<<

US 2012026.6329A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2012/0266329 A1 Mathur et al. (43) Pub. Date: Oct. 18, 2012

(54) NUCLEICACIDS AND PROTEINS AND CI2N 9/10 (2006.01) METHODS FOR MAKING AND USING THEMI CI2N 9/24 (2006.01) CI2N 9/02 (2006.01) (75) Inventors: Eric J. Mathur, Carlsbad, CA CI2N 9/06 (2006.01) (US); Cathy Chang, San Marcos, CI2P 2L/02 (2006.01) CA (US) CI2O I/04 (2006.01) CI2N 9/96 (2006.01) (73) Assignee: BP Corporation North America CI2N 5/82 (2006.01) Inc., Houston, TX (US) CI2N 15/53 (2006.01) CI2N IS/54 (2006.01)

CI2N 15/57 2006.O1 (22) Filed: Feb. 20, 2012 CI2N IS/60 308: Related U.S. Application Data EN f :08: (62) Division of application No. 1 1/817,403, filed on May AOIH 5/00 (2006.01) 7, 2008, now Pat. No. 8,119,385, filed as application AOIH 5/10 (2006.01) No. PCT/US2006/007642 on Mar. 3, 2006. C07K I4/00 (2006.01) CI2N IS/II (2006.01) (60) Provisional application No. 60/658,984, filed on Mar. AOIH I/06 (2006.01) 4, 2005. CI2N 15/63 (2006.01) Publication Classification (52) U.S. Cl...... 800/293; 435/320.1; 435/252.3: 435/325; 435/254.11: 435/254.2:435/348; (51) Int. Cl. 435/419; 435/195; 435/196; 435/198: 435/233; CI2N 15/52 (2006.01) 435/201:435/232; 435/208; 435/227; 435/193; CI2N 15/85 (2006.01) 435/200; 435/189: 435/191: 435/69.1; 435/34; CI2N 5/86 (2006.01) 435/188:536/23.2; 435/468; 800/298; 800/320; CI2N 15/867 (2006.01) 800/317.2: 800/317.4: 800/320.3: 800/306; CI2N 5/864 (2006.01) 800/312 800/320.2: 800/317.3; 800/322; CI2N 5/8 (2006.01) 800/320.1; 530/350, 536/23.1: 800/278; 800/294 CI2N I/2 (2006.01) CI2N 5/10 (2006.01) (57) ABSTRACT CI2N L/15 (2006.01) CI2N I/19 (2006.01) The invention provides polypeptides, including , CI2N 9/14 (2006.01) structural proteins and binding proteins, polynucleotides CI2N 9/16 (2006.01) encoding these polypeptides, and methods of making and CI2N 9/20 (2006.01) using these polynucleotides and polypeptides. Polypeptides, CI2N 9/90 (2006.01) including enzymes and antibodies, and nucleic acids of the CI2N 9/26 (2006.01) invention can be used in industrial, experimental, food and CI2N 9/88 (2006.01) feed processing, nutritional and pharmaceutical applications, CI2N 9/40 (2006.01) e.g., for food and feed Supplements, colorants, neutraceuti CI2N 9/78 (2006.01) cals, cosmetic and pharmaceutical needs. Patent Application Publication Oct. 18, 2012 Sheet 1 of 4 US 2012/0266329 A1

| NTERNA. stoRAGE

DATA , , , , , , , 120

RETRIEWING Estay

Estre Patent Application Publication Oct. 18, 2012 Sheet 2 of 4 US 2012/0266329 A1

200

opeNDATABASE of sequENCEs - READ FIRST SEQUENCE IN DATABASE -

viparison of NewsECUENCE AND SORED SEQUENCE | 252

go to next

SELENCEtt DATABASE

sAABASE

IEEE Patent Application Publication Oct. 18, 2012 Sheet 3 of 4 US 2012/0266329 A1

250 252

READ FIRST CHARACTER OF FIRstseauence --

YES

ARACTERS ros. - READ? r NO

DISPLAY HOMOLOGYLEVEL BETWEEN THE FIRST if AND SECOND SEQUENCES

FIGRE 3 Patent Application Publication Oct. 18, 2012 Sheet 4 of 4 US 2012/0266329 A1

3.02. 3. YN

3A stoREA First sequence roMEMORY ------, 9

OPEN DATABASE OF SEQUENCE FEATURES 3xxxx-xx-xxxx::::: ::...... g. - ... 899 READ FIRST FEATURE FROM DATAEASE - coMPARE FEArurEarTributes witHTHE First 3:6 SEQUENCE

YES 3a

isixxxessDISPLAY Found FEATURE to the user READ NEXT FATRE DATABASE

FIG RE, i. US 2012/0266329 A1 Oct. 18, 2012

NUCLECACDS AND PROTEINS AND SUMMARY METHODS FOR MAKING AND USING THEMI 0005. The invention provides isolated or recombinant CROSS-REFERENCE TO RELATED nucleic acids comprising a nucleic acid sequence having at APPLICATIONS least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 0001. This application is a divisional of U.S. patent appli 58%, 59%, 60%, 61%. 62%, 63%, 64%. 65%, 66%, 67%, cation Ser. No. 1 1/817,403, filed Aug. 29, 2007, now U.S. Pat. 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, No. 8,119,385; which is the U.S. national phase, pursuant to 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 35 U.S.C. S371, of international application number PCT/ 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, US2006/007642, filed Mar. 3, 2006, designating the United 98%, 99%, or more, or complete (100%) sequence identity to States and published on Sep. 14, 2006 as publication number an exemplary nucleic acid of the invention, e.g., including WO 2006/096527, which claims priority under 35 USC S 119 SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQID NO:7, (e)(1) of prior U.S. provisional application No. 60/658,984, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID filed Mar. 4, 2005, all of which are hereby incorporated by NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, reference. SEQ ID NO:23, SEQ ID NO:25, and all nucleic acids dis FIELD OF THE INVENTION closed in the SEQID listing, which include all odd numbered SEQID NO:s from SEQ ID NO:1 through SEQ ID NO:26, 0002. This invention relates to molecular and cellular biol ogy and biochemistry. In one aspect, the invention provides 897, over a region of at least about 10, 15, 20, 25, 30, 35, 40, polypeptides, including enzymes, structural proteins and 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, binding proteins (e.g., ligands, receptors), polynucleotides 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, encoding these polypeptides, and methods of making and 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, using these polynucleotides and polypeptides. In one aspect, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050,2100, the invention is directed to polypeptides, e.g., enzymes, struc 2200, 2250, 2300, 2350, 2400, 2450, 2500, or more residues, tural proteins and binding proteins, including thermostable encodes at least one polypeptide having an , structural and thermotolerant activity, and polynucleotides encoding orbinding activity, and the sequence identities are determined these enzymes, structural proteins and binding proteins and by analysis with a sequence comparison algorithm or by a making and using these polynucleotides and polypeptides. visual inspection. In one aspect, the enzymes and proteins of The polypeptides of the invention can be used in a variety of the invention include, e.g. aldolases, alpha-, pharmaceutical, agricultural and industrial contexts, includ amidases, e.g. secondary amidases, , catalases, caro ing the manufacture of cosmetics and nutraceuticals. tenoid pathway enzymes, dehalogenases, endoglucanases, 0003. Additionally, the polypeptides of the invention can epoxide , , hydrolases, , gly be used in food processing, brewing, bath additives, alcohol cosidases, inteins, , laccases, , monooxyge production, peptide synthesis, enantioselectivity, hide prepa nases, nitroreductases, , P450 enzymes, pectate ration in the leather industry, waste management and animal , , , , polymerases degradation, silver recovery in the photographic industry, and Xylanases. In another aspect, the isolated and recombi medical treatment, silk degumming, biofilm degradation, nant polypeptides of the invention, including enzymes, struc biomass conversion to ethanol, biodefense, antimicrobial tural proteins and binding proteins, and polynucleotides agents and disinfectants, personal care and cosmetics, biotech encoding these polypeptides, of the invention have activity as reagents, in corn wet milling and pharmaceuticals such as digestive aids and anti-inflammatory (anti-phlogistic) agents. described in Table 1, Table 2 or Table 3, below. 0006. In one aspect, the invention also provides isolated or BACKGROUND recombinant nucleic acids with a common novelty in that they 0004. The invention provides isolated and recombinant are all derived from a common Source, e.g., an environmental polypeptides, including enzymes, structural proteins and Source, mixed environmental sources or mixed cultures. The binding proteins, polynucleotides encoding these polypep invention provides isolated or recombinant nucleic acids iso tides, and methods of making and using these polynucleotides lated from a common Source, e.g. an environmental source, and polypeptides. The polypeptides of the invention, and the mixed environmental sources or mixed cultures comprising a polynucleotides encoding the polypeptides of the invention, polynucleotide of the invention, e.g., an exemplary sequence encompass many classes of enzymes, structural proteins and of the invention, including SEQ ID NO:1, SEQ ID NO:3, binding proteins. In one aspect, the enzymes and proteins of SEQID NO:5, SEQID NO:7, SEQIDNO:9, SEQID NO:11, the invention include, e.g. aldolases, alpha-galactosidases, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID amidases, e.g. secondary amidases, amylases, catalases, caro NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, and tenoid pathway enzymes, dehalogenases, endoglucanases, all nucleic acids disclosed in the SEQ ID listing, which epoxide hydrolases, esterases, hydrolases, glucosidases, gly include all odd numbered SEQID NO:s from SEQID NO:1 cosidases, inteins, isomerases, laccases, lipases, monooxyge through SEQID NO:26,897, over a region of at least about nases, nitroreductases, nitrilases, P450 enzymes, pectate 10, 15, 20, 25, 30, 35, 40, 45,50, 75, 100, 150, 200, 250,300, lyases, phosphatases, phospholipases, phytases, polymerases 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, and Xylanases. The invention also provides isolated and 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, recombinant polypeptides, including enzymes, structural 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, proteins and binding proteins, polynucleotides encoding 1950, 2000, 2050,2100, 2200,2250,2300, 2350,2400, 2450, these polypeptides, having the activities described in Table 1, 2500, or more residues, encodes at least one polypeptide Table 2 or Table 3, below. The enzymes and proteins of the having an enzyme, structural or binding activity, and the invention have utility in a variety of applications. sequence identities are determined by analysis with a US 2012/0266329 A1 Oct. 18, 2012 sequence comparison algorithm or by a visual inspection. In or activity, a activity or a lipid acyl one aspect, the enzymes and proteins of the invention include, (LAH) activity, a cell envelop biogenesis activity, an outer e.g. aldolases, alpha-galactosidases, amidases, e.g. secondary membrane synthesis activity, a ribosomal structure synthesis amidases, amylases, catalases, carotenoid pathway enzymes, activity, a translational processing activity, a transcriptional dehalogenases, endoglucanases, epoxide hydrolases, initiation activity, a TATA-binding activity, a signal transduc esterases, hydrolases, glucosidases, glycosidases, inteins, tion activity, an energy activity, an ATPase activ isomerases, laccases, lipases, monooxygenases, nitroreduc ity, an information storage and/or processing activity, and/or tases, nitrilases, P450 enzymes, pectate lyases, phosphatases, any of the polypeptides activities as set forth in Table 1, Table phospholipases, phytases, polymerases and Xylanases. In 2 or Table 3, below. another aspect, the isolated and recombinant polypeptides of the invention, including enzymes, structural proteins and 0009. In one aspect, the sequence comparison algorithm is binding proteins, and polynucleotides encoding these a BLAST version 2.2.2 algorithm where a filtering setting is polypeptides, of the invention have activity as described in set to blastall-p blastp-d'nr pataa’-FF, and all other options Table 1, Table 2 or Table 3, below. are set to default. 0007. In alternative aspects, the isolated or recombinant 0010. Another aspect of the invention is an isolated or nucleic acid encodes a polypeptide comprising an exemplary recombinant nucleic acid including at least 10, 15, 20, 25.30, sequence of the invention, e.g., including sequences as set 35, 40, 45,50, 75, 100, 150,200,250,300,350,400,450,500, forth in SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, NO:8, SEQID NO:10, SEQID NO:12, SEQID NO:14, SEQ 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, ID NO:16, SEQID NO:18, SEQID NO:20, SEQID NO:22, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, SEQID NO:24, and all polypeptides disclosed in the SEQID 2100, 2200, 2250, 2300, 2350, 2400, 2450, 2500, or more listing, which include all even numbered SEQID NO:s from consecutive bases of a nucleic acid sequence of the invention, SEQ ID NO:2 through SEQ ID NO: 26,898. In one aspect sequences substantially identical thereto, and the sequences these polypeptides have an enzyme, structural or binding complementary thereto. activity. In one aspect, the enzymes and proteins of the inven 0011. In one aspect, the isolated or recombinant nucleic tion include, e.g. aldolases, alpha-galactosidases, amidases, acid encodes a polypeptide having a enzyme, structural or e.g. secondary amidases, amylases, catalases, carotenoid binding activity, that is thermostable. The polypeptide can pathway enzymes, dehalogenases, endoglucanases, epoxide retain activity under conditions comprising a temperature hydrolases, esterases, hydrolases, glucosidases, glycosi range of between about 37°C. to about 95°C.; between about dases, inteins, isomerases, laccases, lipases, monooxygena 55° C. to about 85°C., between about 70° C. to about 95°C., ses, nitroreductases, nitrilases, P450 enzymes, pectate lyases, or, between about 90° C. to about 95°C. phosphatases, phospholipases, phytases, polymerases and 0012. In another aspect, the isolated or recombinant Xylanases. In another aspect, the isolated and recombinant nucleic acid encodes a polypeptide having an enzyme, struc polypeptides of the invention, including enzymes, structural tural or binding activity, which is thermotolerant. The proteins and binding proteins, and polynucleotides encoding polypeptide can retain activity after exposure to a temperature these polypeptides, of the invention have activity as described in the range from greater than 37° C. to about 95° C. or in Table 1, Table 2 or Table 3, below. anywhere in the range from greater than 55°C. to about 85°C. 0008. In alternative aspects, the enzyme, structural or The polypeptide can retain activity after exposure to a tem binding activity comprises a recombinase activity, a helicase perature in the range between about 1° C. to about 5° C. activity, a DNA replication activity, a DNA recombination between about 5° C. to about 15°C., between about 15° C. to activity, an , a trans-isomerase activity or topoi about 25°C., between about 25°C. to about 37°C., between Somerase activity, a methyl activity, an ami about 37° C. to about 95°C., between about 55° C. to about notransferase activity, auracil-5- activity, a 85°C., between about 70° C. to about 75° C., or between cysteinyl tRNA synthetase activity, a hydrolase, an about 90° C. to about 95°C., or more. In one aspect, the activity, a phosphoesterase activity, an acetylmuramyl pen polypeptide retains activity after exposure to a temperature in tapeptide phosphotransferase activity, a glycosyltransferase the range from greater than 90° C. to about 95°C. at about pH activity, an activity, an acetylglucosamine 4.5. phosphate transferase activity, a binding activity, 0013 The invention provides isolated or recombinant a telomerase activity or a transcriptional regulatory activity, a nucleic acids comprising a sequence that hybridizes under heat shock protein activity, a activity, a proteinase stringent conditions to a nucleic acid comprising a sequence activity, a peptidase activity, a carboxypeptidase activity, an of the invention, e.g., an exemplary sequence of the invention, activity, an activity, a RecB family including SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ exonuclease activity, a polymerase activity, a carbamoyl ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, phosphate synthetase activity, a methyl-thioadenine Syn SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID thetase activity, an activity, an Fe—S oxi NO:21, SEQID NO:23, SEQID NO:25, and all nucleic acids doreductase activity, a flavodoxin reductase activity, a per disclosed in the SEQID listing, which include all odd num mease activity, a thymidylate activity, a dehydrogenase bered SEQ ID NO:s from SEQ ID NO:1 through SEQ ID activity, a pyrophosphorylase activity, a coenzyme metabo NO:26,897, or fragments or subsequences thereof. In one lism activity, a dinucleotide-utilizing enzyme activity, a aspect, the nucleic acid encodes a polypeptide having a molybdopterin or thiamine biosynthesis activity, a beta-lac enzyme, structural or binding activity. The nucleic acid can be tamase activity, a ligand binding activity, an ion transport at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, activity, an ion metabolism activity, a tellurite resistance pro 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, tein activity, an inorganic ion transport activity, a nucleotide 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200 or more transport activity, a nucleotide metabolism activity, an actin residues in length or the full length of the or transcript. US 2012/0266329 A1 Oct. 18, 2012

In one aspect, the stringent conditions include a wash step erated by amplification, e.g., polymerase chain reaction comprising a wash in 0.2xSSC at a temperature of about 65° (PCR), using an amplification primer pair of the invention. C. for about 15 minutes. The invention provides methods of making a polypeptide, 0014. The invention provides a nucleic acid probe for enzyme, protein, e.g. structural or binding protein, by ampli identifying a nucleic acid encoding a polypeptide having a fication, e.g., polymerase chain reaction (PCR), using an enzyme, structural or binding activity, wherein the probe amplification primer pair of the invention. In one aspect, the comprises at least about 10, 15, 20, 25, 30, 35, 40, 45, 50,55, amplification primer pair amplifies a nucleic acid from a 60, 65, 70, 75, 80, 85,90, 95, 100, 150, 200, 250, 300, 350, library, e.g., a gene library, such as an environmental library. 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 0019. The invention provides methods of amplifying a 1000 or more, consecutive bases of a sequence comprising a nucleic acid encoding a polypeptide having an enzyme, struc sequence of the invention, or fragments or Subsequences tural or binding activity, comprising amplification of a tem thereof, wherein the probe identifies the nucleic acid by bind plate nucleic acid with an amplification primer sequence pair ing or hybridization. The probe can comprise an oligonucle capable of amplifying a nucleic acid sequence of the inven otide comprising at least about 10 to 50, about 20 to 60, about tion, or fragments or Subsequences thereof. 30 to 70, about 40 to 80, or about 60 to 100 consecutive bases 0020. The invention provides expression cassettes com of a sequence comprising a sequence of the invention, or prising a nucleic acid of the invention or a Subsequence fragments or Subsequences thereof. thereof. In one aspect, the expression cassette can comprise 0015 The invention provides a nucleic acid probe for the nucleic acid that is operably linked to a promoter. The identifying a nucleic acid encoding a polypeptide having a promoter can be a viral, bacterial, mammalian or plant pro enzyme, structural or binding activity, wherein the probe moter. In one aspect, the plant promoter can be a potato, rice, comprises a nucleic acid comprising a sequence at least about corn, wheat, tobacco or barley promoter. The promoter can be 10, 15, 20, 30, 40, 50, 60, 70, 80,90, 100, 150, 200, 250, 300, a constitutive promoter. The constitutive promoter can com 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, prise CaMV35S. In another aspect, the promoter can be an 950, 1000 or more residues having at least about 50%, 51%, inducible promoter. In one aspect, the promoter can be a 52%. 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, tissue-specific promoter or an environmentally regulated or a 62%, 63%, 64%. 65%, 66%, 67%, 68%, 69%, 70%, 71%, developmentally regulated promoter. Thus, the promoter can 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, be, e.g., a seed-specific, a leaf-specific, a root-specific, a 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, stem-specific or an abscission-induced promoter. In one 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or aspect, the expression cassette can further comprise a plant or complete (100%) sequence identity to a nucleic acid of the plant virus expression vector. invention. In one aspect, the sequence identities are deter 0021. The invention provides cloning vehicles comprising mined by analysis with a sequence comparison algorithm or an expression cassette (e.g., a vector) of the invention or a by visual inspection. In alternative aspects, the probe can nucleic acid of the invention. The cloning vehicle can be a comprise an oligonucleotide comprising at least about 10 to viral vector, a plasmid, a phage, a phagemid, a cosmid, a 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 fosmid, a bacteriophage or an artificial chromosome. The to 100 consecutive bases of a nucleic acid sequence of the viral vector can comprise an adenovirus vector, a retroviral invention, or a Subsequence thereof. vector or an adeno-associated viral vector. The cloning 0016. The invention provides an amplification primer pair vehicle can comprise a bacterial artificial chromosome for amplifying a nucleic acid encoding a polypeptide having (BAC), a plasmid, a bacteriophage P1-derived vector (PAC), a enzyme, structural or binding activity, wherein the primer a yeast artificial chromosome (YAC), or a mammalian artifi pair is capable of amplifying a nucleic acid comprising a cial chromosome (MAC). sequence of the invention, or fragments or Subsequences 0022. The invention provides transformed cell comprising thereof. One or each member of the amplification primer a nucleic acid of the invention or an expression cassette (e.g., sequence pair can comprise an oligonucleotide comprising at a vector) of the invention, or a cloning vehicle of the inven least about 10 to 50, or more, consecutive bases of the tion. In one aspect, the transformed cell can be a bacterial cell, sequence, or about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22. a mammalian cell, a fungal cell, a yeast cell, an insect cell or 23, 24, 25, 26, 27, 28, 29, 30 or more consecutive bases of the a plant cell. In one aspect, the plant cell can be a cereal, a Sequence. potato, wheat, rice, corn, tobacco or barley cell. 0017. The invention provides amplification primer pairs, 0023 The invention provides transgenic non-human ani wherein the primer pair comprises a first member having a mals comprising a nucleic acid of the invention or an expres sequence as set forth by about the first (the 5') 12, 13, 14, 15, sion cassette (e.g., a vector) of the invention. In one aspect, the 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, animal is a mouse, a rat, a pig, a goat or a sheep. 33, 34, 35, 36 or more residues of a nucleic acid of the 0024. The invention provides transgenic plants compris invention, and a second member having a sequence as set ing a nucleic acid of the invention or an expression cassette forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, (e.g., a vector) of the invention. The transgenic plant can be a 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34,35, 36 cereal plant, a corn plant, a potato plant, a tomato plant, a or more residues of the complementary strand of the first wheat plant, an oilseed plant, a rapeseed plant, a Soybean member. plant, a rice plant, a barley plant or a tobacco plant. 0018. The invention provides polypeptide-, enzyme-, pro 0025. The invention provides transgenic seeds comprising tein-, e.g. structural or binding protein-encoding nucleic a nucleic acid of the invention or an expression cassette (e.g., acids generated by amplification, e.g., polymerase chain reac a vector) of the invention. The transgenic seed can be a cereal tion (PCR), using an amplification primer pair of the inven plant, a corn seed, a wheat kernel, an oilseed, a rapeseed, a tion. The invention provides polypeptide-, enzyme-, protein-, Soybean seed, a palm kernel, a Sunflower seed, a sesame seed, e.g. structural or binding protein-encoding nucleic acids gen a peanut or a tobacco plant seed. US 2012/0266329 A1 Oct. 18, 2012

0026. The invention provides an antisense oligonucleotide phosphate synthetase activity, a methyl-thioadenine Syn comprising a nucleic acid sequence complementary to or thetase activity, an oxidoreductase activity, an Fe—S oxi capable of hybridizing understringent conditions to a nucleic doreductase activity, a flavodoxin reductase activity, a per acid of the invention. The invention provides methods of mease activity, a thymidylate activity, a dehydrogenase inhibiting the translation of a polypeptide, enzyme, protein, activity, a pyrophosphorylase activity, a coenzyme metabo e.g. structural orbinding protein message in a cell comprising lism activity, a dinucleotide-utilizing enzyme activity, a administering to the cell or expressing in the cell an antisense molybdopterin or thiamine biosynthesis activity, a beta-lac oligonucleotide comprising a nucleic acid sequence comple tamase activity, a ligand binding activity, an ion transport mentary to or capable of hybridizing under stringent condi activity, an ion metabolism activity, a tellurite resistance pro tions to a nucleic acid of the invention. In one aspect, the tein activity, an inorganic ion transport activity, a nucleotide antisense oligonucleotide is between about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases transport activity, a nucleotide metabolism activity, an actin in length, e.g., 10, 15, 20, 25, 30,35, 40, 45,50, 55,60, 65,70, or myosin activity, a lipase activity or a lipid acyl hydrolase 75, 80, 85,90, 95, 100 or more bases in length. (LAH) activity, a cell envelop biogenesis activity, an outer 0027. The invention provides methods of inhibiting the membrane synthesis activity, a ribosomal structure synthesis translation of a polypeptide, enzyme, protein, e.g. structural activity, a translational processing activity, a transcriptional orbinding protein message in a cell comprising administering initiation activity, a TATA-binding activity, a signal transduc to the cell or expressing in the cell an antisense oligonucle tion activity, an energy metabolism activity, an ATPase activ otide comprising a nucleic acid sequence complementary to ity, an information storage and/or processing activity, and/or or capable of hybridizing under Stringent conditions to a any of the polypeptides activities as set forth in Table 1, Table nucleic acid of the invention. The invention provides double 2 or Table 3, below. stranded inhibitory RNA (RNAi, or RNA interference) mol 0030 Exemplary polypeptide or peptide sequences of the ecules (including small interfering RNA, or siRNAs, for invention include SEQ ID NO:2, SEQ ID NO:4, SEQ ID inhibiting transcription, and microRNAs, or miRNAs, for NO:6, SEQID NO:8, SEQID NO:10, etc., and all polypep inhibiting translation) comprising a Subsequence of a tides disclosed in the SEQID listing, which include all even sequence of the invention. In one aspect, the RNAi is about numbered SEQID NO:s from SEQID NO:2 through SEQID 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, NO:26,898, and subsequences thereof and variants thereof. 32,33,34,35, 40, 45,50,55,60, 65,70, 75,80, 85,90,95, 100 Exemplary polypeptides also include fragments of at least or more duplex nucleotides in length. The invention provides about 10, 15, 20, 25, 30,35, 40, 45,50, 75,80, 85,90,95, 100, methods of inhibiting the expression of a polypeptide, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600 or more enzyme, protein, peptide, e.g. structural or binding protein in residues in length, or over the full length of an enzyme. a cell comprising administering to the cellor expressing in the Exemplary polypeptide or peptide sequences of the invention cell a double-stranded inhibitory RNA (iRNA, including include sequence encoded by a nucleic acid of the invention. small interfering RNA, or siRNAs, for inhibiting transcrip Exemplary polypeptide or peptide sequences of the invention tion, and microRNAs, or miRNAs, for inhibiting translation), include polypeptides or peptides specifically bound by an wherein the RNA comprises a Subsequence of a sequence of antibody of the invention. the invention. 0031. In one aspect, the polypeptide, enzyme, protein, e.g. 0028. The invention provides isolated or recombinant structural or binding protein, is thermostable. The polypep polypeptides encoded by a nucleic acid of the invention. In tide, enzyme, protein, e.g. structural or binding protein can alternative aspects, the polypeptide can have a sequence as set retain activity under conditions comprising a temperature forth in SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID range of between about 1° C. to about 5°C., between about 5° NO:8, SEQID NO:10, etc., and all polypeptides disclosed in C. to about 15° C., between about 15° C. to about 25°C., the SEQID listing, which include all even numbered SEQID between about 25°C. to about 37°C., between about 37° C. NO:s from SEQ ID NO:2 through SEQ ID NO:26,898 (the to about 95° C., between about 55° C. to about 85°C., exemplary sequences of the invention), or Subsequences between about 70° C. to about 75°C., or between about 90° C. thereof, including fragments having enzymatic and/or Sub to about 95°C., or more. In another aspect, the polypeptide, strate binding activity. The polypeptide can have an enzyme, enzyme, protein, e.g. structural or binding protein can be structural or binding activity. thermotolerant. The polypeptide, enzyme, protein, e.g. struc 0029. In alternative aspects, the enzyme, structural or tural or binding protein can retain activity after exposure to a binding activity comprises a recombinase activity, a helicase temperature in the range from greater than 37°C. to about 95° activity, a DNA replication activity, a DNA recombination C., or in the range from greater than 55° C. to about 85°C. In activity, an isomerase, a trans-isomerase activity or topoi one aspect, the polypeptide, enzyme, protein, e.g. structural Somerase activity, a methyl transferase activity, an ami or binding protein can retain activity after exposure to a notransferase activity, auracil-5-methyltransferase activity, a temperature in the range from greater than 90° C. to about 95° cysteinyl tRNA synthetase activity, a hydrolase, an esterase C. at pH 4.5. activity, a phosphoesterase activity, an acetylmuramyl pen 0032. Another aspect of the invention provides an isolated tapeptide phosphotransferase activity, a glycosyltransferase or recombinant polypeptide or peptide including at least 10, activity, an acetyltransferase activity, an acetylglucosamine 15, 20, 25, 30,35, 40, 45, 50,55, 60, 65,70, 75,80, 85,90,95 phosphate transferase activity, a centromere binding activity, or 100 or more consecutive bases of a polypeptide or peptide a telomerase activity or a transcriptional regulatory activity, a sequence of the invention, sequences Substantially identical heat shock protein activity, a protease activity, a proteinase thereto, and the sequences complementary thereto. The pep activity, a peptidase activity, a carboxypeptidase activity, an tide can be, e.g., an immunogenic fragment, a motif (e.g., a endonuclease activity, an exonuclease activity, a RecB family ), a signal sequence, a prepro sequence or an exonuclease activity, a polymerase activity, a carbamoyl . US 2012/0266329 A1 Oct. 18, 2012

0033. The invention provides isolated or recombinant 0038. In one aspect, the enzyme, structural or binding nucleic acids comprising a sequence encoding a polypeptide, activity comprises a specific activity at about 37° C. in the enzyme, protein, e.g. structural or binding protein having any range from about 1 to about 1200 units per milligram of of the activities as set forth in Tables 1, 2 or 3, and a signal protein, or, about 100 to about 1000 units per milligram of sequence, wherein the nucleic acid comprises a sequence of protein. In another aspect, the polypeptide, enzyme, protein, the invention. In one aspect, the isolated or recombinant e.g. structural or binding protein activity comprises a specific polypeptide can comprise the polypeptide of the invention activity from about 100 to about 1000 units per milligram of comprising a heterologous signal sequence or a heterologous protein, or, from about 500 to about 750 units per milligram of preprosequence, Such as a heterologous enzyme or non-en protein. Alternatively, the enzyme, structural orbinding activ Zyme signal sequence. The invention provides isolated or ity comprises a specific activity at 37° C. in the range from recombinant nucleic acids comprising a sequence encoding a about 1 to about 750 units per milligram of protein, or, from polypeptide, enzyme, protein, e.g. structural or binding pro about 500 to about 1200 units per milligram of protein. In one tein having any of the activities as set forth in Tables 1, 2 or 3, aspect, the enzyme, structural or binding activity comprises a wherein the sequence does not contain a signal sequence and specific activity at 37°C. in the range from about 1 to about the nucleic acid comprises a sequence of the invention. In one 500 units per milligram of protein, or, from about 750 to about aspect, the invention provides an isolated or recombinant 1000 units per milligram of protein. In another aspect, the polypeptide comprising a polypeptide of the invention lack enzyme, structural or binding activity comprises a specific ing all or part of a signal sequence. activity at 37°C. in the range from about 1 to about 250 units 0034. In one aspect, the invention provides chimeric pro per milligram of protein. Alternatively, the enzyme, structural teins comprising a first domain comprising a signal sequence or binding activity comprises a specific activity at 37°C. in of the invention and at least a second domain. The protein can the range from about 1 to about 100 units per milligram of be a fusion protein. The second domain can comprise an protein. enzyme. The enzyme can be a non-enzyme. 0039. In another aspect, thermotolerance comprises reten 0035. The invention provides chimeric polypeptides com tion of at least half of the specific activity of the enzyme, prising at least a first domain comprising signal peptide (SP), structural or binding protein at 37°C. after being heated to the a prepro sequence and/or a catalytic domain (CD) of the elevated temperature. Alternatively, thermotolerance can invention and at least a second domain comprising a heter comprise retention of specific activity at 37°C. in the range ologous polypeptide or peptide, wherein the heterologous polypeptide or peptide is not naturally associated with the from about 1 to about 1200 units per milligram of protein, or, signal peptide (SP), prepro sequence and/or catalytic domain from about 500 to about 1000 units per milligram of protein, (CD). In one aspect, the heterologous polypeptide or peptide after being heated to the elevated temperature. In another is not an enzyme. The heterologous polypeptide or peptide aspect, thermotolerance can comprise retention of specific can be amino terminal to, carboxy terminal to or on both ends activity at 37°C. in the range from about 1 to about 500 units of the signal peptide (SP), prepro sequence and/or catalytic per milligram of protein after being heated to the elevated domain (CD). temperature. 0036. The invention provides isolated or recombinant 0040. The invention provides the isolated or recombinant nucleic acids encoding a chimeric polypeptide, wherein the polypeptide of the invention, wherein the polypeptide com chimeric polypeptide comprises at least a first domain com prises at least one glycosylation site. In one aspect, glycosy prising signal peptide (SP), a prepro domain and/or a catalytic lation can be an N-linked glycosylation. In one aspect, the domain (CD) of the invention and at least a second domain polypeptide can be glycosylated after being expressed in a P comprising a heterologous polypeptide or peptide, wherein pastoris or a S. pombe. the heterologous polypeptide or peptide is not naturally asso 0041. In one aspect, the polypeptide, enzyme, protein, e.g. ciated with the signal peptide (SP), prepro domain and/or structural or binding protein can retain activity under condi catalytic domain (CD). tions comprising about pH 6.5, pH 6, pH 5.5, pH 5. pH 4.5 or 0037. The invention provides isolated or recombinant sig pH4. In another aspect, the polypeptide, enzyme, protein, e.g. nal sequences (e.g., signal peptides) consisting of or compris structural or binding protein can retain activity under condi ing a sequence as set forth in residues 1 to 14, 1 to 15, 1 to 16, tions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 9.5, pH 10, pH 10.5 or pH 11. In one aspect, the polypeptide 24, 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, can retain an enzyme, structural or binding activity after 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 to 38, 1 to exposure to conditions comprising about pH 6.5, pH 6, pH 40, 1 to 41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46 or 1 to 47, 5.5, pH 5, pH 4.5 or pH 4. In another aspect, the polypeptide of a polypeptide of the invention, including the exemplary can retain enzyme, structural or binding activity after expo polypeptides of the invention (including SEQID NO:2, SEQ sure to conditions comprising about pH 7, pH 7.5 pH 8.0, pH ID NO:4, SEQID NO:6, SEQID NO:8, SEQID NO:10, etc., 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 11. and all polypeptides disclosed in the SEQ ID listing, which 0042. In one aspect, the polypeptide, enzyme, protein, e.g. include all even numbered SEQID NO:s from SEQID NO:2 structural or binding protein of the invention has activity at through SEQ ID NO:26,898). In one aspect, the invention under alkaline conditions, e.g., the alkaline conditions of the provides signal sequences comprising the first 14, 15, 16, 17. gut, e.g., the Small intestine. In one aspect, the polypeptide, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34, enzyme, protein, e.g. structural or binding protein can retain 35,36, 37,38, 39, 40, 41,42, 43,44, 45,46, 47, 48,49, 50, 51, activity after exposure to the acidic pH of the stomach. 52,53,54, 55,56, 57,58, 59, 60, 61, 62,63, 64, 65, 66, 67,68, 0043. The invention provides protein preparations com 69, 70 or more amino terminal residues of a polypeptide of the prising a polypeptide of the invention, wherein the protein invention. preparation comprises a liquid, a Solid or a gel. US 2012/0266329 A1 Oct. 18, 2012

0044) The invention provides heterodimers comprising a 0051. The invention provides methods for identifying a polypeptide of the invention and a second protein or domain. polypeptide having enzyme, structural or binding activity The second member of the heterodimer can be a different comprising the following steps: (a) providing a polypeptide enzyme, a different enzyme or another protein. In one aspect, of the invention; or a polypeptide encoded by a nucleic acid of the second domain can be a polypeptide and the heterodimer the invention; (b) providing an enzyme, structural or binding can be a fusion protein. In one aspect, the second domain can activity ; and (c) contacting the polypeptide or a be an epitope or a tag. In one aspect, the invention provides fragment or variant thereof of step (a) with the substrate of homodimers comprising a polypeptide of the invention. step (b) and detecting a decrease in the amount of substrate or 0045. The invention provides immobilized polypeptides an increase in the amount of a reaction , wherein a having enzyme, structural or binding activity, wherein the decrease in the amount of the Substrate or an increase in the polypeptide comprises a polypeptide of the invention, a amount of the reaction product detects a polypeptide having a polypeptide encoded by a nucleic acid of the invention, or a enzyme, structural or binding activity. polypeptide comprising a polypeptide of the invention and a 0.052 The invention provides methods for identifying a second domain. In one aspect, the polypeptide can be immo polypeptide, enzyme, protein, e.g. structural or binding pro bilized on a cell, a metal, a resin, a polymer, a ceramic, a glass, tein, Substrate comprising the following steps: (a) providing a a microelectrode, a graphitic particle, a bead, a gel, a plate, an polypeptide of the invention; or a polypeptide encoded by a array or a capillary tube. nucleic acid of the invention; (b) providing a test Substrate; 0046. The invention provides arrays comprising an immo and (c) contacting the polypeptide of step (a) with the test bilized nucleic acid of the invention. The invention provides Substrate of step (b) and detecting a decrease in the amount of arrays comprising an antibody of the invention. Substrate or an increase in the amount of reaction product, 0047. The invention provides isolated or recombinant wherein a decrease in the amount of the Substrate or an antibodies that specifically bind to a polypeptide of the inven increase in the amount of a reaction product identifies the test tion or to a polypeptide encoded by a nucleic acid of the Substrate as a polypeptide, enzyme, protein, e.g. structural or invention. These antibodies of the invention can be a mono binding protein, Substrate. clonal or a polyclonal antibody. The invention provides hybri domas comprising an antibody of the invention, e.g., an anti 0053. The invention provides methods of determining body that specifically binds to a polypeptide of the invention whether a test compound specifically binds to a polypeptide or to a polypeptide encoded by a nucleic acid of the invention. comprising the following steps: (a) expressing a nucleic acid The invention provides nucleic acids encoding these antibod or a vector comprising the nucleic acid under conditions 1S permissive for translation of the nucleic acid to a polypeptide, 0048. The invention provides method of isolating or iden wherein the nucleic acid comprises a nucleic acid of the tifying a polypeptide having enzyme, structural or binding invention, or, providing a polypeptide of the invention; (b) activity comprising the steps of: (a) providing an antibody of providing a test compound; (c) contacting the polypeptide the invention; (b) providing a sample comprising polypep with the test compound; and (d) determining whether the test tides; and (c) contacting the sample of step (b) with the compound of step (b) specifically binds to the polypeptide. antibody of step (a) under conditions wherein the antibody 0054 The invention provides methods for identifying a can specifically bind to the polypeptide, thereby isolating or modulator of a enzyme, structural or binding activity com identifying a polypeptide having an enzyme, structural or prising the following steps: (a) providing a polypeptide of the binding activity. invention or a polypeptide encoded by a nucleic acid of the 0049. The invention provides methods of making an anti invention; (b) providing a test compound; (c) contacting the polypeptide, anti-enzyme, or anti-protein, e.g. anti-structural polypeptide of step (a) with the test compound of step (b) and oranti-binding protein, antibody comprising administering to measuring an activity of the polypeptide, enzyme, protein, a non-human animal a nucleic acid of the invention or a e.g. structural or binding protein, wherein a change in the polypeptide of the invention or Subsequences thereof in an enzyme, structural or binding activity measured in the pres amount Sufficient to generate a humoral immune response, ence of the test compound compared to the activity in the thereby making an anti-polypeptide, anti-enzyme, or anti absence of the test compound provides a determination that protein, e.g. anti-structural or anti-binding protein, antibody. the test compound modulates the enzyme, structural or bind The invention provides methods of making an anti-polypep ing activity. In one aspect, the enzyme, structural or binding tide, anti-enzyme, or anti-protein, e.g. anti-structural or anti activity can be measured by providing a polypeptide, enzyme, binding protein, immune comprising administering to a non protein, e.g. structural or binding protein, Substrate and human animal a nucleic acid of the invention or a polypeptide detecting a decrease in the amount of the Substrate or an of the invention or Subsequences thereof in an amount Suffi increase in the amount of a reaction product, or, an increase in cient to generate an immune response. the amount of the Substrate or a decrease in the amount of a 0050. The invention provides methods of producing a reaction product. A decrease in the amount of the Substrate or recombinant polypeptide comprising the steps of: (a) provid an increase in the amount of the reaction product with the test ing a nucleic acid of the invention operably linked to a pro compound as compared to the amount of substrate or reaction moter, and (b) expressing the nucleic acid of step (a) under product without the test compound identifies the test com conditions that allow expression of the polypeptide, thereby pound as an activator of enzyme, structural or binding activ producing a recombinant polypeptide. In one aspect, the ity. An increase in the amount of the Substrate or a decrease in method can further comprise transforming a host cell with the the amount of the reaction product with the test compound as nucleic acid of step (a) followed by expressing the nucleic compared to the amount of Substrate or reaction product acid of step (a), thereby producing a recombinant polypeptide without the test compound identifies the test compound as an in a transformed cell. inhibitor of enzyme, structural or binding activity. US 2012/0266329 A1 Oct. 18, 2012

0055. The invention provides computer systems compris of step (b) with the polynucleotide probe of step (a); and (d) ing a processor and a data storage device wherein said data isolating a nucleic acid that specifically hybridizes with the storage device has stored thereon a polypeptide sequence or a polynucleotide probe of step (a), thereby isolating or recov nucleic acid sequence of the invention (e.g., a polypeptide ering a nucleic acid encoding a polypeptide, enzyme, protein, encoded by a nucleic acid of the invention). In one aspect, the computer system can further comprise a sequence compari e.g. structural or binding protein from an environmental son algorithm and a data storage device having at least one sample. The environmental sample can comprise a water reference sequence stored thereon. In another aspect, the sample, a liquid sample, a Soil sample, an air sample or a sequence comparison algorithm comprises a computer pro biological sample. In one aspect, the biological sample can be gram that indicates polymorphisms. In one aspect, the com derived from a bacterial cell, a protozoan cell, an insect cell, puter system can further comprise an identifier that identifies a yeast cell, a plant cell, a fungal cell or a mammalian cell. one or more features in said sequence. The invention provides 0058. The invention provides methods of generating a computer readable media having stored thereona polypeptide variant of a nucleic acid encoding a polypeptide having an sequence or a nucleic acid sequence of the invention. The enzyme, structural or binding activity comprising the steps invention provides methods for identifying a feature in a of: (a) providing a template nucleic acid comprising a nucleic sequence comprising the steps of: (a) reading the sequence acid of the invention; and (b) modifying, deleting or adding using a computer program which identifies one or more fea one or more nucleotides in the template sequence, or a com tures in a sequence, wherein the sequence comprises a bination thereof, to generate a variant of the template nucleic polypeptide sequence or a nucleic acid sequence of the inven acid. In one aspect, the method can further comprise express tion; and (b) identifying one or more features in the sequence with the computer program. The invention provides methods ing the variant nucleic acid to generate a variant the polypep for comparing a first sequence to a second sequence compris tide, enzyme, protein, e.g. structural or binding protein. The ing the steps of: (a) reading the first sequence and the second modifications, additions or deletions can be introduced by a sequence through use of a computer program which com method comprising error-prone PCR, shuffling, oligonucle pares sequences, wherein the first sequence comprises a otide-directed mutagenesis, assembly PCR, sexual PCR polypeptide sequence or a nucleic acid sequence of the inven mutagenesis, in Vivo mutagenesis, cassette mutagenesis, tion; and (b) determining differences between the first recursive ensemble mutagenesis, exponential ensemble sequence and the second sequence with the computer pro mutagenesis, site-specific mutagenesis, gene reassembly, gram. The step of determining differences between the first Gene Site Saturation Mutagenesis (GSSM), synthetic liga sequence and the second sequence can further comprise the tion reassembly (SLR) or a combination thereof. In another step of identifying polymorphisms. In one aspect, the method aspect, the modifications, additions or deletions are intro can further comprise an identifier that identifies one or more duced by a method comprising recombination, recursive features in a sequence. In another aspect, the method can sequence recombination, phosphothioate-modified DNA comprise reading the first sequence using a computer pro mutagenesis, uracil-containing template mutagenesis, gram and identifying one or more features in the sequence. gapped duplex mutagenesis, point mismatch repair mutagen 0056. The invention provides methods for isolating or esis, repair-deficient host strain mutagenesis, chemical recovering a nucleic acid encoding a polypeptide, enzyme, mutagenesis, radiogenic mutagenesis, deletion mutagenesis, protein, e.g. structural or binding protein, from an environ restriction-selection mutagenesis, restriction-purification mental sample comprising the steps of: (a) providing an mutagenesis, artificial gene synthesis, ensemble mutagen amplification primer sequence pair for amplifying a nucleic esis, chimeric nucleic acid multimer creation and a combina acid encoding a polypeptide, enzyme, protein, e.g. structural tion thereof. or binding protein, wherein the primer pair is capable of 0059. In one aspect, the method can be iteratively repeated amplifying a nucleic acid of the invention; (b) isolating a until a polypeptide, enzyme, protein, e.g. structural orbinding nucleic acid from the environmental sample or treating the protein having an altered or different activity or an altered or environmental sample Such that nucleic acid in the sample is different stability from that of a polypeptide encoded by the accessible for hybridization to the amplification primer pair; template nucleic acid is produced. In one aspect, the variant and, (c) combining the nucleic acid of step (b) with the ampli the polypeptide, enzyme, protein, e.g. structural or binding fication primer pair of step (a) and amplifying nucleic acid protein is thermotolerant, and retains some activity after from the environmental sample, thereby isolating or recover being exposed to an elevated temperature. In another aspect, ing a nucleic acid encoding a polypeptide, enzyme, protein, the variant the polypeptide, enzyme, protein, e.g. structural or e.g. structural or binding protein from an environmental binding protein has increased glycosylation as compared to sample. One or each member of the amplification primer the polypeptide, enzyme, protein, e.g. structural or binding sequence pair can comprise an oligonucleotide comprising an protein encoded by a template nucleic acid. Alternatively, the amplification primer sequence pair of the invention, e.g., variant the polypeptide, enzyme, protein, e.g. structural or having at least about 10 to 50 consecutive bases of a sequence binding protein has an enzyme, structural or binding activity of the invention. under a high temperature, wherein the polypeptide, enzyme, 0057 The invention provides methods for isolating or protein, e.g. structural or binding protein encoded by the recovering a nucleic acid encoding a polypeptide, enzyme, template nucleic acid is not active under the high temperature. protein, e.g. structural or binding protein from an environ In one aspect, the method can be iteratively repeated until a mental sample comprising the steps of: (a) providing a poly polypeptide, enzyme, protein, e.g. structural or binding pro nucleotide probe comprising a nucleic acid of the invention or tein coding sequence having an altered codon usage from that a Subsequence thereof; (b) isolating a nucleic acid from the of the template nucleic acid is produced. In another aspect, the environmental sample or treating the environmental sample method can be iteratively repeated until a polypeptide, such that nucleic acid in the sample is accessible for hybrid enzyme, protein, e.g. structural or binding protein gene hav ization to a polynucleotide probe of step (a); (c) combining ing higher or lower level of message expression or stability the isolated nucleic acid or the treated environmental sample from that of the template nucleic acid is produced. US 2012/0266329 A1 Oct. 18, 2012

0060. The invention provides methods for modifying that hybridizes under Stringent conditions to a nucleic acid of codons in a nucleic acid encoding a polypeptide having an the invention, and the nucleic acid encodes a polypeptide, enzyme, structural or binding activity to increase its expres enzyme, protein, e.g. structural or binding protein, active site sion in a host cell, the method comprising the following steps: or a polypeptide, enzyme, protein, e.g. structural or binding (a) providing a nucleic acid of the invention encoding a protein, Substrate binding site; (b) providing a set of polypeptide having an enzyme, structural orbinding activity; mutagenic oligonucleotides that encode naturally-occurring and, (b) identifying a non-preferred or a less preferred codon variants at a plurality of targeted codons in the first in the nucleic acid of step (a) and replacing it with a preferred nucleic acid; and, (c) using the set of mutagenic oligonucle or neutrally used codon encoding the same amino acid as the otides to generate a set of active site-encoding or substrate replaced codon, wherein a preferred codon is a codon over binding site-encoding variant nucleic acids encoding a range represented in coding sequences in in the host cell and of amino acid variations at each amino acid codon that was a non-preferred or less preferred codon is a codon under mutagenized, thereby producing a library of nucleic acids represented in coding sequences in genes in the host cell, encoding a plurality of modified the polypeptide, enzyme, thereby modifying the nucleic acid to increase its expression protein, e.g. structural or binding protein, active sites or Sub in a host cell. strate binding sites. In one aspect, the method comprises 0061 The invention provides methods for modifying mutagenizing the first nucleic acid of step (a) by a method codons in a nucleic acid encoding a polypeptide having an comprising an optimized directed evolution system, Gene enzyme, structural or binding activity; the method compris Site Saturation Mutagenesis (GSSM), synthetic ligation reas ing the following steps: (a) providing a nucleic acid of the sembly (SLR), error-prone PCR, shuffling, oligonucleotide invention; and, (b) identifying a codon in the nucleic acid of directed mutagenesis, assembly PCR, sexual PCR mutagen step (a) and replacing it with a different codon encoding the esis, in Vivo mutagenesis, cassette mutagenesis, recursive same amino acid as the replaced codon, thereby modifying ensemble mutagenesis, exponential ensemble mutagenesis, codons in a nucleic acid encoding a polypeptide, enzyme, site-specific mutagenesis, gene reassembly, and a combina protein, e.g. structural or binding protein. tion thereof. In another aspect, the method comprises 0062. The invention provides methods for modifying mutagenizing the first nucleic acid of step (a) or variants by a codons in a nucleic acid encoding a polypeptide having an method comprising recombination, recursive sequence enzyme, structural or binding activity to increase its expres recombination, phosphothioate-modified DNA mutagenesis, sion in a host cell, the method comprising the following steps: uracil-containing template mutagenesis, gapped duplex (a) providing a nucleic acid of the invention encoding a mutagenesis, point mismatch repair mutagenesis, repair-de polypeptide, enzyme, protein, e.g. structural or binding pro ficient host strain mutagenesis, chemical mutagenesis, radio tein, polypeptide; and, (b) identifying a non-preferred or a genic mutagenesis, deletion mutagenesis, restriction-selec less preferred codon in the nucleic acid of step (a) and replac tion mutagenesis, restriction-purification mutagenesis, ing it with a preferred or neutrally used codon encoding the artificial gene synthesis, ensemble mutagenesis, chimeric same amino acid as the replaced codon, wherein a preferred nucleic acid multimer creation and a combination thereof. codon is a codon over-represented in coding sequences in 0065. The invention provides methods for making a small genes in the host cell and a non-preferred or less preferred molecule comprising the steps of: (a) providing a plurality of codon is a codon under-represented in coding sequences in biosynthetic enzymes capable of synthesizing or modifying a genes in the host cell, thereby modifying the nucleic acid to Small molecule, wherein one of the enzymes comprises an increase its expression in a host cell. enzyme encoded by a nucleic acid of the invention; (b) pro 0063. The invention provides methods for modifying a viding a substrate for at least one of the enzymes of step (a): codon in a nucleic acid encoding a polypeptide having an and, (c) reacting the Substrate of step (b) with the enzymes enzyme, structural or binding activity to decrease its expres under conditions that facilitate a plurality of biocatalytic reac sion in a host cell, the method comprising the following steps: tions to generate a small molecule by a series of biocatalytic (a) providing a nucleic acid of the invention; and (b) identi reactions. fying at least one preferred codon in the nucleic acid of step 0066. The invention provides methods for modifying a (a) and replacing it with a non-preferred or less preferred Small molecule comprising the steps: (a) providing a enzyme codon encoding the same amino acid as the replaced codon, encoded by a nucleic acid of the invention; (b) providing a wherein a preferred codon is a codon over-represented in Small molecule; and, (c) reacting the enzyme of step (a) with coding sequences in genes in a host cell and a non-preferred the small molecule of step (b) under conditions that facilitate or less preferred codon is a codon under-represented in cod an enzymatic reaction catalyzed by the enzyme, thereby ing sequences in genes in the host cell, thereby modifying the modifying a small molecule by an enzymatic reaction. In one nucleic acid to decrease its expression in a host cell. In one aspect, the method comprises providing a plurality of Small aspect, the host cell can be a bacterial cell, a fungal cell, an molecule Substrates for the enzyme of step (a), thereby gen insect cell, a yeast cell, a plant cell or a mammalian cell. erating a library of modified small molecules produced by at 0064. The invention provides methods for producing a least one enzymatic reaction catalyzed by the enzyme. In one library of nucleic acids encoding a plurality of modified aspect, the method further comprises a plurality of additional polypeptides, enzymes, proteins, e.g. structural or binding enzymes under conditions that facilitate a plurality of bio proteins, active sites or Substrate binding sites, wherein the catalytic reactions by the enzymes to form a library of modi modified active sites or substrate binding sites are derived fied small molecules produced by the plurality of enzymatic from a first nucleic acid comprising a sequence encoding a reactions. In one aspect, the method further comprises the first active site or a first substrate binding site the method step of testing the library to determine ifa particular modified comprising the following steps: (a) providing a first nucleic small molecule that exhibits a desired activity is present acid encoding a first active site or first Substrate binding site, within the library. The step of testing the library can further wherein the first nucleic acid sequence comprises a sequence comprises the steps of systematically eliminating all but one US 2012/0266329 A1 Oct. 18, 2012 of the biocatalytic reactions used to produce a portion of the comprising a nucleic acid comprising a nucleic acid of the plurality of the modified small molecules within the library invention or a nucleic acid sequence of the invention, wherein by testing the portion of the modified small molecule for the the sequence identities are determined by analysis with a presence or absence of the particular modified Small molecule sequence comparison algorithm or by visual inspection, with a desired activity, and identifying at least one specific wherein overexpression is effected by use of a high activity biocatalytic reaction that produces the particular modified promoter, a dicistronic vector or by gene amplification of the small molecule of desired activity. Vector. 0067. The invention provides methods for determining a 0071. The invention provides methods of making a trans functional fragment of a polypeptide, enzyme, protein, e.g. genic plant comprising the following steps: (a) introducing a structural or binding protein, comprising the steps of: (a) heterologous nucleic acid sequence into the cell, wherein the providing a polypeptide, enzyme, protein, e.g. structural or heterologous nucleic sequence comprises a nucleic acid binding protein, wherein the enzyme comprises a polypeptide sequence of the invention, thereby producing a transformed of the invention, or a polypeptide encoded by a nucleic acid of plant cell; and (b) producing a transgenic plant from the the invention, or a Subsequence thereof, and (b) deleting a transformed cell. In one aspect, the step (a) can further com plurality of amino acid residues from the sequence of step (a) prise introducing the heterologous nucleic acid sequence by and testing the remaining Subsequence for an enzyme, struc electroporation or microinjection of plant cell protoplasts. In tural or binding activity, thereby determining a functional another aspect, the step (a) can further comprise introducing fragment of a polypeptide, enzyme, protein, e.g. structural or the heterologous nucleic acid sequence directly to plant tissue binding protein. In one aspect, the polypeptide, enzyme, pro by DNA particle bombardment. Alternatively, the step (a) can tein, e.g. structural or binding protein activity is measured by further comprise introducing the heterologous nucleic acid providing a polypeptide, enzyme, protein, e.g. structural or sequence into the plant cell DNA using an Agrobacterium binding protein, Substrate and detecting a decrease in the tumefaciens host. In one aspect, the plant cell can be a potato, amount of the Substrate or an increase in the amount of a corn, rice, wheat, tobacco, or barley cell. reaction product. 0072 The invention provides methods of expressing a het 0068. The invention provides methods for whole cellengi erologous nucleic acid sequence in a plant cell comprising the neering of new or modified phenotypes by using real-time following steps: (a) transforming the plant cell with a heter metabolic flux analysis, the method comprising the following ologous nucleic acid sequence operably linked to a promoter, steps: (a) making a modified cell by modifying the genetic wherein the heterologous nucleic sequence comprises a composition of a cell, wherein the genetic composition is nucleic acid of the invention; (b) growing the plant under modified by addition to the cell of a nucleic acid of the conditions wherein the heterologous nucleic acids sequence invention; (b) culturing the modified cell to generate a plural is expressed in the plant cell. The invention provides methods ity of modified cells; (c) measuring at least one metabolic of expressing a heterologous nucleic acid sequence in a plant parameter of the cell by monitoring the cell culture of step (b) cell comprising the following steps: (a) transforming the plant in real time; and, (d) analyzing the data of step (c) to deter cell with a heterologous nucleic acid sequence operably mine if the measured parameter differs from a comparable linked to a promoter, wherein the heterologous nucleic measurement in an unmodified cell under similar conditions, sequence comprises a sequence of the invention; (b) growing thereby identifying an engineered phenotype in the cell using the plant under conditions wherein the heterologous nucleic real-time metabolic flux analysis. In one aspect, the genetic acids sequence is expressed in the plant cell. composition of the cell can be modified by a method com 0073. The invention provides feeds or foods comprising a prising deletion of a sequence or modification of a sequence polypeptide of the invention, or a polypeptide encoded by a in the cell, or, knocking out the expression of a gene. In one nucleic acid of the invention. In one aspect, the invention aspect, the method can further comprise selecting a cell com provides a food, feed, a liquid, e.g., a beverage (such as a fruit prising a newly engineered phenotype. In another aspect, the juice or a beer), a bread or a dough or a bread product, or a method can comprise culturing the selected cell, thereby gen beverage precursor (e.g., a wort), comprising a polypeptide of erating a new cell strain comprising a newly engineered phe the invention. The invention provides food or nutritional notype. Supplements for an animal comprising a polypeptide of the 0069. The invention provides methods of increasing ther invention, e.g., a polypeptide encoded by the nucleic acid of motolerance or thermostability of a polypeptide, enzyme, the invention. protein, e.g. structural or binding protein, polypeptide, the 0074. In one aspect, the polypeptide in the food or nutri method comprising glycosylating a polypeptide, enzyme, tional Supplement can be glycosylated. The invention pro protein, e.g. structural or binding protein, wherein the vides edible enzyme delivery matrices comprising a polypep polypeptide, enzyme, protein, e.g. structural or binding pro tide of the invention, e.g., a polypeptide encoded by the tein comprises at least thirty contiguous amino acids of a nucleic acid of the invention. In one aspect, the delivery polypeptide of the invention; or a polypeptide encoded by a matrix comprises a pellet. In one aspect, the polypeptide can nucleic acid sequence of the invention, thereby increasing be glycosylated. In one aspect, the polypeptide, enzyme, pro thermotolerance or thermostability of the polypeptide, tein, e.g. structural or binding protein activity is thermotoler enzyme, protein, e.g. structural or binding protein. In one ant. In another aspect, the polypeptide, enzyme, protein, e.g. aspect, the polypeptide, enzyme, protein, e.g. structural or structural or binding protein activity is thermostable. binding protein specific activity can be thermostable or ther 0075. The invention provides a food, a feed or a nutritional motolerant at a temperature in the range from greater than Supplement comprising a polypeptide of the invention. The about 37° C. to about 95° C. invention provides methods for utilizing a polypeptide, 0070 The invention provides methods for overexpressing enzyme, protein, e.g. structural or binding protein, as a nutri a recombinant polypeptide, enzyme, protein, e.g. structural or tional Supplement in an animal diet, the method comprising: binding protein, in a cell comprising expressing a vector preparing a nutritional Supplement containing a polypeptide, US 2012/0266329 A1 Oct. 18, 2012 enzyme, protein, e.g. structural or binding protein, compris I0083 FIG. 3 is a flow diagram illustrating one aspect of a ing at least thirty contiguous amino acids of a polypeptide of process in a computer for determining whether two sequences the invention; and administering the nutritional Supplement to are homologous. an animal. The animal can be a human, a ruminant or a I0084 FIG. 4 is a flow diagram illustrating one aspect of an monogastric animal. The polypeptide, enzyme, protein, e.g. identifier process 300 for detecting the presence of a feature in structural or binding protein can be prepared by expression of a Sequence. a polynucleotide encoding the polypeptide, enzyme, protein, I0085. Like reference symbols in the various drawings e.g. structural or binding protein in an organism selected from indicate like elements. the group consisting of a bacterium, a yeast, a plant, an insect, a fungus and an animal. The organism can be selected from DETAILED DESCRIPTION the group consisting of an S. pombe, S. cerevisiae, Pichia pastoris, E. coli, Streptomyces sp., Bacillus sp. and Lactoba I0086. The invention provides isolated and recombinant cillus sp. polypeptides, including enzymes, structural proteins and 0076. The invention provides edible enzyme delivery binding proteins, polynucleotides encoding these polypep matrix comprising thermostable recombinant polypeptide, tides, and methods of making and using these polynucleotides enzyme, protein, e.g. structural or binding protein of the and polypeptides. The polypeptides of the invention, and the invention. The invention provides methods for delivering a polynucleotides encoding the polypeptides of the invention, polypeptide, enzyme, protein, e.g. structural or binding pro encompass many classes of enzymes, structural proteins and tein, Supplement to an animal, the method comprising: pre binding proteins. In one aspect, the enzymes and proteins of paring an edible enzyme delivery matrix in the form of pellets the invention comprise, e.g. aldolases, alpha-galactosidases, comprising a granulate edible carrier and thermostable amidases, e.g. secondary amidases, amylases, catalases, caro recombinant polypeptide, enzyme, protein, e.g. structural or tenoid pathway enzymes, dehalogenases, endoglucanases, binding protein, wherein the pellets readily disperse the epoxide hydrolases, esterases, hydrolases, glucosidases, gly polypeptide, enzyme, protein, e.g. structural or binding pro cosidases, inteins, isomerases, laccases, lipases, monooxyge tein contained therein into aqueous media, and administering nases, nitroreductases, nitrilases, P450 enzymes, pectate the edible enzyme delivery matrix to the animal. The recom lyases, phosphatases, phospholipases, phytases, polymerases binant polypeptide, enzyme, protein, e.g. structural or bind and Xylanases, which are more specifically described below. ing protein can comprise a polypeptide of the invention. The The invention also provides isolated and recombinant polypeptide, enzyme, protein, e.g. structural or binding pro polypeptides, including enzymes, structural proteins and tein can be glycosylated to provide thermostability at pellet binding proteins, polynucleotides encoding these polypep izing conditions. The delivery matrix can be formed by pel tides, having the activities described in Table 1, Table 2 or letizing a mixture comprising a grain germ and a polypeptide, Table 3, below. enzyme, protein, e.g. structural or binding protein. The pel letizing conditions can include application of steam. The Aldolases pelletizing conditions can comprise application of a tempera I0087. In one aspect, the invention provides aldolases, ture in excess of about 80° C. for about 5 minutes and the polynucleotides encoding them, and methods of making and enzyme retains a specific activity of at least 350 to about 900 using these polynucleotides and polypeptides. In one aspect, units per milligram of enzyme. the invention is directed to polypeptides, e.g., enzymes, hav 0077. In one aspect, invention provides a pharmaceutical ing an aldolase activity, including thermostable and thermo composition comprising a polypeptide, enzyme, protein, e.g. tolerant aldolase activity, and polynucleotides encoding these structural or binding protein, of the invention, or a polypep enzymes, and making and using these polynucleotides and tide encoded by a nucleic acid of the invention. In one aspect, polypeptides. In one aspect, the aldolase activity comprises the pharmaceutical composition acts as a digestive aid. of the formation of a carbon-carbon bond. In one 0078. The details of one or more aspects of the invention aspect, the aldolase activity comprises an aldol condensation. are set forth in the accompanying drawings and the descrip The aldol condensation can have an aldol donor Substrate tion below. Other features, objects, and advantages of the comprising an acetaldehyde and an aldol acceptor Substrate invention will be apparent from the description and drawings, comprising an . The aldol condensation can yield a and from the claims. product of a single chirality. In one aspect, the aldolase activ 0079 All publications, patents, patent applications, Gen ity is enantioselective. The aldolase activity can comprise a Bank sequences and ATCC deposits, cited herein are hereby 2-deoxyribose-5-phosphate aldolase (DERA) activity. The expressly incorporated by reference for all purposes. aldolase activity can comprise catalysis of the condensation of acetaldehyde as donor and a 2CR)-hydroxy-3-(hydroxy or mercapto)-propionaldehyde derivative to form a 2-deox BRIEF DESCRIPTION OF DRAWINGS ySugar. The aldolase activity can comprise catalysis of the condensation of acetaldehyde as donor and a 2-substituted 0080. The following drawings are illustrative of aspects of acetaldehyde acceptor to form a 2,4,6-trideoxyhexose via a the invention and are not meant to limit the scope of the 4-substituted-3-hydroxybutanal intermediate. The aldolase invention as encompassed by the claims. activity can comprise catalysis of the generation of chiral 0081 FIG. 1 is a block diagram of a computer system. using two acetaldehydes as Substrates. The aldo 0082 FIG. 2 is a flow diagram illustrating one aspect of a lase activity can comprises enantioselective assembling of process for comparing a new nucleotide or protein sequence chiral B,8-dihydroxyheptanoic acid side chains. The aldolase with a database of sequences in order to determine the homol activity can comprise enantioselective assembling of the core ogy levels between the new sequence and the sequences in the of R—(R*,R)-2-(4-fluorophenyl)-b.d-dihydroxy-5-(1- database. methylethyl)-3-phenyl-4-(phenylamino)-carbonyl)-1H-pyr US 2012/0266329 A1 Oct. 18, 2012

role-1-heptanoic acid (Atorvastatin, or LIPITORTM), rosuv Amylases astatin (CRESTORTM) and/or fluvastatin (LESCOLTM). The aldolase activity can comprise, with an oxidation step. Syn 0093. In one aspect, the invention provides amylases, thesis of a 3R,5S-6-chloro-2,4,6-trideoxy-erythro-hexono polynucleotides encoding them, and methods of making and lactone. using these polynucleotides and polypeptides. In one aspect, the invention is directed to polypeptides, e.g., enzymes, hav Alpha-Galactosidases ing an activity, including thermostable and thermo 0088. In one aspect, the invention provides alpha-galac tolerant amylase activity, and polynucleotides encoding these tosidases, polynucleotides encoding them, and methods of enzymes, and making and using these polynucleotides and making and using these polynucleotides and polypeptides. In polypeptides. one aspect, the invention is directed to polypeptides, e.g., 0094. In one aspect, the polypeptides of the invention can enzymes, having an alpha-galactosidase activity, including be used as amylases, for example, alpha amylases or glu thermostable and thermotolerant alpha-galactosidase activ coamylases, to catalyze the hydrolysis of starch into Sugars. ity, and polynucleotides encoding these enzymes, and making In one aspect, the invention is directed to polypeptides having and using these polynucleotides and polypeptides. thermostable amylase activity, such as alpha amylases or 0089. An alpha galactosidase hydrolyses the non-reduc glucoamylase activity, e.g., a 1,4-alpha-D-glucan glucohy ing terminal alpha 1-3,4,6 linked galactose from poly- and drolase activity. In one aspect, the polypeptides of the inven oligosaccharides. These saccharides are commonly found in tion can be used as amylases, for example, alpha amylases or legumes and are difficult to digest. As such, alpha-galactosi glucoamylases, to catalyze the hydrolysis of starch into Sug dases can be used as a digestive aid to break down raffinose, ars, such as glucose. The invention is also directed to nucleic stachyose, and Verbascose, found in Such foods as beans and acid constructs, vectors, and host cells comprising the nucleic other gassy foods. acid sequences of the invention as well as recombinant meth ods for producing the polypeptides of the invention. The Amidases invention is also directed to the use of amylases of the inven 0090. In one aspect, the invention provides amidases, tion in starch conversion processes, including production of polynucleotides encoding them, and methods of making and high fructose corn syrup (HFCS), ethanol, dextrose, and dex using these polynucleotides and polypeptides. In one aspect, trose syrups. the invention is directed to polypeptides, e.g., enzymes, hav 0.095 Commercially, glucoamylases are used to further ing an amidase activity, including thermostable and thermo hydrolyze cornstarch, which has already been partially tolerant amidase activity, and polynucleotides encoding these hydrolyzed with an alpha-amylase. The glucose produced in enzymes, and making and using these polynucleotides and this reaction may then be converted to a mixture of glucose polypeptides. In one aspect, the amidases of the invention are and fructose by a glucose isomerase enzyme. This mixture, or used in the removal of , phenylalanine or methionine one enriched with fructose, is the high fructose corn syrup from the N-terminal end of peptides in peptide or peptidomi commercialized throughout the world. In general, starch to metic synthesis. In one aspect, the enzyme of the invention, fructose processing consists of four steps: liquefaction of e.g. an amidase, is selective for the L, or “natural enantiomer granular starch, Saccharification of the liquefied Starch into of the amino acid derivatives and is therefore useful for the dextrose, purification, and isomerization to fructose. The production of optically active compounds. These reactions object of a starch liquefaction process is to convert a concen can be performed in the presence of the chemically more trated Suspension of starch polymer granules into a solution reactive ester functionality, a step which is very difficult to of soluble shorter chain length dextrins of low viscosity. achieve with nonenzymatic methods. The enzyme is also able to tolerate high temperatures (at least 70° C.), and high con 0096. The amylases of the invention can be used in auto centrations of organic solvents (>40% DMSO), both of which matic dish wash (ADW) products and laundry detergent. In cause a disruption of secondary structure in peptides, which ADW products, the amylase will function at pH 10-11 and at 45-60°C. in the presence of calcium chelators and oxidative enables cleavage of otherwise resistant bonds. conditions. For laundry, activity at pH 9-10 and 40°C. in the Secondary Amidases appropriate detergent matrix will be required. Amylases are 0091. In one aspect, the invention provides secondary ami also useful in textile desizing, brewing processes, starch dases, polynucleotides encoding them, and methods of mak modification in the paper and pulp industry and other pro ing and using these polynucleotides and polypeptides. In one cesses described in the art. 0097 Amylases can be used commercially in the initial aspect, the invention is directed to polypeptides, e.g., stages (liquefaction) of Starch processing; in wet corn mill enzymes, having a secondary amidase activity, including ing; in alcohol production; as cleaning agents in detergent thermostable and thermotolerant secondary amidase activity, matrices; in the textile industry for starch desizing; in baking and polynucleotides encoding these enzymes, and making applications; in the beverage industry; in oilfields in drilling and using these polynucleotides and polypeptides. processes; in inking of recycled paper and in animal feed. 0092 Secondary amidases include a variety of useful Amylases are also useful in textile desizing, brewing pro enzymes including peptidases, , and hydantoinases. cesses, starch modification in the paper and pulp industry and This class of enzymes can be used in a range of commercial other processes. applications. For example, secondary amidases can be used to: 1) increase flavor in food, in particular cheese (known as Carotenoid Pathway Enzymes enzyme ripened cheese); 2) promote bacterial and fungal killing; 3) modify and de-protect fine chemical intermediates 0098. The invention provides novel enzymes, and the 4) synthesize peptide bonds; 5) and carry out chiral resolu polynucleotides encoding them, involved in carotenoid (Such tions. Particularly, there is a need in the art for an enzyme as lycopenes and luteins), astaxanthin and/or isoprenoid syn capable of hydrolyzing Cephalosporin C. thesis. The invention also provides novel genes in the caro US 2012/0266329 A1 Oct. 18, 2012 tenoid, astaxanthin and isoprenoid biosynthetic pathways motolerant dehalogenase activity, and polynucleotides comprising at least one enzyme of the invention. For example, encoding these enzymes, and making and using these poly alternative aspects, the invention provides one or more nucleotides and polypeptides. nucleic acid coding sequences (CDSs, or ORFs) encoding all, 0103 Environmental pollutants consist of a large quantity or at least one, enzyme(s) involved in a desired biosynthetic and variety of chemicals; many of these are toxic, environ pathway for carotenoids, astaxanthins and/or isoprenoids. mental hazards that were designated in 1979 as priority pol The nucleic acid coding sequence(s) can be expressed lutants by the U.S. Environmental Protection Agency. Micro through an expression plasmid, vector, engineered virus or bial and enzymatic biodegradation is one method for the any episomal expression system, or, can be integrated into the elimination of these pollutants. Accordingly, methods have been designed to treat commercial wastes and to bioremedi genome of the host cell. In one aspect, the enzyme(s) involved ate polluted environments via microbial and related enzy in the biosynthetic pathway system comprise a novel combi matic processes. Unfortunately, many chemical pollutants are nation of enzymes. In another aspect, the enzyme(s) involved either resistant to microbial degradation or are toxic to poten in the biosynthetic pathway system comprise at least one tial microbial-degraders when present in high concentrations novel enzyme of the invention where nucleic acids used in and certain combinations. the system encode a novel enzyme of the invention. 0104 Dehalogenases, e.g. haloalkane dehalogenases, of 0099 Carotenoids are natural pigments which have anti the invention can cleave carbon-halogen bonds in haloal oxidant and anti-carcinogenic activity. They are free radical kanes and halocarboxylic acids by hydrolysis, thus convert Scavengers, and as such, strong antioxidants. Carotenoids ing them to their corresponding alcohols. This reaction can be have a conjugated backbone structure and are very rigid mol used for detoxification involving haloalkanes, such as ethyl ecules, having a backbone consisting of 9 to 11 alternating chloride, methylchloride, and 1,2-dichloroethane (e.g., single/double bonds and have very similar electro-optical detoxification of toxic composition, e.g., pesticides, poisons, properties as polyacetylene. Astaxanthins are abundant natu chemical warfare agents and the like comprising haloal rally occurring carotenoids. They contain an internal unit kanes). similar to beta-carotene but have two terminal carbonyl and 0105. The present invention provides a number of dehalo hydroxyl functionalities. These compounds are useful for genase enzymes useful in bioremediation having improved food and feed Supplements, colorants, neutraceuticals, cos enzymatic characteristics. The polynucleotides and poly metic and pharmaceutical needs. Isoprenoids are compounds nucleotide products of the invention are useful in, for biosynthesized from or containing isoprene (unsaturated example, groundwater treatment involving transformed host branched chain five-carbon hydrocarbon) units, includingter cells containing a polynucleotide or polypeptide of the inven penes, carotenoids, fat soluble vitamins, ubiquinone, rubber, tion (e.g., the bacteria Xanthobacter autotrophicus) and the and some steroids. Biosynthetic pathways for carotenoids, haloalkane 1,2-dichlorethane as well as removal of polychlo astaxanthins and isoprenoids are known; most of these pub rinated biphenyls (PCB's) from soil sediment. lished pathways are derived from one organism or a combi 0106 The haloalkane dehalogenase of the invention are nation of genes from a few species. useful in carbon-halide reduction efforts. The enzymes of the invention initiate the degradation of haloalkanes. Alterna Catalases tively, host cells containing a dehalogenase polynucleotide or polypeptide of the invention can feed on the haloalkanes and 0100. In one aspect, the invention provides catalases, produce the detoxifying enzyme. polynucleotides encoding them, and methods of making and using these polynucleotides and polypeptides. In one aspect, Endoglucanases the invention is directed to polypeptides, e.g., enzymes, hav ing a catalase activity, including thermostable and thermotol 0107. In one aspect, the invention provides endogluca erant catalase activity, and polynucleotides encoding these nases, polynucleotides encoding them, and methods of mak enzymes, and making and using these polynucleotides and ing and using these polynucleotides and polypeptides. In one polypeptides. aspect, the invention is directed to polypeptides, e.g., 0101. In processes where hydrogen peroxide is a by-prod enzymes, having an endoglucanase activity, including ther uct, catalases of the invention can be used to destroy or detect mostable and thermotolerant endoglucanase activity, and hydrogen peroxide, e.g., in production of glyoxylic acid and polynucleotides encoding these enzymes, and making and in glucose sensors. Also, in processes where hydrogen per using these polynucleotides and polypeptides. oxide is used as a bleaching orantibacterial agent, catalases of 0108. In one aspect, the enzymes of the invention have a the invention can be used to destroy residual hydrogen per glucanase, e.g., an endoglucanase, activity, e.g., catalyzing oxide, e.g. in contact lens cleaning, in bleaching steps in pulp hydrolysis of internal endo-B-1,4- and/or B-1,3-glucanase and paper production, and in the pasteurization of dairy prod linkages. In one aspect, the endoglucanase activity (e.g., ucts. Further, Such catalases of the invention can be used as endo-1,4-beta-D-glucan 4-glucano hydrolase activity) com catalysts for oxidation reactions, e.g. epoxidation and prises hydrolysis of 1,4- and/or B-1,3-beta-D-glycosidic link hydroxylation. ages in cellulose, cellulose derivatives (e.g., carboxy methyl cellulose and hydroxy ethyl cellulose) lichenin, beta-1,4 Dehalogenases bonds in mixed beta-1,3 glucans, such as cereal beta-D-glu cans or Xyloglucans and other plant material containing cel 0102. In one aspect, the invention provides dehalogenases, lulosic parts. polynucleotides encoding them, and methods of making and 0109 Endoglucanases of the invention (e.g., endo-beta-1, using these polynucleotides and polypeptides. In one aspect, 4-glucanases, EC 3.2.1.4, endo-beta-1,3(1)-glucanases, EC the invention is directed to polypeptides, e.g., enzymes, hav 3.2.1.6; endo-beta-1,3-glucanases, EC 3.2.1.39) can hydro ing a dehalogenase activity, including thermostable and ther lyze internal C-1,4- and/or B-1,3-glucosidic linkages in cel US 2012/0266329 A1 Oct. 18, 2012

lulose and glucan to produce Smaller molecular weight glu see, e.g., Omiecinski (2000) Toxicol. Lett. 112-113:365-370. cose and glucose oligomers. Glucans are polysaccharides Microsomal epoxide hydrolases catalyze the addition of formed from 1,4-B- and/or 1.3-glycoside-linked D-glucopy water to epoxides in a two-step reaction involving initial ranose. Endoglucanases of the invention can be used in the attack of an active site carboxylate on the oxirane to give an food industry, for baking and fruit and vegetable processing, ester intermediate followed by hydrolysis of the ester. Soluble breakdown of agricultural waste, in the manufacture of ani epoxide hydrolase play a role in the biosynthesis of inflam mal feed, in pulp and paper production, textile manufacture mation mediators. and household and industrial cleaning agents. Endogluca 0115 Epoxide hydrolases of the invention can be used in nases are produced by fungi and bacteria. the detoxification of epoxides or in the biosynthesis of hor 0110 Beta-glucans are major non-starch polysaccharides mones. Additionally, epoxide hydrolases of the invention can of cereals. The glucan content can vary significantly depend efficiently process several Substrates, leading to enantiomeri ing on variety and growth conditions. The physicochemical cally enriched-epoxides (the unreacted enantiomer) and/or to properties of this polysaccharide are such that it gives rise to the corresponding vicinal diols. Viscous solutions or evengels under oxidative conditions. In addition glucans have high water-binding capacity. All of Esterases these characteristics present problems for several industries including brewing, baking, animal nutrition. In brewing 0116. In one aspect, the invention provides esterases, applications, the presence of glucan results in wort filterabil polynucleotides encoding them, and methods of making and ity and haze formation issues. In baking applications (espe using these polynucleotides and polypeptides. In one aspect, cially for cookies and crackers), glucans can create sticky the invention is directed to polypeptides, e.g., enzymes, hav doughs that are difficult to machine and reduce biscuit size. In ing an esterase activity, including thermostable and thermo addition, this is implicated in rapid rehydration tolerantesterase activity, and polynucleotides encoding these of the baked product resulting in loss of crispiness and enzymes, and making and using these polynucleotides and reduced shelf-life. For monogastric animal feed applications polypeptides. with cereal diets, beta-glucan is a contributing factor to vis 0117 Many esterases are known and have been discovered cosity of gut contents and thereby adversely affects the in a broad variety of organisms, including bacteria, yeast and digestibility of the feed and animal growth rate. For ruminant higher animals and plants. A principal example of esterases animals, these beta-glucans represent Substantial components are the lipases, which are used in the hydrolysis of lipids, offiber intake and more complete digestion of glucans would acidolysis (replacement of an esterified fatty acid with a free facilitate higher feed conversion efficiencies. It is desirable fatty acid) reactions, transesterification (exchange of fatty for animal feed endoglucanases to be active in the animal acids between triglycerides) reactions, and in ester synthesis. stomach. The major industrial applications for lipases include: the 0111 Endoglucanases of the invention can be used in the detergent industry, where they are employed to decompose digestion of cellulose, a beta-1,4-linked glucan found in all fatty materials in laundry stains into easily removable hydro plant material. Cellulose is the most abundant polysaccharide philic substances; the food and beverage industry where they in nature. Enzymes of the invention that digest cellulose have are used in the manufacture of cheese, the ripening and fla utility in the pulp and paper industry, in textile manufacture Voring of cheese, as antistaling agents for bakery products, and in household and industrial cleaning agents. and in the production of margarine and other spreads with natural butter flavors; in waste systems; and in the pharma Epoxide Hydrolases ceutical industry where they are used as digestive aids. 0112. In one aspect, the invention provides epoxide hydro 0118. Alternatively, esterases of the invention can be used lases, polynucleotides encoding them, and methods of mak in detergent compositions. In one aspect, the esterase can be ing and using these polynucleotides and polypeptides. In one a nonsurface-active esterase. In another aspect, the esterase aspect, the invention is directed to polypeptides, e.g., can be a surface-active esterase. The esterase can be formu enzymes, having an epoxide hydrolase activity, including lated in a non-aqueous liquid composition, a cast Solid, a thermostable and thermotolerant epoxide hydrolase activity, granular form, a particulate form, a compressed tablet, a gel and polynucleotides encoding these enzymes, and making form, a paste or a slurry form. and using these polynucleotides and polypeptides. The 0119. In another aspect, the invention provides fabrics or polypeptides of the invention can be used as epoxide hydro clothing comprising an esterase of the invention. In another lases to catalyze the hydrolysis of epoxides and arene oxides aspect, esterases of the invention are used to treat a lipid to their corresponding diols. containing fabric. 0113 Epoxide hydrolases catalyze the hydrolysis of I0120 In another aspect, the invention provides foods and epoxides and arene oxides to their corresponding diols. drinks comprising an esterase of the invention. The invention Epoxide hydrolases from microbial sources are highly versa also provides cheeses comprising an esterase of the invention. tile biocatalysts for the asymmetric hydrolysis of epoxides on Additionally, the invention provides methods for the manu a preparative scale. Besides kinetic resolution, which fur facture of cheese comprising the following steps: (a) provid nishes the corresponding vicinal diol and remaining non ing a polypeptide having an esterase activity, wherein the hydrolyzed epoxide in nonracemic form, enantioconvergent polypeptide comprises a polypeptide of the invention, or, a processes are possible. These are highly attractive as they lead polypeptide encoded by a nucleic acid of the invention; (b) to the formation of a single enantiomeric diol from a racemic providing a cheese precursor, and (c) contacting the polypep oxirane. tide of step (a) with the precursor of step (b) under condition 0114 Microsomal epoxide hydrolases are biotransforma wherein the esterase can catalyze cheese manufacturing pro tion enzymes that catalyze the conversion of a broad array of cesses. In one aspect, the method can comprise the process of xenobiotic epoxide substrates to more polar diol metabolites, ripening and flavoring of cheese. US 2012/0266329 A1 Oct. 18, 2012

0121 Inanother aspect, the invention provides margarines polypeptide of step (a) with the composition of step (b) under and spreads comprising an enzyme of the invention. The conditions wherein esterase removes or liquefies the ester invention provides methods for production of margarine or comprising compositions. other spreads with natural butter flavors comprising the fol lowing steps: (a) providing a polypeptide having an esterase Hydrolases activity, wherein the polypeptide comprises a polypeptide of the invention, or, a polypeptide encoded by a nucleic acid of I0128. In one aspect, the invention provides hydrolases, the invention; (b) providing a margarine or a spread precur polynucleotides encoding them, and methods of making and using these polynucleotides and polypeptides. In one aspect, Sor, and (c) contacting the polypeptide of step (a) with the the invention is directed to polypeptides, e.g., enzymes, hav precursor of step (b) under condition wherein the esterase can ing a hydrolase activity, e.g., an esterase, acylase, lipase, catalyze processes involved in margarine or spread produc or protease activity, including thermostable tion. and thermotolerant hydrolase activity, and polynucleotides 0122) The invention provides methods for treating solidor encoding these enzymes, and making and using these poly liquid waste products comprising the following steps: (a) nucleotides and polypeptides. The hydrolase activities of the providing a polypeptide having an esterase activity, wherein polypeptides and peptides of the invention include esterase the polypeptide comprises a polypeptide of the invention, or, activity, lipase activity (hydrolysis of lipids), acidolysis reac a polypeptide encoded by a nucleic acid of the invention; (b) tions (to replace an esterified fatty acid with a free fatty acid), providing a solid or a liquid waste; and (c) contacting the transesterification reactions (exchange offatty acids between polypeptide of step (a) and the waste of step (b) under con triglycerides), ester synthesis, ester interchange reactions, ditions wherein the polypeptide can treat the waste. The phospholipase activity (e.g., phospholipase A, B, C and D invention provides solid or liquid waste products comprising activity, patatin activity, lipid acyl hydrolase (LAH) activity) a polypeptide of the invention. and protease activity (hydrolysis of peptide bonds). The 0123. The invention provides methods for aiding digestion polypeptides of the invention can be used in a variety of in a mammal comprising (a) providing a polypeptide having pharmaceutical, agricultural and industrial contexts, includ an esterase activity, wherein the polypeptide comprises a ing the manufacture of cosmetics and nutraceuticals. I0129. In one aspect, the polypeptides of the invention are polypeptide of the invention, or, a polypeptide encoded by a used in the biocatalytic synthesis of structured lipids (lipids nucleic acid of the invention; (b) providing a composition that contain a defined set offatty acids distributed in a defined comprising a substrate for the polypeptide of step (a); (c) manner on the glycerol backbone), including cocoa butter feeding or administering to the mammal the polypeptide of alternatives (CBA), lipids containing poly-unsaturated fatty step (a) with a feed or food comprising a substrate for the acids (PUFAs), diacylglycerides, e.g., 1,3-diacyl glycerides polypeptide of step (a), thereby helping digestion in the mam (DAGs), monoglycerides, e.g., 2-monoglycerides (MAGs) mal. In one aspect, the mammal is a human. and triacylglycerides (TAGs). In one aspect, the polypeptides 0.124. The invention provides pharmaceutical composi of the invention are used to modify oils, such as fish, animal tions comprising a polypeptide and/or a nucleic acid of the and vegetable oils, and lipids, Such as poly-unsaturated fatty invention, e.g., a pharmaceutical composition for use as a acids. The hydrolases of the invention having lipase activity digestive aid in a mammal comprising a polypeptide having can modify oils by hydrolysis, alcoholysis, esterification, an esterase activity, wherein the polypeptide comprises a transesterification and/or interesterification. The methods of polypeptide of the invention, or, a polypeptide encoded by a the invention can use lipases with defined regio-specificity or nucleic acid of the invention. In one aspect, the mammal defined chemoselectivity in biocatalytic synthetic reactions. comprises a human. The enzymes of the invention are used in In another aspect, the polypeptides of the invention are used the manufacture of medicaments. to synthesize enantiomerically pure chiral products. 0.125. The invention provides bakery products comprising 0.130. Additionally, the polypeptides of the invention can a polypeptide of the invention. The invention provides anti be used in food processing, brewing, bath additives, alcohol Staling agents for bakery products comprising a polypeptide production, peptide synthesis, enantioselectivity, hide prepa having an esterase activity, wherein the polypeptide com ration in the leather industry, waste management and animal prises a polypeptide of the invention, or, a polypeptide degradation, silver recovery in the photographic industry, encoded by a nucleic acid of the invention. medical treatment, silk degumming, biofilm degradation, 0126 The invention provides methods for hydrolyzing, biomass conversion to ethanol, biodefense, antimicrobial breaking up or disrupting a ester-comprising composition agents and disinfectants, personal care and cosmetics, biotech comprising the following steps: (a) providing a polypeptide reagents, in increasing starch yield from corn wet milling and of the invention having an esterase activity, or a polypeptide pharmaceuticals such as digestive aids and anti-inflammatory encoded by a nucleic acid of the invention; (b) providing a (anti-phlogistic) agents. composition comprising a protein; and (c) contacting the I0131 The major industrial applications for hydrolases, polypeptide of step (a) with the composition of step (b) under e.g., esterases, lipases, phospholipases and proteases, include conditions wherein the esterase hydrolyzes, breaks up or dis the detergent industry, where they are employed to decom rupts the ester-comprising composition. pose fatty materials in laundry stains into easily removable 0127. Alternatively, the invention provides methods for hydrophilic substances; the food and beverage industry where liquefying or removing ester-comprising compositions com they are used in the manufacture of cheese, the ripening and prising the following steps: (a) providing a polypeptide of the flavoring of cheese, as antistaling agents for bakery products, invention having an esterase activity, or a polypeptide and in the production of margarine and other spreads with encoded by a nucleic acid of the invention; (b) providing a natural butter flavors; in waste systems; and in the pharma composition comprising a protein; and (c) contacting the ceutical industry where they are used as digestive aids. US 2012/0266329 A1 Oct. 18, 2012

0132) Oils and fats an important renewable raw material range of wash temperature and pH. In view of the specificity for the chemical industry. They are available in large quanti of enzymes and the growing use of hydrolases in industry, ties from the processing of oilseeds from plants like rice bran research, and medicine, there is an ongoing need in the art for oil, rapeseed (canola), Sunflower, olive, palm or soy. Other new enzymes and new enzyme inhibitors. Sources of valuable oils and fats include fish, restaurant waste, and rendered animal fats. These fats and oils are a mixture of Glucosidases triglycerides or lipids, i.e. fatty acids (FAs) esterified on a glycerol scaffold. Each oil or fat contains a wide variety of 0.136. In one aspect, the invention provides glucosidases, different lipid structures, defined by the FA content and their polynucleotides encoding them, and methods of making and regiochemical distribution on the glycerol backbone. These using these polynucleotides and polypeptides. In one aspect, properties of the individual lipids determine the physical the invention is directed to polypeptides, e.g., enzymes, hav properties of the pure triglyceride. Hence, the triglyceride ing a glucosidase activity, including thermostable and ther content of a fator oil to a large extent determines the physical, motolerant glucosidase activity, and polynucleotides encod chemical and biological properties of the oil. The value of ing these enzymes, and making and using these lipids increases greatly as a function of their purity. High polynucleotides and polypeptides. purity can be achieved by fractional chromatography or dis 0.137 Alpha-glucosidases of the invention can catalyze tillation, separating the desired triglyceride from the mixed the hydrolysis of starches into Sugars. Alpha-glucosidases background of the fat or oil source. However, this is costly and can hydrolyze terminal non-reducing 1, 4 or 1.6 linked C-D- yields are often limited by the low levels at which the triglyc glucose residues in starch, with release of C-D-glucose. eride occurs naturally. In addition, the purity of the product is 0.138 Alpha-glucosidases of the invention can be used often compromised by the presence of many structurally and commercially in the stages liquefaction and saccharification physically or chemically similar triglycerides in the oil. of starch processing; in wet corn milling; in alcohol produc 0133. An alternative to purifying triglycerides or other tion; as cleaning agents in detergent matrices; in the textile lipids from a natural source is to synthesize the lipids. The industry for starch desizing; in baking applications; in the products of Such processes are called structured lipids beverage industry; in oilfields in drilling processes; in inking because they contain a defined set offatty acids distributed in of recycled paper and in animal feed. Alpha-glucosidases of a defined manner on the glycerol backbone. The value of the invention are also useful in textile desizing, brewing pro lipids also increases greatly by controlling the fatty acid con cesses, starch modification in the paper and pulp industry and tent and distribution within the lipid. Lipases can be used to other processes. affect such control. Glycosidases 0134) Phospholipases are enzymes that hydrolyze the ester bonds of phospholipids. Corresponding to their impor 0.139. In one aspect, the invention provides glycosidases, tance in the metabolism of phospholipids, these enzymes are polynucleotides encoding them, and methods of making and widespread among prokaryotes and eukaryotes. The phos using these polynucleotides and polypeptides. In one aspect, pholipases affect the metabolism, construction and reorgani the invention is directed to polypeptides, e.g., enzymes, hav Zation of biological membranes and are involved in signal ing a glycosidase activity, including thermostable and ther cascades. Several types of phospholipases are known which motolerant glycosidase activity, and polynucleotides encod differ in their specificity according to the position of the bond ing these enzymes, and making and using these attacked in the phospholipid molecule. polynucleotides and polypeptides. Glycosidase enzymes of (PLA1) removes the 1-position fatty acid to produce free fatty the invention can have more specific activity as glucosidases, acid and 1-lyso-2-acylphospholipid. O-galactosidases, B-galactosidases, B-mannosidases, B-man (PLA2) removes the 2-position fatty acid to produce free fatty nanases, endoglucanases, and . acid and 1-acyl-2-lysophospholipid. PLA1 and PLA2 0140 C.-galactosidases of the invention can catalyze the enzymes can be intra- or extra-cellular, membrane-bound or hydrolysis of galactose groups on a polysaccharide backbone soluble. Intracellular PLA2 is found in almost every mamma or hydrolyze the cleavage of di- or oligosaccharides compris liancell. (PLC) removes the phosphate moi ing galactose. B-mannanases of the invention can catalyze the ety to produce 1.2 diacylglycerol and phospho base. Phos hydrolysis of mannose groups internally on a polysaccharide pholipase D (PLD) produces 1,2-diacylglycerophosphate and backbone or hydrolyze the cleavage of di- or oligosaccharides base group. PLC and PLD are important in cell function and comprising mannose groups. B-mannosidases of the inven signaling. Patatins are another type of phospholipase thought tion can hydrolyze non-reducing, terminal mannose residues to work as a PLA. on a mannose-containing polysaccharide and the cleavage of 0135. In general, enzymes, including hydrolases such as di- or oligosaccaharides comprising mannose groups. esterases, lipases and proteases, are active over a narrow 0141 Guargum is a branched galactomannan polysaccha range of environmental conditions (temperature, pH, etc.), ride composed of B-1.4 linked mannose backbone with a-1,6 and many are highly specific for particular Substrates. The linked galactose sidechains. The enzymes required for the narrow range of activity for a given enzyme limits its appli degradation of guar are B-mannanase, B-mannosidase and cability and creates a need for a selection of enzymes that (a) O-galactosidase. B-mannanase hydrolyses the mannose back have similar activities but are active under different condi bone internally and B-mannosidase hydrolyses non-reducing, tions or (b) have different substrates. For instance, an enzyme terminal mannose residues. C-galactosidase hydrolyses capable of catalyzing a reaction at 50° C. may be so inefficient O-linked galactose groups. at 35° C., that its use at the lower temperature will not be 0.142 Galactomannan polysaccharides and the enzymes feasible. For this reason, laundry detergents generally contain of the invention that degrade them have a variety of applica a selection of proteolytic enzymes (e.g., polypeptides of the tions. Guar is commonly used as a thickening agent in food invention), allowing the detergent to be used over a broad and is utilized in hydraulic fracturing in oil and gas recovery. US 2012/0266329 A1 Oct. 18, 2012

Consequently, galactomannanases are industrially relevant erates a detectable signal. The enzyme that generates a detect for the degradation and modification of guar. Furthermore, a able signal can comprise an alpha-galactosidase, an antibiotic need exists for thermostable galactomannases that are active (e.g., chloramphenicol acetyltransferase) or a kinase. The in extreme conditions associated with oil drilling and well detectable moiety domain can comprise a radioactive isotope. stimulation. 0149. In one aspect, the chimeric protein is a recombinant 0143. There are other applications for these enzymes in fusion protein. In one aspect, the intein domain splicing activ various industries, such as in the beet sugar industry. 20-30% ity results in cleavage of the enzyme domain from the intein of the domestic U.S. Sucrose consumption is sucrose from domain and detectable domain. The intein domain splicing Sugar beets. Raw beet Sugar can contain a small amount of activity can result in cleavage of the enzyme domain from the raffinose when the Sugar beets are stored before processing intein domain and detectable domain and cleavage of the and rotting begins to set in. Raffinose inhibits the crystalliza detectable domain from the intein domain. In one aspect, the tion of Sucrose and also constitutes a hidden quantity of intein domain splicing activity results in cleavage of the Sucrose. Thus, there is merit to eliminating raffinose from raw detectable domain from the intein domain. In one aspect, the beet Sugar. C-Galactosidase has also been used as a digestive intein domain has only splicing activity. The intein domain aid to break down raffinose, stachyose, and Verbascose in can have only cleaving activity. Such foods as beans and other gassy foods. 0150. In one aspect, at least one domain is separated from 0144 B-Galactosidases of the invention can be used for another domain by a linker. The linker can be a flexible linker. the production of lactose-free dietary milk products. Addi The intein domain can be separated from the detectable moi tionally, B-galactosidases of the invention can be used for the ety domain and the enzyme domain by a linker. enzymatic synthesis of oligosaccharides via transglycosyla tion reactions. Isomerases 0145 is well known as a debranching enzyme of pullulan and starch. The enzyme of the invention can 0151. In one aspect, the invention provides isomerases, hydrolyze C-1,6-glucosidic linkages on these polymers. e.g. Xylose isomerases, polynucleotides encoding them, and Starch degradation for the production or Sweeteners (glucose methods of making and using these polynucleotides and or maltose) is a very important industrial application of this polypeptides. In one aspect, the invention is directed to enzyme. The degradation of starch is developed in two stages. polypeptides, e.g., enzymes, having an isomerase activity, The first stage involves the liquefaction of the substrate with e.g. Xylose isomerase activity, including thermostable and C.-amylase, and the second stage, or saccharification stage, is thermotolerant isomerase activity, e.g. Xylose isomerase performed by B-amylase with pullalanase added as a activity, and polynucleotides encoding these enzymes, and debranching enzyme, to obtain better yields. making and using these polynucleotides and polypeptides. 014.6 Endoglucanases of the invention can be used in a 0152. In one aspect, the invention provides xylose variety of industrial applications. For instance, the endoglu isomerase enzymes, polynucleotides encoding the enzymes, canases of the invention can hydrolyze the internal B-1,4- methods of making and using these polynucleotides and glycosidic bonds in cellulose, which may be used for the polypeptides. The polypeptides of the invention can be used conversion of plant biomass into fuels and chemicals. Endo in a variety of agricultural and industrial contexts. For glucanases of the invention also have applications in deter example, the polypeptides of the invention can be used for gent formulations, the textile industry, in animal feed, in converting glucose to fructose or for manufacturing high waste treatment, oil drilling and well stimulation, and in the content fructose syrups in large quantities. Other examples fruit juice and brewing industry for the clarification and include use of the polypeptides of the invention in confec extraction of juices. tionary, brewing, alcohol and soft drinks production, and in diabetic foods and Sweeteners. Inteins Laccases 0147 In one aspect, the invention provides inteins, poly nucleotides encoding them, and methods of making and using 0153. In one aspect, the invention provides laccases, poly these polynucleotides and polypeptides. In another aspect, nucleotides encoding them, and methods of making and using the invention provides a chimeric protein comprising at least these polynucleotides and polypeptides. In one aspect, the three domains, wherein the first domain comprises at least invention is directed to polypeptides, e.g., enzymes, having a one enzyme domain or a binding protein domain, the second laccase activity, including thermostable and thermotolerant domain comprises at least one intein domain and a third laccase activity, and polynucleotides encoding these domain comprising a detectable moiety domain, at least one enzymes, and making and using these polynucleotides and intein domain is positioned between at least one enzyme or polypeptides. binding protein and at least one detectable moiety domain, 0154) In one aspect, the invention provides methods of and the intein domain has at least one cleavage or splicing depolymerizing lignin, e.g., in a pulp or paper manufacturing activity. process, using a polypeptide of the invention. In another 0148. In one aspect, the detectable moiety domain com aspect, the invention provides methods for oxidizing products prises a detectable peptide or polypeptide. The detectable that can be mediators of laccase-catalyzed oxidation reac peptide or a polypeptide can be a fluorescent peptide or tions, e.g., 2.2-azinobis-(3-ethylbenzthiazoline-6-sulfonate) polypeptide. The detectable peptide or a polypeptide can be a (ABTS), 1-hydroxybenzotriazole (HBT), 2.2.6,6-tetrameth bioluminescent or a chemiluminescent peptide or polypep ylpiperidin-1-yloxy (TEMPO), dimethoxyphenol, dihy tide. In one aspect, the bioluminescent or chemiluminescent droxyfumaric acid (DHF) and the like. polypeptide comprises a green fluorescent protein (GFP), an 0155 Laccases are a subclass of the multicopper oxidase aequorin, an obelin, a mnemiopsin or aberovin. In one aspect, Super family of enzymes, which includes ascorbate oxidases the detectable moiety domain comprises an enzyme that gen and the mammalian protein, . Laccases are one US 2012/0266329 A1 Oct. 18, 2012 of the oldest known enzymes and were first implicated in the acylated with oleate to give components of cocoa butter oxidation of urushiol and laccol. In one aspect, reactions equivalents. In alternative aspects, the proportions of POS, catalyzed by laccases of the invention comprises the oxida POP and SOS can be varied according to: Stearate to palmitate tion of phenolic Substrates. The major target application has ratio; selectivity of enzyme for palmitate versus Stearate; or been in the delignification of wood fibers during the prepara enzyme enantioselectivity (could alter levels of POS/SOP). tion of pulp. One-pot synthesis of cocoa butter equivalents or other cocoa butter alternatives is possible using this aspect of the inven Lipases tion. 0156. In one aspect, the invention provides lipases, poly 0160. In one aspect, lipases that exhibit regioselectivity nucleotides encoding them, and methods of making and using and/or chemoselectivity are used in the structure synthesis of these polynucleotides and polypeptides. In one aspect, the lipids or in the processing of lipids. Thus, the methods of the invention is directed to polypeptides, e.g., enzymes, having a invention use lipases with defined regio-specificity or defined lipase activity, including thermostable and thermotolerant chemoselectivity (e.g., a fatty acid specificity) in a biocata lipase activity, and polynucleotides encoding these enzymes, and making and using these polynucleotides and polypep lytic synthetic reaction. For example, the methods of the tides. invention can use lipases with SN1, SN2 and/or SN3 regio 0157. In one aspect, the lipases of the invention can be specificity, or combinations thereof. In one aspect, the meth used in a variety of pharmaceutical, agricultural and indus ods of the invention use lipases that exhibit regioselectivity trial contexts, including the manufacture of cosmetics and for the 2-position of a triacylglyceride (TAG). This SN2 regi nutraceuticals. In one aspect, the lipases of the invention are oselectivity can be used in the synthesis of a variety of struc used in the biocatalytic synthesis of structured lipids (lipids tured lipids, e.g., triacylglycerides (TAGS), including 1,3- that contain a defined set offatty acids distributed in a defined DAGs and components of cocoa butter. manner on the glycerol backbone), including cocoa butter 0.161 The methods and compositions (lipases) of the alternatives (CBA), lipids containing poly-unsaturated fatty invention can be used in the biocatalytic synthesis of struc acids (PUFAs), diacylglycerides, e.g., 1,3-diacyl glycerides tured lipids, and the production of nutraceuticals (e.g., poly (DAGs), monoglycerides, e.g., 2-monoglycerides (MAGs) unsaturated fatty acids and oils), various foods and food addi and triacylglycerides (TAGs). In one aspect, the polypeptides tives (e.g., emulsifiers, fat replacers, margarines and spreads), of the invention are used to modify oils, such as fish, animal cosmetics (e.g., emulsifiers, creams), pharmaceuticals and and vegetable oils, and lipids, such as poly-unsaturated fatty drug delivery agents (e.g., liposomes, tablets, formulations), acids. The lipases of the invention can modify oils by hydroly and animal feed additives (e.g., polyunsaturated fatty acids, sis, alcoholysis, esterification, transesterification and/or Such as linoleic acids) comprising lipids made by the struc interesterification. The methods of the invention use lipases tured synthesis methods of the invention or processed by the with defined regio-specificity or defined chemoselectivity in methods of the invention biocatalytic synthetic reactions. In another aspect, the 0162. In one aspect, lipases of the invention can act on polypeptides of the invention are used to synthesize enantio fluorogenic fatty acid (FA) esters, e.g., umbelliferyl FA merically pure chiral products. esters. In one aspect, profiles of FA specificities of lipases 0158. The invention provides lipase enzymes, polynucle made or modified by the methods of the invention can be otides encoding the enzymes, methods of making and using obtained by measuring their relative activities on a series of these polynucleotides and polypeptides. The polypeptides of umbelliferyl FA esters, such as palmitate, Stearate, oleate, the invention can be used in a variety of pharmaceutical, laurate, PUFA, butyrate. agricultural and industrial contexts, including the manufac 0163 The methods and compositions (lipases) of the ture of cosmetics and nutraceuticals. In one aspect, the invention can be used to synthesize enantiomerically pure polypeptides of the invention are used in the biocatalytic chiral products. In one aspect, the methods and compositions synthesis of structured lipids (lipids that contain a defined set (lipases) of the invention can be used to prepare a D-amino offatty acids distributed in a defined manner on the glycerol acid and corresponding esters from a racemic mix. For backbone), including cocoa butter alternatives, poly-unsatur example, D-aspartic acid can be prepared from racemic ated fatty acids (PUFAs), 1,3-diacyl glycerides (DAGs), aspartic acid. In one aspect, optically active D-homopheny 2-monoglycerides (MAGs) and triacylglycerides (TAGS), lalanine and/or its esters are prepared. The enantioselectively such as 1,3-dipalmitoyl-2-oleoylglycerol (POP), 1,3-dis synthesized D-homophenylalanine can be starting material tearoyl-2-oleoylglycerol (SOS), 1-palmitoyl-2-oleoyl-3- for many drugs, such as Enalapril, Lisinopril, and Quinapril, stearoylglycerol (POS) or 1-oleoyl-2,3-dimyristoylglycerol used in the treatment of hypertension and congestive heart (OMM), long chain polyunsaturated fatty acids such as failure. The D-aspartic acid and its derivatives made by the arachidonic acid, docosahexaenoic acid (DHA) and eicosap methods and compositions of the invention can be used in entaenoic acid (EPA). pharmaceuticals, e.g., for the inhibition of arginioSuccinate 0159. In one aspect, the invention provides synthesis (us synthetase to prevent or treat sepsis or -induced sys ing lipases of the invention) of a triglyceride mixture com temic hypotension or as immunosuppressive agents. The posed of POS (Palmitic-Oleic-Stearic), POP (Palmitic-Oleic D-aspartic acid and its derivatives made by the methods and Palmitic) and SOS (Stearic-Oleic-Stearic) from glycerol. compositions of the invention can be used as taste modifying This synthesis uses free fatty acids versus fatty acid esters. In compositions for foods, e.g., as Sweeteners (e.g., ALI one aspect, this reaction can be performed in one pot with TAMETM). For example, the methods and compositions (li sequential addition of fatty acids using crude glycerol and pases) of the invention can be used to synthesize an optical free fatty acids and fatty acid esters. In one aspect, Stearate isomer S(+) of 2-(6-methoxy-2-naphthyl)propionic acid and palmitate are mixed together to generate mixtures of from a racemic (R.S) ester of 2-(6-methoxy-2-naphthyl)pro DAGs. In one aspect, the diacylglycerides are Subsequently pionic acid. US 2012/0266329 A1 Oct. 18, 2012

0164. In one aspect, the methods and compositions (li substituted carboxylic acid or an ester thereof, i.e. a fluori pases) of the invention can be used to for stereoselectively nated, chlorinated or bromated carboxylic acid or an ester hydrolyzing racemic mixtures of esters of 2-substituted acids, thereof. e.g., 2-aryloxy Substituted acids, such as R-2-(4-hydroxyphe noxy)propionic acid, 2-arylpropionic acid, ketoprofen to Syn Monooxygenases thesize enantiomerically pure chiral products. 0170 In one aspect, the invention provides monooxyge 0.165. The methods and compositions (lipases) of the nases, polynucleotides encoding them, and methods of mak invention can be used to hydrolyze oils, such as fish, animal ing and using these polynucleotides and polypeptides. In one and vegetable oils, and lipids, Such as poly-unsaturated fatty aspect, the invention is directed to polypeptides, e.g., acids. In one aspect, the polypeptides of the invention are used enzymes, having a monooxygenase activity, including ther process fatty acids (such as poly-unsaturated fatty acids), e.g., mostable and thermotolerant monooxygenase activity, and fish oil fatty acids, for use in or as a feed additive. Addition of polynucleotides encoding these enzymes, and making and poly-unsaturated fatty acids PUFAs to feed for dairy cattle using these polynucleotides and polypeptides. has been demonstrated to result in improved fertility and milk 0171 In one aspect, the monooxygenases of the invention yields. Fish oil contains a high level of PUFAs and therefore have commercial utility as biocatalysts for use in the synthe is a potentially inexpensive source for PUFAS as a starting sis of aromatic and aliphatic esters and their derivatives. Such material for the methods of the invention. The biocatalytic as acids and alcohols. In one aspect, the monooxygenases of the invention are used in the catalysis of Sulfoxidation reac methods of the invention can process fish oil under mild tions. In one aspect, the invention provides Baeyer-Villiger conditions, thus avoiding harsh conditions utilized in some monooxygenases, polynucleotides encoding the Baeyer-Vil processes. Harsh conditions may promote unwanted isomer liger monooxygenases, and methods of using these Baeyer ization, polymerization and oxidation of the PUFAs. In one Villiger monooxygenases and polynucleotides. In one aspect, aspect, the methods of the invention comprise lipase-cata the invention provides methods of producing chiral synthetic lyzed total hydrolysis of fish-oil or selective hydrolysis of intermediates using Baeyer-Villiger monooxygenases. PUFAs from fish oil to provide a mild alternative that would 0172 In one aspect, the monooxygenase activity com leave the high-value PUFAs intact. In one aspect, the methods prises catalysis of Sulfoxidation reactions. The monooxyge further comprise hydrolysis of lipids by chemical or physical nase activity can comprise an asymmetric Sulfoxidation reac splitting of the fat. tion. The monooxygenase activity can be enantiospecific. In 0166 In one aspect, the lipases and methods of the inven one aspect, it can generate a substantially chiral product. tion are used for the total hydrolysis of fish oil. Lipases can be 0173. In one aspect, the monooxygenase activity com screened for their ability to catalyze the total hydrolysis of prises generation of an ester or a lactonehaving at least one of fish oil under different conditions using. In alternative the following structures: aspects, a single or multiple lipases are used to catalyze the total splitting of the fish oil. Several lipases of the invention may need to be used, owing to the presence of the PUFAs. In one aspect, a PUFA-specific lipase of the invention is com bined with a general lipase to achieve the desired effect. R3 Xr 0167. The methods and compositions (lipases) of the R. R. invention can be used to catalyze the partial or total hydrolysis O of other oils, e.g. olive oils, that do not contain PUFAs. 0168 The methods and compositions (lipases) of the 0.174 wherein: R. R. R. and Rare each independently invention can be used to catalyze the hydrolysis of PUFA selected from —H, substituted or unsubstituted alkyl, alk glycerol esters. These methods can be used to make feed enyl, alkynyl, aryl, heteroaryl, cycloalkyl, and heterocyclic; additives. In one aspect, lipases of the invention catalyze the wherein the substituted groups are substituted with one or release of PUFAs from simple esters and fish oil. Standard more of lower alkyl, hydroxy, alkoxy, mercapto, cycloalkyl, assays and analytical methods can be utilized. heterocyclic, aryl, heteroaryl, aryloxy, and halogen, or two or 0169. The methods and compositions (lipases) of the more of R. R. R. and Ramay togetherform cyclic moieties. invention can be used to selectively hydrolyze saturated esters and, R' is selected from substituted or unsubstituted alkylene, over unsaturated esters into acids or alcohols. The methods alkenylene, alkynylene, arylene, heteroarylene, cycloalky and compositions (lipases) of the invention can be used to lene, and heterocyclic; wherein the substitutions are substi treat latexes for a variety of purposes, e.g., to treat latexes used tuted with one or more of lower alkyl, hydroxy, alkoxy, mer in hair fixative compositions to remove unpleasant odors. The capto, cycloalkyl, heterocyclic, aryl, heteroaryl, aryloxy, and methods and compositions (lipases) of the invention can be halogen. used in the treatment of a lipase deficiency in an animal, e.g., 0.175. In one aspect, the monooxygenase activity com a mammal. Such as a human. The methods and compositions prises oxidation of a cycloalkanone to produce a chiral lac (lipases) of the invention can be used to prepare lubricants, tone. The cycloalkanone can comprise a cyclobutanone, a Such as hydraulic oils. The methods and compositions (li cyclopentanone, a cyclohexanone, a 2-methylcyclopen pases) of the invention can be used in making and using tanone, a 2-methylcyclohexanone, a cyclohex-2-ene-1-one, a detergents. The methods and compositions (lipases) of the 2-(cyclohex-1-enyl)cyclohexanone, a 12-cyclohexanedione, invention can be used in processes for the chemical finishing a 1.3-cyclohexanedione or a 14-cyclohexanedione. of fabrics, fibers or yarns. In one aspect, the methods and 0176 In one aspect, the monooxygenase activity com compositions (lipases) of the invention can be used for prises a chlorophenol 4-monooxygenase activity or a Xylene obtaining flame retardancy in a fabric using, e.g., a halogen monooxygenase activity. US 2012/0266329 A1 Oct. 18, 2012

0177. The invention provides a pharmaceutical composi insensitive nitroreductases catalyze nitroreduction in a series tion comprising a polypeptide of the invention. of two electron transfers, first via the nitroso and then the 0.178 The invention provides a method for converting a hydroxylamine intermediates before forming the amine. ketone to its corresponding ester comprising contacting the ketone with a polypeptide of the invention under conditions Nitrilases wherein the polypeptide catalyzes the conversion of the ketone to its corresponding ester. In one aspect, the polypep 0184. In one aspect, the invention provides nitrilases, tide has an monooxygenase activity that is enantiospecific to polynucleotides encoding them, and methods of making and generate a Substantially chiral product. In one aspect, the ester using these polynucleotides and polypeptides. In one aspect, is an aromatic or an aliphatic ester. the invention is directed to polypeptides, e.g., enzymes, hav 0179 The invention provides a method for converting a ing a activity, including thermostable and thermotol cycloaliphatic ketone to its corresponding lactone comprising erant nitrilase activity, and polynucleotides encoding these contacting the cycloaliphatic ketone with a polypeptide of the enzymes, and making and using these polynucleotides and invention under conditions wherein the polypeptide catalyzes polypeptides. the conversion of the cycloaliphatic ketone to its correspond 0185. Nitrilases of the invention can be used for hydrolyz ing lactone. In one aspect, the polypeptide has an monooxy ing a nitrile to a carboxylic acid. In one embodiment, the genase activity that is enantiospecific to generate a Substan conditions of the reaction comprise aqueous conditions. In tially chiral product. In one aspect, the ester or lactone has at another embodiment, the conditions comprise a pH of about least one of the following structures: 8.0 and/or a temperature from about 37° C. to about 45° C. Nitrilases of the invention can also be used for hydrolyzing a R cyanohydrin moiety or an aminonitrile moiety of a molecule. Alternatively, the nitrilases of the invention can be used for R3 O R4 R iii. R making a chiral C-hydroxy acid molecule, a chiralamino acid molecule, a chiral B-hydroxy acid molecule, or a chiral R.X R. O O gamma-hydroxy acid molecule. In one embodiment, the chiral molecule is an (R)-enantiomer. In another embodi ment, the chiral molecule is an (S)-enantiomer. In one 0180 wherein: R. R. R. and R are each independently embodiment of the invention, one particular enzyme can have selected from —H, substituted or unsubstituted alkyl, alk R-specificity for one particular substrate and the same enyl, alkynyl, aryl, heteroaryl, cycloalkyl, and heterocyclic; enzyme can have S-specificity for a different particular sub wherein the substituted groups are substituted with one or Strate. more of lower alkyl, hydroxy, alkoxy, mercapto, cycloalkyl, 0186. In one aspect, nitrilases of the invention can be used heterocyclic, aryl, heteroaryl, aryloxy, and halogen, or two or for making a composition oran intermediate thereof, wherein more of R. R. R. and R may togetherform cyclic moieties, the nitrilase of the invention hydrolyzes a cyanohydrin or a and, R is selected from substituted or unsubstituted alkylene, aminonitrile moiety. In one embodiment, the composition or alkenylene, alkynylene, arylene, heteroarylene, cycloalky intermediate thereof comprises (S)-2-amino-4-phenyl lene, and heterocyclic; wherein the substitutions are substi butanoic acid. In a further embodiment, the composition or tuted with one or more of lower alkyl, hydroxy, alkoxy, mer intermediate thereof comprises an L-amino acid. In a further capto, cycloalkyl, heterocyclic, aryl, heteroaryl, aryloxy, and embodiment, the composition comprises a food additive or a halogen. pharmaceutical drug. 0187. In another aspect, nitrilases of the invention can be Nitroreductases used for making an (R)-ethyl 4-cyano-3-hydroxybutyric acid, 0181. In one aspect, the invention provides nitroreduc wherein the nitrilase of the invention acts upon a hydroxyglu tases, polynucleotides encoding them, and methods of mak taryl nitrile and selectively produces an (R)-enantiomer, so as ing and using these polynucleotides and polypeptides. In one to make (R)-ethyl 4-cyano-3-hydroxybutyric acid. In one aspect, the invention is directed to polypeptides, e.g., embodiment, the ee is at least 95% or at least 99%. In another enzymes, having a nitroreductase activity, including thermo embodiment, the hydroxyglutaryl nitrile comprises 1,3-di stable and thermotolerant nitroreductase activity, and poly cyano-2-hydroxy-propane or 3-hydroxyglutaronitrile. nucleotides encoding these enzymes, and making and using 0188 In another aspect, nitrilases of the invention can be these polynucleotides and polypeptides. used for making an (S)-ethyl 4-cyano-3-hydroxybutyric acid, 0182 Nitroreductases can catalyze the six-electron reduc wherein the nitrilase of the invention acts upon a hydroxyglu tion of nitro compounds to the corresponding amines. Amines taryl nitrile and selectively produces an (S)-enantiomer, so as have a variety of applications as synthons and advanced phar to make (S)-ethyl 4-cyano-3-hydroxybutyric acid. maceutical intermediates. There are markets for both aro 0189 In another aspect, the nitrilases of the invention can matic amines and chiral aliphatic amines. be used for making a (R)-mandelic acid, wherein the nitrilase 0183 Nitroreductases of the invention fall in to two main of the invention acts upon a mandelonitrile to produce a classes. These are the -sensitive and oxygen-insensi (R)-mandelic acid. In one embodiment, the (R)-mandelic tive nitroreductases. The oxygen-sensitive enzyme can cata acid comprises (R)-2-chloromandelic acid. In another lyze nitroreduction only under anaerobic conditions. A nitro embodiment, the (R)-mandelic acid comprises an aromatic anion radical is formed by a one-electron transfer and is ring Substitution in the ortho-, meta-, or para-positions; a immediately reoxidized in the presence of oxygen thus gen 1-naphthyl derivative of (R)-mandelic acid, a pyridyl deriva erating a futile cycle whereby reducing equivalents are con tive of (R)-mandelic acid or a thienyl derivative of (R)-man sumed without nitroreduction. On the other hand the oxygen delic acid or a combination thereof. US 2012/0266329 A1 Oct. 18, 2012 20

0190. In another aspect, the nitrilases of the invention can a pectinase activity, and polynucleotides encoding these be used for making a (S)-mandelic acid, wherein the nitrilase enzymes, and making and using these polynucleotides and of the invention acts upon a mandelonitrile to produce a polypeptides. (S)-mandelic acid. In one embodiment, the (S)-mandelic acid 0197) The pectate lyases, e.g. pectinases, of the invention comprises (S)-methylbenzyl cyanide and the mandelonitrile can be used to catalyze the beta-elimination or hydrolysis of comprises (S)-methoxy-benzyl cyanide. In one embodiment, pectin and/or polygalacturonic acid, such as 1.4-linked alpha the (S)-mandelic acid comprises an aromatic ring Substitution D-galacturonic acid. They can be used in variety of industrial in the ortho-, meta-, or para-positions; a 1-naphthyl derivative applications, e.g., to treat plant cell walls, such as those in of (S)-mandelic acid, a pyridyl derivative of (S)-mandelic cotton or other natural fibers. In another exemplary industrial acid or a thienyl derivative of (S)-mandelic acid or a combi application, the polypeptides of the invention can be used in nation thereof. textile Scouring. 0.198. In one aspect, pectate activity comprises 0191 In yet another aspect, the nitrilases of the invention catalysis of beta-elimination (trans-elimination) or hydroly can be used for making a (S)-phenyl lactic acid derivative or sis of pectin or polygalacturonic acid (pectate). The pectate a (R)-phenylactic acid derivative, wherein the nitrilase of the lyase activity can comprise the breakup or dissolution of plant invention acts upon a phenylactonitrile and selectively pro cell walls. The pectate lyase activity can comprise beta-elimi duces an (S)-enantiomer oran (R)-enantiomer, thereby pro nation (trans-elimination) or hydrolysis of 1.4-linked alpha ducing an (S)-phenyl lactic acid derivative oran (R)-phenyl D-galacturonic acid. The pectate lyase activity can comprise lactic acid derivative. catalysis of beta-elimination (trans-elimination) or hydroly sis of methyl-esterified galacturonic acid. The pectate lyase P450 Enzymes activity can be exo-acting or endo-acting. In one aspect, the pectate lyase activity is endo-acting and acts at random sites 0.192 In one aspect, the invention provides P450 enzymes, within a polymer chain to give a mixture of oligomers. In one polynucleotides encoding them, and methods of making and aspect, the pectate lyase activity is exo-acting and acts from using these polynucleotides and polypeptides. In one aspect, one end of a polymer chain and produces monomers or the invention is directed to polypeptides, e.g., enzymes, hav dimers. The pectate lyase activity can catalyze the random ing a P450 enzymatic activity, including thermostable and cleavage of alpha-1,4-glycosidic linkages in pectic acid (po thermotolerant P450 enzymatic activity, and polynucleotides lygalacturonic acid) by trans-elimination or hydrolysis. The encoding these enzymes, and making and using these poly pectate lyase activity can comprise activity the same or simi nucleotides and polypeptides. lar to pectate lyase (EC 4.2.2.2), poly(1,4-alpha-D-galactur 0193 P450s are oxidative enzymes that are widespread in onide) lyase, polygalacturonate lyase (EC 4.2.2.2), pectin nature and polypeptides of the invention having P450 activity lyase (EC 4.2.2.10), polygalacturonase (EC 3.2.1.15), exo can be used in processes such as detoxifying Xenobiotics, polygalacturonase (EC 3.2.1.67), exo-polygalacturonate catabolism of unusual carbon sources and biosynthesis of lyase (EC 4.2.2.9) or exo-poly-alpha-galacturonosidase (EC secondary metabolites (e.g., detoxification of toxic composi 3.2.1.82). The pectate lyase activity can comprise beta-elimi tion, e.g., pesticides, poisons, chemical warfare agents and nation (trans-elimination) or hydrolysis of galactan to galac the like). These activate molecular oxygen using tose orgalactooligomers. The pectate lyase activity can com an - center and utilize a electron shuttle to prise beta-elimination (trans-elimination) or hydrolysis of a Support the epoxidation reaction. plant fiber. The plant fiber can comprise cotton fiber, hemp 0194 In one aspect, the P450 activity comprises a fiber or flax fiber. monooxygenation reaction. In one aspect, the P450 activity 0199 The pectate lyases, e.g. pectinases, of the invention comprises catalysis of incorporation of oxygen into a Sub can be used for hydrolyzing, breaking up or disrupting a strate. In one aspect, the P450 activity can further comprise pectin- or pectate (polygalacturonic acid)-comprising com hydroxylation of aliphatic or aromatic carbons. In another position, for liquefying or removing a pectin or pectate (po aspect, the P450 activity can comprise epoxidation. Alterna lygalacturonic acid) from a composition. Alternatively, the tively, the P450 activity can comprise N-O-, or S-dealkyla pectate lyases, e.g. pectinases, of the invention can be used in tion. In one aspect, the P450 activity can comprise dehaloge detergent compositions. In one aspect, the pectate lyase is a nation. In another aspect the P450 activity can comprise nonsurface-active pectate lyase or a surface-active pectate . Alternatively, the P450 activity can lyase. The pectate lyase can be formulated in a non-aqueous comprise N-oxidation or N-hydroxylation. In one aspect, the liquid composition, a cast Solid, a granular form, a particulate P450 activity can comprise sulphoxide formation. form, a compressed tablet, a gel form, a paste or a slurry form. 0.195. In one aspect, the epoxidase activity further com 0200. In one aspect, the pectate lyases, e.g. pectinases, of prises an alkene Substrate. The epoxidase activity can further the invention can be used for washing an object. In another comprise production of a chiral product. In one aspect, the aspect, textiles or fabrics comprise a polypeptide of the inven epoxidase activity can be enantioselective. tion, or a polypeptide encoded by a nucleic acid of the inven tion, wherein the polypeptide has pectate lyase, e.g. pectinase Pectate Lyases activity. Additionally, the pectate lyases, e.g. pectinases, of the invention can be used for fiber, thread, textile or fabric 0196. In one aspect, the invention providespectate lyases, scouring. In one aspect, the pectate lyase is an alkaline active e.g. pectinases, polynucleotides encoding them, and methods and thermostable pectate lyase. The desizing and scouring of making and using these polynucleotides and polypeptides. treatments can be combined in a single bath. The method can In one aspect, the invention is directed to polypeptides, e.g., further comprise addition of an alkaline and thermostable enzymes, having a pectate lyase, e.g. a pectinase activity, amylase. The desizing or scouring treatments can comprise including thermostable and thermotolerant pectate lyase, e.g. conditions of between about pH 8.5 to pH 10.0 and tempera US 2012/0266329 A1 Oct. 18, 2012

tures of at about 40° C. The method can further comprise the phosphoric group and the 5' carbon of the phosphodiester addition of a bleaching step. The desizing, scouring and bridge. The best known of the class 3' enzymes is a phos bleaching treatments can be done simultaneously or sequen phodiesterase from the venom of the rattlesnake or from a tially in a single-bath container. The bleaching treatment can rustle's viper, which hydrolyses all the 3' bonds in either RNA comprise hydrogen peroxide or at least one peroxy compound or DNA liberating nearly all the nucleotide units as nucleotide that can generate hydrogen peroxide when dissolved in water, 5' phosphates. This enzyme requires a free 3' hydroxyl group or combinations thereof, and at least one bleach activator. The on the terminal nucleotide residue and proceeds stepwise fiber, thread, textile or fabric can comprise a cellulosic mate from that end of the polynucleotide chain. This enzyme and rial. The cellulosic material can comprise a crude fiber, a yarn, all other which attack only at the ends of the poly a woven or knit textile, a cotton, a linen, a flax, a ramie, a nucleotide chains are called . The 5' enzymes are rayon, a hemp, a jute or a blend of natural or synthetic fibers. represented by a from bovine spleen, also 0201 Alternatively, the pectate lyases, e.g. pectinases, of an exonuclease, which hydrolyses all the 5' linkages of both the invention can be used in feeds or foods. For example, the DNA and RNA and thus liberates only nucleoside 3' phos pectate lyases, e.g. pectinases, of the invention can be used to phates. It begins its attack at the end of the chain having a free improve the extraction of oil from an oil-rich plant material. 3' hydroxyl group. In one aspect, the oil-rich plant material comprises an oil-rich 0209 enzymes remove phosphate from phytic seed. The oil can be a soybean oil, an olive oil, a rapeseed acid (inositol hexaphosphoric acid), a compound found in (canola) oil or a Sunflower oil. plants such as corn, wheat and rice. The enzyme has commer 0202 In another aspect, the pectate lyases, e.g. pectinases, cial use for the treatment of animal feed, making the inositol of the invention can be used for preparing a fruit or vegetable of the phytic acid available for animal nutrition. Phytases are juice, syrup, puree or extract. In yet another aspect, the pec used to improve the utilization of natural phosphorus in ani tate lyases, e.g. pectinases, of the invention can used for mal feed. Use of phytase as a feed additive enables the animal treating a paper or a paper or wood pulp. Alternatively, the to metabolize a larger degree of its cereal feed's natural min invention provides papers or paper products or paper pulps eral content thereby reducing or altogether eliminating the comprising a pectate lyase of the invention, or a polypeptide need for synthetic phosphorus additives. More important than encoded by a nucleic acid of the invention. the reduced need for phosphorus additives is the correspond 0203. In yet another aspect, the invention provides phar ing reduction of phosphorus in pig and chicken waste. Many maceutical compositions comprising a polypeptide of the European countries severely limit the amount of manure that invention, or a polypeptide encoded by a nucleic acid of the can be spread per acre due to concerns regarding phosphorus invention, wherein the polypeptide has pectate lyase, e.g. contamination of ground water. pectinase activity. The pharmaceutical composition can act as 0210 Alkaline phosphatases hydrolyze monophosphate a digestive aid. esters, releasing an organic phosphate and the cognate alco 0204 Alternatively, the invention provides oral care prod hol compound. It is non-specific with respect to the alcohol ucts comprising a polypeptide of the invention, or a polypep moiety and it is this feature which accounts for the many uses tide encoded by a nucleic acid of the invention, wherein the of this enzyme. polypeptide has pectate lyase, e.g. pectinase activity. The oral care product can comprise a toothpaste, a dental cream, a gel Phospholipases or a tooth powder, an odontic, a mouth wash, a pre- or post 0211. In one aspect, the invention provides phospholi brushing rinse formulation, a chewing gum, a lozenge or a pases, polynucleotides encoding them, and methods of mak candy. ing and using these polynucleotides and polypeptides. In one Phosphatases aspect, the invention is directed to polypeptides, e.g., enzymes, having a phospholipase activity, including thermo 0205. In one aspect, the invention provides phosphatases, stable and thermotolerant phospholipase activity, and poly polynucleotides encoding them, and methods of making and nucleotides encoding these enzymes, and making and using using these polynucleotides and polypeptides. In one aspect, these polynucleotides and polypeptides. the invention is directed to polypeptides, e.g., enzymes, hav 0212 Phospholipases are enzymes that hydrolyze the ing a activity, including thermostable and ther ester bonds of phospholipids. Corresponding to their impor motolerant phosphatase activity, and polynucleotides encod tance in the metabolism of phospholipids, these enzymes are ing these enzymes, and making and using these widespread among prokaryotes and eukaryotes. The phos polynucleotides and polypeptides. pholipases affect the metabolism, construction and reorgani 0206 Phosphatases are a group of enzymes that remove Zation of biological membranes and are involved in signal phosphate groups from organophosphate ester compounds. cascades. Several types of phospholipases are known which There are numerous phosphatases, including alkaline phos differ in their specificity according to the position of the bond phatases, and phytases. attacked in the phospholipid molecule. Phospholipase A1 0207 Alkaline phosphatases are widely distributed (PLA1) removes the 1-position fatty acid to produce free fatty enzymes and are composed of a group of enzymes which acid and 1-lyso-2-acylphospholipid. Phospholipase A2 hydrolyze organic phosphate ester bonds at alkaline pH. (PLA2) removes the 2-position fatty acid to produce free fatty 0208 Phosphodiesterases are capable of hydrolyzing acid and 1-acyl-2-lysophospholipid. PLA1 and PLA2 nucleic acids by hydrolyzing the phosphodiester bridges of enzymes can be intra- or extra-cellular, membrane-bound or DNA and RNA. The classification of phosphodiesterases soluble. Intracellular PLA2 is found in almost every mamma depends upon which side of the phosphodiester bridge is lian cell. Phospholipase C (PLC) removes the phosphate moi attacked. The 3' enzymes specifically hydrolyze the ester ety to produce 1.2 diacylglycerol and phospho base. Phos linkage between the 3' carbon and the phosphoric group pholipase D (PLD) produces 1,2-diacylglycerophosphate and whereas the 5' enzymes hydrolyze the ester linkage between base group. PLC and PLD are important in cell function and US 2012/0266329 A1 Oct. 18, 2012 22 signaling. PLD had been the dominant phospholipase in bio 0217. In one embodiment, the phospholipases of the catalysis. Patatins are another type of phospholipase, thought invention can be used for refining a crude oil. The polypeptide to work as a PLA. can have a phospholipase activity is in a water Solution that is 0213. The invention provides methods for cleaving a glyc added to the composition. The water level can be between erolphosphate ester linkage comprising the following steps: about 0.5 to 5%. The process time can be less than about 2 (a) providing a polypeptide having a phospholipase activity, hours, less than about 60 minutes, less than about 30 minutes, less than 15 minutes, or less than 5 minutes. The hydrolysis wherein the polypeptide comprises an amino acid sequence conditions can comprise a temperature of between about 25° of the invention, or the polypeptide is encoded by a nucleic C.-70° C. The hydrolysis conditions can comprise use of acid of the invention; (b) providing a composition comprising caustics. The hydrolysis conditions can comprise a pH of a glycerolphosphate ester linkage; and, (c) contacting the between about pH 3 and pH 10, between about pH 4 and pH polypeptide of step (a) with the composition of step (b) under 9, or between about pH 5 and pH 8. The hydrolysis conditions conditions wherein the polypeptide cleaves the glycerolphos can comprise addition of emulsifiers and/or mixing after the phate ester linkage. In one aspect, the conditions comprise contacting of step (c). The methods can comprise addition of between about pH 5 to about 5.5, or, between about pH 4.5 to an emulsion-breaker and/or heat to promote separation of an about 5.0. In one aspect, the conditions comprise a tempera aqueous phase. The methods can comprise degumming ture of between about 40°C. and about 70°C. In one aspect, before the contacting step to collect lecithin by centrifugation the composition comprises a vegetable oil. In one aspect, the and then adding a PLC, a PLC and/or a PLA to remove composition comprises an oilseed phospholipid. In one non-hydratable phospholipids. The methods can comprise aspect, the cleavage reaction can generate a water extractable water degumming of crude oil to less than 10 ppm for edible phosphorylated base and a diglyceride. oils and Subsequent physical refining to less than about 50 0214) Phospholipases of the invention can be used in oil ppm for biodiesel oils. The methods can comprise addition of degumming, wherein the phospholipase is used under condi acid to promote hydration of non-hydratable phospholipids. tions wherein the phospholipase can cleave ester linkages in Phytases an oil, thereby degumming the oil. In one aspect, the oil is a Vegetable oil. In another aspect, the vegetable oil comprises 0218. In one aspect, the invention provides phytases, poly oilseed. The vegetable oil can comprise palm oil, rapeseed oil, nucleotides encoding them, and methods of making and using corn oil, soybean oil, canola oil, Sesame oil, peanut oil or these polynucleotides and polypeptides. In one aspect, the sunflower oil. In one aspect, the method further comprises invention is directed to polypeptides, e.g., enzymes, having a addition of a phospholipase of the invention, another phos phytase activity, including thermostable and thermotolerant pholipase, another enzyme, or a combination thereof. phytase activity, and polynucleotides encoding these enzymes, and making and using these polynucleotides and 0215. In another aspect of the invention, phospholipases polypeptides. of the invention can be used for converting a non-hydratable 0219 Conversion of phytate to inositol and inorganic phospholipid to a hydratable form or for caustic refining of a phosphorous can be catalyzed by phytase enzymes. Phytases phospholipid-containing composition. In the latter use, the such as phytase #EC 3.1.3.8 are capable of catalyzing the polypeptide of the invention can be added before caustic hydrolysis of myo-inositol hexaphosphate to D-myo-inositol refining and the composition comprising the phospholipid 1,2,4,5,6-pentaphosphate and orthophosphate. Other can comprise a plant and the polypeptide can be expressed phytases hydrolyze inositol pentaphosphate to tetra-, tri-, and transgenically in the plant, the polypeptide having a phospho lower phosphates. Acid phosphatases are enzymes that cata lipase activity can be added during crushing of a seed or other lytically hydrolyze a wide variety of phosphate esters. For plant part, or, the polypeptide having a phospholipase activity example, #EC 3.1.3.2 enzymes catalyze the hydrolysis of is added following crushing or prior to refining. The polypep orthophosphoric monoesters to orthophosphate products. tide can be added during caustic refining and varying levels of 0220 Phytases of the invention can be used in producing acid and caustic can be added depending on levels of phos phytase as a feed additive, e.g. for monogastric animals, fish, phorous and levels of free fatty acids. The polypeptide can be poultry, ruminants and other non-ruminants. Phytases of the added after caustic refining: in an intense mixer or retention invention can also be used for producing animal feed from mixer prior to separation; following a heating step; in a cen certain industrial processes, e.g., wheat and corn waste prod trifuge; in a Soapstock; in a washwater; or, during bleaching ucts. In one aspect, the wet milling process of corn produces or deodorizing steps. glutens sold as animal feeds. The addition of phytase 0216. In yet another aspect, the phospholipases of the improves the nutritional value of the feed product. invention can be used for purification of a phytosterol or a 0221) Phytases of the invention may also be used in dietary triterpene. The phytosterol or a triterpene can comprise a aids or in pharmaceutical compositions, for reducing pollu plant sterol. The plant sterol can be derived from a vegetable tion and increasing nutrient availability in an environment or oil. The vegetable oil can comprise a coconut oil, canola oil, environmental sample by degrading environmental phytic cocoa butter oil, corn oil, cottonseed oil, linseed oil, olive oil, acid, for liberating minerals from phytates in plant materials palm oil, peanut oil, oil derived from a rice bran, safflower oil, either in vitro, i.e., in feed treatment processes, or in vivo, i.e., sesame oil, soybean oil or a Sunflower oil. The method can by administering the enzymes to animals. comprise use of nonpolar solvents to quantitatively extract free phytosterols and phytosteryl fatty-acid esters. The phy Polymerases tosterol or a triterpene can comprise a B-sitosterol, a campes terol, a stigmasterol, a Stigmastanol, a B-sitostanol, a sito 0222. In one aspect, the invention provides polymerases, stanol, a desmoSterol, a chalinasterol, a poriferasterol, a polynucleotides encoding them, and methods of making and clionasterol or a brassicasterol. using these polynucleotides and polypeptides. In one aspect, US 2012/0266329 A1 Oct. 18, 2012 the invention is directed to polypeptides, e.g., enzymes, hav desirable to employ proteases of low specificity or mixtures ing a polymerase activity, including thermostable and ther of more specific proteases to obtain the necessary degree of motolerant polymerase activity, and polynucleotides encod degradation. ing these enzymes, and making and using these 0228 Proteases are classified according to their catalytic polynucleotides and polypeptides. mechanisms. The International Union of Biochemistry and 0223) The polymerase enzymes of the invention can have Molecular Biology (IUBMB) recognizes four mechanistic different polymerase activities at various high temperatures. classes: (1) the serine proteases; (2) the proteases; (3) In one aspect, the polymerase activity comprises addition of the aspartic proteases; and (4) the metalloproteases. In addi deoxynucleotides at the 3' hydroxyl end of a polynucleotide. tion, the IUBMB recognizes a class of (oli The invention also provides kits, e.g., diagnostic kits, and gopeptidases) of unknown catalytic mechanism. The serine methods for performing various amplification reactions, e.g., proteases have alkaline pH optima, the metalloproteases are polymerase chain reactions, transcription amplifications, optimally active around neutrality, and the cysteine and aspar chain reactions, self-sustained sequence replication or tic enzymes have acidic pH optima. Serine proteases class Q Beta replicase amplifications. comprises two distinct families: the chymotrypsin family, 0224. In one aspect, the polymerase activity comprises which includes the mammalian enzymes such as chymot addition of nucleotides at the 3' hydroxyl end of a nucleic rypsin, trypsin, elastase, or kallikrein, and the Subtilisin fam acid. The polymerase activity can comprise a 5'-->3' poly ily, which include the bacterial enzymes such as subtilisin. merase activity, a 3'->5' exonuclease activity or a 5'-->3' exo Serine proteases are used for a variety of industrial purposes, activity or all or a combination thereof. In one Such as laundry detergents to aid in the removal of proteina aspect, the polymerase activity comprises only a 5'-->3' poly ceous stains. In the food processing industry, serine proteases merase activity, but not a 3'->5' exonuclease activity or a are used to produce protein-rich concentrates from fish and 5'-->3' exonuclease activity. In another aspect, the polymerase livestock, and in the preparation of dairy products. activity can comprise a 5'-->3' polymerase activity and a 3'->5' 0229. The proteases of the invention can be used in a exonuclease activity, but not a 5'-->3' exonuclease activity. variety of diagnostic, therapeutic, and industrial contexts. Alternatively, the polymerase activity can comprise a 5'-->3' The proteases of the invention can be used as, e.g., an additive polymerase activity and a 5'-->3' exonuclease activity, but not for a detergent, for processing foods and for chemical Syn a 3'->5' exonuclease activity. The polymerase activity can thesis utilizing a reverse reaction. Additionally, the proteases comprise addition ofdUTP or dITP. The polymerase activity of the invention can be used in food processing, brewing, bath can comprise addition of a modified or a non-natural nucle additives, alcohol production, peptide synthesis, enantiose otide to a polynucleotide. Such as an analog of guanine, lectivity, hide preparation in the leather industry, waste man cytosine, thymine, or uracil, e.g., a 2-aminopurine, an agement and animal degradation, silver recovery in the pho inosine or a 5-methylcytosine. tographic industry, medical treatment, silk degumming, biofilm degradation, biomass conversion to ethanol, biode 0225. In one aspect, the polymerase activity can comprise fense, antimicrobial agents and disinfectants, personal care Strand displacement properties. In one aspect, the polymerase and cosmetics, biotech reagents, in increasing starch yield activity comprises reverse transcriptase activity. from corn wet milling and pharmaceuticals such as digestive aids and anti-inflammatory (anti-phlogistic) agents. Proteases 0226. In one aspect, the invention provides proteases, Xylanases polynucleotides encoding them, and methods of making and 0230. In one aspect, the invention provides xylanases, using these polynucleotides and polypeptides. In one aspect, polynucleotides encoding them, and methods of making and the invention is directed to polypeptides, e.g., enzymes, hav using these polynucleotides and polypeptides. In one aspect, ing a protease activity, including thermostable and thermo the invention is directed to polypeptides, e.g., enzymes, hav tolerant protease activity, and polynucleotides encoding these ing a Xylanase activity, including thermostable and thermo enzymes, and making and using these polynucleotides and tolerant Xylanase activity, and polynucleotides encoding polypeptides. these enzymes, and making and using these polynucleotides 0227 Proteases of the invention can be carbonyl hydro and polypeptides. lases which act to cleave peptide bonds of proteins or pep 0231 Xylanases (e.g., endo-1,4-beta-xylanase, EC 3.2.1. tides. Proteolytic enzymes are ubiquitous in occurrence, 8) of the invention can hydrolyze internal B-1,4-xylosidic found in all living organisms, and are essential for cell growth linkages in Xylan to produce Smaller molecular weight xylose and differentiation. The extracellular proteases are of com and Xylo-oligomers. Xylans are polysaccharides formed from mercial value and find multiple applications in various indus 1,4-B-glycoside-linked D-Xylopyranoses. Xylanases of the trial sectors. Industrial applications of proteases include food invention are of considerable commercial value, being used in processing, brewing, alcohol production, peptide synthesis, the food industry, for baking and fruit and vegetable process enantioselectivity, hide preparation in the leather industry, ing, breakdown of agricultural waste, in the manufacture of waste management and animal degradation, silver recovery in animal feed and in pulp and paper production. the photographic industry, medical treatment, silk degum 0232 Arabinoxylanase are major non-starch polysaccha ming, biofilm degradation, biomass conversion to ethanol, rides of cereals representing 2.5-7.1% w/w depending on biodefense, antimicrobial agents and disinfectants, personal variety and growth conditions. The physicochemical proper care and cosmetics, biotech reagents and in increasing starch ties of this polysaccharide are such that it gives rise to viscous yield from corn wet milling. Additionally, proteases are Solutions or evengels under oxidative conditions. In addition, important components of laundry detergents and other prod arabinoxylans have high water-binding capacity and may ucts. Within biological research, proteases are used in purifi have a role in protein foam stability. All of these characteris cation processes to degrade unwanted proteins. It is often tics present problems for several industries including brew US 2012/0266329 A1 Oct. 18, 2012 24 ing, baking, animal nutrition and paper manufacturing. In or beer, in reducing viscosity of plant material, or in increas brewing applications, the presence of Xylan results in wort ing viscosity or gel strength of food products Such as jam, filterability and haze formation issues. In baking applications marmalade, jelly, juice, paste, Soup, Salsa, etc. Xylanases of (especially for cookies and crackers), these arabinoxylans the invention may also be used in hydrolysis of hemicellulose create Sticky doughs that are difficult to machine and reduce for which it is selective, particularly in the presence of cellu biscuit size. In addition, this carbohydrate is implicated in lose. In addition, Xylanases of the invention can also be used rapid rehydration of the baked product resulting in loss of in the production of ethanol, in transformation of a microbe crispiness and reduced shelf-life. For monogastric animal that produces ethanol, in production of oenological tannins feed applications with cereal diets, arabinoxylan is a major and enzymatic composition, in stimulating the natural contributing factor to viscosity of gut contents and thereby defenses of plants, in production of Sugars from hemicellu adversely affects the digestibility of the feed and animal lose Substrates, in the cleaning of fruit, vegetables, mud or growth rate. For ruminant animals, these polysaccharides clay containing soils, in cleaning beer filtration membranes, represent Substantial components of fiber intake and more and in killing or inhibiting microbial cells. complete digestion of arabinoxylans would facilitate higher 0237 Table 1, below, lists the various EC (Enzyme Com feed conversion efficiencies. mission) Numbers along with the corresponding mode of 0233 Xylanases are currently used as additives (dough action for each enzyme class, Subclass and Sub-Subclass. conditioners) in dough processing for the hydrolysis of water Enzyme nomenclature is based upon the recommendations of soluble arabinoxylan. In baking applications (especially for the Nomenclature Committee of the International Union of cookies and crackers), arabinoxylan creates sticky doughs Biochemistry and Molecular Biology (IUBMB). Table 2, that are difficult to machine and reduce biscuit size. In addi below, lists the various EC Numbers along with the corre tion, this carbohydrate is implicated in rapid rehydration of sponding name given to each enzyme class, Subclass and the baked product resulting in loss of crispiness and reduced Sub-Subclass. Tables 1 and 2 list exemplary enzymatic activi shelf-life. ties of polypeptides of the invention, as can be determined by 0234. The enhancement of xylan digestion in animal feed sequence identity (e.g., homology); and in one embodiment a may improve the availability and digestibility of valuable sequence of the invention comprises an enzyme having at carbohydrate and protein feed nutrients. For monogastric ani least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, mal feed applications with cereal diets, arabinoxylan is a 59%, 60%, 61%, 62%, 63%, 64%. 65%, 66%, 67%, 68%, major contributing factor to viscosity of gut contents and 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, thereby adversely affects the digestibility of the feed and 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, animal growth rate. For ruminant animals, these polysaccha 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, rides represent Substantial components of fiber intake and 99%, or more sequence identity (homology) to an enzyme more complete digestion would facilitate higher feed conver encoded by an exemplary sequence of the invention, includ sion efficiencies. It is desirable for animal feed xylanases to ing all odd numbered SEQID NO:1 to SEQID NO:26,897, or be active in the animal stomach. This requires a feed enzyme an exemplary polypeptide of the invention, including all even to have high activity at 37°C. and at low pH for monogastrics numbered SEQID NO:2 to SEQID NO:26,898, and with an (pH 2-4) and near neutral pH for ruminants (pH 6.5-7). The exemplary function as listed in Table 1 or Table 2. enzyme should also possess resistance to animal gut Xyla 0238 Table 3, below, contains the exemplary SEQ ID nases and stability at the higher temperatures involved in feed NO:s of the invention, and the closest hit (BLAST) informa pelleting. As such, there is a need in the art for Xylanase feed tion for the polynucleotides and polypeptides of the inven additives for monogastric feed with high specific activity, tion. This information includes the closest hit organism, activity at 35-40° C. and pH 2-4, half life greater than 30 accession number, definition of the closest hit, EC number, minutes in SGF and a half-life >5 minutes at 85°C. in for percentage amino acid identity and the percent nucleotide mulated State. For ruminant feed, there is a need for Xylanase identity, along with the Evalue for the closest hits. The infor feed additives that have a high specific activity, activity at mation contained in Table 3 identifies exemplary activities of 35-40°C. and pH 6.5-7.0, halflife greater than 30 minutes in polypeptides of the invention, based on sequence identity SRF and stability as a concentrated dry powder. (homology). In one embodiment a sequence of the invention 0235. In one aspect, the Xylanases of the invention are also comprises an enzyme with at least 50%, 51%, 52%. 53%, used in improving the quality and quantity of milk protein 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, production in lactating cows, increasing the amount of 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, soluble saccharides in the stomach and Small intestine of pigs, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, improving late egg production efficiency and egg yields in 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, hens. Additionally, Xylanases of the inventions can be used in 94%. 95%, 96%, 97%, 98%, 99%, or more sequence identity biobleaching and treatment of chemical pulps, biobleaching (homology) to an enzyme as listed in Table 3. and treatment of wood or paper pulps, in reducing lignin in wood and modifying wood, as feed additives and/or Supple TABLE 1 ments or in manufacturing cellulose solutions. Detergent compositions comprising Xylanases of the invention are used EC (Enzyme Commission) Numbers with the corresponding mode of for fruit, vegetables and/or mud and clay compounds. action for each enzyme class, Subclass and Sub-Subclass 0236. In another aspect, Xylanases of the invention can be 1.——.— . used in compositions for the treatments and/or prophylaxis of 1.1.—— Acting on the CH-OH group of donors. 1.1.1.— With NAD(+) or NADP(+) as acceptor. coccidiosis. In yet another aspect, Xylanases of the invention 1.1.2.— With a as acceptor. can be used in the production of water soluble dietary fiber, in 1.1.3.— With oxygen as acceptor. improving the filterability, separation and production of 1.1.4.— With a as acceptor. starch, the beverage industry in improving filterability of wort US 2012/0266329 A1 Oct. 18, 2012 25

TABLE 1-continued TABLE 1-continued EC (Enzyme Commission) Numbers with the corresponding mode of EC (Enzyme Commission) Numbers with the corresponding mode of action or each enzyme class, Subclass and Sub-Subclass action for each enzyme class, Subclass and Sub-Subclass .1.5.— h a quinone or similar compound as acceptor. With a quinone or similar compound as .1.99.— h other acceptors. acceptor. .2.—— ing on the aldehyde or oxo group of donors. With an iron- protein as acceptor. .2.1.— h NAD(+) or NADP(+) as acceptor. With other known acceptors. .2.2.— h a cytochrome as acceptor. With other acceptors. .2.3.— in Oxygen as acceptor. Acting on single donors with .2.4.— h a disulfide as acceptor. incorporation of molecular oxygen. .2.7.— h an iron-sulfur protein as acceptor. With incorporation of two atoms of oxygen. 2.99.— h other acceptors. With incorporation of one atom of oxygen. .3.—— ing on the CH-CH group of donors. Acting on paired donors, with incorporation or .3.1.— h NAD(+) or NADP(+) as acceptor. reduction of molecular oxygen .3.2.— h a cytochrome as acceptor. With 2-oxoglutarate as one donor, and .3.3.— in Oxygen as acceptor. incorporation of one atom each of oxygen into both donors .3.5.— h a quinone or related compound as acceptor. With NADH or NADPH as one donor, and .3.7.— h an iron-sulfur protein as acceptor. incorporation of two atoms of oxygen into one donor 3.99.— h other acceptors. With NADH or NADPH as one donor, and 4.- ...— ing on the CH-NH(2) group of donors. incorporation of one atom of oxygen 4.1.— h NAD(+) or NADP(+) as acceptor. With reduced flavin or flavoprotein as one 4.2.— h a cytochrome as acceptor. donor, and incorporation of one atom of oxygen 4.3.— in Oxygen as acceptor. With a reduced iron-sulfur protein as one .4.4.— h a disulfide as acceptor. donor, and incorporation of one atom of oxygen 4.7.— h an iron-sulfur protein as acceptor. With reduced pteridine as one donor, and 4.99.— h other acceptors. incorporation of one atom of oxygen .5.—— ing on the CH-NH group of donors. With reduced ascorbate as one donor, and .5.1.— h NAD(+) or NADP(+) as acceptor. incorporation of one atom of oxygen .5.3.— in Oxygen as acceptor. With another compound as one donor, and .5.4.— h a disulfide as acceptor. incorporation of one incorporation of one atom of .5.5.— h a quinone or similar compound as acceptor. OXygen .5.8.— h a flavin as acceptor. With oxidation of a pair of donors resulting 5.99. h other acceptors. in the reduction of molecular oxygen to two molecules .6.—— ing on NADH or NADPH. of water .6.1.— h NAD(+) or NADP(+) as acceptor. With 2-oxoglutarate as one donor, and the .6.2.— h a heme protein as acceptor. other dehydrogenated. .6.3.— ha oxygen as acceptor. With NADH or NADPH as one donor, and .6.4.— h a disulfide as acceptor. the other dehydrogenated. .6.5.— h a quinone or similar compound as Acting on Superoxide as acceptor. acceptor. Oxidizing metal ions. h a nitrogenous group as acceptor. With NAD(+) or NADP(+) as acceptor. h a flavin as acceptor. With oxygen as acceptor. W h other acceptors. With flavin as acceptor. Ac ing on other nitrogenous compounds as Acting on CH or CH(2) groups. donors. With NAD(+) or NADP(+) as acceptor. h NAD(+) or NADP(+) as acceptor. With oxygen as acceptor. h a cytochrome as acceptor. With a disulfide as acceptor. h oxygen as acceptor. With a quinone or similar compound as h an iron-sulfur protein as acceptor. acceptor h other acceptors. With other acceptors. ing on a Sulfur group of donors. Acting on iron-sulfur proteins as donors. h NAD(+) or NADP(+) as acceptor. With NAD(+) or NADP(+) as acceptor. h a cytochrome as acceptor. With dinitrogen as acceptor. h oxygen as acceptor. With other, known, acceptors. h a disulfide as acceptor. With H(+) as acceptor. h a quinone or similar compound as Acting on reduced flavodoxin as donor. acceptor. With dinitrogen as acceptor. .8.7.— h an iron-sulfur protein as acceptor. Acting on phosphorus or in 8.98.— h other, known, acceptors. donors. .8.99.— h other acceptors. .20.1.— Acting on phosphorus or arsenic in .9.- ...— ing on a heme group of donors. donors, with NAD(P)(+) as acceptor .9.3.— h oxygen as acceptor. .20.4.— Acting on phosphorus or arsenic in .9.6.— h a nitrogenous group as acceptor. donors, with disulfide as acceptor 9.99. h other acceptors. .20.98.— Acting on phosphorus or arsenic in .10.—.— Ac ing on diphenols and related donors, with other, known acceptors .20.99.— Acting on phosphorus or arsenic in Substances as donors. donors, with other acceptors .10.1.— W h NAD(+) or NADP(+) as acceptor. Acting on X-Handy-H to form an X-y .10.2.— W h a cytochrome as acceptor. bond. .10.3.— W h oxygen as acceptor. With oxygen as acceptor. .10.99.— W h other acceptors. With a disulfide as acceptor. .11.—.— ing on a peroxide as acceptor With other acceptors. roxidases). Other oxidoreductases. .12.—.— ing on hydrogen as donor. . .12.1.— h NAD(+) or NADP(+) as acceptor. Transferring one-carbon groups. .12.2.— h a cytochrome as acceptor. . US 2012/0266329 A1 Oct. 18, 2012 26

TABLE 1-continued TABLE 1-continued EC (Enzyme Commission) Numbers with the corresponding mode of EC (Enzyme Commission) Numbers with the corresponding mode of action for each enzyme class, Subclass and Sub-Subclass action for each enzyme class, Subclass and Sub-Subclass 2.1.2.— Hydroxymethyl-, formyl- and related 3.1.27.— producing other transferases. han 5'-phosphomonoesters. 2.1.3.— Carboxyl- and carbamoyltransferases. 3.1.30.— Endoribonucleases active with either 2.1.4.— Amidinotransferases. ribo- or deoxyribonucleic and producing 5'- 2.2.—— Transferring aldehyde or ketone residues. phosphomonoesters 2.2.1.— Transketolases and transaldolases. 3.1.31.— Endoribonucleases active with either 2.3.—— . ribo- or deoxyribonucleic and producing 3'- 2.3.1.— Transferring groups other than amino- phosphomonoesters acyl groups. 3.2.—— Glycosylases. 2.3.2.— . 3.2.1.— Glycosidases, i.e. enzymes hydrolyzing 2.3.3.— Acyl groups converted into alkyl on O- and S-glycosyl compounds transfer. 3.2.2.— Hydrolyzing N-glycosyl compounds. 2.4.—— Glycosyltransferases. 3.3.—— Acting on ether bonds. 2.4.1.— Hexosyltransferases. 3.3.1.— Thioether and trialkylsulfonium 2.4.2.— Pentosyltransferases. hydrolases. 2.4.99.— Transferring other glycosyl groups. 3.3.2.— Ether hydrolases. 2.5.—— Transferring alkyl or aryl groups, other 3.4.—— Acting on peptide bonds (peptide than methyl groups. hydrolases). 2.6.—— Transferring nitrogenous groups. 3.4.11.— Aminopeptidases. 2.6.1.— (aminotransferases). 3.4.13.— Dipeptidases. 2.6.3.— Oximinotransferases. 3.4.14.— Dipeptidyl-peptidases and tripeptidyl 2.6.99.— Transferring other nitrogenous groups. peptidases. 2.7.—— Transferring phosphorous-containing groups. 3.4.15.— Peptidyl-dipeptidases. 2.7.1.— Phosphotransferases with an alcohol group as 3.4.16.— Serine-type carboxypeptidases. acceptor. 3.4.17.— Metallocarboxypeptidases. 2.7.2.— Phosphotransferases with a carboxyl group as 3.4.18.— Cysteine-type carboxypeptidases. acceptor. 3.4.19.— Omega peptidases. 2.7.3.— Phosphotransferases with a nitrogenous group 3.4.21.— Serine endopeptidases. as acceptor. 3.4.22.— Cysteine endopeptidases. 2.7.4.— Phosphotransferases with a phosphate group as 3.4.23.— Aspartic endopeptidases. acceptor. 3.4.24.— . 2.7.6.— Diphosphotransferases. 3.4.25.— Threonine endopeptidases. 2.7.7.— Nucleotidyltransferases. 3.4.99.— Endopeptidases of unknown catalytic 2.7.8.— Transferases for other substituted phosphate mechanism. groups. 3.5.—— Acting on carbon-nitrogen bonds, other 2.7.9.— Phosphotransferases with paired acceptors. than peptide bonds. 2.8.—— Transferring Sulfur-containing groups. 3.5.1.— In linear amides. 2.8.1.— . 3.5.2.— In cyclic amides. 2.8.2.— . 3.5.3.— In linear amidines. 2.8.3.— CoA-transferases. 3.5.4.— In cyclic amidines. 2.8.4.— Transferring alkylthio groups. 3.55. In nitriles. 2.9.—— Transferring selenium-containing groups. 3.5.99. In other compounds. 2.9.1.— Selenotransferases. 3.6.—— Acting on acid anhydrides. 3.——.— Hydrolases. 3.6.1.— In phosphorous-containing anhydrides. 3.1.—— Acting on ester bonds. 3.6.2.— In Sulfonyl-containing anhydrides. 3.1.1.— Carboxylic ester hydrolases. 3.6.3.— Acting on acid anhydrides; catalyzing 3.1.2.— Thiolester hydrolases. transmembrane movement of Substances 3.1.3.— Phosphoric monoester hydrolases. 3.6.4.— Acting on acid anhydrides; involved in 3.1.4.— Phosphoric diester hydrolases. cellular and Subcellular movement 3.1.5.— Triphosphoric monoester hydrolases. 3.6.5.— Acting on GTP; involved in cellular and 3.1.6.— Sulfuric ester hydrolases. Subcellular movement. 3.1.7.— Diphosphoric monoester hydrolases. 3.7.—— Acting on carbon-carbon bonds. 3.1.8.— Phosphoric triester hydrolases. 3.7.1.— In ketonic Substances. 3.1.11.— producing 5'- 3.8.—— Acting on halide bonds. OSOOO(SCS. 3.8.1.— In C-halide compounds. 3.1.13.— producing 5'- 3.9.—— Acting on phosphorus-nitrogen bonds. OSOOO(SCS. 3.10.—.— Acting on Sulfur-nitrogen bonds. 3.1.14.— Exoribonucleases producing 3'- 3.11.—.— Acting on carbon-phosphorus bonds. OSOOO(SCS. 3.12.—.— Acting on Sulfur-Sulfur bonds. 3.1.15.— Exonucleases active with either ribo- or 3.13.—.— Acting on carbon-sulfur bonds. deoxyribonucleic acid and producing 5'- 4.——.— Lyases. OSOOO(SCS 4.1.—— also lyases. 3.1.16.— Exonucleases active with either ribo- or 4.1.1.— Carboxy-lyases. 4.1.2.— Aldehyde-lyases. deoxyribonucleic acid producing 3'-phosphomonoesters 4.1.3.— Oxo-acid-lyases 3.1.21.— producing 5'- 4.1.99.— Other carbon-carbon lyases. pnos pnomonoesters. 4.2.—— Carbon-oxygen lyases. 3.1.22.— Endodeoxyribonucleases producing 4.2.1.— Hydro-lyases. other than 5'-phosphomonoesters. 4.2.2.— Acting on polysaccharides. 3.1.25.— Site-specific endodeoxyribonucleases 4.2.3.— Acting on phosphates. specific for altered bases. 4.2.99.— Other carbon-oxygen lyases. 3.1.26.— Endoribonucleases producing 5'- 4.3.—— Carbon-nitrogen lyases. phosphomonoesters. 4.3.1.— -lyases. US 2012/0266329 A1 Oct. 18, 2012 27

TABLE 1-continued TABLE 2-continued EC (Enzyme Commission) Numbers with the corresponding mode of EC Numbers with the corresponding name given to each enzyme action for each enzyme class, Subclass and Sub-Subclass class, Subclass and Sub-Subclass. 4.3.2.— Lyases acting on amides, amidines, etc. 1.1.15 D-iditol 2-dehydrogenase. 4.3.3.— Amine-lyases. .1.1.16 Galactitol 2-dehydrogenase. 4.3.99.— Other carbon-nitrogen-lyases. 1.1.17 Mannitol-1-phosphate 5 4.4.—— Carbon-sulfur lyases. dehydrogenase. 4.5.—— Carbon-halide lyases. .1.1.18 Inositol 2-dehydrogenase. 4.6.—— Phosphorus-oxygen lyases. .1.1.19 L-glucuronate reductase. 4.99.-- Other lyases. .1.1.20 Glucuronolactone reductase. 5.——.— Isomerases. .1.1.21 Aldehyde reductase. 5.1.—— Racemases and epimerases. .1.1.22 UDP-glucose 6-dehydrogenase. 5.1.1.— Acting on amino acids and derivatives. .1.1.23 Histidinol dehydrogenase. 5.1.2.— Acting on hydroxy acids and derivatives. .1.1.24 Quinate dehydrogenase. 5.1.3.— Acting on and derivatives. 1.1.25 Shikimate dehydrogenase. 5.1.99. Acting on other compounds. .1.1.26 Glyoxylate reductase. 5.2.—— Cis-trans-isomerases. 1.1.27 L-lactate dehydrogenase. 5.3.—— intramolecular oxidoreductases. .1.1.28 D-lactate dehydrogenase. 5.3.1.— interconverting aldoses and ketoses. 1.1.29 Glycerate dehydrogenase. 5.3.2.— interconverting keto- and enol-groups. 1.1.30 3-hydroxybutyrate dehydrogenase. 5.3.3.— Transposing C=C bonds. .1.1.31 3-hydroxyisobutyrate dehydrogenase. 5.3.4.— Transposing S-S bonds. .1.1.32 Mevalidate reductase. 5.3.99. Other intramolecular oxidoreductases. 11.33 Mevaldate reductase (NADPH). 5.4.—— intramolecular transferases (). .1.1.34 Hydroxymethylglutaryl-CoA reductase 5.4.1.— Transferring acyl groups. (NADPH). 5.4.2.— Phosphotransferases (phosphomutases). 1.1.35 3-hydroxyacyl-CoA dehydrogenase. 5.4.3.— Transferring amino groups. .1.1.36 Acetoacetyl-CoA reductase. 5.4.4.— Transferring hydroxy groups. 1.1.37 Malate dehydrogenase. 5.4.99. Transferring other groups. 1.1.38 Malate dehydrogenase (oxaloacetate 5.5.—— intramolecular lyases. decarboxylating). 5.99.—— Other isomerases. 1139 Malate dehydrogenase 6.- ...—.— . (decarboxylating). 6.1.—— Forming carbon-oxygen bonds. .1.1.40 Malate dehydrogenase (oxaloacetate 6.1.1.— Ligases forming aminoacyl-tRNA and decarboxylating) (NADP+). related compounds. .1.1.41 Isocitrate dehydrogenase (NAD+). 6.2.—— Forming carbon-sulfur bonds. .1.1.42 Isocitrate dehydrogenase (NADP+). 6.2.1.— Acid-thiol ligases. .1.1.43 Phosphogluconate 2-dehydrogenase. 6.3.—— Forming carbon-nitrogen bonds. .1.1.44 Phosphogluconate dehydrogenase 6.3.1.— Acid-ammonia (or amide) ligases (amide (decarboxylating). synthases). .1.1.45 L-gulonate 3-dehydrogenase. 6.3.2.— Acid--D-amino-acid ligases (peptide .1.1.46 L-arabinose 1-dehydrogenase. synthases). .1.1.47 Glucose 1-dehydrogenase. 6.3.3.— Cyclo-ligases. .1.1.48 Galactose 1-dehydrogenase. 6.3.4.— Other carbon-nitrogen ligases. 1.49 Glucose-6-phosphate 1 -dehydrogenase. 6.3.5.— Carbon-nitrogen ligases with glutamine as SO 3-alpha-hydroxysteroid dehydrogenase (B- amido-N-donor. specific). 1.51 3(or 17)beta-hydroxysteroid dehydrogenase. 6.4.—— Forming carbon-carbon bonds. 1.1.52 3-alpha-hydroxycholanate dehydrogenase. 6.5.—— Forming phosphoric ester bonds. 11.53 3-alpha(or 20-beta)-hydroxysteroid 6.6.—— Forming nitrogen-metal bonds. dehydrogenase. 6.6.1.— Forming nitrogen-metal bonds. .1.1.54 Allyl-alcohol dehydrogenase. .1.55 L-acetaldehyde reductase (NADPH). S6 Ribitol 2-dehydrogenase. .1.1.57 Fructuronate reductase. TABLE 2 1.1.58 Tagaturonate reductase. 59 3-hydroxypropionate dehydrogenase. EC Numbers with the corresponding name given to each enzyme .1.1.60 2-hydroxy-3-oxopropionate reductase. class, Subclass and Sub-Subclass. .1.1.61 4-hydroxybutyrate dehydrogenase. .1.1.62 Estradiol 17-beta-dehydrogenase. ENZYME: 1. . . . .1.1.63 Testosterone 17-beta-dehydrogenase. .1.1.64 Testosterone 17-beta-dehydrogenase 1.1.1.1 Alcohol dehydrogenase. (NADP+). 1.1.1.2 Alcohol dehydrogenase (NADP+). 11.65 Pyridoxine 4-dehydrogenase. 1.1.1.3 Homoserine dehydrogenase. .1.1.66 Omega-hydroxy decanoate dehydrogenase. 1.1.1.4 (R,R)-butanediol dehydrogenase. 11.67 Mannitol 2-dehydrogenase. 1.1.1.5 Acetoin dehydrogenase. 11.69 Gluconate 5-dehydrogenase. 1.1.1.6 Glycerol dehydrogenase. 11.71 Alcohol dehydrogenase (NAD(P)+). 1.1.1.7 Propanediol-phosphate dehydrogenase. 1172 Glycerol dehydrogenase (NADP+). 1.1.1.8 Glycerol-3-phosphate dehydrogenase 11.73 Octanol dehydrogenase. (NAD+). .1.1.75 (R)-aminopropanol dehydrogenase. 1.1.1.9 D-xylulose reductase. 11.76 (S,S)-butanediol dehydrogenase. 1.1.1.10 L-xylulose reductase. 1.177 Lactaldehyde reductase. 1.1.1.11 D-arabinitol 4-dehydrogenase. 11.78 D-lactaldehyde dehydrogenase. 1.1.1.12 L-arabinitol 4-dehydrogenase. 1179 Glyoxylate reductase (NADP+). 1.1.1.13 L-arabinitol 2-dehydrogenase. 11.8O Isopropanol dehydrogenase (NADP+). 1.1.1.14 L-iditol 2-dehydrogenase. .1.1.81 Hydroxypyruvate reductase. US 2012/0266329 A1 Oct. 18, 2012 28

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 1.82 Malate dehydrogenase (NADP+). .1.148 Estradiol 17-alpha-dehydrogenase. 1.83 D-malate dehydrogenase (decarboxylating). 1149 20-alpha-hydroxysteroid dehydrogenase. 1.84 Dimethylmalate dehydrogenase. 1SO 21-hydroxysteroid dehydrogenase 1.85 3-isopropylmalate dehydrogenase. 186 Ketol-acid reductoisomerase. 152 3-alpha-hydroxy-5-beta-androstane-17 1.87 Homoisocitrate dehydrogenase. one 3-alpha-dehydrogenase. 1.88 Hydroxymethylglutaryl-CoA reductase. 1.153 Sepiapterin reductase. 1.90 Aryl-alcohol dehydrogenase. 1.154 Ureidoglycolate dehydrogenase. 1.91 Aryl-alcohol dehydrogenase (NADP+). 1.15S Homoisocitrate dehydrogenase. 92 Oxaloglycolate reductase 1.156 Glycerol 2-dehydrogenase (NADP+). (decarboxylating). 1.157 3-hydroxybutyryl-CoA dehydrogenase. 1.93 Tartrate dehydrogenase. 1158 UDP-N-acetylmuramate dehydrogenase. .94 Glycerol-3-phosphate dehydrogenase 1159 7-alpha-hydroxysteroid dehydrogenase. (NAD(P)+). 1160 Dihydrobunolol dehydrogenase. 1.9S Phosphoglycerate dehydrogenase. .1.161 Cholestanetetraol 26-dehydrogenase. 1.96 Diiodophenylpyruvate reductase. .1.162 Erythrulose reductase. 1.97 3-hydroxybenzyl-alcohol dehydrogenase. 1.163 Cyclopentanol dehydrogenase. 1.98 (R)-2-hydroxy-fatty-acid dehydrogenase. .1.164 Hexadecanol dehydrogenase. 1.99 (S)-2-hydroxy-fatty-acid dehydrogenase. 1165 2-alkyn-1-oldehydrogenase. 3-oxoacyl-acyl-carrier-protein 166 Hydroxycyclohexanecarboxylate reductase. dehydrogenase. .1.101 Acylglycerone-phosphate reductase. 1.167 Hydroxymalonate dehydrogenase. 1.102 3-dehydrosphinganine reductase. 168 2-dehydropantolactone reductase (A- 1.103 L-threonine 3-dehydrogenase. specific). .1.104 4-oxoproline reductase. 1.169 2-dehydropantoate 2-reductase. 1105 Retinol dehydrogenase. 170 Sterol-4-alpha-carboxylate 3 1106 Pantoate 4-dehydrogenase. dehydrogenase (decarboxylating). 1.107 Pyridoxal 4-dehydrogenase. 1172 2-oxoadipate reductase. 1.108 Carnitine 3-dehydrogenase. 1.173 L-rhamnose 1-dehydrogenase. .1.110 Indolelactate dehydrogenase. 1.174 Cyclohexane-1,2-diol dehydrogenase. .1.111 3-(imidazol-5-yl)lactate dehydrogenase. 1.175 D-xylose 1-dehydrogenase. .1.112 Indanol dehydrogenase. 176 12-alpha-hydroxysteroid 1113 L-xylose 1-dehydrogenase. dehydrogenase. .1.114 Apiose 1-reductase. 177 Glycerol-3-phosphate 1-dehydrogenase 1.115 1-dehydrogenase (NADP+). (NADP+). .1.116 D-arabinose 1-dehydrogenase. 178 3-hydroxy-2-methylbutyryl-CoA 117 D-arabinose 1-dehydrogenase dehydrogenase. (NAD(P)+). 1179 D-xylose 1-dehydrogenase (NADP+). 1118 Glucose 1-dehydrogenase (NAD+). 181 Cholest-5-ene-3-beta,7-alpha-diol 3 1119 Glucose 1-dehydrogenase (NADP+). beta-dehydrogenase. 1.120 Galactose 1-dehydrogenase (NADP+). 1.183 Geraniol dehydrogenase. .1.121 Aldose 1-dehydrogenase. .1.184 Carbonyl reductase (NADPH). .1.122 D-threo-aldose 1-dehydrogenase. 1.185 L-glycol dehydrogenase. 1.123 Sorbose 5-dehydrogenase (NADP+). 1186 dTDP-galactose 6-dehydrogenase. .1.124 Fructose 5-dehydrogenase (NADP+). 1187 GDP-4-dehydro-D-rhamnose reductase. 1.125 2-deoxy-D-gluconate 3-dehydrogenase. 1.188 Prostaglandin-F synthase. 126 2-dehydro-3-deoxy-D-gluconate 6 1.1.89 Prostaglandin-E(2) 9-reductase. dehydrogenase. 190 (ndole-3-acetaldehyde reductase 127 2-dehydro-3-deoxy-D-gluconate 5 (NADH). dehydrogenase. 191 (ndole-3-acetaldehyde reductase 1.128 L-idonate 2-dehydrogenase. (NADPH). 1129 L-threonate 3-dehydrogenase. 1.192 Long-chain-alcohol dehydrogenase. 1130 3-dehydro-L-gulonate 2-dehydrogenase. 193 5-amino-6-(5- 1.131 Mannuronate reductase. phosphoribosylamino)uracil reductase. 1.132 GDP-mannose 6-dehydrogenase. 1.194 Coniferyl-alcohol dehydrogenase. 1133 dTDP-4-dehydrorhamnose reductase. 1.195 Cinnamyl-alcohol dehydrogenase. 134 dTDP-6-deoxy-L-talose 4 196 15-hydroxyprostaglandin-D dehydrogenase. dehydrogenase (NADP+). 1.135 GDP-6-deoxy-D-talose 4-dehydrogenase. 197 15-hydroxyprostaglandin dehydrogenase 136 UDP-N-acetylglucosamine 6 (NADP+). dehydrogenase. 198 (+)-borneol dehydrogenase. 1.137 Ribitol-5-phosphate 2-dehydrogenase. 1.199 (S)-uSnate reductase. 1.138 Mannitol 2-dehydrogenase (NADP+). 2OO Aldose-6-phosphate reductase .1.140 Sorbitol-6-phosphate 2-dehydrogenase. .141 15-hydroxyprostaglandin dehydrogenase (NADPH). (NAD+). 7-beta-hydroxysteroid dehydrogenase .1.142 D-pinitol dehydrogenase. (NADP+). .1.143 Sequoyitol dehydrogenase. 12O2 1,3-propanediol dehydrogenase. .1.144 Perillyl-alcohol dehydrogenase. 1.2O3 Uronate dehydrogenase. 145 3-beta-hydroxy-delta(5)-steroid 1.205 IMP dehydrogenase. dehydrogenase. 12O6 Tropine dehydrogenase. .1.146 11-beta-hydroxysteroid dehydrogenase. 1.2O7 (-)-menthol dehydrogenase. 147 16-alpha-hydroxysteroid dehydrogenase. 208 (+)-neomenthol dehydrogenase. US 2012/0266329 A1 Oct. 18, 2012 29

TABLE 2-continued TABLE 2-continued EC Numbers wi h the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 209 3(or 17)-alpha-hydroxysteroid 266 dTDP-4-dehydro-6-deoxyglucose dehydrogenase. reductase. 3-beta(or 20-alpha)-hydroxysteroid .267 1-deoxy-D-xylulose-5-phosphate dehydrogenase. reductoisomerase. Long-chain-3-hydroxyacyl-CoA 268 2-(R)-hydroxypropyl-CoM dehydrogenase. dehydrogenase. 3-oxoacyl-acyl-carrier-protein 269 2-(S)-hydroxypropyl-CoM reductase (NADH). dehydrogenase. 3-alpha-hydroxysteroid dehydrogenase 1.270 3-keto-steroid reductase. (A-specific). 1.271 GDP-L-fucose synthase. 2-dehydropantolactone reductase (B- 1.272 (R)-2-hydroxyacid dehydrogenase. specific). 1.273 VelloSimine dehydrogenase. Gluconate 2-dehydrogenase. 1274 2,5-didehydrogluconate reductase. g Farnesol dehydrogenase. 1275 (+)-trans-carveol dehydrogenase. Benzyl-2-methyl-hydroxybutyrate 1.276 Serine 3-dehydrogenase. dehydrogenase. 277 3-beta-hydroxy-5-beta-steroid Morphine 6-dehydrogenase. dehydrogenase. Dihydrokaempferol 4-reductase. 278 3-beta-hydroxy-5-alpha-steroid 6-pyruvoyltetrahydropterin 2'-reductase. dehydrogenase. Vomifoliol 4'-dehydrogenase. 279 (R)-3-hydroxyacid-ester dehydrogenase. (R)-4-hydroxyphenylactate 1.28O (S)-3-hydroxyacid-ester dehydrogenase. dehydrogenase. 281 GDP-4-dehydro-6-deoxy-D-mannose 1.223 Isopiperitenol dehydrogenase. (C8Se. .1.224 Mannose-6-phosphate 6-reductase. 1.1.282 Quinateishikimate dehydrogenase. 1.225 Chlordecome reductase. .1.2.2 Mannitol dehydrogenase (cytochrome). 226 4-hydroxycyclohexanecarboxylate .1.2.3 L-lactate dehydrogenase (cytochrome). dehydrogenase. .1.2.4 D-lactate dehydrogenase (cytochrome). 1.227 (-)-borneol dehydrogenase. 2.5 D-lactate dehydrogenase (cytochrome c 1228 (+)-Sabinol dehydrogenase. 229 Diethyl 2-methyl-3-oxosuccinate 1.3.3 Malate oxidase. (CC8Se. .1.3.4 Glucose oxidase. 230 3-alpha-hydroxyglycyrrhetinate 13.5 Hexose oxidase. dehydrogenase. .1.3.6 Cholesterol oxidase. 231 15-hydroxyprostaglandin-I 1.3.7 Aryl-alcohol oxidase. dehydrogenase (NADP+). 13.8 L-gulonolactone oxidase. 232 15-hydroxylicosatetraenoate 13.9 Galactose oxidase. dehydrogenase. 1.3.10 Pyranose oxidase. 1.233 N-acylmannosamine 1-dehydrogenase. .1.3.11 L-Sorbose oxidase. 1234 Flavanone 4-reductase. .1.3.12 Pyridoxine 4-oxidase. 1.235 8-oxocoformycin reductase. 1.3.13 Alcohol oxidase. 1.236 Tropinone reductase. .1.3.14 Catechol oxidase (dimerizing). 1.237 Hydroxyphenylpyruvate reductase. 1.3.15 (S)-2-hydroxy-acid oxidase. 1.238 12-beta-hydroxysteroid dehydrogenase. .1.3.16 Ecdysone oxidase. 239 3-alpha-(17-beta)-hydroxysteroid 1.3.17 oxidase. dehydrogenase (NAD+). 1.3.18 Secondary-alcohol oxidase. .1.240 N-acetylhexosamine 1-dehydrogenase. 1.3.19 4-hydroxymandelate oxidase. .1.241 6-endo-hydroxycineole dehydrogenase. 1.3.20 Long-chain-alcohol oxidase. 1243 Carveol dehydrogenase. .1.3.21 Glycerol-3-phosphate oxidase. .1.244 Methanol dehydrogenase. 1.323 Thiamine oxidase. 1.245 Cyclohexanol dehydrogenase. .1.3.24 L-galactonolactone oxidase. .1.246 Pterocarpin synthase. 1.3.25 Cellobiose oxidase. 1.247 Codeinone reductase (NADPH). 1.3.27 Hydroxyphytanate oxidase. 1.248 Salutaridine reductase (NADPH). 1.328 Nucleoside oxidase. 1.2SO D-arabinitol 2-dehydrogenase. 1.329 N-acylhexosamine oxidase. 251 Galactitol-1-phosphate 5 13.30 Polyvinyl-alcohol oxidase. dehydrogenase. 1.3.37 D-arabinono-1,4-lactone oxidase. 1.252 Tetrahydroxynaphthalene reductase. 13.38 Vanillyl-alcohol oxidase. 1.254 (S)-carnitine 3-dehydrogenase. 13.39 Nucleoside oxidase (H(2)O(2)-forming) 1.255 Mannitol dehydrogenase. .1.3.40 D-mannitol oxidase. 1.256 Fluoren-9-oldehydrogenase. .1.3.41 Xylitol oxidase. 257 4-(hydroxymethyl)benzenesulfonate .1.4.1 Vitamin-K-epoxide reductase dehydrogenase. (warfarin-sensitive). 1.258 6-hydroxyhexanoate dehydrogenase. .1.4.2 Vitamin-K-epoxide reductase 259 3-hydroxypimeloyl-CoA (warfarin-insensitive). dehydrogenase. 15.2 Quinoprotein glucose dehydrogenase. 1.260 Sulcatone reductase. 1991 Choline dehydrogenase. 261 Glycerol-1-phosphate dehydrogenase 1992 2-hydroxyglutarate dehydrogenase. (NAD(P)+). 99.3 Gluconate 2-dehydrogenase 262 4-hydroxythreonine-4-phosphate (acceptor). dehydrogenase. 1994 Dehydrogluconate dehydrogenase. 263 1,5-anhydro-D-fructose reductase. 1995 Glycerol-3-phosphate dehydrogenase. .1.264 L-idonate 5-dehydrogenase. 1996 D-2-hydroxy-acid dehydrogenase. 26S 3-methylbutanal reductase. 99.7 Lactate-malate transhydrogenase. US 2012/0266329 A1 Oct. 18, 2012 30

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 1998 Alcohol dehydrogenase (acceptor). .41 Glutamate-5-semialdehyde dehydrogenase. 1.99.9 Pyridoxine 5-dehydrogenase. .42 Hexadecanal dehydrogenase (acylating). 1.99.10 Glucose dehydrogenase (acceptor). 43 Formate dehydrogenase (NADP+). 1.99.11 Fructose 5-dehydrogenase. .44 Cinnamoyl-CoA reductase. 1.99.12 Sorbose dehydrogenase. 45 4-carboxy-2-hydroxymuconate-6- 1.99.13 Glucoside 3-dehydrogenase. semialdehyde dehydrogenase. 1.99.14 Glycolate dehydrogenase. 2. .46 dehydrogenase. 1.99.16 Malate dehydrogenase (acceptor). 2. 47 4-trimethylammoniobutyraldehyde 1.99.18 Cellobiose dehydrogenase (acceptor). dehydrogenase. 1.99.19 Uracil dehydrogenase. 2. 48 Long-chain-aldehyde dehydrogenase. 1.99.2O Alkan-1-oldehydrogenase (acceptor). .2.1.49 2-oxoaldehyde dehydrogenase 1.99.21 D-Sorbitol dehydrogenase (acceptor). (NADP+). 1.99.22 Glycerol dehydrogenase (acceptor). 2. SO Long-chain-fatty-acyl-CoA reductase. 99.23 Polyvinyl-alcohol dehydrogenase 2. S1 Pyruvate dehydrogenase (NADP+). (acceptor). 2.1.52 Oxoglutarate dehydrogenase 99.24 Hydroxyacid-oxoacid (NADP+). transhydrogenase. 2.1.53 4-hydroxyphenylacetaldehyde 99.25 Quinate dehydrogenase dehydrogenase. (pyrroloquinoline-quinone). 2.1.54 Gamma-guanidinobutyraldehyde 99.26 3-hydroxycyclohexanone dehydrogenase. dehydrogenase. 2.1.57 Butanal dehydrogenase. 99.27 (R)-pantolactone dehydrogenase 2.1.58 Phenylglyoxylate dehydrogenase (flavin). (acylating). 99.28 Glucose-fructose oxidoreductase. 2.1.59 Glyceraldehyde-3-phosphate 1.99.29 Pyranose dehydrogenase (acceptor). dehydrogenase (NAD(P)(+)) (phosphorylating). 99.30 2-oxo-acid reductase. .2.1.60 5-carboxymethyl-2-hydroxymuconic Formaldehyde dehydrogenase semialdehyde dehydrogenase. (glutathione). .2.1.61 4-hydroxymuconic-semialdehyde Formate dehydrogenase. dehydrogenase. Aldehyde dehydrogenase (NAD+). .2.1.62 4-formylbenzenesulfonate Aldehyde dehydrogenase (NADP+). dehydrogenase. Aldehyde dehydrogenase (NAD(P)+). 2.1.63 6-oxohexanoate dehydrogenase. Benzaldehyde dehydrogenase .2.1.64 4-hydroxybenzaldehyde (NADP+). dehydrogenase. Betaine-aldehyde dehydrogenase. 2.1.65 Salicylaldehyde dehydrogenase. Glyceraldehyde-3-phosphate dehydrogenase .2.1.66 Mycothiol-dependent formaldehyde (NADP+). dehydrogenase. 10 Acetaldehyde dehydrogenase (acetylating). 2.1.67 Vanillin dehydrogenase. .11 Aspartate-semialdehyde dehydrogenase. 2.1.68 Coniferyl-aldehyde dehydrogenase. .12 Glyceraldehyde-3-phosphate dehydrogenase 21.69 Fluoroacetaldehyde dehydrogenase. (phosphorylating). .2.2.1 Formate dehydrogenase (cytochrome). 13 Glyceraldehyde-3-phosphate dehydrogenase .2.2.2 Pyruvate dehydrogenase (cytochrome). (NADP(+)) (phosphorylating). .2.2.3 Formate dehydrogenase (cytochrome c 1S Malonate-semialdehyde dehydrogenase. 553). 16 Succinate-semialdehyde dehydrogenase .2.2.4 Carbon-monoxide dehydrogenase (NAD(P)+). (cytochrome b-561). 17 Glyoxylate dehydrogenase (acylating). .2.3.1 Aldehyde oxidase. 18 Malonate-semialdehyde dehydrogenase 2.3.3 Pyruvate oxidase. (acetylating). .2.3.4 Oxalate oxidase. 19 Aminobutyraldehyde dehydrogenase. 2.3.5 Glyoxylate oxidase. 20 Glutarate-semialdehyde dehydrogenase. 23.6 Pyruvate oxidase (CoA-acetylating). .21 Glycolaldehyde dehydrogenase. 2.3.7 (ndole-3-acetaldehyde oxidase. 22 Lactaldehyde dehydrogenase. 23.8 Pyridoxal oxidase. 23 2-oxoaldehyde dehydrogenase (NAD+). 23.9 Aryl-aldehyde oxidase. .24 Succinate-semialdehyde dehydrogenase. .2.3.11 Retinal oxidase. 25 2-oxoisovalerate dehydrogenase (acylating). 2.3.13 4-hydroxyphenylpyruvate oxidase. 26 2,5-dioxovalerate dehydrogenase. .2.4.1 Pyruvate dehydrogenase (acetyl 27 Methylmalonate-semialdehyde transferring). dehydrogenase (acylating). .2.4.2 Oxoglutarate dehydrogenase (Succinyl 28 Benzaldehyde dehydrogenase (NAD+). transferring). 29 Aryl-aldehyde dehydrogenase. .2.4.4 3-methyl-2-oxobutanoate dehydrogenase (2- 30 Aryl-aldehyde dehydrogenase (NADP+). methylpropanoyl-transferring). 31 L-aminoadipate-semialdehyde 2.7.1 Pyruvate synthase. dehydrogenase. 27.2 2-oxobutyrate synthase. 2. 32 Aminomuconate-semialdehyde 2.7.3 2-oxoglutarate synthase. dehydrogenase. 27.4 Carbon-monoxide dehydrogenase .33 (R)-dehydropantoate dehydrogenase. (ferredoxin). 36 Retinal dehydrogenase. 2.7.5 Aldehyde ferredoxin oxidoreductase. 38 N-acetyl-gamma-glutamyl-phosphate 27.6 Glyceraldehyde-3-phosphate dehydrogenase reductase. (ferredoxin). 2. 39 Phenylacetaldehyde dehydrogenase. 2.7.7 3-methyl-2-oxobutanoate dehydrogenase 40 3-alpha,7-alpha,12-alpha (ferredoxin). trihydroxycholestan-26-all 26-oxidoreductase. 2.78 Indolepyruvate ferredoxin oxidoreductase. US 2012/0266329 A1 Oct. 18, 2012 31

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme ECNumbers wi h the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.7.9 2-oxoglutarate ferredoxin oxidoreductase. 3.153 (3S4R)-3,4-dihydroxycyclohexa-1,5- 2.99.2 Carbon-monoxide dehydrogenase iene-1,4-dicarboxylate dehydrogenase. (acceptor). 3.154 Precorrin-6A reductase. 2.99.3 Aldehyde dehydrogenase (pyrroloquinoline 3.156 Cis-2,3-dihydrobiphenyl-2,3-diol quinone). enydrogenase. 2.99.4 Formaldehyde dismutase. 3.1.57 Phloroglucinol reductase. 2.99.5 Formylmethanofuran dehydrogenase. 3.1.58 2,3-dihydroxy-2,3-dihydro-p-cumate 2.99.6 Carboxylate reductase. enydrogenase. 2.99.7 Aldehyde dehydrogenase (FAD 3.1.59 ,6-dihydroxy-5-methylcyclohexa-2,4- independent). ienecarboxylate dehydrogenase. .1 Dihydrouracil dehydrogenase (NAD+). 3.1.60 Dibenzothiophene dihydrodiol .2 Dihydropyrimidine dehydrogenase enydrogenase. (NADP+). .3.1.61 Terephthalate 1,2-cis-dihydrodiol Cortisone beta-reductase. enydrogenase. Cortisone alpha-reductase. 3.162 Pimeloyl-CoA dehydrogenase. Cucurbitacin delta(23)-reductase. 3.1.63 2,4-dichlorobenzoyl-CoA reductase. Fumarate reductase (NADH). .3.1.64 Phthalate 4.5-cis-dihydrodiol Meso-tartrate dehydrogenase. enydrogenase. Acyl-CoA dehydrogenase (NADP+). 3.1.65 5,6-dihydroxy-3-methyl-2-oxo-1,2,5,6- Enoyl-acyl-carrier-protein reductase etrahydroquinoline dehydrogenase. (NADH). 3.1.66 Cis-dihydroethylcatechol 3 10 Enoyl-acyl-carrier-protein reductase enydrogenase. (NADPH, B-specific). 3.1.67 Cis-1,2-dihydroxy-4-methylcyclohexa .11 2-coumarate reductase. 3,5-diene-1-carboxylate dehydrogenase. .12 Prephenate dehydrogenase. 3.1.68 2-dihydroxy-6-methylcyclohexa-3,5- 13 Prephenate dehydrogenase (NADP+). ienecarboxylate dehydrogenase. .14 Orotate reductase (NADH). 69 Zeatin reductase. 1S Orotate reductase (NADPH). 70 Delta (14)-sterol reductase. 16 Beta-nitroacrylate reductase. 71 Delta(24(24(1)))-sterol reductase. 17 3-methyleneoxindole reductase. 72 Delta(24)-sterol reductase. 18 Kynurenate-7,8-dihydrodiol dehydrogenase. 73 2-dihydrovomilenine reductase. 19 Cis-1,2-dihydrobenzene-1,2-diol .74 2-alkenal reductase. dehydrogenase. 75 Divinyl chlorophyllide a 8-vinyl 3 20 Trans-1,2-dihydrobenzene-1,2-diol reductase. dehydrogenase. 3.176 Precorrin-2 dehydrogenase. .21 7-dehydrocholesterol reductase. 3.23 Galactonolactone dehydrogenase. 22 Cholestenone 5-alpha-reductase. 3.31 Dihydroorotate oxidase. 23 Cholestenone 5-beta-reductase. 3.32 Lathosterol oxidase. .24 Billiverdin reductase. 3.3.3 Coproporphyrinogen oxidase. 25 1,6-dihydroxycyclohexa-2,4-diene-1- .3.3.4 Protoporphyrinogen oxidase. carboxylate dehydrogenase. 3.35 Bilirubin oxidase. 26 Dihydrodipicolinate reductase. 33.6 Acyl-CoA oxidase. 27 2-hexadecenal reductase. 3.3.7 Dihydrouracil oxidase. 28 2,3-dihydro-2,3-dihydroxybenzoate 3.38 Tetrahydroberberine oxidase. dehydrogenase. 33.9 Secologanin synthase. 3 29 Cis-1,2-dihydro-1,2-dihydroxynaphthalene 3.3.10 Tryptophan alphabeta-oxidase. dehydrogenase. 35.1 Succinate dehydrogenase (ubiquinone). 30 Progesterone 5-alpha-reductase. 3.7.1 6-hydroxynicotinate reductase. 31 2-enoate reductase. 3.7.2 15, 16-dihydrobiliverdin:ferredoxin 32 Maleylacetate reductase. oxidoreductase. .33 Protochlorophyllide reductase. 3.7.3 Phycoerythrobilin:ferredoxin 34 2,4-dienoyl-CoA reductase (NADPH). oxidoreductase. 35 Phosphatidylcholine desaturase. 37.4 Phytochromobilin:ferredoxin 36 Geissoschizine dehydrogenase. oxidoreductase. 37 Cis-2-enoyl-CoA reductase (NADPH). 3.7.5 Phycocyanobilin:ferredoxin 38 Trans-2-enoyl-CoA reductase (NADPH). oxidoreductase. 39 Enoyl-acyl-carrier-protein reductase 3.99.1 Succinate dehydrogenase. (NADPH, A-specific). 3.99.2 Butyryl-CoA dehydrogenase. 3 40 2-hydroxy-6-oxo-6-phenylhexa-2,4- 3.99.3 Acyl-CoA dehydrogenase. dienoate reductase. 3.99.4 3-oxosteroid 1-dehydrogenase. 41 Xanthommatin reductase. 3.99.5 3-oxo-5-alpha-steroid 4-dehydrogenase. 42 2-oxophytodienoate reductase. 3.99.6 3-oxo-5-beta-steroid 4-dehydrogenase. 43 Cyclohexadienyl dehydrogenase. 3.99.7 Glutaryl-CoA dehydrogenase. Trans-2-enoyl-CoA reductase (NAD+). 3.99.8 2-furoyl-CoA dehydrogenase. 45 2'-hydroxyisoflavone reductase. 3.99.10 Isovaleryl-CoA dehydrogenase. 46 Biochanin-A reductase. 3.99.11 Dihydroorotate dehydrogenase. 47 Alpha-Santonin 12-reductase. 3.99.12 2-methylacyl-CoA dehydrogenase. 48 5-oxoprostaglandin 13-oxidase. 3.99.13 Long-chain-acyl-CoA dehydrogenase. 49 Cis-3,4-dihydrophenanthrene-3,4-diol 3.99.14 Cyclohexanone dehydrogenase. dehydrogenase. 3.99.15 Benzoyl-CoA reductase. 3 S1 2'-hydroxydaidzein reductase. 3.99.16 Isoquinoline 1-oxidoreductase. 3 52 2-methyl-branched-chain-enoyl-CoA 3.99.17 Quinoline 2-oxidoreductase. reductase. 3.99.18 Quinaldate 4-oxidoreductase. US 2012/0266329 A1 Oct. 18, 2012 32

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.99.19 Quinoline-4-carboxylate 2 Methylenetetrahydrofolate reductase oxidoreductase. (NADPH). 3.99.2O 4-hydroxybenzoyl-CoA reductase. .21 Delta(1)-piperideine-2-carboxylate 3.99.21 (R)-benzylsuccinyl-CoA reductase. dehydrogenase. 22 Strombine dehydrogenase. . .23 Tauropine dehydrogenase. . .24 N(5)-(carboxyethyl)ornithine synthase. Glutamate dehydrogenase (NAD(P)+). 25 Thiomorpholine-carboxylate Glutamate dehydrogenase (NADP+). dehydrogenase. L-amino-acid dehydrogenase. 26 Beta-alanopine dehydrogenase. Serine 2-dehydrogenase. s 27 1,2-dehydroreticulinium reductase Valine dehydrogenase (NADP+). (NADPH). . 28 Opine dehydrogenase. . 29 FMN reductase. L-erythro-3,5-diaminohexanoate 30 Flavin reductase. dehydrogenase. 31 Berberine reductase. 2,4-diaminopentanoate dehydrogenase. 32 Vomillenine reductase. Glutamate synthase (NADPH). .33 Pteridine reductase. Glutamate synthase (NADH). S. 34 6,7-dihydropteridine reductase. dehydrogenase. 53.1 Sarcosine oxidase. Diaminopimelate dehydrogenase. S.3.2 N-methyl-L-amino-acid oxidase. N-methylalanine dehydrogenase. 53.4 N(6)-methyl-lysine oxidase. Lysine 6-dehydrogenase. S3.5 (S)-6-hydroxynicotine oxidase. . 53.6 (R)-6-hydroxynicotine oxidase. Phenylalanine dehydrogenase. 53.7 L-pipecolate oxidase. Glycine dehydrogenase (cytochrome). S.3.10 Dimethylglycine oxidase. D-aspartate oxidase. 5.3.11 Polyamine oxidase. L-amino-acid oxidase. 5.312 Dihydrobenzophenanthridine oxidase. D-amino-acid oxidase. .5.4.1 Pyrimidodiazepine synthase. (flavin-containing). S.S.1 Electron-transferring-flavoprotein Pyridoxamine-phosphate oxidase. dehydrogenase. Amine oxidase (copper-containing). 5.8.1 Dimethylamine dehydrogenase. D-glutamate oxidase. 5.8.2 Trimethylamine dehydrogenase. Ethanolamine oxidase. 5.99.1 Sarcosine dehydrogenase. . S.99.2 Dimethylglycine dehydrogenase. L-glutamate oxidase. S.99.3 L-pipecolate dehydrogenase. . 5.99.4 Nicotine dehydrogenase. Protein-lysine 6-oxidase. S.99.5 Methylglutamate dehydrogenase. L-lysine oxidase. 5.99.6 Spermidine dehydrogenase. D-glutamate(D-aspartate) oxidase. S.99.8 Proline dehydrogenase. L-aspartate oxidase. S.99.9 Methylenetetrahydromethanopterin . dehydrogenase. Glycine dehydrogenase (decarboxylating). 5.99.11 5,10-methylenetetrahydromethanopterin Glutamate synthase (ferredoxin). reductase. D-amino-acid dehydrogenase. S.99.12 Cytokinin dehydrogenase. . .6.1.1 NAD(P)(+) transhydrogenase (B-specific). . .6.1.2 NAD(P)(+) transhydrogenase (AB Aralkylamine dehydrogenase. specific). Glycine dehydrogenase (cyanide .6.2.2 Cytochrome-b5 reductase. orming). .6.2.4 NADPH--hemoprotein reductase. Pyrroline-2-carboxylate reductase. 6.25 NADPH--cytochrome-c2 reductase. Pyrroline-5-carboxylate reductase. .6.2.6 Leghemoglobin reductase. Dihydrofolate reductase. 6.3.1 NAD(P)H oxidase. Methylenetetrahydrofolate dehydrogenase 6.5.3 NADH dehydrogenase (ubiquinone). (NADP+). 6.5.4 Monodehydroascorbate reductase (NADH). Formyltetrahydrofolate dehydrogenase. 6.S.S NADPH:duinone reductase. Saccharopine dehydrogenase (NAD+, L 65.6 p-benzoquinone reductase (NADPH). ysine-forming). 6.5.7 2-hydroxy-1,4-benzoquinone reductase. Saccharopine dehydrogenase (NADP+, L 6.6.9 Trimethylamine-N-oxide reductase. ysine-forming). 6.99.1 NADPH dehydrogenase. Saccharopine dehydrogenase (NAD+, L 6.99.2 NAD(P)H dehydrogenase (quinone). glutamate-forming). 6.99.3 NADH dehydrogenase. 10 Saccharopine dehydrogenase (NADP+, 6.99.5 NADH dehydrogenase (quinone). L-glutamate-forming). 6.99.6 NADPH dehydrogenase (quinone). .7.1.1 Nitrate reductase (NADH). .11 D-Octopine dehydrogenase. .7.1.2 .12 -pyrroline-5-carboxylate dehydrogenase. Nitrate reductase (NAD(P)H). .7.1.3 Nitrate reductase (NADPH). 1S Methylenetetrahydrofolate dehydrogenase .7.1.4 Nitrite reductase (NAD(P)H). (NAD+). .7.1.5 Hyponitrite reductase. 16 D-lysopine dehydrogenase. 71.6 Azobenzene reductase. 17 Alanopine dehydrogenase. .7.1.7 GMP reductase. 18 Ephedrine dehydrogenase. .7.1.9 Nitroquinoline-N-oxide reductase. 19 D-nopaline dehydrogenase. .7.1.10 Hydroxylamine reductase (NADH). US 2012/0266329 A1 Oct. 18, 2012 33

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. .7.1.11 4 .10.2.1 L-ascorbate-cytochrome-b5 reductase. (dimethylamino)phenylazoxybenzene reductase. .10.2.2 Ubiquinol-cytochrome-c reductase. .7.1.12 N-hydroxy-2-acetamidofluorene 10.3.1 Catechol oxidase. reductase. 10.3.2 7.2.1 Nitrite reductase (NO-forming). 10.3.3 L-ascorbate oxidase. 7.2.2 Nitrite reductase (cytochrome; .10.3.4 O-aminophenol oxidase. ammonia-forming). 10.3.5 3-hydroxyanthranilate oxidase. 7.2.3 Trimethylamine-N-oxide reductase 10.3.6 Rifamycin-B oxidase. (cytochrome c). 10.99. 1 Plastoquinol-plastocyanin reductase. 7.3.1 Nitroethane oxidase. .11. NADH peroxidase. 7.3.2 Acetylindoxyl oxidase. NADPH peroxidase. 7.3.3 Urate oxidase. Fatty-acid peroxidase. 7.34 Hydroxylamine oxidase. Cytochrome-c peroxidase. 7.3.5 3-aci-nitropropanoate oxidase. Catalase. 7.7.1 Ferredoxin-nitrite reductase. Peroxidase. 7.7.2 Ferredoxin-nitrate reductase. odide peroxidase. 7.99.1 Hydroxylamine reductase. Glutathione peroxidase. 7.99.4 Nitrate reductase. Chloride peroxidase. 7.99.5 5,10-methylenetetrahydrofolate L-ascorbate peroxidase. reductase (FADH(2)). Phospholipid-hydroperoxide glutathione 7.99.6 Nitrous-oxide reductase. peroxidase. 7.99.7 Nitric-oxide reductase. Manganese peroxidase. 7.99.8 Hydroxylamine oxidoreductase. Diarylpropane peroxidase. .8.1.2 (NADPH). Hydrogen dehydrogenase. 8.13 Hypotaurine dehydrogenase. Hydrogen dehydrogenase (NADP+). .8.1.4 Dihydrolipoyl dehydrogenase. Cytochrome-c3 hydrogenase. 8.15 2-oxopropyl-CoM reductase Hydrogen:Quinone oxidoreductase. (carboxylating). 12.72 Ferredoxin hydrogenase. .8.1.6 Cystine reductase. 12.98. 1 Coenzyme F420 hydrogenase. 8.17 Glutathione-disulfide reductase. 12.98. 2 5,10-methenyltetrahydromethanopterin 81.8 Protein-disulfide reductase. hydrogenase. .8.19 Thioredoxin-disulfide reductase. 12.98. 3 Methanosarcina-phenazine hydrogenase. 8.1.10 CoA-. 12.99. 6 Hydrogenase (acceptor). .8.1.11 Asparagusate reductase. 3. Catechol 1,2-dioxygenase. .8.1.12 Trypanothione-disulfide reductase. Catechol 2,3-dioxygenase. 8.1.13 Bis-gamma-glutamylcystine reductase. Protocatechuate 3,4-dioxygenase. .8.1.14 CoA-disulfide reductase. Gentisate 1,2-dioxygenase. 8.1.15 Mycothione reductase. Homogentisate 1,2-dioxygenase. .8.2.1 . 3-hydroxyanthranilate 3,4-dioxygenase. .8.2.2 Thiosulfate dehydrogenase. Protocatechuate 4,5-dioxygenase. 8.31 . 2,5-dihydroxypyridine 5,6-dioxygenase. 8.32 Thiol oxidase. 7,8-dihydroxykynurenate 8,8a 8.33 Glutathione oxidase. dioxygenase. .8.3.4 Methanethiol oxidase. Tryptophan 2,3-dioxygenase. 8.3.5 Prenylcysteine oxidase. . .8.4.1 Glutathione-homocystine Ascorbate 2,3-dioxygenase. transhydrogenase. 2,3-dihydroxybenzoate 3,4-dioxygenase. .8.4.2 Protein-disulfide reductase 3,4-dihydroxyphenylacetate 2,3- (glutathione). dioxygenase. .84.3 Glutathione--CoA-glutathione 16 3-carboxyethylcatechol 2,3-dioxygenase. transhydrogenase. 17 Indole 2,3-dioxygenase. .8.44 Glutathione-cystine transhydrogenase. 18 Sulfur dioxygenase. 84.5 Methionine-S-oxide reductase. 19 dioxygenase. .8.4.6 Protein-methionine-S-oxide reductase. 2O . 8.4.7 Enzyme-thiol transhydrogenase 22 Caffeate 3,4-dioxygenase. (glutathione-disulfide). .23 2,3-dihydroxyindole 2,3-dioxygenase. .84.8 Phosphoadenylyl-sulfate reductase .24 Quercetin 2,3-dioxygenase. (thioredoxin). 25 3,4-dihydroxy-9,10-secoandrosta 84.9 Adenylyl-sulfate reductase 1,3,5(10)-triene-9,17-dione 4,5-dioxygenase. (glutathione). 26 Peptide-tryptophan 2,3-dioxygenase. .8.4.10 Adenylyl-sulfate reductase 27 4-hydroxyphenylpyruvate dioxygenase. (thioredoxin). 28 2,3-dihydroxybenzoate 2,3-dioxygenase. 85.1 Glutathione dehydrogenase (ascorbate). 29 Stizolobate synthase. 8.7.1 Sulfite reductase (ferredoxin). 30 Stizolobinate synthase. 8.98.1 CoB-CoM heterodisulfide reductase. 31 Arachidonate 12-lipoxygenase. 8.99.1 Sulfite reductase. 32 2-nitropropane dioxygenase. 8.99.2 Adenylyl-sulfate reductase. .33 Arachidonate 15-lipoxygenase. 8.99.3 Hydrogensulfite reductase. 34 Arachidonate 5-lipoxygenase. 9.31 Cytochrome-c oxidase. 35 Pyrogallol 1,2-. 96.1 Nitrate reductase (cytochrome). 36 Chloridazon-. 9.99.1 Iron--cytochrome-c reductase. 37 Hydroxyquinol 1,2-dioxygenase. .10.1.1 Trans-acenaphthene-1,2-diol 38 1-hydroxy-2-naphthoate 1,2-dioxygenase. dehydrogenase. 39 Biphenyl-2,3-diol 1,2-dioxygenase. US 2012/0266329 A1 Oct. 18, 2012 34

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 140 Arachidonate 8-lipoxygenase. .14.13.2 4-hydroxybenzoate 3 1.41 2,4-dihydroxyacetophenone dioxygenase. monooxygenase. 1.42 Indoleamine-pyrrole 2,3-dioxygenase. 4. 3.3 4-hydroxyphenylacetate 3 1.43 Lignostilbene alpha-beta-dioxygenase. monooxygenase. 1.44 Linoleate diol synthase. 3.4 Melilotate 3-monooxygenase. 1.45 Linoleate 11-lipoxygenase. 3.5 Imidazoleacetate 4-monooxygenase. 1.46 4-hydroxymandelate synthase. 3.6 Orcinol 2-monooxygenase. 1.47 3-hydroxy-4-oxoquinoline 2,4- 3.7 Phenol 2-monooxygenase. dioxygenase. 3.8 Dimethylaniline monooxygenase (N- 148 3-hydroxy-2-methylquinolin-4-one 2,4- oxide-forming). dioxygenase. 6.4 Tryptophan 5-monooxygenase. 1.49 Chlorite O(2)-lyase. 6.5 Glyceryl-ether monooxygenase. 1...SO Acetylacetone-cleaving enzyme. 6.6 Mandelate 4-monooxygenase. 2.1 Arginine 2-monooxygenase. 7. Dopamine beta-monooxygenase. 2.2 Lysine 2-monooxygenase. 7.3 Peptidylglycine monooxygenase. 2.3 Tryptophan 2-monooxygenase. 7.4 Aminocyclopropanecarboxylate oxidase. 2.4 Lactate 2-monooxygenase. 8. Monophenol monooxygenase. 2.5 Renilla-luciferin 2-monooxygenase. 8.2 CMP-N-acetylneuraminate 2.6 Cypridina-luciferin 2-monooxygenase. monooxygenase. 2.7 Photinus-luciferin 4-monooxygenase (ATP Stearoyl-CoA 9-desaturase. hydrolyzing). Acyl-acyl-carrier-protein desaturase. Watasenia-luciferin 2-monooxygenase. Linoleoyl-CoA desaturase. Phenylalanine 2-monooxygenase. Deacetoxycephalosporin-C synthase. s Methylphenyltetrahydropyridine N (S)-stylopine synthase. monooxygenase. (S)-cheilanthifoline synthase. Apo-beta-carotenoid-14.13'-dioxygenase. Berbamunine synthase. Oplophorus-luciferin 2-monooxygenase. Salutaridine synthase. . (S)-canadine synthase. Tryptophan 2'-dioxygenase. Prostaglandin-endoperoxide synthase. Gamma-butyrobetaine dioxygenase. Kynurenine 7,8-hydroxylase. Procollagen-proline dioxygenase. Heme oxygenase (decyclizing). Pyrimidine-deoxynucleoside 2'- Progesterone monooxygenase. dioxygenase. Squalene monooxygenase. Procollagen-lysine 5-dioxygenase. Steroid 17-alpha-monooxygenase. Thymine dioxygenase. Steroid 21-monooxygenase. Procollagen-proline 3-dioxygenase. Estradiol 6-beta-monooxygenase. Trimethyllysine dioxygenase. Androst-4-ene-3,17-dione Naringenin 3-dioxygenase. monooxygenase. Pyrimidine-deoxynucleoside 1'- Progesterone 11-alpha-monooxygenase. dioxygenase. 4-methoxybenzoate monooxygenase (O- Hyoscyamine (6S)-dioxygenase. demethylating). Gibberellin-44 dioxygenase. Plasmanylethanolamine desaturase. Gibberellin 2-beta-dioxygenase. Phylloquinone monooxygenase (2,3- 6-beta-hydroxyhyoscyamine epoxidizing). epoxidase. 14.99.21 Latia-luciferin monooxygenase 1S Gibberellin 3-beta-dioxygenase. (demethylating). 16 Peptide-aspartate beta-dioxygenase. 14.99.22 Ecdysone 20-monooxygenase. 17 Taurine dioxygenase. 14.99.23 3-hydroxybenzoate 2-monooxygenase. 18 Phytanoyl-CoA dioxygenase. 14.99.24 Steroid 9-alpha-monooxygenase. 19 Leucocyanidin oxygenase. 14.99.26 2-hydroxypyridine 5-monooxygenase. 20 Desacetoxyvindoline 4-hydroxylase. 14.99.27 uglone 3-monooxygenase. .21 Clavaminate synthase. 14.99.28 Linalool 8-monooxygenase. 2.1 Anthranilate 1,2-dioxygenase 14.99.29 Deoxyhypusine monooxygenase. (deaminating, decarboxylating). 14.99.30 Caroteine 7,8-desaturase. 2.3 Benzene 1,2-dioxygenase. 14.99.31 Myristoyl-CoA 11-(E) desaturase. 2.4 3-hydroxy-2- 14.99.32 Myristoyl-CoA 11-(Z) desaturase. methylpyridinecarboxylate dioxygenase. 14.99.33 Delta (12)-fatty acid 2.5 5-pyridoxate dioxygenase. dehydrogenase. 2.7 Phthalate 4,5-dioxygenase. 14.99.34 Monoprenyl isoflavone epoxidase. 2.8 4-Sulfobenzoate 3,4-dioxygenase. 14.99.35 Thiophene-2-carbonyl-CoA 2.9 4-chlorophenylacetate 3,4- monooxygenase. dioxygenase. 14.99.36 Beta-caroteine 15,15'- 2.10 Benzoate 1,2-dioxygenase. monooxygenase. 2.11 Toluene dioxygenase. 14.99.37 2.12 Naphthalene 1,2-dioxygenase. Taxadiene 5-alpha-hydroxylase. 2.13 2-chlorobenzoate 1,2-dioxygenase. 15.1.1 . 2.14 2-aminobenzenesulfonate 2,3- 15.1.2 Superoxide reductase. dioxygenase. .16.1.1 Mercury(II) reductase. 2.15 Terephthalate 1,2-dioxygenase. .16.1.2 Diferric-transferrin reductase. 2.16 2-hydroxyquinoline 5,6-dioxygenase. .16.1.3 Aquacobalamin reductase. 2.17 Nitric oxide dioxygenase. .16.1.4 Cob(II)alamin reductase. 2.18 Biphenyl 2,3-dioxygenase. 16.15 Aquacobalamin reductase 3.1 Salicylate 1-monooxygenase. (NADPH). US 2012/0266329 A1 Oct. 18, 2012 35

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. .16.1.6 Cyanocobalamin reductase .11 Magnesium protoporphyrin DX (cyanide-eliminating). methyltransferase. 16.17 Ferric-chelate reductase. .1.12 Methionine S-methyltransferase. .16.1.8 reductase. 3. 1.13 Methionine synthase. .16.3.1 Ferroxidase. .14 5-methyltetrahydropteroyltriglutamate-- .16.8.1 Cob(II)yrinic acid a,c-diamide S-methyltransferase. reductase. 2. 1.15 Fatty-acid O-methyltransferase. 17.1.1 CDP-4-dehydro-6-deoxyglucose 16 Methylene-fatty-acyl-phospholipid reductase. synthase. 17.1.2 4-hydroxy-3-methylbut-2-enyl 17 Phosphatidylethanolamine N diphosphate reductase. methyltransferase. 17.13 Leucoanthocyanidin reductase. 18 Polysaccharide O .17.1.4 Xanthine dehydrogenase. methyltransferase. 17.1.5 Nicotinate dehydrogenase. 19 Trimethylsulfonium-- 17.3.1 Pteridine oxidase. etrahydrofolate N-methyltransferase. 17.3.2 . 1.20 Glycine N-methyltransferase. 17.3.3 6-hydroxynicotinate 3. .21 Methylamine-glutamate N dehydrogenase. methyltransferase. .17.4.1 Ribonucleoside-diphosphate .1.22 Carnosine N-methyltransferase. reductase. 1.25 Phenol O-methyltransferase. 17.4.2 Ribonucleoside-triphosphate .1.26 odophenol O-methyltransferase. reductase. 1.27 Tyramine N-methyltransferase. 17.4.3 4-hydroxy-3-methylbut-2-en-1-yl 28 Phenylethanolamine N diphosphate synthase. melnyltransferase. 17.5. Phenylacetyl-CoA dehydrogenase. 2. 29 RNA (cytosine-5-)- 17.99.1 4-cresol dehydrogenase melnyltransferase. (hydroxylating). 31 RNA (guanine-N(1)-)- 17.99.2 Ethylbenzene hydroxylase. melnyltransferase. .18.1. Rubredoxin--NAD(+) reductase. 32 RNA (guanine-N(2)-)- .18.1.2 Ferredoxin-NADP(+) reductase. methyltransIerase. 18.13 Ferredoxin-NAD(+) reductase. .33 RNA (guanine-N(7)-)- .18.1.4 Rubredoxin--NAD(P)(+) reductase. melnyltransferase. .18.6. . 34 RNA (guanosine-2'-O-)- 19.6. Nitrogenase (flavodoxin). melnyltransferase. .20.1. Phosphonate dehydrogenase. 1.35 RNA (uracil-5-)-methyltransferase. .20.4. Arsenate reductase (). 36 RNA (adenine-N(1)-)- .20.4.2 Methylarsonate reductase. melnyltransferase. 2O.98.1 Arsenate reductase (). 37 DNA (cytosine-5-)- 2O.99.1 Arsenate reductase (donor). melnyltransferase. .21.3. Sopenicillin-N Synthase. 38 O-demethylpuromycin O .21.3.2 Columbamine oxidase. melnyltransferase. 21.3.3 . 1.39 nositol 3-methyltransferase. .21.3.4 Sulochrin oxidase ((+)-bisdechlorogeodin 1.40 nositol 1-methyltransferase. orming). .1.41 Sterol 24-C-methyltransferase. 21.3.5 Sulochrin oxidase ((-)-bisdechlorogeodin .1.42 Luteolin O-methyltransferase. orming). 1.43 Histone-lysine N-methyltransferase. 213.6 . .1.44 Dimethylhistidine N .21.4.1 D-proline reductase (dithiol). methyltransferase. .21.4.2 Glycine reductase. 1.45 . .21.4.3 Sarcosine reductase. 3. .1.46 soflavone 4'-O-methyltransferase. .21.4.4 Betaine reductase. 1.47 (ndolepyruvate C 21.99.1 Beta-cyclopiazonate dehydrogenase. methyltransferase. 97.1.1 Chlorate reductase. .1.48 rRNA (adenine-N(6)-)- 97.1.2 Pyrogallol hydroxytransferase. methyltransferase. 97.1.3 Sulfur reductase. 1.49 Amine N-methyltransferase. 97.14 Formate acetyltransferase activating 2. 1...SO Loganate O-methyltransferase. enzyme. S1 rRNA (guanine-N(1)-)- 97.1.8 Tetrachloroethene reductive dehalogenase. methyltransferase. 97.1.9 Selenate reductase. 2. 52 rRNA (guanine-N(2)-)- 97.1.10 Thyroxine 5'-deiodinase. methyltransferase. 97.1.11 Thyroxine 5-deiodinase. 1.53 Putrescine N-methyltransferase. ENZYME: 2. . . . 1.54 Deoxycytidylate C-methyltransferase. 1.55 tRNA (adenine-N(6)-)-methyltransferase. 2.1.1.1 Nicotinamide N-methyltransferase. 1.56 mRNA (guanine-N(7)-)-methyltransferase. 2.1.1.2 Guanidinoacetate N-methyltransferase. 57 mRNA (nucleoside-2'-O-)- 2.1.1.3 Thetin-homocysteine S-methyltransferase. methyltransferase. 2.1.1.4 Acetylserotonin O-methyltransferase. 2. 59 Cytochrome c-lysine N 2.1.1.5 Betaine-homocysteine S-methyltransferase. methyltransferase. 2.1.1.6 Catechol O-methyltransferase. 2. 160 Calmodulin-lysine N-methyltransferase. 2.1.1.7 Nicotinate N-methyltransferase. 61 tRNA (5-methylaminomethyl-2- 2.1.1.8 Histamine N-methyltransferase. thiouridylate)-methyltransferase. 2.1.19 Thiol S-methyltransferase. 62 mRNA (2'-O-methyladenosine-N(6)-)- 2.1.1.10 Homocysteine S-methyltransferase. methyltransferase. US 2012/0266329 A1 Oct. 18, 2012 36

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 63 Methylated-DNA-protein-cysteine S Sterigmatocystin 7-O- methyltransferase. melnyltransferase. 2. .64 3-demethylubiquinone-9 3-O- 2. 1 Anthranilate N-methyltransferase. methyltransferase. Glucuronoxylan 4-O- 1.65 Licodione 2'-O-methyltransferase. melnyltransferase. 1.66 rRNA (-2'-O-)-methyltransferase. Site-specific DNA 1.67 Thiopurine S-methyltransferase. methyltransferase (cytosine-N(4)-specific). 1.68 Caffeate O-methyltransferase. Hexaprenyldihydroxybenzoate 69 5-hydroxyfuranocoumarin 5-O- melnyltransferase. methyltransferase. (RS)-1-benzyl-1,2,3,4- 2. 70 8-hydroxyfuranocoumarin 8-O- tetrahydroisoquinoline N-methyltransferase. methyltransferase. 3'-hydroxy-N-methyl-(S)-coclaurine 4'-O- .71 Phosphatidyl-N-methylethanolamine N melnyltransferase. methyltransferase. (S)-scoulerine 9-O-methyltransferase. 72 Site-specific DNA-methyltransferase 3. 8 Columbamine O-methyltransferase. (adenine-specific). 10-hydroxydihydrosanguinarine 10-O- .74 Methylenetetrahydrofolate--tRNA-(uracil melnyltransferase. 5-)-methyltransferase (FADH(2)-oxidizing). 120 12-hydroxydihydrochelirubine 12-O- 1.75 Apigenin 4'-O-methyltransferase. melnyltransferase. 3. 1.76 Quercetin 3-O-methyltransferase. 121 6-O-methylnorlaudanosoline 5'-O- 77 Protein-L-isoaspartate(D-aspartate) O melnyltransferase. methyltransferase. 122 (S)-tetrahydroprotoberberine N 2. 1.78 Isoorientin 3'-O-methyltransferase. melnyltransferase. .79 Cyclopropane-fatty-acyl-phospholipid 123 Cytochrome-c-methionine S synthase. melnyltransferase. 18O Protein-glutamate O-methyltransferase. .124 Cytochrome-c-arginine N 3. 1.82 3-methylquercitin 7-O-methyltransferase. melnyltransferase. .83 3,7-dimethylguercitin 4'-O- 1.125 Histone-arginine N-methyltransferase. melnyltransferase. 126 Myelin basic protein-arginine N 84 Methylguercetagetin 6-O- methyltransIerase. melnyltransferase. 127 Ribulose-bisphosphate carboxylase 1.85 Protein- N-methyltransferase. ysine N-methyltransferase. 86 Tetrahydromethanopterin S 2. 128 (RS)-norcoclaurine 6-O- melnyltransferase. methyltransferase. 1.87 Pyridine N-methyltransferase. 1129 nositol 4-methyltransferase. 88 8-hydroxyquercitin 8-O- 1130 Precorrin-2 C(20)-methyltransferase. melnyltransferase. 1131 Precorrin-3B C(17)-methyltransferase. 89 Tetrahydrocolumbamine 2-O- 132 Precorrin-6Y C(5,15)-methyltransferase melnyltransferase. (decarboxylating). 90 Methanol-5- 1133 Precorrin-4 C(11)-methyltransferase. hydroxybenzimidazolylcobamide Co 1136 Chlorophenol O-methyltransferase. melnyltransferase. 1.137 Arsenite methyltransferase. 91 sobutyraldoxime O 139 3'-demethylstaurosporine O melnyltransferase. methyltransferase. 1.92 Bergaptol O-methyltransferase. .1.140 (S)-coclaurine-N-methyltransferase. 3. 1.93 Xanthotoxol O-methyltransferase. .1.141 Jasmonate O-methyltransferase. .94 1-O-demethyl-17-O- .1.142 Cycloartenol 24-C-methyltransferase. deacetylvindoline O-methyltransferase. .1.143 24-methylenesterol C-methyltransferase. 1.9S Tocopherol O-methyltransferase. .1.144 Trans-aconitate 2-methyltransferase. 3. 1.96 Thioether S-methyltransferase. 1145 Trans-aconitate 3-methyltransferase. 97 3-hydroxyanthranilate 4-C- .1.146 (ISO)eugenol O-methyltransferase. melnyltransferase. 1.147 Corydaline synthase. 2. 1.98 Diphthine synthase. .1.148 Thymidylate synthase (FAD). .99 6-methoxy-2,3-dihydro-3- 1149 Myricetin O-methyltransferase. hydroxytabersonine N-methyltransferase. 1150 soflavone 7-O-methyltransferase. Protein-S-isoprenylcysteine O 1151 Cobalt-factor II C(20)-methyltransferase. melnyltransferase. 1152 Precorrin-6A synthase (deacetylating). .1.101 Macrocin O-methyltransferase. 2. Glycine hydroxymethyltransferase. 102 Demethylmacrocin O .2.2 Phosphoribosylglycinamide melnyltransferase. ormyltransferase. 103 Phosphoethanolamine N 2.3 Phosphoribosylaminoimidazolecarboxamide melnyltransferase. ormyltransferase. 104 Caffeoyl-CoA O .2.4 Glycine formimidoyltransferase. melnyltransferase. 2.5 105 N-benzoyl-4-hydroxyanthranilate Glutamate formimidoyltransferase. 4-O-methyltransferase. 2.7 D-alanine 2 1106 Tryptophan 2-C-methyltransferase. hydroxymethyltransferase. 107 Uroporphyrin-III C 2.8 Deoxycytidylate 5 methyltransferase. hydroxymethyltransferase. 108 6-hydroxymellein O 2.9 Methionyl-tRNA formyltransferase. methyltransferase. 2. 2.10 . 109 Demethylsterigmatocystin 6-O- .2.11 3-methyl-2-oxobutanoate methyltransferase. hydroxymethyltransferase. US 2012/0266329 A1 Oct. 18, 2012 37

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.1.3.1 Methylmalonyl-CoA 23. 45 N-acetylneuraminate 7-O(or 9-O)- carboxytransferase. acetyltransferase. 2.1.3.2 Aspartate carbamoyltransferase. 23. .46 Homoserine O-Succinyltransferase. 2.1.3.3 Ornithine carbamoyltransferase. 23. 47 8-amino-7-Oxononanoate synthase. 2.1.3.5 Oxamate carbamoyltransferase. 23. 48 Histone acetyltransferase. 2.1.3.6 Putrescine carbamoyltransferase. 23. 49 Deacetyl-citrate-(pro-3S)-lyase S 2.1.3.7 3-hydroxymethylcephem acetyltransferase. carbamoyltransferase. 23. SO Serine C-palmitoyltransferase. 2.1.3.8 Lysine carbamoyltransferase. 23. S1 1-acylglycerol-3-phosphate O 2.1.4.1 Glycine amidinotransferase. . 2.1.4.2 Scyllo-inosamine-4-phosphate 23. 52 2-acylglycerol-3-phosphate O amidinotransferase. acyltransferase. 2.2.1.1 Transketolase. 23. 53 Phenylalanine N-acetyltransferase. 2.2.1.2 Transaldolase. 23. 54 Formate C-acetyltransferase. 2.2.1.3 Formaldehyde transketolase. 23. S6 Aromatic-hydroxylamine O 2.2.1.4 Acetoin-ribose-5-phosphate acetyltransferase. transaldolase. 23. 57 Diamine N-acetyltransferase. 2.2.1.5 2-hydroxy-3-oxoadipate synthase. 23. S8 2,3-diaminopropionate N 2.2.1.6 Acetolactate synthase. oxalyltransferase. 2.2.1.7 1-deoxy-D-xylulose-5-phosphate 23. 59 Gentamicin 2'-N-acetyltransferase. synthase. 23. 60 Gentamicin 3'-N-acetyltransferase. 2.2.1.8 Fluorothreonine transaldolase. 23. 61 Dihydrolipoyllysine-residue 2.3.1.1 Amino-acid N-acetyltransferase. Succinyltransferase. 2.3.1.2 Imidazole N-acetyltransferase. 23. 62 2-acylglycerophosphocholine O 2.3.1.3 Glucosamine N-acetyltransferase. acyltransferase. 2.3.1.4 Glucosamine 6-phosphate N 23. 63 1-alkylglycerophosphocholine O acetyltransferase. acyltransferase. 2.3.1.5 Arylamine N-acetyltransferase. 23. .64 Agmatine N(4)- 2.3.1.6 Choline O-acetyltransferase. coumaroyltransferase. 2.3.1.7 Carnitine O-acetyltransferase. 23. 6S Glycine N-choloyltransferase. 2.3.1.8 Phosphate acetyltransferase. 23. 66 Leucine N-acetyltransferase. 2.3.1.9 Acetyl-CoA C-acetyltransferase. 23. .67 1-alkylglycerophosphocholine O 2.3.1.10 Hydrogen-sulfide S-acetyltransferase. acetyltransferase. 2.3.1.11 Thioethanolamine S-acetyltransferase. 23. 68 Glutamine N-acyltransferase. 2.3.1.12 Dihydrolipoyllysine-residue 23. 69 Monoterpenol O-acetyltransferase. acetyltransferase. 23. 70 CDP-acylglycerol O 2.3.1.13 Glycine N-acyltransferase. arachidonoyltransferase. 2.3.1.14 Glutamine N-phenylacetyltransferase. 23. 71 Glycine N-benzoyltransferase. 2.3.1.15 Glycerol-3-phosphate O-acyltransferase. 23. 72 Indoleacetylglucose--inositol O 2.3.1.16 Acetyl-CoA C-acyltransferase. acyltransferase. 2.3.1.17 Aspartate N-acetyltransferase. 23. 73 Diacylglycerol--sterol O 2.3.1.18 Galactoside O-acetyltransferase. acyltransferase. 2.3.1.19 Phosphate butyryltransferase. 23. .74 Naringenin-chalcone synthase. 2.3.1.20 Diacylglycerol O-acyltransferase. 23. 75 Long-chain-alcohol O-fatty 2.3.1.21 Carnitine O-palmitoyltransferase. acyltransferase. 2.3.1.22 2-acylglycerol O-acyltransferase. 23. .76 Retinol O-fatty-acyltransferase. 2.3.1.23 1-acylglycerophosphocholine O 23. 77 Triacylglycerol--sterol O acyltransferase. acyltransferase. 2.3.1.24 Sphingosine N-acyltransferase. 23. Heparan-alpha-glucosaminide N 2.3.1.25 Plasmalogen synthase. acetyltransferase. 2.3.1.26 Sterol O-acyltransferase. 23. Maltose O-acetyltransferase. 2.3.1.27 Cortisol O-acetyltransferase. 23. Cysteine-S-conjugate N 2.3.1.28 Chloramphenicol O-acetyltransferase. acetyltransferase. 2.3.1.29 Glycine C-acetyltransferase. 23. 81 Aminoglycoside N(3)- 2.3.1.30 Serine O-acetyltransferase. acetyltransferase. 2.3.1.31 Homoserine O-acetyltransferase. 23. 82 Aminoglycoside N(6')- 2.3.1.32 Lysine N-acetyltransferase. acetyltransferase. 2.3.1.33 Histidine N-acetyltransferase. 23. .83 Phosphatidylcholine--dolichol O 2.3.1.34 D-tryptophan N-acetyltransferase. acyltransferase. 2.3.1.35 Glutamate N-acetyltransferase. 23. 84 Alcohol O-acetyltransferase. 2.3.1.36 D-amino-acid N-acetyltransferase. 23. .85 Fatty-acid synthase. 2.3.1.37 5-aminolevulinate synthase. 23. 86 Fatty-acyl-CoA synthase. 2.3.1.38 Acyl-carrier-protein S-acetyltransferase. 23. .87 Aralkylamine N-acetyltransferase. 2.3.139 Acyl-carrier-protein S 23. 88 Peptide alpha-N-acetyltransferase. malonyltransferase. 23. 89 Tetrahydrodipicolinate N-acetyltransferase. 23.140 Acyl-acyl-carrier-protein-phospholipid 23. 90 Beta-glucogallin O-galloyltransferase. O-acyltransferase. 23. .91 Sinapoylglucose-choline O 2.3.1.41 3-oxoacyl-acyl-carrier-protein synthase. Sinapoyltransferase. 2.3.1.42 Glycerone-phosphate O-acyltransferase. 23. 92 Sinapoylglucose-malate O 23.143 Phosphatidylcholine-sterol O Sinapoyltransferase. acyltransferase. 23. .93 13-hydroxylupinine O-tigloyltransferase. 2.3.1.44 N-acetylneuraminate 4-O- 23. .94 Erythronolide synthase. acetyltransferase. 23. 95 Trihydroxystilbene synthase. US 2012/0266329 A1 Oct. 18, 2012 38

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 23. .96 Glycoprotein N-palmitoyltransferase. 2.3.1. 46 Pinosylvin synthase. 23. 97 Glycylpeptide N-tetradecanoyltransferase. 2.3.1. 47 Glycerophospholipid arachidonoyl 23. .98 Chlorogenate-glucarate O transferase (CoA-independent). hydroxycinnamoyltransferase. 2.3.1. 48 Glycerophospholipid acyltransferase 23. .99 Quinate O-hydroxycinnamoyltransferase. (CoA-dependent). 2.3.1. OO Myelin-proteolipid O 2.3.1. 49 Platelet-activating factor palmitoyltransferase. acetyltransferase. 2.3.1. Formylmethanofuran 2.3.1. 50 Salutaridinol 7-O-acetyltransferase. etrahydromethanopterin N-formyltransferase. 2.3.1. 51 Benzophenone synthase. 2.3.1. N(6)-hydroxylysine O-acetyltransferase. 2.3.1. 52 Alcohol O-cinnamoyltransferase. 2.3.1. Sinapoylglucose--Sinapoylglucose O 2.3.1. 53 Anthocyanin 5-aromatic Sinapoyltransferase. acyltransferase. 2.3.1. O4 -alkenylglycerophosphocholine O 2.3.1. Propionyl-CoA C(2)- acyltransferase. trimethyltridecanoyltransferase. 2.3.1. 05 Alkylglycerophosphate 2-O- 2.3.1. 55 Acetyl-CoA C-myristoyltransferase. acetyltransferase. 2.3.1. 56 Phloroisovalerophenone synthase. 2.3.1. O6 Tartronate O 2.3.1. 57 Glucosamine-1-phosphate N hydroxycinnamoyltransferase. acetyltransferase. 2.3.1. O7 7-O-deacetylvindoline O 2.3.1. 58 Phospholipid:diacylglycerol acyltransferase. acetyltransferase. 2.3.1. 59 Acridone synthase. 2.3.1. O8 N-acetyltransferase. 2.3.1. 60 Vinorine synthase. 2.3.1. 09 Arginine N-Succinyltransferase. 2.3.1. 61 Lovastatin nonaketide synthase. 2.3.1. 10 Tyramine N-feruloyltransferase. 2.3.1. 62 Taxadien-5-alpha-ol O-acetyltransferase. 2.3.1. 11 Mycocerosate synthase. 2.3.1. 63 O-hydroxytaxane O-acetyltransferase. 2.3.1. 12 D-tryptophan N-malonyltransferase. 2.3.1. 64 sopenicillin-NN-acyltransferase. 2.3.1. 13 Anthranilate N-malonyltransferase. 2.3.1. 65 6-methylsalicylic acid synthase. 2.3.1. 14 3,4-dichloroaniline N-malonyltransferase. 2.3.1. 66 2-alpha-hydroxytaxane 2-O- 2.3.1. 15 soflavone-7-O-beta-glucoside 6"-O- benzoyltransferase. malonyltransferase. 2.3.1. 67 0-deacetylbaccatin III 10-O- 2.3.1. 16 Flavonol-3-O-beta-glucoside O acetyltransferase. malonyltransferase. 2.3.1. 68 Dihydrolipoyllysine-residue (2- 2.3.1. 17 2,3,4,5-tetrahydropyridine-2,6- methylpropanoyl)transferase. icarboxylate N-Succinyltransferase. 2.3.1. 69 CO-methylating acetyl-CoA synthase. 2.3.1. 18 N-hydroxyarylamine O-acetyltransferase. 2.3.2. D-glutamyltransferase. 2.3.1. 19 cosanoyl-CoA synthase. 2.3.2.2 Gamma-glutamyltransferase. 2.3.1. 21 -alkenylglycerophosphoethanolamine O 2.3.2.3 Lysyltransferase. acyltransferase. 2.3.2.4 Gamma-glutamylcyclotransferase. 2.3.1. 22 Trehalose O-mycolyltransferase. 2.3.2.5 Glutaminyl-peptide cyclotransferase. 2.3.1. 23 Dolichol O-acyltransferase. 2.3.2.6 Leucyltransferase. 2.3.1. 25 -alkyl-2-acetylglycerol O 2.3.2.7 Aspartyltransferase. acyltransferase. 2.3.2.8 Arginyltransferase. 2.3.1. 26 socitrate O 2.3.2.9 Agaritine gamma-glutamyltransferase. dihydroxycinnamoyltransferase. 2.3.2.1 O UDP-N-acetylmuramoylpentapeptide-lysine 2.3.1. 27 Ornithine N-benzoyltransferase. N(6)-alanyltransferase. 2.3.1. 28 Ribosomal-protein-alanine N 2.3.2.1 1 Alanylphosphatidylglycerol synthase. acetyltransferase. 2.3.2.1 2 Peptidyltransferase. 2.3.1. 29 Acyl-[acyl-carrier-protein--UDP-N- 2.3.2.1 3 Protein-glutamine gamma acetylglucosamine O-acyltransferase. glutamyltransferase. 2.3.1. 30 Galactarate O 2.3.2.1 4 D-alanine gamma-glutamyltransferase. hydroxycinnamoyltransferase. 2.3.2.1 5 Glutathione gamma 2.3.1. 31 Glucarate O glutamylcysteinyltransferase. hydroxycinnamoyltransferase. 2.3.3.1 Citrate (Si)-synthase. 2.3.1. 32 Glucarolactone O 2.3.3.2 Decylcitrate synthase. hydroxycinnamoyltransferase. 2.3.3.3 Citrate (Re)-synthase. 2.3.1. 33 Shikimate O 2.3.3.4 Decylhomocitrate synthase. hydroxycinnamoyltransferase. 2.3.3.5 2-methylcitrate synthase. 2.3.1. 34 Galactolipid O-acyltransferase. 2.3.3.6 2-ethylmalate synthase. 2.3.1. 35 Phosphatidylcholine--retinol O 2.3.3.7 3-ethylmalate synthase. acyltransferase. 23.38 ATP . 2.3.1. 36 Polysialic-acid O-acetyltransferase. 233.9 . 2.3.1. 37 Carnitine O-octanoyltransferase. 2.3.3.1 O Hydroxymethylglutaryl-CoA synthase. 2.3.1. 38 Putrescine N 2.3.3.1 1 2-hydroxyglutarate synthase. hydroxycinnamoyltransferase. 2.3.3.1 2 3-propylmalate synthase. 2.3.1. 39 Ecclysone O-acyltransferase. 2.3.3.1 3 2-isopropylmalate synthase. 23. 140 Rosmarinate synthase. 2.3.3.1 4 . 23. .141 Galactosylacylglycerol O 2.3.3.1 5 Sulfoacetaldehyde acyltransferase. acetyltransferase. 23. .142 Glycoprotein O-fatty 2.4.1.1 Phosphorylase. acyltransferase. 2.4.1.2 Dextrin dextranase. 23. 143 Beta-glucogallin 2.4.1.4 Amylosucrase. tetrakisgalloylglucose O-galloyltransferase. 24.15 Dextranslucrase. 23. .144 Anthranilate N-benzoyltransferase. 24.17 Sucrose phosphorylase. 23. 145 Piperidine N-piperoyltransferase. 2.4.1.8 Maltose phosphorylase. US 2012/0266329 A1 Oct. 18, 2012 39

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.4.1. Inulosucrase. 2.4. Lipopolysaccharide 2.4. 10 LevanSucrase. glucosyltransferase I. 2.4. .11 Glycogen (starch) synthase. 2.4. 60 Abequosyltransferase. 2.4. .12 Cellulose synthase (UDP-forming). 2.4. 62 Ganglioside galactosyltransferase. 2.4. 13 Sucrose synthase. 2.4. 63 Linamarin synthase. 2.4. .14 Sucrose-phosphate synthase. 2.4. .64 Alpha,alpha-trehalose phosphorylase. 2.4. 1S Alpha,alpha-trehalose-phosphate 2.4. 6S 3-galactosyl-N-acetylglucosaminide synthase (UDP-forming). 4-alpha-L-fucosyltransferase. 2.4. 16 Chitin synthase. 2.4. 66 Procollagen glucosyltransferase. 2.4. 17 Glucuronosyltransferase. 2.4. .67 Galactinol--raffinose 2.4. 18 1,4-alpha-glucan branching galactosyltransferase. enzyme. 2.4. 68 Glycoprotein 6-alpha-L- 2.4. 19 Cyclomaltodextrin fucosyltransferase. glucanotransferase. 2.4. 69 Galactoside 2-alpha-L- 2.4. 20 Cellobiose phosphorylase. fucosyltransferase. 2.4. .21 Starch synthase. 2.4. 70 Poly(ribitol-phosphate) N 2.4. 22 Lactose synthase. acetylglucosaminyltransferase. 2.4. 23 Sphingosine beta 2.4. 71 Arylamine glucosyltransferase. galactosyltransferase. 2.4. 73 Lipopolysaccharide glucosyltransferase II. 2.4. .24 4-alpha-glucan 6-alpha 2.4. .74 Glycosaminoglycan galactosyltransferase. glucosyltransferase. 2.4. 75 UDP-galacturonosyltransferase. 2.4. 25 4-alpha-glucanotransferase. 2.4. .78 Phosphopolyprenol glucosyltransferase. 2.4. 26 DNA alpha-glucosyltransferase. 2.4. .79 Galactosylgalactosylglucosylceramide 2.4. 27 DNA beta-glucosyltransferase. beta-D-acetylgalactosaminyltransferase. 2.4. 28 Glucosyl-DNA beta 2.4. Ceramide glucosyltransferase. glucosyltransferase. 2.4. Flavone 7-O-beta-glucosyltransferase. 2.4. 29 Cellulose synthase (GDP-forming). 2.4. Galactinol--sucrose galactosyltransferase. 2.4. 30 ,3-beta-oligoglucan 2.4. Dolichyl-phosphate beta-D- phosphorylase. mannosyltransferase. 2.4. 31 Laminaribiose phosphorylase. 2.4. .85 Cyanohydrin beta-glucosyltransferase. 2.4. 32 Glucomannan 4-beta 2.4. 86 Glucosaminylgalactosylglucosylceramide mannosyltransferase. beta-galactosyltransferase. 2.4. .33 Alginate synthase. 2.4. .87 N-acetylactosaminide 3-alpha 2.4. 34 ,3-beta-glucan synthase. galactosyltransferase. 2.4. 35 Phenol beta-glucosyltransferase. 2.4. 88 Globoside alpha-N- 2.4. 36 Alpha,alpha-trehalose-phosphate acetylgalactosaminyltransferase. synthase (GDP-forming). 2.4. 90 N-acetylactosamine synthase. 2.4. 37 Fucosylgalactoside 3-alpha 2.4. .91 Flavonol 3-O-glucosyltransferase. galactosyltransferase. 2.4. 92 (N-acetylneuraminyl)- 2.4. 38 Beta-N- galactosylglucosylceramide N acetylglucosaminylglycopeptide beta-1,4- acetylgalactosaminyltransferase. galactosyltransferase. 2.4. .94 Protein N-acetylglucosaminyltransferase. 2.4. 39 Steroid N 2.4. 95 Bilirubin-glucuronoside acetylglucosaminyltransferase. glucuronosyltransferase. 2.4. 40 Glycoprotein-fucosylgalactoside 2.4. Sn-glycerol-3-phosphate 1 alpha-N-acetylgalactosaminyltransferase. galactosyltransferase. 2.4. 41 Polypeptide N 2.4. 97 ,3-beta-D-glucan phosphorylase. acetylgalactosaminyltransferase. 2.4. .99 Sucrose:Sucrose fructosyltransferase. 2.4. 43 Polygalacturonate 4-alpha 2.4.1. OO 2,1-fructan:2,1-fructan 1 galacturonosyltransferase. ructosyltransferase. 2.4. .44 Lipopolysaccharide 3-alpha 2.4.1. Alpha-1,3-mannosyl-glycoprotein 2-beta galactosyltransferase. N-acetylglucosaminyltransferase. 2.4. 45 2-hydroxyacylsphingosine 1-beta 2.4.1. Beta-1,3-galactosyl-O-glycosyl galactosyltransferase. glycoprotein beta-1,6-N- 2.4. 46 2-diacylglycerol 3-beta acetylglucosaminyltransferase. galactosyltransferase. 2.4.1. Alizarin 2-beta-glucosyltransferase. 2.4. 47 N-acylsphingosine 2.4.1. O-dihydroxycoumarin 7-O- galactosyltransferase. glucosyltransferase. 2.4. 48 Heteroglycan alpha 2.4.1. 05 Vitexin beta-glucosyltransferase. mannosyltransferase. 2.4.1. Isovitexin beta-glucosyltransferase. 2.4. 49 Cellodextrin phosphorylase. 2.4.1. 09 Dolichyl-phosphate-mannose--protein 2.4. SO Procollagen galactosyltransferase. mannosyltransferase. 2.4. 52 Poly(glycerol-phosphate) alpha 2.4.1. 10 tRNA-queuosine beta glucosyltransferase. mannosyltransferase. 2.4.1. 11 Coniferyl-alcohol glucosyltransferase. 2.4. 53 Poly(ribitol-phosphate) beta 2.4.1. 12 Alpha-1,4-glucan-protein synthase glucosyltransferase. (UDP-forming). 2.4. 54 Undecaprenyl-phosphate 2.4.1. 13 Alpha-1,4-glucan-protein synthase mannosyltransferase. (ADP-forming). 2.4. S6 Lipopolysaccharide N 2.4.1. 14 2-coumarate O-beta acetylglucosaminyltransferase. glucosyltransferase. 2.4. 57 Phosphatidylinositol alpha 2.4.1. 15 Anthocyanidin 3-O- mannosyltransferase. glucosyltransferase. US 2012/0266329 A1 Oct. 18, 2012 40

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.4.1. 16 Cyanidin-3-rhamnosylglucoside 5-O- 2.4.1. 57 2-diacylglycerol 3-glucosyltransferase. glucosyltransferase. 2.4.1. 58 3-hydroxydocosanoate 13-beta 2.4.1. 17 Dolichyl-phosphate beta glucosyltransferase. glucosyltransferase. 2.4.1. 59 Flavonol-3-O-glucoside L 2.4.1. 18 Cytokinin 7-beta-glucosyltransferase. rhamnosyltransferase. 2.4.1. 19 Dolichyl-diphosphooligosaccharide-- 2.4.1. 60 Pyridoxine 5'-O-beta-D- protein glycotransferase. glucosyltransferase. 2.4.1. 2O Sinapate 1-glucosyltransferase. 2.4.1. 61 Oligosaccharide 4-alpha-D- 2.4.1. 21 Indole-3-acetate beta glucosyltransferase. glucosyltransferase. 2.4.1. 62 Aldose beta-D-fructosyltransferase. 2.4.1. 22 Glycoprotein-N-acetylgalactosamine 2.4.1. 63 Beta-galactosyl-N- 3-beta-galactosyltransferase. acetylglucosaminylgalactosylglucosyl-ceramide beta 2.4.1. 23 Inositol 3-alpha-galactosyltransferase. 3-acetylglucosaminyltransferase. 2.4.1. 25 Sucrose-1,6-alpha-glucan 3(6)-alpha 2.4.1. 64 Galactosyl-N- glucosyltransferase. acetylglucosaminylgalactosylglucosyl-ceramide beta 2.4.1. 26 Hydroxycinnamate 4-beta 6-N-acetylglucosaminyltransferase. glucosyltransferase. 2.4.1. 65 N 2.4.1. 27 Monoterpenol beta acetylneuraminylgalactosylglucosylceramide beta-1,4- glucosyltransferase. N-acetylgalactosaminyltransferase. 2.4.1. 28 Scopoletin glucosyltransferase. 2.4.1. 66 Raffinose-raffinose alpha 2.4.1. 29 Peptidoglycan glycosyltransferase. galactosyltransferase. 2.4.1. 30 Dolichyl-phosphate-mannose-- 2.4.1. 67 Sucrose 6 (F)-alpha-galactosyltransferase. glycolipid alpha-mannosyltransferase. 2.4.1. 68 Xyloglucan 4-glucosyltransferase. 2.4.1. 31 Glycolipid 2-alpha 2.4.1. 70 soflavone 7-O-glucosyltransferase. mannosyltransferase. 2.4.1. 71 Methyl-ONN-azoxymethanol beta-D- 2.4.1. 32 Glycolipid 3-alpha glucosyltransferase. mannosyltransferase. 2.4.1. 72 Salicyl-alcohol beta-D- 2.4.1. 33 Xylosylprotein 4-beta glucosyltransferase. galactosyltransferase. 2.4.1. 73 Sterol 3-beta-glucosyltransferase. 2.4.1. 34 Galactosylxylosylprotein 3-beta 2.4.1. 74 Glucuronylgalactosylproteoglycan 4-beta galactosyltransferase. N-acetylgalactosaminyltransferase. 2.4.1. 35 Galactosylgalactosylxylosylprotein 3 2.4.1. 75 Glucuronosyl-N-acetylgalactosaminyl beta-glucuronosyltransferase. proteoglycan 4-beta-N- 2.4.1. 36 Gallate 1-beta-glucosyltransferase. acetylgalactosaminyltransferase. 2.4.1. 37 Sn-glycerol-3-phosphate 2-alpha 2.4.1. 76 Gibberellin beta-D-glucosyltransferase. galactosyltransferase. 2.4.1. 77 Cinnamate beta-D-glucosyltransferase. 2.4.1. 38 Mannotetraose 2-alpha-N- 2.4.1. 78 Hydroxymandelonitrile acetylglucosaminyltransferase. glucosyltransferase. 2.4.1. 39 Maltose synthase. 2.4.1. 79 Lactosylceramide beta-1,3- 2.4. 140 Alternanslucrase. galactosyltransferase. 2.4. .141 N-acetylglucosaminyldiphosphodolichol 2.4.1. 8O Lipopolysaccharide N N-acetylglucosaminyltransferase. acetylmannosaminouronosyltransferase. 2.4. .142 Chitobiosyldiphosphodolichol beta 2.4.1. 81 Hydroxyanthraquinone mannosyltransferase. glucosyltransferase. 2.4. 143 Alpha-1,6-mannosyl-glycoprotein 2 2.4.1. 82 Lipid-A-disaccharide synthase. beta-N-acetylglucosaminyltransferase. 2.4.1. 83 Alpha-1,3-glucan synthase. 2.4. .144 Beta-1,4-mannosyl-glycoprotein 4-beta 2.4.1. 84 Galactolipid galactosyltransferase. N-acetylglucosaminyltransferase. 2.4.1. 85 Flavanone 7-O-beta-glucosyltransferase. 2.4. 145 Alpha-1,3-mannosyl-glycoprotein 4 2.4.1. 86 Glycogenin glucosyltransferase. beta-N-acetylglucosaminyltransferase. 2.4.1. 87 N 2.4. .146 Beta-1,3-galactosyl-O-glycosyl acetylglucosaminyldiphosphoundecaprenol N-acetyl glycoprotein beta-1,3-N- beta-D-mannosaminyltransferase. acetylglucosaminyltransferase. 2.4.1. 88 N 2.4. 147 Acetylgalactosaminyl-O-glycosyl acetylglucosaminyldiphosphoundecaprenol glycoprotein beta-1,3-N- glucosyltransferase. acetylglucosaminyltransferase. 2.4.1. 89 Luteolin 7-O-glucuronosyltransferase. 2.4. 148 Acetylgalactosaminyl-O-glycosyl 2.4.1. 90 Luteolin-7-O-glucuronide 7-O- glycoprotein beta-1,6-N- glucuronosyltransferase. acetylglucosaminyltransferase. 2.4.1. 91 Luteolin-7-O-diglucuronide 4'-O- 2.4. 149 N-acetylactosaminide beta-1,3-N- glucuronosyltransferase. acetylglucosaminyltransferase. 2.4.1. 92 Nuatigenin 3-beta 2.4.1. 50 N-acetylactosaminide beta-1,6-N- glucosyltransferase. acetylglucosaminyl-transferase. 2.4.1. 93 Sarsapogenin 3-beta 2.4.1. 52 4-galactosyl-N-acetylglucosaminide 3 glucosyltransferase. alpha-L-fucosyltransferase. 2.4.1. 94 4-hydroxybenzoate 4-O-beta-D- 2.4.1. 53 Dolichyl-phosphate alpha-N- glucosyltransferase. acetylglucosaminyltransferase. 2.4.1. 95 Thiohydroximate beta-D- 2.4.1. Globotriosylceramide beta-1,6-N- glucosyltransferase. acetylgalactosaminyl-transferase. 2.4.1. 96 Nicotinate glucosyltransferase. 2.4.1. 55 Alpha-1,6-mannosyl-glycoprotein 6 2.4.1. 97 High-mannose-oligosaccharide beta beta-N-acetylglucosaminyltransferase. 14-N-acetylglucosaminyltransferase. 2.4.1. 56 Indolylacetyl-myo-inositol 2.4.1. 98 Phosphatidylinositol N galactosyltransferase. acetylglucosaminyltransferase. US 2012/0266329 A1 Oct. 18, 2012 41

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 24.1.199 Beta-mannosylphosphodecaprenol 24.2.17 ATP phosphoribosyltransferase. mannooligosaccharide 6-mannosyltransferase. 2.4.2.18 Anthranilate 2.4.1.201 Alpha-1,6-mannosyl-glycoprotein 4 phosphoribosyltransferase. beta-N-acetylglucosaminyltransferase. 24.2.19 Nicotinate-nucleotide diphosphorylase 2.4.1.202 2,4-dihydroxy-7-methoxy-2H-1,4- (carboxylating). benzoxazin-3 (4H)-one 2-D-glucosyltransferase. 2.4.2.20 Dioxotetrahydropyrimidine 24.1.2O3 Trans-zeatin O-beta-D- phosphoribosyltransferase. glucosyltransferase. 2.4.2.21 Nicotinate-nucleotide-- 24.1205 Galactogen 6-beta dimethylbenzimidazole phosphoribosyltransferase. galactosyltransferase. 2.4.2.22 Xanthine phosphoribosyltransferase. 24.12O6 Lactosylceramide 1,3-N-acetyl-beta 2.4.2.23 Deoxyuridine phosphorylase. D-glucosaminyltransferase. 2.4.2.24 1,4-beta-D-xylan synthase. 24.1207 Xyloglucan:Xyloglucosyltransferase. 24.2.25 Flavone apiosyltransferase. 24.1208 Diglucosyl diacylglycerol synthase. 2.4.2.26 Protein xylosyltransferase. 24.1209 Cis-p-coumarate glucosyltransferase. 24.2.27 dTDP-dihydrostreptose-streptidine-6- 2.4.1.210 Limonoid glucosyltransferase. phosphate dihydrostreptosyltransferase. 2.4.1.211 ,3-beta-galactosyl-N- 2.4.2.28 S-methyl-5-thioadenosine phosphorylase. acetylhexosamine phosphorylase. 24.2.29 Queuine tRNA-ribosyltransferase. 2.4.1.212 Hyaluronan synthase. 24.2.30 NAD(+) ADP-ribosyltransferase. 2.4.1.213 Glucosylglycerol-phosphate synthase. 2.4.2.31 NAD(P)(+)--arginine ADP 2.4.1.214 Glycoprotein 3-alpha-L- ribosyltransferase. lucosyltransferase. 2.4.2.32 Dolichyl-phosphate D-xylosyltransferase. 24.1215 Cis-zeatin O-beta-D- 24.2.33 Dolichyl-xylosyl-phosphate--protein glucosyltransferase. xylosyltransferase. 2.4.1.216 Trehalose 6-phosphate 2.4.2.34 Indolylacetylinositol arabinosyltransferase. phosphorylase. 24.2.35 Flavonol-3-O-glycoside xylosyltransferase. 24.1.217 Mannosyl-3-phosphoglycerate 24.2.36 NAD(+)--diphthamide ADP synthase. ribosyltransferase. 2.4.1.218 Hydroquinone glucosyltransferase. 24.2.37 NAD(+)--dinitrogen-reductase ADP-D- 24.1.219 Vomilenine glucosyltransferase. ribosyltransferase. 2.4.1.220 indoxyl-UDPG glucosyltransferase. 24.238 Glycoprotein 2-beta-D-xylosyltransferase. 2.4.1.221 Peptide-O-fucosyltransferase. 24.2.39 Xyloglucan 6-Xylosyltransferase. 2.4.1.222 O-fucosylpeptide 3-beta-N- 2.4.2.40 Zeatin O-beta-D-xylosyltransferase. acetylglucosaminyltransferase. 24.99.1 Beta-galactoside alpha-2,6-Sialyltransferase. 24.1.223 Glucuronyl-galactosyl-proteoglycan 24.99.2 MonoSialoganglioside sialyltransferase. 4-alpha-N-acetylglucosaminyltransferase. 24.99.3 Alpha-N-acetylgalactosaminide alpha-2,6- 2.4.1.224 Glucuronosyl-N- sialyltransferase. acetylglucosaminyl-proteoglycan 4-alpha-N- 24.99.4 Beta-galactoside alpha-2,3-Sialyltransferase. acetylglucosaminyltransferase. 24.99.5 Galactosyldiacylglycerol alpha-2,3- 24.1225 N-acetylglucosaminyl-proteoglycan sialyltransferase. 4-beta-glucuronosyltransferase. 24.99.6 N-acetylactosaminide alpha-2,3- 2.4.1.226 N-acetylgalactosaminyl sialyltransferase. proteoglycan 3-beta-glucuronosyltransferase. 24.99.7 (Alpha-N-acetylneuraminyl-2,3-beta 24.1227 Undecaprenyldiphospho galactosyl-1,3)-N-acetyl-galactosaminide 6-alpha muramoylpentapeptide beta-N- sialyltransferase. acetylglucosaminyltransferase. 24.99.8 Alpha-N-acetylneuraminate alpha-2,8- 24.1228 Lactosylceramide 4-alpha sialyltransferase. galactosyltransferase. 24.99.9 Lactosylceramide alpha-2,3- 24.1229 Skp1-protein-hydroxyproline N sialyltransferase. acetylglucosaminyltransferase. 24.99.10 Neolactotetraosylceramide alpha-2,3- 24.1230 Koibiose phosphorylase. sialyltransferase. 2.4.1.231 Alpha,alpha-trehalose phosphorylase 24.99.11 Lactosylceramide alpha-2,6-N- (configuration-retaining). sialyltransferase. 24.1.232 initiation-specific alpha-1,6- 2.S. Dimethylallyltranstransferase. mannosyltransferase. 2.S. Thiamine pyridinylase. 2.4.2.1 Purine-nucleoside phosphorylase. 2.S. Thiamine-phosphate diphosphorylase. 2.4.2.2 Pyrimidine-nucleoside phosphorylase. 2.S. Adenosylmethionine cyclotransferase. 2.4.2.3 Uridine phosphorylase. 2.S. Galactose-6-sulfurylase. 2.4.2.4 Thymidine phosphorylase. 2.S. Methionine adenosyltransferase. 24.25 Nucleoside ribosyltransferase. 2.S. UDP-N-acetylglucosamine 1 2.4.2.6 Nucleoside deoxyribosyltransferase. carboxyvinyltransferase. 24.2.7 Adenine phosphoribosyltransferase. 2.5.1.8 tRNA isopentenyltransferase. 2.4.2.8 Hypoxanthine 2.5.1.9 Riboflavin synthase. phosphoribosyltransferase. 2.5.1.10 Geranyltranstransferase. 2.4.2.9 Uracil phosphoribosyltransferase. 2.5.1.11 Trans-octaprenyltranstransferase. 2.4.2.10 Orotate phosphoribosyltransferase. 2.5.1.15 Dihydropteroate synthase. 2.4.2.11 Nicotinate 2.5.1.16 Spermidine synthase. phosphoribosyltransferase. 2.5.1.17 Cob(I)yrinic acid a,c-diamide 2.4.2.12 Nicotinamide adenosyltransferase. phosphoribosyltransferase. 2.5.1.18 Glutathione transferase. 2.4.2.14 Amidophosphoribosyltransferase. 2.5.1.19 3-phosphoshikimate 1 24.2.15 Guanosine phosphorylase. carboxyvinyltransferase. 2.4.2.16 Urate-ribonucleotide phosphorylase. 2.5.1.20 Rubber cis-polyprenylcistransferase. US 2012/0266329 A1 Oct. 18, 2012 42

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme ECNumbers wi h the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.S. .21 Farnesyl-diphosphate 2.6.1.22 (S)-3-amino-2-methylpropionate farnesyltransferase. . 2.S. 22 synthase. 26.123 4-hydroxyglutamate transaminase. 2.S. 23 Sym-norspermidine synthase. 2.6.1.24 Diiodotyrosine transaminase. 2.S. .24 Discadenine synthase. 26.126 Thyroid-hormone transaminase. 2.S. 25 tRNA-uridine 26.1.27 Tryptophan transaminase. aminocarboxypropyltransferase. 26.1.28 Tryptophan-phenylpyruvate 2.S. 26 Alkylglycerone-phosphate synthase. transaminase. 2.S. 27 Adenylate dimethylallyltransferase. 26.1.29 Diamine transaminase. 2.S. 28 Dimethylallylcistransferase. 26.1.30 Pyridoxamine-pyruvate transaminase. 2.S. 29 Farnesyltranstransferase. 26.1.31 Pyridoxamine--Oxaloacetate 2.S. 30 Trans-hexaprenyltranstransferase. transaminase. 2.S. 31 Di-transpoly-cis 26.1.32 Valine-3-methyl-2-oxovalerate decaprenylcistransferase. transaminase. 2.S. 32 Geranylgeranyl-diphosphate 26.1.33 dTDP-4-amino-4,6-dideoxy-D-glucose geranylgeranyltransferase. transaminase. 2.S. .33 Trans-pentaprenyltranstransferase. 2.6.1.34 UDP-2-acetamido-4-amino-2,4,6- 2.S. 34 Tryptophan dimethylallyltransferase. trideoxyglucose transaminase. 2.S. 35 Aspulvinone dimethylallyltransferase. 26.1.35 Glycine-oxaloacetate transaminase. 2.S. 36 Trihydroxypterocarpan 26.136 L-lysine 6-transaminase. dimethylallyltransferase. 26.1.37 2-aminoethylphosphonate-pyruvate 2.S. 38 Isonocardicin synthase. transaminase. 2.S. 39 4-hydroxybenzoate 26.1.38 Histidine transaminase. nonaprenyltransferase. 26.139 2-aminoadipate transaminase. 2.S. 41 Phosphoglycerol 2.6.1.40 (R)-3-amino-2-methylpropionate-- geranylgeranyltransferase. pyruvate transaminase. 2.S. 42 Geranylgeranylglycerol-phosphate 2.6.1.41 D-methionine-pyruvate transaminase. geranylgeranyltransferase. 2.6.1.42 Branched-chain-amino-acid 2.S. 43 Nicotianamine synthase. transaminase. 2.S. .44 Homospermidine synthase. 2.6.1.43 Aminolevulinate transaminase. 2.S. 45 Homospermidine synthase (spermidine 2.6.1.44 Alanine-glyoxylate transaminase. specific). 26.1.45 Serine-glyoxylate transaminase. 2.S. 46 Deoxyhypusine synthase. 2.6.1.46 Diaminobutyrate--pyruvate 2.S. 47 Cysteine synthase. transaminase. 2.S. 48 CyStathionine gamma-synthase. 26.1.47 Alanine-oxomalonate transaminase. 2.S. 49 O-acetylhomoserine 2.6.1.48 5-aminovalerate transaminase. aminocarboxypropyltransferase. 26.1.49 Dihydroxyphenylalanine transaminase. 2.S. SO Zeatin 9-aminocarboxyethyltransferase. 26.1.50 Glutamine-Scyllo-inositol 2.S. S1 Beta-pyrazolylalanine synthase. transaminase. 2.S. 52 L-mimosine synthase. 26.1.51 Serine-pyruvate transaminase. 2.S. 53 Uracilylalanine synthase. 26.1.52 Phosphoserine transaminase. 2.S. 54 3-deoxy-7-phosphoheptulonate synthase. 26.1.54 Pyridoxamine-phosphate transaminase. 2.S. 55 3-deoxy-8-phosphooctulonate synthase. 2.6.1.SS Taurine-2-oxoglutarate transaminase. 2.S. S6 N-acetylneuraminate synthase. 26.1.56 D-1-guanidino-3-amino-1,3-dideoxy 2.S. 57 N-acylneuraminate-9-phosphate synthase. Scyllo-inositol transaminase. 2.S. S8 Protein farnesyltransferase. 2.6.1.57 Aromatic-amino-acid transaminase. 2.S. 59 Protein geranylgeranyltransferase type I. 26.1.58 Phenylalanine(histidine) transaminase. 2.S. 60 Protein geranylgeranyltransferase type II. 26.1.59 dTDP-4-amino-4,6-dideoxygalactose 2.S. 61 Hydroxymethylbilane synthase. transaminase. 2.S. 62 Chlorophyll synthase. 2.6.160 Aromatic-amino-acid-glyoxylate 2.S. 63 Adenosyl-fluoride synthase. transaminase. 2.S. .64 2-succinyl-6-hydroxy-2,4-cyclohexadiene 2.6.162 Adenosylmethionine-8-amino-7- 1-carboxylate synthase. Oxononanoate transaminase. 2.6. . 2.6.163 Kynurenine-glyoxylate transaminase. 2.6. . 2.6.1.64 Glutamine-phenylpyruvate transaminase. 2.6. Cysteine transaminase. 26.1.65 N(6)-acetyl-beta-lysine transaminase. 2.6. Glycine transaminase. 2.6.166 Valine-pyruvate transaminase. 2.6. Tyrosine transaminase. 2.6.167 2-aminohexanoate transaminase. 2.6. Leucine transaminase. 2.6.168 Ornithine(lysine) transaminase. 2.6. Kynurenine-oxoglutarate transaminase. 26.1.70 Aspartate--phenylpyruvate transaminase. 2.6. 2,5-diaminovalerate transaminase. 26.171 Lysine-pyruvate 6-transaminase. 2.6. Histidinol-phosphate transaminase. 26.1.72 D-4-hydroxyphenylglycine transaminase. 2.6.1. Acetylornithine transaminase. 26.173 Methionine-glyoxylate transaminase. 2.6.1. Alanine-oxo-acid transaminase. 26.174 Cephalosporin-C transaminase. 2.6.1. Ornithine-oxo-acid transaminase. 2.6.1.75 Cysteine-conjugate transaminase. 2.6.1. -oxo-acid transaminase. 26.1.76 Diaminobutyrate-2-oxoglutarate 2.6.1. Glutamine-pyruvate transaminase. transaminase. 2.6.1. Glutamine-fructose-6-phosphate 2.6.177 Taurine-pyruvate aminotransferase. transaminase (isomerizing). 26.3.1 Oximinotransferase. 2.6. 17 Succinyldiaminopimelate transaminase. 26.99.1 dATP(dGTP)--DNA purinetransferase. 2.6. 18 Beta-alanine-pyruvate transaminase. 2.7.1.1 Hexokinase. 2.6. 19 4-aminobutyrate transaminase. 2.7.1.2 Glucokinase. 2.6. .21 D-alanine transaminase. 2.7.1.3 Ketohexokinase. US 2012/0266329 A1 Oct. 18, 2012 43

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.7.1.4 Fructokinase. 2.7. .74 Deoxycytidine kinase. 2.7.1.5 Rhamnulokinase. 2.7. .76 Deoxyadenosine kinase. 2.7.1.6 Galactokinase. 2.7. 77 Nucleoside phosphotransferase. 2.7.1.7 Mannokinase. 2.7. .78 Polynucleotide 5'-hydroxy-kinase. 2.7.1.8 Glucosamine kinase. 2.7. .79 Diphosphate-glycerol phosphotransferase. 2.7.1.10 Phosphoglucokinase. 2.7. 8O Diphosphate-serine phosphotransferase. 2.7.1.11 6-phosphofructokinase. 2.7. 81 Hydroxylysine kinase. 2.7.1.12 Gluconokinase. 2.7. 82 Ethanolamine kinase. 2.7.1.13 Dehydrogluconokinase. 2.7. .83 Pseudouridine kinase. 2.7.1.14 Sedoheptulokinase. 2.7. 84 Alkylglycerone kinase. 2.7.1.15 Ribokinase. 2.7. .85 Beta-glucoside kinase. 2.7.1.16 Ribulokinase. 2.7. 86 NADH kinase. 2.7.1.17 Xylulokinase. 2.7. .87 Streptomycin 3"-kinase. 2.7.1.18 Phosphoribokinase. 2.7. 88 Dihydrostreptomycin-6-phosphate 3'-alpha 2.7.1.19 Phosphoribulokinase. kinase. 2.7.1.20 Adenosine kinase. 2.7. 89 Thiamine kinase. 2.7.1.21 Thymidine kinase. 2.7. 90 Diphosphate-fructose-6-phosphate 1 2.7.1.22 Ribosylnicotinamide kinase. phosphotransferase. 2.7.1.23 NAD(+) kinase. 2.7. .91 Sphinganine kinase. 2.7.1.24 Dephospho-CoA kinase. 2.7. 92 5-dehydro-2-deoxygluconokinase. 2.7.1.25 Adenylyl-sulfate kinase. 2.7. .93 Alkylglycerol kinase. 2.7.1.26 Riboflavin kinase. 2.7. .94 Acylglycerol kinase. 2.7.1.27 Erythritol kinase. 2.7. 95 Kanamycin kinase. 2.7.1.28 Triokinase. 2.7. .99 Pyruvate dehydrogenase (lipoamide) kinase. 2.7.1.29 Glycerone kinase. 2.7.1. S-methyl-5-thioribose kinase. 2.7.1.30 Glycerol kinase. 2.7.1. Tagatose kinase. 2.7.1.31 Glycerate kinase. 2.7.1. Hamamelose kinase. 2.7.1.32 Choline kinase. 2.7.1. Viomycin kinase. 2.7.1.33 Pantothenate kinase. 2.7.1. Diphosphate-protein phosphotransferase. 2.7.1.34 Pantetheline kinase. 2.7.1. 6-phosphofructo-2-kinase. 2.7.1.35 Pyridoxal kinase. 2.7.1. Glucose-1,6-bisphosphate synthase. 2.7.1.36 Mevalonate kinase. 2.7.1. Diacylglycerol kinase. 2.7.1.37 Protein kinase. 2.7.1. Dolichol kinase. 2.7.1.38 Phosphorylase kinase. 2.7.1. Hydroxymethylglutaryl-CoA reductase 2.7.1.39 Homoserine kinase. (NADPH) kinase. 2.7.1.40 Pyruvate kinase. 2.7.1. 10 Dephospho-reductase kinase kinase. 2.7.1.41 Glucose-1-phosphate 2.7.1. 12 Protein-tyrosine kinase. phosphodismutase. 2.7.1. 13 Deoxyguanosine kinase. 2.7.1.42 Riboflavin phosphotransferase. 2.7.1. 14 AMP--thymidine kinase. 2.71.43 Glucuronokinase. 2.7.1. 15 3-methyl-2-oxobutanoate dehydrogenase 2.7.1.44 Galacturonokinase. (lipoamide) kinase. 2.7.1.45 2-dehydro-3-deoxygluconokinase. 2.7.1. 16 Isocitrate dehydrogenase (NADP+) kinase. 2.7.1.46 L-arabinokinase. 2.7.1. 17 Myosin light-chain kinase. 2.7.1.47 D-ribulokinase. 2.7.1. 18 ADP--thymidine kinase. 2.7148 Uridine kinase. 2.7.1. 19 Hygromycin-B kinase. 2.7.1.49 Hydroxymethylpyrimidine kinase. 2.7.1. 2O Caldesmon kinase. 2.7.1. SO Hydroxyethylthiazole kinase. 2.7.1. 21 Phosphoenolpyruvate-glycerone 2.7.1.51 L-fuculokinase. phosphotransferase. 2.7.1.52 Fucokinase. 2.7.1. 22 Xylitol kinase. 2.7.1.53 L-xylulokinase. 2.7.1. 23 Calcium calmodulin-dependent protein 2.7.1.54 D-arabinokinase. kinase. 2.7.155 Allose kinase. 2.7.1. 24 Tyrosine 3-monooxygenase kinase. 2.7.1.56 -phosphofructokinase. 2.7.1. 25 Rhodopsin kinase. 2.7.1.58 2-dehydro-3-deoxygalactonokinase. 2.7.1. 26 Beta-adrenergic-receptor kinase. 2.7.1.59 N-acetylglucosamine kinase. 2.7.1. 27 nositol-trisphosphate 3-kinase. 2.7.1.60 N-acylmannosamine kinase. 2.7.1. 28 Acetyl-CoA carboxylase kinase. 2.7.1.61 Acyl-phosphate--hexose 2.7.1. 29 Myosin heavy-chain kinase. phosphotransferase. 2.7.1. 30 Tetraacyldisaccharide 4'-kinase. 2.7.1.62 Phosphoramidate--hexose 2.7.1. 31 Low-density lipoprotein receptor phosphotransferase. kinase. 2.7.1.63 Polyphosphate--glucose 2.7.1. 32 Tropomyosin kinase. phosphotransferase. 2.7.1. 34 nositol-tetrakisphosphate 1-kinase. 2.7.1.64 nositol 3-kinase. 2.7.1. 35 kinase. 2.7.1.65 Scyllo-inosamine 4-kinase. 2.7.1. 36 Macrollide 2'-kinase. 2.7.1.66 Undecaprenol kinase. 2.7.1. 37 Phosphatidylinositol 3-kinase. 2.7.1.67 -phosphatidylinositol 4-kinase. 2.7.1. 38 Ceramide kinase. 2.71.68 -phosphatidylinositol-4-phosphate 2.7.1. 40 nositol-tetrakisphosphate 5-kinase. 5-kinase. 2.7.1. 41 RNA-polymerase-subunit kinase. 2.7.1.69 Protein-N(pi)-phosphohistidine--sugar 2.7.1. 42 Glycerol-3-phosphate--glucose phosphotransferase. phosphotransferase. 2.7.1.71 Shikimate kinase. 2.7.1. 43 Diphosphate-purine nucleoside kinase. 2.7.1.72 Streptomycin 6-kinase. 2.7.1. 44 Tagatose-6-phosphate kinase. 2.7.1.73 nosine kinase. 2.7.1. 45 Deoxynucleoside kinase. US 2012/0266329 A1 Oct. 18, 2012 44

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.7. .146 ADP-specific phosphofructokinase. 2.77.2 FMN adenylyltransferase. 2.7. 147 ADP-specific glucokinase. 2.7.7.3 Pantetheline-phosphate adenylyltransferase. 2.7. 148 4-(cytidine 5'-diphospho)-2-C-methyl 2.7.7.4 Sulfate adenylyltransferase. D-erythritol kinase. 2.77.5 Sulfate adenylyltransferase (ADP). 2.7. 149 -phosphatidylinositol-5-phosphate 4 2.7.7.6 DNA-directed RNA polymerase. kinase. 2.77.7 DNA-directed DNA polymerase. 2.7.1. 50 -phosphatidylinositol-3-phosphate 5 2.77.8 Polyribonucleotide nucleotidyltransferase. kinase. 2.7.7.9 UTP-glucose-1-phosphate 2.7.1. 51 nositol-polyphosphate multikinase. uridylyltransferase. 2.7.1. 53 Phosphatidylinositol-4,5-bisphosphate 3 2.77.10 UTP-hexose-1-phosphate kinase. uridylyltransferase. 2.7.1. Phosphatidylinositol-4-phosphate 3 2.7.7.11 UTP--xylose-1-phosphate kinase. uridylyltransferase. 2.7.1. 55 Diphosphoinositol-pentakisphosphate 2.77.1.2 UDP-glucose--hexose-1-phosphate kinase. uridylyltransferase. 2.7.1. 56 Adenosylcobinamide kinase. 2.77.13 Mannose-1-phosphate guanylyltransferase. 2.72. Acetate kinase. 2.77.14 Ethanolamine-phosphate 27.2.2 Carbamate kinase. cytidylyltransferase. 2.7.2.3 Phosphoglycerate kinase. 2.7.7.15 Choline-phosphate cytidylyltransferase. 27.24 Aspartate kinase. 2.77.18 Nicotinate-nucleotide adenylyltransferase. 27.26 Formate kinase. 2.7.7.19 Polynucleotide adenylyltransferase. 2.72.7 Butyrate kinase. 2.7.7.21 RNA cytidylyltransferase. 27.2.8 Acetylglutamate kinase. 2.7.7.22 Mannose-1-phosphate guanylyltransferase 27.2.1 O Phosphoglycerate kinase (GTP). (GDP). 27.2.1 1 Glutamate 5-kinase. 2.7.7.23 UDP-N-acetylglucosamine 27.2.1 2 Acetate kinase (diphosphate). diphosphorylase. 27.2.1 3 Glutamate 1-kinase. 2.7.7.24 Glucose-1-phosphate 27.2.14 Branched-chain-fatty-acid kinase. hymidylyltransferase. 2.7.3.1 Guanidinoacetate kinase. 2.7.7.25 RNA adenylyltransferase. 2.7.32 kinase. 2.77.27 Glucose-1-phosphate adenylyltransferase. 2.7.3.3 Arginine kinase. 2.7.7.28 Nucleoside-triphosphate-aldose 1 2.73.4 Taurocyamine kinase. phosphate nucleotidyltransferase. 2.73.5 Lombricine kinase. 2.7.7.30 Fucose-1-phosphate guanylyltransferase. 2.73.6 Hypotaurocyamine kinase. 2.7.7.31 DNA nucleotidylexotransferase. 2.73.7 Opheline kinase. 2.7.7.32 Galactose-1-phosphate 2.73.8 Ammonia kinase. hymidylyltransferase. 2.73.9 Phosphoenolpyruvate--protein 2.7.7.33 Glucose-1-phosphate cytidylyltransferase. phosphotransferase. 2.7.7.34 Glucose-1-phosphate 2.7.3.1 O Agmatine kinase. guanylyltransferase. 2.7.3.1 1 Protein-histidine pros-kinase. 2.7.7.35 Ribose-5-phosphate adenylyltransferase. 2.7.3.1 2 Protein-histidine tele-kinase. 2.7.7.36 Aldose-1-phosphate adenylyltransferase. 2.74.1 Polyphosphate kinase. 2.7.737 Aldose-1-phosphate 2.74.2 Phosphomevalonate kinase. nucleotidyltransferase. 2.74.3 Adenylate kinase. 2.77.38 3-deoxy-manno-octulosonate 2.7.4.4 Nucleoside-phosphate kinase. cytidylyltransferase. 2.74.6 Nucleoside-diphosphate kinase. 2.7.7.39 Glycerol-3-phosphate 2.74.7 Phosphomethylpyrimidine kinase. cytidylyltransferase. 2.74.8 Guanylate kinase. 2.7.7.40 D-ribitol-5-phosphate 2.74.9 dTMP kinase. cytidylyltransferase. 2.74.1 O Nucleoside-triphosphate--adenylate 2.77.41 Phosphatidate cytidylyltransferase. kinase. 2.7.7.42 Glutamate-ammonia-ligase 2.74.1 1 (Deoxy)adenylate kinase. adenylyltransferase. 2.74.1 2 T(2)-induced deoxynucleotide kinase. 2.77.43 N-acylneuraminate cytidylyltransferase. 2.74.1 3 (Deoxy) nucleoside-phosphate kinase. 2.77.44 Glucuronate-1-phosphate 2.74.14 Cytidylate kinase. uridylyltransferase. 2.74.1 5 Thiamine-diphosphate kinase. 2.7.7:45 Guanosine-triphosphate 2.74.1 6 Thiamine-phosphate kinase. guanylyltransferase. 2.74.1 7 3-phosphoglyceroyl-phosphate-- 2.77.46 Gentamicin 2"-nucleotidyltransferase. polyphosphate phosphotransferase. 2.7.747 Streptomycin 3'-adenylyltransferase. 2.74.1 8 Farnesyl-diphosphate kinase. 2.7.7.48 RNA-directed RNA polymerase. 2.74.1 9 5-methyldeoxycytidine-5'-phosphate 2.7.7.49 RNA-directed DNA polymerase. kinase. 2.77.50 mRNA guanylyltransferase. 2.74.2 O Dolichyl-diphosphate-polyphosphate 2.7.7.51 Adenylylsulfate--ammonia phosphotransferase. adenylyltransferase. 2.74.2 1 Inositol-hexakisphosphate kinase. 2.7.7.52 RNA uridylyltransferase. 2.76.1 Ribose-phosphate diphosphokinase. 2.7.7.53 ATP adenylyltransferase. 2.7.62 Thiamine diphosphokinase. 2.77.54 Phenylalanine adenylyltransferase. 2.76.3 2-amino-4-hydroxy-6- 2.77.55 Anthranilate adenylyltransferase. hydroxymethyldihydropteridine diphosphokinase. 2.77.56 tRNA nucleotidyltransferase. 27.64 Nucleotide diphosphokinase. 2.7.7.57 N-methylphosphoethanolamine 2.76.5 GTP diphosphokinase. cytidylyltransferase. 2.77.1 Nicotinamide-nucleotide 2.7.7.58 (2,3-dihydroxybenzoyl)adenylate adenylyltransferase. synthase. US 2012/0266329 A1 Oct. 18, 2012 45

TABLE 2-continued TABLE 2-continued EC Numbers with he corresponding name given to each enzyme ECNumbers wi h the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.7.7.59 Protein PII uridylyltransferase. 2.8.2.14 Bile-salt . 2.7.7.60 2-C-me hyl-D-erythritol 4-phosphate 2.8.2.15 Steroid sulfotransferase. cytidylyltransferase. 2.8.2.16 Thiol sulfotransferase. 2.7.7.61 Holo-ACP synthase. 2.8.2.17 Chondroitin 6-sulfotransferase. 2.7.7.62 Adenosy cobinamide-phosphate 2.8.2.18 Cortisol sulfotransferase. guanylyltransferase. 2.8.2.19 Triglucosylalkylacylglycerol 2.78.1 Ethanolaminephosphotransferase. Sulfotransferase. 2.78.2 Diacylglycerol 2.8.2.2O Protein-tyrosine sulfotransferase. cholinephosphotransferase. 2.8.2.21 Keratan Sulfotransferase. 2.783 Cerami e cholinephosphotransferase. 2.8.2.22 Arylsulfate sulfotransferase. 27.84 Serine-phosphoethanolamine synthase. 2.8.2.23 Heparan sulfate-glucosamine 3 2.78.5 CDP-diacylglycerol-glycerol -3-phosphate Sulfotransferase 1. 3-phosphatidyltransferase. 2.8.2.24 Desulfoglucosinolate Sulfotransferase. 2.78.6 Undecaprenyl-phosphate galactOSe 2.8.2.25 Flavonol 3-sulfotransferase. phosphotransferase. 2.8.2.26 Quercetin-3-sulfate 3'-sulfotransferase. 2.78.7 Holo-acyl-carrier-protein synthase. 2.8.2.27 Quercetin-3-sulfate 4'-sulfotransferase. 2.78.8 CDP-diacylglycerol--serine O 2.8.2.28 Quercetin-3,3'-bisSulfate 7 phosphatidyltransferase. Sulfotransferase. 2.78.9 Phosphomannan 2.8.2.29 Heparan sulfate-glucosamine 3 mannosephosphotransferase. Sulfotransferase 2. 2.78.10 Sphingosine cholinephosphotransferase. 2.8.2.30 Heparan sulfate-glucosamine 3 2.78.11 CDP-diacylglycerol--inositol Sulfotransferase 3. phosphatidyltransferase. 2.83.1 Propionate CoA-transferase. 2.78.12 CDP-glycero glycerophosphotransferase. 2.83.2 Oxalate CoA-transferase. 2.78.13 Phosp ho-N-acetylmuramoyl-pentapeptide 2.83.3 Malonate CoA-transferase. transferase. 2.83.5 3-oxoacid CoA-transferase. 2.78.14 CDP-ribitol ribitolphosphotransferase. 2.83.6 3-oxoadipate CoA-transferase. 2.78.15 UDP-N-acetylglucosamine-- olichyl 2.8.3.7 Succinate-citramalate CoA-transferase. phospha e N-acetylglucosaminephosphotransferase. 2.83.8 Acetate CoA-transferase. 2.78.17 UDP-N-acetylglucosamine-lysosomal 2.83.9 Butyrate--acetoacetate CoA-transferase. enzyme N-acetylglucosamine phosphotransferase. 2.83. Citrate CoA-transferase. 2.78.18 UDP-ga actose-UDP-N-acetylglucosamine 2.83. Citramalate CoA-transferase. galactose phosphotransferase. 2.83. Glutaconate CoA-transferase. 2.7.8.19 UDP-glucose--glycoproteing COSC 2.83. Succinate--hydroxymethylglutarate CoA phosphotransferase. transferase. 2.78.20 Phospha idylglycerol-membrane 2.83. 5-hydroxypentanoate CoA-transferase. oligosaccharide glycerophosp hotransferase. 2.83. s Succinyl-CoA:(R)-benzylsuccinate CoA 2.78.21 embrane-oligosaccharide transferase. glycerophosphotransferase. 2.83.16 Formyl-CoA transferase. 2.78.22 -alkeny -2-acylglycerol choline 2.8.3.17 Cinnamoyl-CoA:phenylactate CoA phosphotransferase. transferase. 2.78.23 Carboxyvinyl-carboxyphosphOnate 2.84. Coenzyme-B sulfoethylthiotransferase. p hosphorylmutase. 2.9.1. L-seryl-tRNA(Sec) selenium transferase. 2.78.24 hosphatidylcholine synthase ENZYME: 3. . . . 2.78.25 Triphosphori bosyl-dephospho-CoA synthase. Carboxylesterase. 2.78.26 Adenosylcobinamide-GDP Arylesterase. ribazoletransferase. Triacylglycerol lipase. 27.91 Pyruvate, phosphate dikinase. Phospholipase A(2). 2.79.2 Pyruvate, water dikinase. Lysophospholipase. 2.79.3 Selenide, water dikinase. Acetylesterase. 2.79.4 A. pha-glucan, water dikinase. . 2.8.1.1 Thiosulfate sulfur-transferase. . 2.8.1.2 3-mercaptopyruvate Sulfur-transferase. Tropinesterase. 2.8.1.3 Thiosulfate--thiol sulfur-trans ferase. . 2.8.1.4 tRNA sulfur-transferase. 1.13 Sterol esterase. 2.8.1.5 Thiosulfate--dithio Sulfur-transferase. .1.14 Chlorophyllase. 2.8.1.6 otin synthase. 1.15 L-arabinonolactonase. 2.8.1.7 . 1.17 Gluconolactonase. 2.8.2.1 . 1.19 Uronolactonase. 2.8.2.2 A. cohol Sulfotransferase. 1.20 Tannase. 2.8.23 Amine Sulfotransferase. .1.21 Retinyl-palmitate esterase. 2.8.2.4 Estrone sulfotransferase. .1.22 Hydroxybutyrate-dimer hydrolase. 2.8.25 Chondroitin 4-sulfotransferase. 1.23 Acylglycerol lipase. 2.8.2.6 Choline sulfotransferase. .1.24 3-oxoadipate enol-lactonase. 2.8.2.7 U DP-N-acetylgalactosamine-4-sulfate 1.25 1.4-lactonase. Sl lfotransferase. .1.26 Galactolipase. 2.8.2.8 eparan Sulfate-glucosamine N 1.27 4-pyridoxolactonase. lfotransferase. 1.28 Acylcarnitine hydrolase. 2.8.2.9 Tyrosine-ester sulfotransferase. 1.29 Aminoacyl-tRNA hydrolase. 2.8.2.10 Renilla-luciferin sulfotransferase. 1.30 D-arabinonolactonase. 2.8.2.11 Galactosylceramide Sulfotransferase. 1.31 6-phosphogluconolactonase. 2.8.2.13 Psychosine Sulfotransferase. 32 Phospholipase A(1). US 2012/0266329 A1 Oct. 18, 2012 46

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 1.33 6-acetylglucose deacetylase. 3 .2.24 2-(2-hydroxyphenyl)benzenesulfinate .1.34 . hydrolase. 1.35 Dihydrocoumarin hydrolase. 2.25 Phenylacetyl-CoA hydrolase. 1.36 Limonin-D-ring-lactonase. .3.1 . 1.37 Steroid-lactonase. 3.2 . 1.38 Triacetate-lactonase. 3.3 Phosphoserine phosphatase. 1.39 Actinomycin lactonase. .3.4 Phosphatidate phosphatase. 1.40 Orsellinate-depside hydrolase. 3.5 5'-. .1.41 Cephalosporin-C deacetylase. 3.6 3'-nucleotidase. .1.42 Chlorogenate hydrolase. 3.7 3'(2),5'-bisphosphate nucleotidase. 1.43 Alpha-amino-acid esterase. 3.8 3-phytase. .1.44 4-methyloxaloacetate esterase. 3.9 Glucose-6-phosphatase. 1.45 Carboxymethylenebutenolidase. 3.10 Glucose-1-phosphatase. .1.46 Deoxylimonate A-ring-lactonase. 3.11 Fructose-bisphosphatase. 1.47 1-alkyl-2-acetylglycerophosphocholine 3.12 Trehalose-phosphatase. esterase. 3.13 Bisphosphoglycerate phosphatase. .1.48 Fusarinine-Cornithinesterase. 3.14 Methylphosphothioglycerate phosphatase. 1.49 Sinapine esterase. 3.15 Histidinol-phosphatase. 1...SO Wax-ester hydrolase. 3.16 Phosphoprotein phosphatase. 1.51 Phorbol-diester hydrolase. 3.17 Phosphorylase phosphatase. 1.52 Phosphatidylinositol deacylase. 3.18 Phosphoglycolate phosphatase. 1.53 Sialate O-acetylesterase. 3.19 Glycerol-2-phosphatase. 1.54 Acetoxybutynylbithiophene deacetylase. 3.2O Phosphoglycerate phosphatase. 1.55 Acetylsalicylate deacetylase. 3.21 Glycerol-1-phosphatase. 1.56 Methylumbelliferyl-acetate deacetylase. 3.22 Mannitol-1-phosphatase. 1.57 2-pyrone-4,6-dicarboxylate lactonase. 3.23 Sugar-phosphatase. N-acetylgalactosaminoglycan 3.24 Sucrose-phosphatase. deacetylase. 3.25 Inositol-1 (or 4)-monophosphatase. 1.59 Juvenile-hormone esterase. 3.26 4-phytase. 1.60 Bis(2-ethylhexyl)phthalate esterase. 3.27 Phosphatidylglycerophosphatase. 1.61 Protein-glutamate methylesterase. 3.28 ADP-phosphoglycerate phosphatase. 1.63 11-cis-retinyl-palmitate hydrolase. 3.29 N-acylneuraminate-9-phosphatase. 1.64 All-trans-retinyl-palmitate hydrolase. 3.31 Nucleotidase. 1.65 L-rhamnono-1,4-lactonase. 3.32 Polynucleotide 3'-phosphatase. 66 5-(3,4-diacetoxybut-1-ynyl)-2,2'- 3.33 Polynucleotide 5'-phosphatase. bithiophene deacetylase. 3.34 Deoxynucleotide 3'-phosphatase. 1.67 Fatty-acyl-ethyl-ester synthase. 3.35 Thymidylate 5'-phosphatase. 1.68 Xylono-1,4-lactonase. 3.36 Phosphoinositide 5-phosphatase. 1.70 Cetraxate benzylesterase. 3.37 Sedoheptulose-bisphosphatase. 1.71 Acetylalkylglycerol acetylhydrolase. 3.38 3-phosphoglycerate phosphatase. 1.72 Acetylxylan esterase. 3.39 Streptomycin-6-phosphatase. 1.73 Feruloyl esterase. 3.40 Guanidinodeoxy-Scyllo-inositol-4- 1.74 . phosphatase. 1.75 Poly(3-hydroxybutyrate) depolymerase. 3.41 4-nitrophenylphosphatase. .76 Poly(3-hydroxyoctanoate) 3.42 Glycogen-synthase-D] phosphatase. depolymerase. i 3.43 Pyruvate dehydrogenase (lipoamide)- 1.77 Acyloxyacylhydrolase. phosphatase. 1.78 Polyneuridine-aldehyde esterase. 3.44 Acetyl-CoA carboxylase 1.79 Hormone-sensitive lipase. hosphatase. .2.1 Acetyl-CoA hydrolase. 3.45 p-deoxy-manno-octulosonate-8- .2.2 Palmitoyl-CoA hydrolase. hosphatase. 2.3 Succinyl-CoA hydrolase. 3.46 ructose-2,6-bisphosphate 2 .2.4 3-hydroxyisobutyryl-CoA hydrolase. phosphatase. 2.5 Hydroxymethylglutaryl-CoA hydrolase. 3.47 Hydroxymethylglutaryl-CoA .2.6 Hydroxyacylglutathione hydrolase. reductase (NADPH)-phosphatase. 2.7 Glutathione thiolesterase. 3.48 Protein-tyrosine-phosphatase. 2.10 Formyl-CoA hydrolase. 3.49 Pyruvate kinase-phosphatase. .2.11 Acetoacetyl-CoA hydrolase. 3.SO Sorbitol-6-phosphatase. .2.12 S-formylglutathione hydrolase. 3.51 Dolichyl-phosphatase. 2.13 S-Succinylglutathione hydrolase. 3.52 3-methyl-2-oxobutanoate .2.14 Oleoyl-acyl-carrier-protein hydrolase. dehydrogenase (lipoamide)-phosphatase. 2.15 Ubiquitin thiolesterase. 3.53 Myosin light-chain-phosphatase. .2.16 Citrate-(pro-3S)-lyase thiolesterase. 3.54 Fructose-2,6-bisphosphate 6 2.17 (S)-methylmalonyl-CoA hydrolase. phosphatase. 2.18 ADP-dependent short-chain-acyl-CoA 3.55 Caldesmon-phosphatase. hydrolase. 3.56 nositol-polyphosphate 5-phosphatase. 2.19 ADP-dependent medium-chain-acyl-CoA 3.57 nositol-1,4-bisphosphate 1 hydrolase. phosphatase. 2.2O Acyl-CoA hydrolase. 3.58 Sugar-terminal-phosphatase. .2.21 Dodecanoyl-acyl-carrier protein 3.59 Alkylacetylglycerophosphatase. hydrolase. 3.60 Phosphoenolpyruvate phosphatase. .2.22 Palmitoyl-protein hydrolase. 3.62 Multiple inositol-polyphosphate 2.23 4-hydroxybenzoyl-CoA . phosphatase. US 2012/0266329 A1 Oct. 18, 2012 47

TABLE 2-continued TABLE 2-continued EC Numbers wi h the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.63 2-carboxy-D-arabinitol-1-phosphatase. 3 7.2 Guanosine-3',5'-bis(diphosphate) 3'- 3 3.64 Phosphatidylinositol-3-phosphatase. diphosphatase. 3.66 Phosphatidylinositol-3,4-bisphosphate 7.3 Monoterpenyl-diphosphatase. 4-phosphatase. .8.1 Aryldialkylphosphatase. 3.67 Phosphatidylinositol-3,4,5- 8.2 Diisopropyl-fluorophosphatase. Erisphosphate 3-phosphatase. .11.1 I. 3.68 2-deoxyglucose-6-phosphatase. .11.2 Exodeoxyribonuclease III. 3.69 Glucosylglycerol 3-phosphatase. 11.3 Exodeoxyribonuclease (lambda 3.70 Mannosyl-3-phosphoglycerate induced). OS88Se. 3 .11.4 Exodeoxyribonuclease (phage Sp3 3.71 2-phosphosulfolactate phosphatase. induced). 3.72 11.5 Exodeoxyribonuclease V. 3.73 Alpha-ribazole phosphatase. .11.6 Exodeoxyribonuclease VII. .4.1 Phosphodiesterase 13.1 II. .4.2 Glycerophosphocholine phosphodiesterase. 13.2 Exoribonuclease H. 4.3 Phospholipase C. 13.3 Oligonucleotidase. .4.4 . .13.4 Poly(A)-specific . .4.11 Phosphoinositide phospholipase C. .14.1 Yeast ribonuclease. .4.12 Sphingomyelin phosphodiesterase. 15.1 Venom exonuclease. 4.13 Serine-ethanolaminephosphate .16.1 Spleen exonuclease. phosphodiesterase. .21.1 I. 4.14 Acyl-carrier-protein phosphodiesterase. .21.2 Deoxyribonuclease IV (phage-T(4)- s 4.15 Adenylyl-glutamate-ammonia ligase induced). 21.3 Type I site-specific deoxyribonuclease. 4.16 2',3'-cyclic-nucleotide 2'- .21.4 Type II site-specific deoxyribonuclease. phosphodiesterase. 21.5 Type III site-specific deoxyribonuclease. 4.17 3',5'-cyclic-nucleotide phosphodiesterase. .21.6 CC-preferring . 4.35 3',5'-cyclic-GMP phosphodiesterase. 21.7 Deoxyribonuclease V. 4.37 2',3'-cyclic-nucleotide 3'- .22.1 Deoxyribonuclease II. phosphodiesterase. .22.2 Aspergillus deoxyribonuclease K(1). 4.38 Glycerophosphocholine .22.4 Crossover junction . cholinephosphodiesterase. 22.5 Deoxyribonuclease X. 4.39 Alkylglycerophosphoethanolamine 25.1 Deoxyribonuclease (pyrimidine dimer). phosphodiesterase. .26.1 Physarum polycephalum ribonuclease. 4.40 CMP-N-acylneuraminate 26.2 ibonuclease alpha. phosphodiesterase. 26.3 bonuclease III. 4.41 Sphingomyelin phosphodiesterase D. .26.4 bonuclease H. 4.42 ycerol-1,2-cyclic-phosphate 2 26.5 bonuclease P. osphodiesterase. 26.6 bonuclease IV. 4.43 ycerophosphoinositol 26.7 bonuclease P4. nositolphosphodiesterase. 26.8 bonuclease M5. .4.44 ycerophosphoinositol 26.9 bonuclease (poly-(U)-specific). ycerophosphodiesterase. 26.10 bonuclease DX. 4.45 -acetylglucosamine-1-phosphodiester .26.11 bonuclease Z. pha-N-acetylglucosaminidase. 27.1 bonuclease T(2). 4.46 ycerophosphodiester phosphodiesterase. 27.2 cillus Subtilis ribonuclease. 4.48 olichylphosphate-glucose 27.3 bonuclease T(1). phosphodiesterase. 27.4 bonuclease U(2). 4.49 Dolichylphosphate-mannose 27.5 . phosphodiesterase. 27.6 Enterobacter ribonuclease. 4.50 Glycosylphosphatidylinositol 27.7 Ribonuclease F. phospholipase D. 27.8 Ribonuclease V. 3 4.51 Glucose-1-phospho-D- 27.9 RNA-intron endonuclease. mannosylglycoprotein phosphodiesterase. 27.10 rRNA endonuclease. 5.1 dGTPase. 30.1 Aspergillus nuclease S(1). .6.1 . 30.2 Serratia marcescens nuclease. .6.2 Steryl-. 31.1 . 6.3 Glycosulfatase. Alpha-amylase. .6.4 N-acetylgalactosamine-6-sulfatase. Beta-amylase. .6.6 Choline-sulfatase. Glucan 1,4-alpha-glucosidase. 6.7 Cellulose-polysulfatase. . 6.8 Cerebroside-sulfatase. Endo-1,3(4)-beta-glucanase. 6.9 Chondro-4-sulfatase. Inulinase. 6.10 Chondro-6-sulfatase. Endo-1,4-beta-xylanase. .6.11 Disulfoglucosamine-6-sulfatase. 1O Oligo-1,6-glucosidase. .6.12 N-acetylgalactosamine-4-Sulfatase. Dextranase. 6.13 Iduronate-2-sulfatase. .14 . .6.14 N-acetylglucosamine-6-sulfatase. 1S Polygalacturonase. 6.15 N-sulfoglucosamine-3-sulfatase. 17 . 6.16 Monomethyl-sulfatase. 18 Exo-alpha-Sialidase. 6.17 D-lactate-2-sulfatase. 2O Alpha-glucosidase. 6.18 Glucuronate-2-sulfatase. .21 Beta-glucosidase. .7.1 Prenyl-diphosphatase. 22 Alpha-galactosidase. US 2012/0266329 A1 Oct. 18, 2012 48

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.2. 23 Beta-galactosidase. 3.2.1.97 Glycopeptide alpha-N- 3.2. .24 Alpha-mannosidase. acetylgalactosaminidase. 3.2. 25 Beta-mannosidase. 3.21.98 Glucan 1,4-alpha-maltohexaosidase. 3.2. 26 Beta-fructofuranosidase. 3.21.99 Arabinan endo-1,5-alpha-L-arabinosidase. 3.2. 28 Alpha,alpha-. 3.2.1.1OO Mannan 1,4-mannobiosidase. 3.2. 31 Beta-glucuronidase. 3.2.1.101 Mannan endo-1,6-alpha-mannosidase. 3.2. 32 Xylan endo-1,3-beta-xylosidase. 3.2.1.102 Blood-group-substance endo-1,4-beta 3.2. .33 Amylo-alpha-1,6-glucosidase. galactosidase. 3.2. 35 Hyaluronoglucosaminidase. 3.2.1.103 Keratan-sulfate endo-1,4-beta 3.2. 36 Hyaluronoglucuronidase. galactosidase. 3.2. 37 Xylan 1,4-beta-xylosidase. 3.2.1.104 Steryl-beta-glucosidase. 3.2. 38 Beta-D-fucosidase. 3.2.1.105 Strictosidine beta-glucosidase. 3.2. 39 Glucan endo-1,3-beta-D-glucosidase. 3.2.1.106 Mannosyl-oligosaccharide glucosidase. 3.2. 40 Alpha-L-rhamnosidase. 3.2.1.107 Protein-glucosylgalactosylhydroxylysine 3.2. 41 Pullulanase. glucosidase. 3.2. 42 GDP-glucosidase. 3.2.1.108 . 3.2. 43 Beta-L-rhamnosidase. 3.2.1.109 Endogalactosaminidase. 3.2. .44 Fucoidanase. 3.2.1.110 Mucinaminylserine mucinaminidase. 3.2. 45 . 3.2.1.111 1,3-alpha-L-fucosidase. 3.2. 46 . 3.2.1.112 2-deoxyglucosidase. 3.2. 47 Galactosylgalactosylglucosylceramidase. 3.2.1.113 Mannosyl-oligosaccharide 1,2-alpha 3.2. 48 Sucrose alpha-glucosidase. mannosidase. 3.2. 49 Alpha-N-acetylgalactosaminidase. 3.2.1.114 Mannosyl-oligosaccharide 1,3-1,6-alpha 3.2. SO Alpha-N-acetylglucosaminidase. mannosidase. 3.2. S1 Alpha-L-fucosidase. 3.2.1.115 Branched-dextran exo-1,2-alpha 3.2. 52 Beta-N-acetylhexosaminidase. glucosidase. 3.2. 53 Beta-N-acetylgalactosaminidase. 3.21116 Glucan 1,4-alpha-maltotriohydrolase. 3.2. 54 Cyclomaltodextrinase. 3.2.1.117 Amygdalin beta-glucosidase. 3.2. 55 Alpha-N-arabinofuranosidase. 3.2.1.118 Prunasin beta-glucosidase. 3.2. S6 Glucuronosyl-disulfoglucosamine 3.2.1.119 Vicianin beta-glucosidase. glucuronidase. 3.2.1.12O Oligoxyloglucan beta-glycosidase. 3.2. 57 Isopullulanase. 3.2.1.121 Polymannuronate hydrolase. 3.2. S8 lucan 1,3-beta-glucosidase. 3.2.1122 Maltose-6-phosphate glucosidase. 3.2. 59 Glucan endo-1,3-alpha-glucosidase. 3.2.1.123 Endoglycosylceramidase. 3.2. 60 Glucan 1,4-alpha 3.2.1.124 3-deoxy-2-octulosonidase. maltotetraohydrolase. 3.2.1.125 Raucaffricine beta-glucosidase. 3.2. 61 Mycodextranase. 3.21126 Coniferin beta-glucosidase. 3.2. 62 Glycosylceramidase. 3.2.1.127 ,6-alpha-L-fucosidase. 3.2. 63 2-alpha-L-fucosidase. 3.2.1.128 Glycyrrhizinate beta-glucuronidase. 3.2. .64 2,6-beta-fructan 6-levanbiohydrolase. 3.2.1.129 Endo-alpha-Sialidase. 3.2. .65 (W88Se. 3.21.130 Glycoprotein endo-alpha-1,2- 3.2. 66 Quercitrinase. mannosidase. 3.2. .67 Galacturan 1,4-alpha-galacturonidase. 3.21.131 Xylan alpha-1,2-glucuronosidase. 3.2. 68 Soamylase. 3.21.132 Chitosanase. 3.2. 70 Glucan 1,6-alpha-glucosidase. 3.21.133 Glucan 1,4-alpha-maltohydrolase. 3.2. .71 Glucan endo-1,2-beta-glucosidase. 3.21.134 Difructose-anhydride synthase. 3.2. 72 Xylan 1,3-beta-xylosidase. 3.2.1.135 Neopululanase. 3.2. 73 Licheninase. 3.21.136 Glucuronoarabinoxylan endo-1,4- 3.2. .74 Glucan 1,4-beta-glucosidase. beta-xylanase. 3.2. .75 Glucan endo-1,6-beta-glucosidase. 3.2.1.1.37 Mannan exo-1,2-1,6-alpha 3.2. .76 L-. mannosidase. 3.2. 77 Mannan 1,2-(1,3)-alpha-mannosidase. 3.21.139 Alpha-glucuronidase. 3.2. .78 Mannan endo-1,4-beta-mannosidase. 3.2.1.140 Lacto-N-biosidase. 3.2. 80 Fructan beta-fructosidase. 3.2.1.141 4-alpha-D-(1->4)-alpha-D- 3.2. 81 Agarase. glucano trehalose trehalohydrolase. 3.2. 82 Exo-poly-alpha-galacturonosidase. 3.2.1.142 Limit dextrinase. 3.2. .83 Kappa-carrageenase. 3.2.1.143 Poly(ADP-ribose) glycohydrolase. 3.2. 84 Glucan 1,3-alpha-glucosidase. 3.2.1.144 3-deoxyoctulosonase. 3.2. .85 6-phospho-beta-galactosidase. 3.2.1.145 Galactan 1,3-beta-galactosidase. 3.2. 86 6-phospho-beta-glucosidase. 3.2.1.146 Beta-galactofuranosidase. 3.2. 87 Capsular-polysaccharide endo-1,3-alpha 3.2.1.147 Thioglucosidase. galactosidase. 3.2.1.148 Ribosylhomocysteinase. 3.2. 88 Beta-L-arabinosidase. 3.2.1.149 Beta-primeverosidase. 3.2.1.150 Oligoxyloglucan reducing-end 3.2. 89 Arabinogalactan endo-1,4-beta specific cellobiohydrolase. galactosidase. 3.2.1.151 Xyloglucan-specific endo-beta-1,4- 3.2. 91 Cellulose 1,4-beta-cellobiosidase. glucanase. 3.2. 92 Peptidoglycan beta-N-acetylmuramidase. 3.2.2. Purine nucleosidase. 3.2. .93 Alpha,alpha-phosphotrehalase. 3.2.2.2 nosine nucleosidase. 3.2. .94 Glucan 1,6-alpha-isomaltosidase. 3.22.3 Uridine nucleosidase. 3.2. 95 Dextran 1,6-alpha-isomaltotriosidase. 3.2.2.4 AMP nucleosidase. 3.2. .96 Mannosyl-glycoprotein endo-beta-N- 3.22.5 NAD(+) nucleosidase. acetylglucosaminidase. 3.22.6 NAD(P)(+) nucleosidase. US 2012/0266329 A1 Oct. 18, 2012 49

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.22.7 Adenosine nucleosidase. 3.4. 6.4 Serine-type D-Ala-D-Ala 3.22.8 Ribosylpyrimidine nucleosidase. carboxypeptidase. 3.22.9 Adenosylhomocysteine nucleosidase. 3.4. Carboxypeptidase C. 3.22.10 Pyrimidine-5'-nucleotide 3.4. Carboxypeptidase D. nucleosidase. 3.4. Carboxypeptidase A. 3.22.11 Beta-aspartyl-N-acetylglucosaminidase. 3.4. Carboxypeptidase B. 3.2.2.12 nosinate nucleosidase. 3.4. Lysine carboxypeptidase. 3.22.13 -methyladenosine nucleosidase. 3.4. Gly-X carboxypeptidase. 3.2.2.14 NMN nucleosidase. 3.4. Alanine carboxypeptidase. 3.22.15 DNA-deoxyinosine glycosylase. 3.4. Muramoylpentapeptide 3.22.16 Methylthioadenosine nucleosidase. carboxypeptidase. 3.2.2.17 Deoxyribodipyrimidine endonucleosidase. 3.4. Carboxypeptidase E. 3.22.19 Protein ADP-ribosylarginine hydrolase. 3.4. Glutamate carboxypeptidase. 3.2220 DNA-3-methyladenine glycosylase I. 3.4. Carboxypeptidase M. 3.22.21 DNA-3-methyladenine glycosylase II. 3.4. : Muramoyltetrapeptide 3.22.22 rRNA N-glycosylase. carboxypeptidase. 3.22.23 DNA-formamidopyrimidine glycosylase. 3.4. Zinc D-Ala-D-Ala carboxypeptidase. 3.2.2.24 ADP-ribosyl-dinitrogen reductase 3.4. Carboxypeptidase A2. hydrolase. 3.4. Membrane Pro-X carboxypeptidase. 3.3.1.1 Adenosylhomocysteinase. 3.4. Tubulinyl-Tyr carboxypeptidase. 3.31.2 Adenosylmethionine hydrolase. 3.4. Carboxypeptidase T. 3.3.2.1 Sochorismatase. 3.4. Carboxypeptidase Taq. 3.3.2.2 Alkenylglycerophosphocholine hydrolase. 3.4. Carboxypeptidase U. 3.32.3 Epoxide hydrolase. 3.4. Glutamate carboxypeptidase II. 3.32.4 Trans-epoxysuccinate hydrolase. 3.4. Metallocarboxypeptidase D. 3.3.2.5 Alkenylglycerophosphoethanolamine 3.4. X. hydrolase. 3.4. Acylaminoacyl-peptidase. 3.32.6 Leukotriene-A(4) hydrolase. 3.4. Peptidyl-glycinamidase. 3.32.7 Hepoxilin-epoxide hydrolase. 3.4. Pyroglutamyl-peptidase I. 3.32.8 Limonene-1,2-epoxide hydrolase. 3.4. Beta-aspartyl-peptidase. 3.4. Leucyl aminopeptidase. 3.4. Pyroglutamyl-peptidase II. 3.4. Membrane alanyl aminopeptidase. 3.4. N-formylmethionyl-peptidase. 3.4. Cystinyl aminopeptidase. 3.4. Gamma-glutamyl hydrolase. 3.4. Tripeptide aminopeptidase. 3.4. 1 Gamma-D-glutamyl-meso 3.4. Prolylaminopeptidase. diaminopimelate peptidase. 3.4. Aminopeptidase B. 3.4. 9. 2 Ubiquitinyl hydrolase 1. 3.4. Glutamylaminopeptidase. 3.4.2 Chymotrypsin. 3.4. Xaa-Pro aminopeptidase. 3.4.2 Chymotrypsin C. 3.4. 1O Bacterial leucyl aminopeptidase. 3.4.2 Metridin. 3.4. 13 Clostridial aminopeptidase. 3.4.2 Trypsin. 3.4. .14 Cytosol alanyl aminopeptidase. 3.4.2 Thrombin. 3.4. 1S Aminopeptidase Y. 3.4.2 Coagulation factor Xa. 3.4. 16 Xaa-Trp aminopeptidase. 3.4.2 Plasmin. 3.4. 17 Tryptophanyl aminopeptidase. 3.4.2 Enteropeptidase. 3.4. 18 Methionyl aminopeptidase. 3.4.21. Acrosin. 3.4. 19 D-stereospecific aminopeptidase. 3.4.21. Alpha-lytic . 3.4. 2O Aminopeptidase Ey. 3.4.21. Glutamyl endopeptidase. 3.4. .21 Aspartyl aminopeptidase. 3.4.21. Cathepsin G. 3.4. 22 Aminopeptidase I. 3.4.21. Coagulation factor VIIa. 3.4. 23 PepBaminopeptidase. 3.4.21. Coagulation factor IXa. 3.4. 3.3 Xaa-His dipeptidase. 3.4.21. Cucumisin. 3.4. 3.4 Xaa-Arg dipeptidase. 3.4.21. Prolyl oligopeptidase. 3.4. 3.5 Xaa-methyl-His dipeptidase. 3.4.21. Coagulation factor XIa. 3.4. 3.7 Glu-Glu dipeptidase. 3.4.21. Brachyurin. 3.4. 3.9 Xaa-Pro dipeptidase. 3.4.21. Plasma kallikrein. 3.4. 3.12 Met-Xaa dipeptidase. 3.4.21. Tissue kallikrein. 3.4. 3.17 Non-stereospecific dipeptidase. 3.4.21. Pancreatic elastase. 3.4. 3.18 Cytosol nonspecific dipeptidase. 3.4.21. Leukocyte elastase. 3.4. 3.19 Membrane dipeptidase. 3.4.21. Coagulation factor XIIa. 3.4. 3.20 Beta-Ala-His dipeptidase. 3.4.21. Chymase. 3.4. 3.21 Dipeptidase E. 3.4.21. Complement Subcomponent C1r. 3.4. 4.1 Dipeptidyl-peptidase I. 3.4.21. Complement Subcomponent C1s. 3.4. 4.2 Dipeptidyl-peptidase II. 3.4.21. Classical-complement-pathway C3/C5 3.4. 4.4 Dipeptidyl-peptidase III. convertase. 3.4. 4.5 Dipeptidyl-peptidase IV. 3.4.2 45 Complement factor I. 3.4. 4.6 Dipeptidyl-dipeptidase. 3.4.2 46 Complement factor D. 3.4. 4.9 Tripeptidyl-peptidase I. 3.4.2 47 Alternative-complement-pathway C3/C5 3.4. 4.10 Tripeptidyl-peptidase II. convertase. 3.4. 4.11 Xaa-Pro dipeptidyl-peptidase. 3.4.2 48 Cerevisin. 3.4. S.1 Peptidyl-dipeptidase A. 3.4.2 49 Hypodermin C. 3.4. 5.4 Peptidyl-dipeptidase B. 3.4.2 SO Lysyl endopeptidase. 3.4. 5.5 Peptidyl-dipeptidase Dcp. 3.4.2 53 Endopeptidase La. 3.4. 6.2 Lysosomal Pro-X carboxypeptidase. 3.4.2 54 Gamma-. US 2012/0266329 A1 Oct. 18, 2012 50

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.4.2 55 Venombin AB. 3.4.22.40 . 3.4.2 57 Leucyl endopeptidase. 3.4.22.41 . 3.4.2 59 Tryptase. 3.4.22.42 . 3.4.2 60 Scutelarin. 3.4.22.43 . 3.4.2 61 Kexin. 3.4.22.44 Nuclear-inclusion-a endopeptidase. 3.4.2 62 Subtilisin. 3.4.22.45 Helper-component proteinase. 3.4.2 63 Oryzin. 3.4.22.46 L-peptidase. 3.4.2 .64 Endopeptidase K. 3.4.22.47 . 3.4.2 .65 Thermomycolin. 3.4.22.48 . 3.4.2 66 Thermitase. 3.4.22.49 . 3.4.2 .67 Endopeptidase So. 3.4.22.50 V-cath endopeptidase. 3.4.2 68 T-plasminogen activator. 3.4.22.51 . 3.4.2 69 Protein C (activated). 3.4.22.52 -1. 3.4.2 70 Pancreatic endopeptidase E. 3.4.22.53 Calpain-2. 3.4.2 .71 Pancreatic elastase II. 3.4.23.1 A. 3.4.2 72 gA-specific serine endopeptidase. 3.4.23.2 Pepsin B. 3.4.2 73 U-plasminogen activator. 3.4.23.3 Gastricsin. 3.4.2 .74 Venombin A. 3.4.23.4 . 3.4.2 .75 Furin. 3.4.23.5 . 3.4.2 .76 Myeloblastin. 3.4.23.12 . 3.4.2 77 Semenogelase. 3.4.23.15 Renin. 3.4.2 .78 Granzyme A. 3.4.23.16 HIV-1 retropepsin. 3.4.2 .79 Granzyme B. 3.4.23.17 Pro-opiomelanocortin converting enzyme. 3.4.2 8O Streptogrisin A. 3.4.23.18 Aspergillopepsin I. 3.4.2 81 Streptogrisin B. 3.4.23.19 Aspergillopepsin II. 3.4.2 82 Glutamyl endopeptidase II. 3.4.23.2O Penicillopepsin. 3.4.2 .83 Oligopeptidase B. 3.4.23.21 Rhizopuspepsin. 3.4.2 84 Limulus clotting factor C. 3.4.23.22 Endothiapepsin. 3.4.2 .85 Limulus clotting factor B. 3.4.23.23 Mucorpepsin. 3.4.2 86 Limulus clotting enzyme. 3.4.23.24 Candidapepsin. 3.4.2 87 Omptin. 3.4.23.25 Saccharopepsin. 3.4.2 88 Repressor lexA. 3.4.23.26 Rhodotorulapepsin. 3.4.2 89 Signal peptidase I. 3.4.23.28 Acrocylindropepsin. 3.4.2 90 Togavirin. 3.4.23.29 Polyporopepsin. 3.4.2 91 Flavivirin. 3.4.23.30 Pycnoporopepsin. 3.4.2 92 Endopeptidase Clp. 3.4.23.31 Scytalidopepsin A. 3.4.2 .93 Proprotein convertase 1. 3.4.23.32 Scytalidopepsin B. 3.4.2 .94 Proprotein convertase 2. 3.4.23.34 . 3.4.2 95 Snake venom factor V activator. 3.4.23.35 Barrierpepsin. 3.4.2 .96 Lactocepin. 3.4.23.36 Signal peptidase II. 3.4.2 97 Assemblin. 3.4.23.38 I. 3.4.2 .98 Hepacivirin. 3.4.23.39 Plasmepsin II. 3.4.2 .99 Spermosin. 3.4.23.40 Phytepsin. 3.4.2 1OO Pseudomonalisin. 3.4.23.41 Yapsin 1. 3.4.2 101 Xanthomonalisin. 3.4.23.42 Thermopsin. 3.4.2 102 C-terminal processing peptidase. 3.4.23.43 Prepilin peptidase. 3.4.2 103 Physarolisin. 3.4.23.44 Nodavirus endopeptidase. 3.4.22.1 . 3.4.23.45 Memapsin 1. 3.4.22.2 . 3.4.23.46 Memapsin 2. 3.4.22.3 . 3.4.23.47 HIV-2 retropepsin. 3.4.22.6 . 3.4.23.48 Plasminogen activator Pla. 3.4.22.7 . 3.4.24.1 . 3.4.22.8 . 3.424.3 Microbial . 3.4.22.10 . 3.4.24.6 Leucolysin. 3.4.22.14 . 3.424.7 interstitial collagenase. 3.4.22.15 . 3.4.24.11 . 3.4.22.16 . 3.4.24.12 Envelysin. 3.4.22.24 Cathepsin T. 3.424.13 gA-specific . 3.4.22.25 . 3.4.24.14 Procollagen N-endopeptidase. 3.4.22.26 . 3.424.15 Thimet oligopeptidase. 3.4.2227 . 3.4.24.16 Neurolysin. 3.4.22.28 . 3.424.17 Stromelysin 1. 3.4.22.29 Picornain 2A. 3.424.18 Meprin A. 3.4.22.30 . 3.424.19 Procollagen C-endopeptidase. 3.4.22.31 . 3.424.2O Peptidyl-Lys metalloendopeptidase. 3.4.22.32 Stem . 3.4.24.21 Astacin. 3.4.22.33 . 3.4.24.22 Stromelysin 2. 3.4.22.34 Legumain. 3.424.23 Matrilysin. 3.4.22.35 Histolysain. 3.4.24.24 A. 3.4.22.36 -1. 3.424.25 Vibriolysin. 3.4.22.37 . 3.4.24.26 Pseudolysin. 3.4.22.38 . 3.424.27 . 3.4.22.39 . 3.424.28 Bacillolysin. US 2012/0266329 A1 Oct. 18, 2012 51

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.424.29 Aureolysin. 3.S. 13 Aryl-acylamidase. 3.424.30 Coccolysin. 3.S. .14 Aminoacylase. 3.4.2431 Mycolysin. 3.S. 1S . 3.424.32 Beta-lytic metalloendopeptidase. 3.S. 16 Acetylornithine deacetylase. 3.424.33 Peptidyl-Asp metalloendopeptidase. 3.S. 17 Acyl-lysine deacylase. 3.4.24.34 Neutrophil collagenase. 3.S. 18 Succinyl-diaminopimelate desuccinylase. 3.424.35 Gelatinase B. 3.S. 19 Nicotinamidase. 3.424.36 Leishmanolysin. 3.S. 2O Citrullinase. 3.424.37 Saccharolysin. 3.S. .21 N-acetyl-beta-alanine deacetylase. 3.424.38 Gametolysin. 3.S. 22 Pantothenase. 3.424.39 Deuterolysin. 3.S. .23 . 3.4.24.40 Serralysin. 3.S. .24 Choloylglycine hydrolase. 3.4.24.41 Atrolysin B. 3.S. 25 N-acetylglucosamine-6-phosphate 3.4.24.42 Atrolysin C. deacetylase. 3.4.24.43 AtroXase. 3.S. 26 N(4)-(beta-N-acetylglucosaminyl)-L- 3.4.24.44 Atrolysin E. . 3.4.24.45 Atrolysin F. 3.S. 27 N-formylmethionylaminoacyl-tRNA 3.4.24.46 Adamalysin. deformylase. 3.4.24.47 Horrilysin. 3.S. 28 N-acetylmuramoyl-L-alanine amidase. 3.4.24.48 Ruberlysin. 3.S. 29 2-(acetamidomethylene)Succinate 3.4.24.49 Bothropasin. hydrolase. 3.424.50 Bothrolysin. 3.S. 30 5-aminopentanamidase. 3.424.51 Ophiolysin. 3.S. 31 Formylmethionine deformylase. 3.424.52 Trimerelysin I. 3.S. 32 Hippurate hydrolase. 3.424.53 Trimerelysin II. 3.S. .33 N-acetylglucosamine deacetylase. 3.424.54 Mucrolysin. 3.S. 35 D-. 3.424.55 Pitrilysin. 3.S. 36 N-methyl-2-oxoglutaramate hydrolase. 3.4.2456 Insulysin. 3.S. 38 Glutamin-(asparagin-)ase. 3.424.57 O-Sialoglycoprotein endopeptidase. 3.S. 39 Alkylamidase. 3.424.58 Russellysin. 3.S. .40 Acylagmatine amidase. 3.424.59 Mitochondrial intermediate 3.S. .41 Chitin deacetylase. peptidase. 3.S. .42 Nicotinamide-nucleotide amidase. 3.424.60 Dactylysin. 3.S. 43 Peptidyl-glutaminase. 3.4.24.61 Nardilysin. 3.S. .44 Protein-glutamine glutaminase. 3.4.24.62 Magnolysin. 3.S. .46 6-aminohexanoate-dimer hydrolase. 3.424.63 Meprin B. 3.S. 47 N-acetyldiaminopimelate deacetylase. 3.4.24.64 Mitochondrial processing peptidase. 3.S. 48 Acetylspermidine deacetylase. 3.424.65 Macrophage elastase. 3.S. 49 Formamidase. 3.424.66 Choriolysin L. 3.S. SO Pentanamidase. 3.424.67 Choriolysin H. 3.S. S1 4-acetamidobutyryl-CoA deacetylase. 3.424.68 Tentoxilysin. 3.S. 52 Peptide-N(4)-(N-acetyl-beta 3.424.69 Bontoxilysin. glucosaminyl)asparagine amidase. 3.424.70 Oligopeptidase A. 3.S. 53 N-carbamoylputrescine amidase. 3.424.71 Endothelin-converting enzyme 1. 3.S. 54 Allophanate hydrolase. 3.424.72 Fibrolase. 3.S. 55 Long-chain-fatty-acyl-glutamate deacylase. 3.424.73 Jararhagin. 3.S. S6 N,N-dimethylformamidase. 3.424.74 Fragilysin. 3.S. 57 Tryptophanamidase. 3.424.75 . 3.S. S8 N-benzyloxycarbonylglycine hydrolase. 3.424.76 Flavastacin. 3.S. 59 N-carbamoylsarcosine amidase. 3.424.77 Snapalysin. 3.S. 60 N-(long-chain-acyl)ethanolamine deacylase. 3.424.78 GPR endopeptidase. 3.S. 61 Mimosinase. 3.424.79 Pappalysin-1. 3.S. 62 Acetylputrescine deacetylase. 3.424.8O Membrane-type matrix 3.S. 63 4-acetamidobutyrate deacetylase. metalloproteinase-1. 3.S. .64 N(alpha)-benzyloxycarbonyleucine 3.424.81 ADAM10 endopeptidase. hydrolase. 3.424.82 ADAMTS-4 endopeptidase. 3.S. Theanine hydrolase. 3.424.83 Anthrax lethal factor endopeptidase. 3.S. 2-(hydroxymethyl)-3- 3.4.24.84 Ste24 endopeptidase. (acetamidomethylene)Succinate hydrolase. 3.424.85 S2P endopeptidase. 3.S. .67 4-methyleneglutaminase. 3.424.86 ADAM17 endopeptidase. 3.S. 68 N-formylglutamate deformylase. 3.425.1 Proteasome endopeptidase complex. 3.S. 69 Glycosphingolipid deacylase. 3.5.1.1 Asparaginase. 3.S. 70 Aculeacin-A deacylase. 3.5.1.2 Glutaminase. 3.S. 71 N-feruloylglycine deacylase. 3.5.1.3 Omega-amidase. 3.S. 72 D-benzoylarginine-4-nitroanilide 3.5.1.4 Amidase. amidase. 3.5.1.5 . 3.S. 73 Carnitinamidase. 3.5.1.6 Beta-ureidopropionase. 3.S. .74 Chenodeoxycholoyltaurine hydrolase. 3.5.1.7 Ureidosuccinase. 3.S. 75 Urethanase. 3.5.1.8 Formylaspartate deformylase. 3.S. .76 Arylalkyl acylamidase. 3.5.1.9 Arylformamidase. 3.S. 77 N-carbamoyl-D-amino acid hydrolase. 3.5.1.10 Formyltetrahydrofolate deformylase. 3.S. .78 Glutathionylspermidine amidase. 3.5.1.11 Penicillin amidase. 3.S. .79 Phthalyl amidase. 3.5.1.12 . 3.S. 81 N-acyl-D-amino-acid deacylase. US 2012/0266329 A1 Oct. 18, 2012 52

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.5.1.82 N-acyl-D-glutamate deacylase. 35.4.22 1-pyrroline-4-hydroxy-2-carboxylate 3.5.1.83 N-acyl-D-aspartate deacylase. deaminase. 3.5.1.84 Biuret . 3.54.23 Blasticidin-S deaminase. 3.5.1.85 (S)-N-acetyl-1-phenylethylamine 3.54.24 Sepiapterin deaminase. hydrolase. 3.54.25 GTP cyclohydrolase II. 3.5.1.86 Mandelamide amidase. 35.426 Diaminohydroxyphosphoribosylaminopyrimidine 3.5.1.87 N-carbamoyl-L-amino-acid hydrolase. deaminase. 3.S.188 Peptide deformylase. 3.54.27 Methenyltetrahydromethanopterin 3.5.1.89 N cyclohydrolase. acetylglucosaminylphosphatidylinositol 3.54.28 S-adenosylhomocysteine deaminase. deacetylase. 3.54.29 GTP cyclohydrolase IIa. 3.5.1.90 Adenosylcobinamide hydrolase. 3.54.30 dCTP deaminase (dUMP-forming). 3.5.2.1 . 3.S.S.1 Nitrilase. 3.5.2.2 Dihydropyrimidinase. 3.S.S.2 Ricinine nitrilase. 3.5.23 . 3.S.S.4 Cyanoalanine nitrilase. 3.5.2.4 Carboxymethylhydantoinase. 3.5.5.5 Arylacetonitrilase. 3.5.2.5 Allantoinase. Bromoxynil nitrilase. 3.5.2.6 Beta-lactamase. 3.5.5.7 Aliphatic nitrilase. 3.5.2.7 midazolonepropionase. 3.S.S.8 Thiocyanate hydrolase. 3.5.2.9 5-oxoprolinase (ATP-hydrolyzing). 3.S.99.1 Riboflavinase. 3.5.2.10 . 3.S.99.2 . 3.5.2.11 L-lysine-lactamase. 3.S.99.3 Hydroxydechloroatrazine 3.5.2.12 6-aminohexanoate-cyclic-dimer ethylaminohydrolase. hydrolase. 3.5.99.4 N-isopropylammelide 3.5.2.13 2,5-dioxopiperazine hydrolase. isopropylaminohydrolase. 3.5.2.14 N-methylhydantoinase (ATP 3.S.99.5 2-aminomuconate deaminase. hydrolyzing). 3.S.99.6 Glucosamine-6-phosphate deaminase. 3.5.2.15 Cyanuric acid amidohydrolase. 3.S.99.7 1-aminocyclopropane-1-carboxylate 3.5.2.16 Maleimide hydrolase. deaminase. 3.5.2.17 Hydroxyisourate hydrolase. 3.6.1.1 Inorganic diphosphatase. 3.5.3.1 . 3.6.1.2 Trimetaphosphatase. 3.5.3.2 Guanidinoacetase. 3.6.13 Adenosinetiriphosphatase. 3.5.3.3 . 3.6.15 . 3.534 Allantoicase. 3.6.16 Nucleoside-diphosphatase. 3.5.3.5 Formimidoylaspartate deiminase. 3.6.17 Acylphosphatase. 3.5.3.6 . 3.6.1.8 ATP diphosphatase. 3.5.3.7 Guanidinobutyrase. 3.6.19 Nucleotide diphosphatase. 3.5.3.8 Formimidoylglutamase. 3.6.1.10 Endopolyphosphatase. 3.5.3.9 Allantoate deiminase. 3.6.1.11 Exopolyphosphatase. 3.5.3.10 D-arginase. 3.6.1.12 dCTP diphosphatase. 3.5.3.11 . 3.6.1.13 ADP-ribose diphosphatase. 3.5.3.12 Agmatine deiminase. 3.6.1.14 Adenosine-tetraphosphatase. 3.5.3.13 Formimidoylglutamate deiminase. 3.6.1.15 Nucleoside-triphosphatase. 3.5.3.14 Amidinoaspartase. 3.6.1.16 CDP-glycerol diphosphatase. 3.5.3.15 Protein-arginine deiminase. 3.6.1.17 Bis(5'-nucleosyl)-tetraphosphatase 3.5.3.16 Methylguanidinase. (asymmetrical). 3.5.3.17 Guanidinopropionase. 3.6.1.18 FAD diphosphatase. 3.5.3.18 Dimethylargininase. 3.6.1.19 Nucleoside-triphosphate diphosphatase. 3.5.3.19 Ureidoglycolate hydrolase. 3.6.1.20 5'-acylphosphoadenosine hydrolase. 3.5.3.20 Diguanidinobutanase. 3.6.1.21 ADP-Sugar diphosphatase. 3.5.3.21 Methylenediurea deaminase. 3.6.1.22 NAD+ diphosphatase. 3.5.3.22 Proclavaminate amidinohydrolase. 3.6.1.23 dUTP diphosphatase. 3.54.1 Cytosine deaminase. 3.6.1.24 Nucleoside phosphoacylhydrolase. 3.54.2 Adenine deaminase. 3.6.1.25 Triphosphatase. 3.543 . 3.6.1.26 CDP-diacylglycerol diphosphatase. 3.54.4 . 3.6.1.27 Undecaprenyl-diphosphatase. 3.54.5 . 3.6.1.28 Thiamine-triphosphatase. 3.54.6 AMP deaminase. 3.6.1.29 Bis(5'-adenosyl)-triphosphatase. 3.54.7 ADP deaminase. 3.6.1.30 M(7)G(5')pppN diphosphatase. 3.54.8 Aminoimidazolase. 3.6.1.31 Phosphoribosyl-ATP diphosphatase. 3.54.9 Methenyltetrahydrofolate cyclohydrolase. 3.6.139 Thymidine-triphosphatase. 3.54.10 IMP cyclohydrolase. 3.6.140 Guanosine-5'-triphosphate,3'- 3.54.11 Pterin deaminase. diphosphate diphosphatase. 3.54.12 dCMP deaminase. 3.6.1.41 Bis(5'-nucleosyl)-tetraphosphatase 3.54.13 dCTP deaminase. (symmetrical). 3.54.14 Deoxycytidine deaminase. 3.6.1.42 Guanosine-diphosphatase. 3.54.15 Guanosine deaminase. 3.6.1.43 Dolichyldiphosphatase. 3.54.16 GTP cyclohydrolase I. 3.6.1.44 Oligosaccharide-diphosphodolichol 3.54.17 Adenosine-phosphate deaminase. diphosphatase. 3.54.18 ATP deaminase. 3.6.1.45 UDP-Sugar diphosphatase. 3.54.19 Phosphoribosyl-AMP cyclohydrolase. 3.6.1.52 Diphosphoinositol-polyphosphate 3.54.20 Pyrithiamine deaminase. diphosphatase. 3.54.21 Creatinine deaminase. 3.6.2.1 Adenylylsulfatase. US 2012/0266329 A1 Oct. 18, 2012 53

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.6.2.2 Phosphoadenylylsulfatase. 3.6 4.9 ATPase. 36.31 Phospholipid-translocating ATPase. 3.6 4.10 Non-chaperonin molecular chaperone 3.6.32 Magnesium-importing ATPase. ATPase. 3.6.3.3 Cadmium-exporting ATPase. 3.6 .4.11 Nucleoplasmin ATPase. 3.6.3.4 Copper-exporting ATPase. 3.6 5.1 Heterotrimeric G-protein GTPase. 3.6.3.5 Zinc-exporting ATPase. 3.6 S.2 Small monomeric GTPase. 36.36 Proton-exporting ATPase. 3.6 S.3 Protein-synthesizing GTPase. 3.6.3.7 Sodium-exporting ATPase. 3.6 5.4 Signal-recognition-particle GTPase. 36.38 Calcium-transporting ATPase. 3.6 5.5 GTPase. 3.63.9 Sodium/potassium-exchanging 3.6 5.6 Tubulin GTPase. ATPase. 3.7. Oxaloacetase. 3.6.3.10 Hydrogenipotassium-exchanging 3.7. Fumarylacetoacetase. ATPase. 3.7. Kynureninase. 3.6.3.11 Chloride-transporting ATPase. 3.7. Phloretin hydrolase. 36.312 Potassium-transporting ATPase. 3.7. Acylpyruvate hydrolase. 3.6.3.14 H(+)-transporting two-sector ATPase. 3.7. Acetylpyruvate hydrolase. 3.6.3.15 Sodium-transporting two-sector 3.7. Beta-diketone hydrolase. ATPase. 3.7. 2,6-dioxo-6-phenylhexa-3-enoate 36.316 Arsenite-transporting ATPase. hydrolase. 3.6.3.17 Monosaccharide-transporting ATPase. 3.7.1. 2-hydroxymuconate-semialdehyde 3.6.3.18 Oligosaccharide-transporting ATPase. hydrolase. 3.6.3.19 Maltose-transporting ATPase. 3.7. O Cyclohexane-1,3-dione hydrolase. 3.6.32O Glycerol-3-phosphate-transporting 38. Alkylhalidase. ATPase. 38. (S)-2-haloacid dehalogenase. 3.6.321 Polar-amino-acid-transporting 38. Haloacetate dehalogenase. ATPase. 38. Haloalkane dehalogenase. 3.6.322 Nonpolar-amino-acid-transporting 38. 4-chlorobenzoate dehalogenase. ATPase. 38. 4-chlorobenzoyl-CoA dehalogenase. 3.6.323 Oligopeptide-transporting ATPase. 38. Atrazine chlorohydrolase. 3.6.324 Nickel-transporting ATPase. 38. (R)-2-haloacid dehalogenase. 3.6.3.25 Sulfate-transporting ATPase. 3.81. O 2-haloacid dehalogenase (configuration 3.6.326 Nitrate-transporting ATPase. inverting). 3.6.327 Phosphate-transporting ATPase. 1 1 2-haloacid dehalogenase (configuration 3.6.328 Phosphonate-transporting ATPase. retaining). 3.6.329 Molybdate-transporting ATPase. Phosphoamidase. 3.6.3.30 Fe(3+)-transporting ATPase. 0.1.1 N-sulfoglucosamine Sulfohydrolase. 3.6.3.31 Polyamine-transporting ATPase. O.12 Cyclamate Sulfohydrolase. 3.6.3.32 Quaternary-amine-transporting 1.1.1 Phosphonoacetaldehyde hydrolase. ATPase. 1.1.2 Phosphonoacetate hydrolase. 3.6.3.33 Vitamin B12-transporting ATPase. 2.1.1 Trithionate hydrolase. 3.6.3.34 Iron-chelate-transporting ATPase. 3.11 UDP-Sulfoquinovose synthase. 3.6.3.35 Manganese-transporting ATPase. ENZYME: 4 - 3.6.3.36 Taurine-transporting ATPase. 3.6.3.37 Guanine-transporting ATPase. Pyruvate decarboxylase. 36.338 Capsular-polysaccharide-transporting Oxalate decarboxylase. ATPase. Oxaloacetate decarboxylase. 3.6.3.39 Lipopolysaccharide-transporting Acetoacetate decarboxylase. ATPase. Acetolactate decarboxylase. 3.6.340 Teichoic-acid-transporting ATPase. Aconitate decarboxylase. 3.6.3.41 Heme-transporting ATPase. Benzoylformate decarboxylase. 3.6.3.42 Beta-glucan-transporting ATPase. Oxalyl-CoA decarboxylase. 3.6.3.43 Peptide-transporting ATPase. Malonyl-CoA decarboxylase. 3.6.3.44 Xenobiotic-transporting ATPase. Aspartate 1-decarboxylase. 3.6.345 Steroid-transporting ATPase. Aspartate 4-decarboxylase. 3.6.3.46 Cadmium-transporting ATPase. Valine decarboxylase. 3.6.3.47 Fatty-acyl-CoA-transporting ATPase. Glutamate decarboxylase. 3.6.3.48 Alpha-factor-transporting ATPase. Hydroxyglutamate decarboxylase. 3.6.3.49 Channel-conductance-controlling Ornithine decarboxylase. ATPase. Lysine decarboxylase. 3.6.3SO Protein-secreting ATPase. Arginine decarboxylase. 3.6.351 Mitochondrial protein-transporting Diaminopimelate decarboxylase. ATPase. Phosphoribosylaminoimidazole 3.6.352 Chloroplast protein-transporting carboxylase. ATPase. .1.22 Histidine decarboxylase. 3.6.353 Ag(+)-exporting ATPase. 1.23 Orotidine-5'-phosphate decarboxylase. 3.6.4.1 Myosin ATPase. .1.24 Aminobenzoate decarboxylase. 3.6.4.2 ATPase. 1.25 Tyrosine decarboxylase. 36.4.3 Microtubule-severing ATPase. 1.28 Aromatic-L-amino-acid decarboxylase. 3.6.4.4 Plus-end-directed ATPase. 1.29 Sulfinoalanine decarboxylase. 3.64.5 Minus-end-directed kinesin ATPase. 1.30 Pantothenoylcysteine decarboxylase. 3.6.46 Vesicle-fusing ATPase. 1.31 Phosphoenolpyruvate carboxylase. 36.4.7 Peroxisome-assembly ATPase. 32 Phosphoenolpyruvate carboxykinase 36.4.8 Proteasome ATPase. (GTP). US 2012/0266329 A1 Oct. 18, 2012 54

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 1.33 Diphosphomevalonate decarboxylase. .2.21 2-dehydro-3-deoxy-6- .1.34 Dehydro-L-gulonate decarboxylase. phosphogalactonate aldolase. 1.35 UDP-glucuronate decarboxylase. .2.22 Fructose-6-phosphate phosphoketolase. : 36 Phosphopantothenoylcysteine 2.23 3-deoxy-D-manno-octuloSonate aldolase. decarboxylase. .2.24 Dimethylaniline-N-oxide aldolase. 1.37 Uroporphyrinogen decarboxylase. 2.25 Dihydroneopterin aldolase. 38 Phosphoenolpyruvate carboxykinase 2.26 Phenylserine aldolase. (diphosphate). 2.27 Sphinganine-1-phosphate aldolase. 1.39 Ribulose-bisphosphate carboxylase. 2.28 2-dehydro-3-deoxy-D-pentonate 1.40 Hydroxypyruvate decarboxylase. aldolase. .1.41 Methylmalonyl-CoA decarboxylase. 4. 2.29 5-dehydro-2-deoxyphosphogluconate .1.42 Carnitine decarboxylase. aldolase. 1.43 Phenylpyruvate decarboxylase. 2.30 17-alpha-hydroxyprogesterone aldolase. .1.44 4-carboxymuconolactone 2.32 Trimethylamine-oxide aldolase. decarboxylase. 2.33 Fucosterol-epoxide lyase. 4. 1.45 Aminocarboxymuconate-semialdehyde .2.34 4-(2-carboxyphenyl)-2-oxobut-3-enoate decarboxylase. aldolase. .1.46 O-pyrocatechuate decarboxylase. 2.35 Propioin synthase. 1.47 Tartronate-semialdehyde synthase. 2.36 Lactate aldolase. .1.48 Indole-3-glycerol-phosphate synthase. 2.37 Acetone-cyanohydrin lyase. : 1.49 Phosphoenolpyruvate carboxykinase 2.38 Benzoin aldolase. (ATP). 2.39 Hydroxynitrilase. 1...SO Adenosylmethionine decarboxylase. .2.40 Tagatose-bisphosphate aldolase. S1 3-hydroxy-2-methylpyridine-4,5- .2.41 Vanillin synthase. dicarboxylate 4-decarboxylase. .3.1 Isocitrate lyase. 1.52 6-methylsalicylate decarboxylase. 3.3 N-acetylneuraminate lyase. 1.53 Phenylalanine decarboxylase. .3.4 Hydroxymethylglutaryl-CoA lyase. 1.54 Dihydroxyfumarate decarboxylase. 3.6 Citrate (pro-3S)-lyase. 1.55 4,5-dihydroxyphthalate decarboxylase. 3.13 Oxalomalate lyase. 1.56 3-oxolaurate decarboxylase. 3.14 3-hydroxyaspartate aldolase. 1.57 Methionine decarboxylase. 3.16 4-hydroxy-2-oxoglutarate aldolase. 1.58 Orsellinate decarboxylase. 3.17 4-hydroxy-4-methyl-2-oxoglutarate 1.59 Gallate decarboxylase. aldolase. 1.60 Stipitatonate decarboxylase. 3.22 Citramalate lyase. 1.61 4-hydroxybenzoate decarboxylase. 3.24 Malyl-CoA lyase. .1.62 Gentisate decarboxylase. 3.25 Citramalyl-CoA lyase. 1.63 Protocatechuate decarboxylase. : 3.26 3-hydroxy-3-isohexenylglutaryl-CoA .64 2,2-dialkylglycine decarboxylase lyase. (pyruvate). 3.27 Anthranilate synthase. 1.65 Phosphatidylserine decarboxylase. 3.30 Methylisocitrate lyase. 1.66 Uracil-5-carboxylate decarboxylase. 3.32 2,3-dimethylmalate lyase. 1.67 UDP-galacturonate decarboxylase. 3.34 Citryl-CoA lyase. : 68 5-oxopent-3-ene-1,2,5-tricarboxylate 3.35 (1-hydroxycyclohexan-1-yl)acetyl-CoA decarboxylase. lyase. 1.69 3,4-dihydroxyphthalate decarboxylase. 3.36 Naphthoate synthase. 1.70 Glutaconyl-CoA decarboxylase. 3.38 Aminodeoxychorismate lyase. 1.71 2-oxoglutarate decarboxylase. 99.1 Tryptophanase. : 72 Branched-chain-2-oxoacid 99.2 Tyrosinephenol-lyase. decarboxylase. 99.3 Deoxyribodipyrimidine photo-lyase. 1.73 Tartrate decarboxylase. 99.5 Octadecanal decarbonylase. 1.74 Indolepyruvate decarboxylase. 99.11 Benzylsuccinate synthase. 4. .75 5-guanidino-2-oxopentanoate Carbonate . decarboxylase. Fumarate hydratase. 1.76 Arylmalonate decarboxylase. Aconitate hydratase. 1.77 4-oxalocrotonate decarboxylase. Citrate dehydratase. 1.78 Acetylenedicarboxylate decarboxylase. Arabinonate dehydratase. 1.79 Sulfopyruvate decarboxylase. Galactonate dehydratase. 18O 4-hydroxyphenylpyruvate decarboxylase. Altronate dehydratase. 1.81 Threonine-phosphate decarboxylase. Mannonate dehydratase. .2.2 Ketotetrose-phosphate aldolase. Dihydroxy-acid dehydratase. .2.4 Deoxyribose-phosphate aldolase. 3-dehydroquinate dehydratase. 2.5 Threonine aldolase. Phosphopyruvate hydratase. 2.9 Phosphoketolase. Phosphogluconate dehydratase. 2.10 Mandelonitrile lyase. Enoyl-CoA hydratase. .2.11 Hydroxymandelonitrile lyase. Methylglutaconyl-CoA hydratase. .2.12 2-dehydropantoate aldolase. Imidazoleglycerol-phosphate dehydratase. 2.13 Fructose-bisphosphate aldolase. . .2.14 2-dehydro-3-deoxy-phosphogluconate CyStathionine beta-synthase. aldolase. Porphobilinogen synthase. 2.17 L-fuculose-phosphate aldolase. L-arabinonate dehydratase. 2.18 2-dehydro-3-deoxy-L-pentonate aldolase. Acetylenecarboxylate hydratase. 2.19 Rhamnulose-1-phosphate aldolase. Propanediol dehydratase. : 2.2O 2-dehydro-3-deoxyglucarate aldolase. Glycerol dehydratase. US 2012/0266329 A1 Oct. 18, 2012 55

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 4.2. 31 Maleate hydratase. 4.2.1.98 6-alpha-hydroxyprogesterone 4.2. 32 L(+)-tartrate dehydratase. dehydratase. 4.2. .33 3-isopropylmalate dehydratase. 4.2.1.99 2-methylisocitrate dehydratase. 4.2. 34 (S)-2-methylmalate dehydratase. 4.2.1.1OO Cyclohexa-1,5-dienecarbonyl-CoA 4.2. 35 (R)-2-methylmalate dehydratase. hydratase. 4.2. 36 Homoaconitate hydratase. 4.2.1.101 Trans-feruloyl-CoA hydratase. 4.2. 39 Gluconate dehydratase. 4.2.1.1.03 Cyclohexyl-isocyanide hydratase. 4.2. 40 Glucarate dehydratase. 4.2.1.104 Cyanate hydratase. 4.2. 41 5-dehydro-4-deoxyglucarate 4.2.2.1 . dehydratase. 4.2.2.2 Pectate lyase. 4.2. 42 Galactarate dehydratase. 4.2.2.3 Poly(beta-D-mannuronate) lyase. 4.2. 43 2-dehydro-3-deoxy-L-arabinonate 4.2.2.4 Chondroitin ABC lyase. dehydratase. 4.2.2.5 Chondroitin AC lyase. 4.2. .44 Myo-inosose-2 dehydratase. 4.2.2.6 Oligogalacturonide lyase. 4.2. 45 CDP-glucose 4,6-dehydratase. 4.22.7 Heparin lyase. 4.2. 46 dTDP-glucose 4,6-dehydratase. 4.2.2.8 Heparin-sulfate lyase. 4.2. 47 GDP-mannose 4,6-dehydratase. 4.2.2.9 Pectate disaccharide-lyase. 4.2. 48 D-glutamate cyclase. 4.2.2.10 Pectin lyase. 4.2. 49 Urocanate hydratase. 4.2.2.11 Poly(alpha-L-guluronate) lyase. 4.2. SO Pyrazolylalanine synthase. 4.2.2.12 Xanthan lyase. 4.2. S1 Prephenate dehydratase. 4.2.2.13 EXO-(1->4)-alpha-D-glucan lyase. 4.2. 52 Dihydrodipicolinate synthase. 4.2.2.14 Glucuronan lyase. 4.2. 53 Oleate hydratase. 4.2.2.15 Anhydrosialidase. 4.2. 54 Lactoyl-CoA dehydratase. 4.2.2.16 Levan fructotransferase (DFA-IV 4.2. 55 3-hydroxybutyryl-CoA dehydratase. orming). 4.2. S6 taconyl-CoA hydratase. 4.2.2.17 (nulin fructotransferase (DFA-I-forming). 4.2. 57 Sohexenylglutaconyl-CoA 4.2.2.18 (nulin fructotransferase (DFA-III-forming). hydratase. 4.2.3.1 . 4.2. Crotonoyl-acyl-carrier-protein 4.2.3.2 Ethanolamine-phosphate phospho-lyase. hydratase. 4.2.3.3 Methylglyoxal synthase. 4.2. 59 3-hydroxyoctanoyl-acyl-carrier 4.2.3.4 3-dehydroquinate synthase. protein dehydratase. 4.23.5 Chorismate synthase. 4.2. 60 3-hydroxy decanoyl-acyl-carrier 4.2.3.6 Trichodiene synthase. protein dehydratase. 4.23.7 Pentalenene synthase. 4.2. 61 3-hydroxypalmitoyl-acyl-carrier 4.238 Casbene synthase. protein dehydratase. 4.23.9 Aristolochene synthase. 4.2. 62 5-alpha-hydroxysteroid dehydratase. 4.23.10 (-)-endo-fenchol synthase. 4.2. .65 3-cyanoalanine hydratase. 4.2.3.11 Sabinene-hydrate synthase. 4.2. 66 Cyanide hydratase. 4.2.3.12 6-pyruvoyltetrahydropterin 4.2. .67 D-fuconate dehydratase. synthase. 4.2. 68 L-fuconate dehydratase. 4.2.3.13 (+)-delta-cadinene synthase. 4.2. 69 Cyanamide hydratase. 4.2.3.14 Pinene synthase. 4.2. 70 Pseudouridylate synthase. 4.23.15 Myrcene synthase. 4.2. 73 Protoaphin-aglucone dehydratase 4.2.3.16 (4S)-limonene synthase. (cyclizing). 4.23.17 Taxadiene synthase. 4.2. .74 Long-chain-enoyl-CoA hydratase. 4.2.3.18 Abietadiene synthase. 4.2. .75 Uroporphyrinogen-III synthase. 4.23.19 Ent-kaurene synthase. 4.2. .76 UDP-glucose 4,6-dehydratase. 4.23.2O (+)-limonene synthase. 4.2. 77 Trans-L-3-hydroxyproline 4.2.3.21 Vetispiradiene synthase. dehydratase. 4.2.99.12 Carboxymethyloxysuccinate lyase. 4.2. .78 (S)-norcoclaurine synthase. 4.2.99.18 DNA-(apurinic or apyrimidinic 4.2. .79 2-methylcitrate dehydratase. site) lyase. 4.2. 80 2-oxopent-4-enoate hydratase. 4.2.99.19 2-hydroxypropyl-CoM lyase. 4.2. 81 D(-)-tartrate dehydratase. 4.3. Aspartate ammonia-lyase. 4.2. 82 Xylonate dehydratase. 4.3. Methylaspartate ammonia-lyase. 4.2. .83 4-oxalmesaconate hydratase. 4.3. Histidine ammonia-lyase. 4.2. 84 . 4.3. Formimidoyltetrahydrofolate 4.2. .85 Dimethylmaleate hydratase. cyclodeaminase. 4.2. 86 6-dehydroprogesterone hydratase. 4.3. Phenylalanine ammonia-lyase. 4.2. 87 Octopamine dehydratase. 4.3. Beta-alanyl-CoA ammonia-lyase. 4.2. 88 Synephrine dehydratase. 4.3. Ethanolamine ammonia-lyase. 4.2. 89 Carnitine dehydratase. 4.3. Glucosaminate ammonia-lyase. 4.2. 90 4.3. Serine-Sulfate ammonia-lyase. L-rhamnonate dehydratase. 4.3.1. 4.2. 91 Carboxycyclohexadienyl dehydratase. Dihydroxyphenylalanine ammonia yase. 4.2. 92 Hydroperoxide dehydratase. 4.3.1.12 . 4.2. .93 ATP-dependent NAD(P)H-hydrate 4.3.1.13 Carbamoyl-serine ammonia-lyase. dehydratase. 4.3.1.14 3-aminobutyryl-CoA ammonia 4.2. .94 Scytalone dehydratase. yase. 4.2. 95 Kievitone hydratase. 4.3.1.15 Diaminopropionate ammonia-lyase. 4.2. .96 4a-hydroxytetrahydrobiopterin 4.3.1.16 Threo-3-hydroxyaspartate dehydratase. ammonia-lyase. 4.2. 97 Phaseollidin hydratase. 4.3.1.17 L-serine ammonia-lyase. US 2012/0266329 A1 Oct. 18, 2012 56

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 4.3.1.18 D-Serine ammonia-lyase. .2.2 . 4.3.1.19 Threonine ammonia-lyase. 2.3 3-hydroxybutyryl-CoA epimerase. 4.3.1.20 Erythro-3-hydroxyaspartate .2.4 Acetoin racemase. ammonia-lyase. 2.5 Tartrate epimerase. 4.3.2.1 Argininosuccinate lyase. .2.6 Isocitrate epimerase. 4.3.2.2 . .3.1 Ribulose-phosphate 3-epimerase. 4.3.2.3 Ureidoglycolate lyase. 3.2 UDP-glucose 4-epimerase. 4.3.2.4 Purine imidazole-ring cyclase. 3.3 Aldose 1-epimerase. 4.3.25 Peptidylamidoglycolate lyase. .3.4 L-ribulose-phosphate 4-epimerase. 4.3.3.1 3-ketovalidoxylamine C-N-lyase. 3.5 UDP-arabinose 4-epimerase. 4.3.3.2 . 3.6 UDP-glucuronate 4-epimerase. 4.3.3.3 Deacetylisoipecoside synthase. 3.7 UDP-N-acetylglucosamine 4 4.3.3.4 Deacetylipecoside synthase. epimerase. 4.4.1.1 CyStathionine gamma-lyase. 3.8 N-acylglucosamine 2-epimerase. 4.4.1.2 Homocysteine desulfhydrase. 3.9 N-acylglucosamine-6-phosphate 2 4.4.1.3 Dimethylpropiothetin dethiomethylase. epimerase. 4.4.1.4 Alliin lyase. 3.10 CDP-abequose epimerase. 4.4.1.5 Lactoylglutathione lyase. 3.11 Cellobiose epimerase. 4.4.1.6 S-alkylcysteine lyase. 3.12 UDP-glucuronate 5'-epimerase. 4.4.1.8 CyStathionine beta-lyase. 3.13 dTDP-4-dehydrorhamnose 3,5- 4.4.1.9 L-3-cyanoalanine synthase. epimerase. 4.4.1.10 Cysteine lyase. 3.14 UDP-N-acetylglucosamine 2 4.4.1.11 Methionine gamma-lyase. epimerase. 4.4.1.13 Cysteine-S-conjugate beta-lyase. 3.15 Glucose-6-phosphate 1-epimerase. 4.4.1.14 -aminocyclopropane-1-carboxylate 3.16 UDP-glucosamine 4-epimerase. synthase. 3.17 Heparosan-N-Sulfate-glucuronate 5 4.4.1.15 D-cysteine desulfhydrase. epimerase. 4.4.1.16 Selenocysteine lyase. 3.18 GDP-mannose 3,5-epimerase. 4.4.1.17 Holocytochrome-c synthase. 3.19 Chondroitin-glucuronate 5 4.4.1.19 Phosphosulfolactate synthase. epimerase. 4.4.1.20 Leukotriene-C(4) synthase. 3.2O ADP-glyceroman no-heptose 6 4.5.1.1 DDT-dehydrochlorinase. epimerase. 4.51.2 3-chloro-D-alanine dehydrochlorinase. 3.21 Maltose epimerase. 4.51.3 Dichloromethane dehalogenase. 99.1 Methylmalonyl-CoA epimerase. 4.5.1.4 L-2-amino-4-chloropent-4-enoate 99.2 16-hydroxysteroid epimerase. dehydrochlorinase. 99.3 Allantoin racemase. 4.51.5 S-carboxymethylcysteine synthase. 99 4 Alpha-methylacyl-CoA racemase. 4.6.1.1 Adenylate cyclase. Maleate isomerase. 4.6.1.2 Guanylate cyclase. Maleylacetoacetate isomerase. 4.6.1.6 Cytidylate cyclase. Retinal isomerase. 4.6.1.12 2-C-methyl-D-erythritol 2,4- Maleylpyruvate isomerase. cyclodiphosphate synthase. Linoleate isomerase. 4.6.1.13 Phosphatidylinositol diacylglycerol-lyase. Furylfuramide isomerase. 4.6.1.14 Glycosylphosphatidylinositol Retinol isomerase. diacylglycerol-lyase. Peptidylprolyl isomerase. 4.6.1.15 FAD-AMP lyase (cyclizing). Farnesol 2-isomerase. 4.99.1.1 Ferrochelatase. 2-chloro-4-carboxymethylenebut-2-en-1,4- 4.99.1.2 Alkylmercury lyase. olide isomerase. 4.99.13 Sirohydrochlorin cobaltochelatase. 5.2.1. 4-hydroxyphenylacetaldehyde-oxime 4.99.14 Sirohydrochlorin ferrochelatase. isomerase. 4.99.15 Aliphatic aldoxime dehydratase. 5.3. Triose-phosphate isomerase. 4.99.1.6 Indoleacetaldoxime dehydratase. 5.3. Arabinose isomerase. ENZYME: 5. . . . 5.3. L-arabinose isomerase. 5.3. Xylose isomerase. 5.1.1.1 Alanine racemase. 5.3. Ribose-5-phosphate isomerase. 5.1.1.2 Methionine racemase. 5.3. Mannose isomerase. 5.1.1.3 Glutamate racemase. 5.3. Mannose-6-phosphate isomerase. 5.1.1.4 Proline racemase. 5.3.1. Glucose-6-phosphate isomerase. 5.1.1.5 Lysine racemase. 5.3.1. Glucuronate isomerase. 5.1.16 Threonine racemase. 5.3.1. Arabinose-5-phosphate isomerase. 5.1.1.7 Diaminopimelate epimerase. 5.3.1. L-rhamnose isomerase. 5.1.1.8 4-hydroxyproline epimerase. 5.3.1. D-lyxose ketol-isomerase. 5.1.19 Arginine racemase. 5.3.1. 1-(5-phosphoribosyl)-5-((5- 5.1.1.10 Amino-acid racemase. phosphoribosylamino)methylideneamino)imidazole-4- 5.1.1.11 Phenylalanine racemase (ATP carboxamide isomerase. hydrolyzing). 5.3. 17 4-deoxy-L-threo-5-hexoSulose-uronate 5.1.1.12 Ornithine racemase. ketol-isomerase. 5.1.1.13 Aspartate racemase. 5.3. 2O Ribose isomerase. 5.1.1.14 Nocardicin-A epimerase. 5.3. .21 Corticosteroid side-chain-isomerase. 5.1.1.15 2-aminohexano-6-lactam racemase. 5.3. 22 Hydroxypyruvate isomerase. 5.1.1.16 Protein-serine epimerase. 5.3. .23 S-methyl-5-thioribose-1-phosphate 5.1.1.17 Isopenicillin-N epimerase. isomerase. 5.1.2.1 Lactate racemase. 5.3. .24 Phosphoribosylanthranilate isomerase. US 2012/0266329 A1 Oct. 18, 2012 57

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 5.3.1.25 L-fucose isomerase. S.S.1.6 Chalcone isomerase. 5.3.1.26 Galactose-6-phosphate isomerase. 5.5.1.7 Chloromuconate cycloisomerase. 5.3.2.1 Phenylpyruvate tautomerase. S.S.1.8 Geranyl-diphosphate cyclase. 5.3.2.2 Oxaloacetate tautomerase. S.S.1.9 Cycloeucalenol cycloisomerase. 5.3.3.1 Steroid delta-isomerase. S.S.1.10 Alpha-pinene-oxide decyclase. 5.3.3.2 Isopentenyl-diphosphate delta-isomerase. S.S.1.11 Dichloromuconate cycloisomerase. 5.3.3.3 Vinylacetyl-CoA delta-isomerase. S.S.1.12 Copalyl diphosphate synthase. 5.33.4 Muconolactone delta-isomerase. S.S.1.13 Ent-copalyl diphosphate synthase. 5.33.5 Cholestenol delta-isomerase. 5.99.1.1 Thiocyanate isomerase. 5.3.3.6 Methylitaconate delta-isomerase. 5.99.1.2 DNA topoisomerase. 5.33.7 Aconitate delta-isomerase. 5.99.13 DNA topoisomerase (ATP-hydrolyzing). 5.3.3.8 Dodecenoyl-CoA delta-isomerase. ENZYME: 6. . . . 5.33.9 Prostaglandin-A(1) delta-isomerase. 5.3.3.10 5-carboxymethyl-2-hydroxymuconate Tyrosine-tRNA ligase. delta-isomerase. Tryptophan--tRNA ligase. 5.3.3.11 Sopiperitenone delta-isomerase. Threonine--tRNA ligase. 5.3.3.12 Dopachrome isomerase. Leucine--tRNA ligase. 5.3.3.13 Polyenoic fatty acid isomerase. Isoleucine--tRNA ligase. 53.4.1 Protein disulfide-isomerase. Lysine--tRNA ligase. 5.3.99.2 Prostaglandin-D synthase. Alanine--tRNA ligase. 5.3.99.3 Prostaglandin-E synthase. Valine--tRNA ligase. 5.3.99.4 Prostaglandin-I synthase. Methionine--tRNA ligase. 5.3.99.5 Thromboxane-A Synthase. Serine--tRNA ligase. 5.3.99.6 Allene-oxide cyclase. Aspartate--tRNA ligase. 5.3.99.7 Styrene-oxide isomerase. D-alanine-poly(phosphoribitol) ligase. 5.4.1.1 ySolecithin acylmutase. Glycine-tRNA ligase. 5.4.1.2 recorrin-8X methylmutase. Proline--tRNA ligase. 54.2.1 hosphoglycerate . Cysteine--tRNA ligase. 54.2.2 hosphoglucomutase. Glutamate--tRNA ligase. 54.23 hosphoacetylglucosamine mutase. Glutamine--tRNA ligase. 5.4.2.4 isphosphoglycerate mutase. Arginine--tRNA ligase. 54.25 hosphoglucomutase (glucose Phenylalanine--tRNA ligase. ). Histidine--tRNA ligase. 54.2.6 Beta-. Asparagine--tRNA ligase. S.4.2.7 Phosphopentomutase. Aspartate--tRNA(ASn) ligase. 542.8 C hosphomannomutase. Glutamate--tRNA(Gln) ligase. 54.29 C hosphoenolpyruvate mutase. Lysine--tRNA(Pyl) ligase. 54.210 C hosphoglucosamine mutase. Acetate-CoA ligase. 5.432 Lysine 2,3-aminomutase. Butyrate--CoA ligase. 5.4.3.3 Beta-lysine 5,6-aminomutase. Long-chain-fatty-acid-CoA ligase. 5.434 D-lysine 5,6-aminomutase. Succinate--CoA ligase (GDP-forming). 5.43.5 D-ornithine 4,5-aminomutase. Succinate--CoA ligase (ADP 54.36 Tyrosine 2,3-aminomutase. orming). 5.43.7 Leucine 2,3-aminomutase. 6.2. Glutarate--CoA ligase. 5.4.38 Glutamate-1-semialdehyde 2,1- 6.2. Cholate--CoA ligase. aminomutase. 6.2. Oxalate--CoA ligase. 5.4.4.1 (Hydroxyamino)benzene mutase. 6.2.1. Malate--CoA ligase. 5.4.4.2 Sochorismate synthase. 6.2.1. Acid--CoA ligase (GDP-forming). 5.44.3 3-(hydroxyamino)phenol mutase. 6.2.1. -CoA ligase. 5.499.1 Methylaspartate mutase. 6.2.1. 4-coumarate--CoA ligase. 5.4.99.2 Methylmalonyl-CoA mutase. 6.2.1. Acetate-CoA ligase (ADP-forming). 5.499.3 2-acetolactate mutase. 6.2.1. 6-carboxylhexanoate--CoA ligase. 5.499.4 2-methyleneglutarate mutase. 6.2.1. Arachidonate--CoA ligase. 5.4.99.5 Chorismate mutase. 6.2.1. Acetoacetate--CoA ligase. 5.4.99.7 . 6.2.1. Propionate-CoA ligase. 5.4.99.8 Cycloartenol synthase. 6.2.1. Citrate--CoA ligase. 5.4.99.9 UDP-galactopyranose mutase. 6.2.1. Long-chain-fatty-acid-luciferin 5.499.11 Isomalitulose synthase. component ligase. 5.499.12 tRNA-pseudouridine synthase I. 6.2. Long-chain-fatty-acid-acyl-carrier 5.499.13 Isobutyryl-CoA mutase. protein ligase. 5.499.14 4-carboxymethyl-4- 6.2. 22 Citrate (pro-3S)-lyase ligase. methylbutenolide mutase. 6.2. .23 Dicarboxylate-CoA ligase. 5.4.99.15 (1->4)-alpha-D-glucan 1-alpha-D- 6.2. .24 Phytanate--CoA ligase. glucosylmutase. 6.2. 25 Benzoate--CoA ligase. 5.499.16 Maltose alpha-D- 6.2. 26 O-Succinylbenzoate--CoA ligase. glucosyltransferase. 6.2. 27 4-hydroxybenzoate--CoA ligase. 5.4.99.17 Squalene-hopene cyclase. 6.2. 28 3-alpha,7-alpha-dihydroxy-5-beta S.S.1.1 Muconate cycloisomerase. cholestanate-CoA ligase. S.S.1.2 3-carboxy-cis,cis-muconate 6.2. 29 3-alpha,7-alpha,12-alpha-trihydroxy cycloisomerase. 5-beta-cholestanate--CoA ligase. S.S.1.3 Tetrahydroxypteridine cycloisomerase. 6.2. 30 Phenylacetate--CoA ligase. 5.5.1.4 Inositol-3-phosphate synthase. 6.2. 31 2-furoate--CoA ligase. 5.5.1.5 Carboxy-cis,cis-muconate cyclase. 6.2. 32 Anthranilate--CoA ligase. US 2012/0266329 A1 Oct. 18, 2012 58

TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 6.2.1.33 4-chlorobenzoate--CoA ligase. 6.3.4.11 Biotin--methylcrotonoyl-CoA 6.2.1.34 Trans-feruloyl-CoA synthase. carboxylase ligase. 6.3.1.1 Aspartate--ammonia ligase. 6.3.4.12 Glutamate-methylamine ligase. 6.3.1.2 Glutamate--ammonia ligase. 6.3.4.13 Phosphoribosylamine-glycine 6.3.1.4 Aspartate--ammonia ligase (ADP igase. forming). 6.3.4.14 Biotin carboxylase. 6.3.1.5 NAD(+) synthase. 6.3.4.15 Biotin-acetyl-CoA-carboxylase ligase. 6.3.1.6 Glutamate--ethylamine ligase. 6.3.4.16 Carbamoyl-phosphate synthase 6.3.1.7 4-methyleneglutamate-ammonia (ammonia). ligase. 6.3.4.17 Formate-dihydrofolate ligase. 6.3.1.8 Glutathionylspermidine synthase. 6.35.1 NAD(+) synthase (glutamine 6.3.1.9 Trypanothione synthase. hydrolyzing). 6.3.1.10 Adenosylcobinamide-phosphate 6.35.2 GMP synthase (glutamine-hydrolyzing). synthase. 6.35.3 Phosphoribosylformylglycinamidine 6.3.2.1 Pantoate--beta-alanine ligase. synthase. 6.3.2.2 Glutamate-cysteine ligase. 6.35.4 Asparagine synthase (glutamine 6.3.2.3 Glutathione synthase. hydrolyzing). 6.3.2.4 D-alanine--D-alanine ligase. 6.3.S.S Carbamoyl-phosphate synthase 6.3.25 Phosphopantothenate-cysteine ligase. (glutamine-hydrolyzing). 6.3.2.6 PhosphoribosylaminoimidazoleSuccinocarboxamide 6.35.6 Asparaginyl-tRNA synthase (glutamine synthase. hydrolyzing). 6.3.2.7 UDP-N-acetylmuramoyl-L-alanyl-D- 6.35.7 Glutaminyl-tRNA synthase (glutamine glutamate--L-lysine ligase. hydrolyzing). 6.3.2.8 UDP-N-acetylmuramate--L-alanine 6.35.8 Aminodeoxychorismate synthase. igase. 6.35.9 Hydrogenobyrinic acid a,c-diamide 6.3.2.9 UDP-N-acetylmuramoylalanine--D- synthase (glutamine-hydrolyzing). glutamate ligase. 6.35.10 Adenosylcobyric acid synthase 6.3.2.10 UDP-N-acetylmuramoyl-tripeptide-- (glutamine-hydrolyzing). D-alanyl-D-alanine ligase. 6.4.1.1 . 6.3.2.11 Carnosine synthase. 6.4.1.2 Acetyl-CoA carboxylase. 6.3.2.12 Dihydrofolate synthase. 6.4.1.3 Propionyl-CoA carboxylase. 6.3.2.13 UDP-N-acetylmuramoylalanyl-D- 6.4.1.4 Methylcrotonoyl-CoA carboxylase. glutamate-2,6-diaminopimelate ligase. 6.4.15 Geranoyl-CoA carboxylase. 6.3.2.14 2,3-dihydroxybenzoate-serine 6.4.1.6 Acetone carboxylase. igase. 6.5.1.1 DNA ligase (ATP). 6.3.2.16 D-alanine-alanyl 6.5.1.2 DNA ligase (NAD+). poly(glycerolphosphate) ligase. 6.5.1.3 RNA ligase (ATP). 6.3.2.17 Tetrahydrofolylpolyglutamate 6.5.1.4 RNA-3'-phosphate cyclase. synthase. 6.6.1.1 Magnesium chelatase. 6.3.2.18 Gamma-glutamylhistamine 6.6.1.2 Cobaltochelatase. synthase. 6.3.4.17 Formate-dihydrofolate ligase. 6.3.2.19 Ubiquitin-protein ligase. 6.35.1 NAD(+) synthase (glutamine 6.3.2.2O indoleacetate-lysine synthetase. hydrolyzing). 6.3.2.21 Ubiquitin--calmodulin ligase. 6.35.2 GMP synthase (glutamine-hydrolyzing). 6.3.2.22 Diphthine--ammonia ligase. 6.35.3 Phosphoribosylformylglycinamidine 6.3.2.23 Homoglutathione synthase. synthase. 6.3.2.24 Tyrosine-arginine ligase. 6.35.4 Asparagine synthase (glutamine 6.3.2.25 Tubulin--tyrosine ligase. hydrolyzing). 6.3.2.26 N-(5-amino-5-carboxypentanoyl)-L- 6.3.S.S Carbamoyl-phosphate synthase cysteinyl-D-valine synthase. (glutamine-hydrolyzing). 6.3.2.27 Aerobactin synthase. 6.35.6 Asparaginyl-tRNA synthase (glutamine 6.3.3.1 Phosphoribosylformylglycinamidine hydrolyzing). cyclo-ligase. 6.35.7 Glutaminyl-tRNA synthase (glutamine 6.3.3.2 5-formyltetrahydrofolate cyclo hydrolyzing). ligase. 6.35.8 Aminodeoxychorismate synthase. 6.3.3.3 Dethiobiotin synthase. 6.35.9 Hydrogenobyrinic acid a,c-diamide 6.3.3.4 (Carboxyethyl)arginine beta-lactam synthase (glutamine-hydrolyzing). synthase. 6.35.10 Adenosylcobyric acid synthase (glutamine 6.3.4.1 GMP synthase. hydrolyzing). 6.3.4.2 CTP synthase. 6.4.1.1 Pyruvate carboxylase. 6.3.4.3 Formate--tetrahydrofolate ligase. 6.4.1.2 Acetyl-CoA carboxylase. 6.3.4.4 AdenyloSuccinate synthase. 6.4.1.3 Propionyl-CoA carboxylase. 6.3.4.5 Argininosuccinate synthase. 6.4.1.4 Methylcrotonoyl-CoA carboxylase. 6.3.4.6 carboxylase. 6.4.15 Geranoyl-CoA carboxylase. 6.3.4.7 Ribose-5-phosphate-ammonia 6.4.1.6 Acetone carboxylase. ligase. 6.5.1.1 DNA ligase (ATP). 6.3.4.8 Imidazoleacetate-- 6.5.1.2 DNA ligase (NAD+). phosphoribosyldiphosphate ligase. 6.5.1.3 RNA ligase (ATP). 6.3.4.9 Biotin-methylmalonyl-CoA 6.5.1.4 RNA-3'-phosphate cyclase. carboxytransferase ligase. 6.6.1.1 Magnesium chelatase. 6.3.4.10 Biotin-propionyl-CoA 6.6.1.2 Cobaltochelatase. carboxylase (ATP-hydrolyzing) ligase. US 2012/0266329 A1 Oct. 18, 2012 59

0239 Table 3 summarizes exemplary functions of exem stitution occurs at a site that is not the active site of the plary enzymes of the invention; these enzyme functions were molecule, or, alternatively the Substitution occurs at a site that determined using sequence identity comparison analysis is the active site of the molecule, provided that the polypep using closest BLAST hits to the exemplary polypeptides and tide essentially retains its functional (enzymatic) properties. polynucleotides of the invention. A conservative amino acid Substitution, for example, Substi 0240. The invention also provides isolated and recombi tutes one amino acid for another of the same class (e.g., nant nucleic acids encoding polypeptides, e.g., SEQID NO:1, Substitution of one hydrophobic amino acid, Such as isoleu SEQID NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, cine, Valine, leucine, or methionine, for another, or Substitu etc., and all additional nucleic acids disclosed in the SEQID tion of one polar amino acid for another, Such as Substitution listing, which include all odd numbered SEQID NO:s from of arginine for lysine, for aspartic acid or SEQ ID NO:1 through SEQ ID NO:26,897 (the exemplary glutamine for asparagine). One or more amino acids can be polynucleotides of the invention). The invention also pro deleted, for example, from a polypeptide, resulting in modi vides isolated and recombinant polypeptides, SEQID NO:2. fication of the structure of the polypeptide, without signifi SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID NO:10, cantly altering its biological activity. For example, amino- or etc., and all polypeptides disclosed in the SEQ ID listing, carboxyl-terminal amino acids that are not required for a which include all even numbered SEQID NO:s from SEQID polypeptide, enzyme, protein, e.g. structural or binding pro NO:2 through SEQ ID NO:26,898 (the exemplary polypep tein, biological activity can be removed. Modified polypep tides of the invention). tide sequences of the invention can be assayed for enzyme, 0241. In another embodiment, the polypeptides of the structural or binding activity by any number of methods, invention can be expressed in any expression system, in vitro including contacting the modified polypeptide sequence with or in vivo, e.g., any microorganism or other cell system (e.g., a Substrate and determining whether the modified polypep eukaryotic, such as yeast or mammalian cells) using proce tide decreases the amount of specific Substrate in the assay or dures known in the art. In other aspects, the polypeptides of increases the bioproducts of the reaction of a functional the invention can be immobilized on a Solid Support prior to polypeptide, enzyme, protein, e.g. structural or binding pro use in the methods of the invention. Methods for immobiliz tein, with the substrate. Assays for enzyme activity are well ing enzymes on Solid Supports are commonly known in the known in the art. art, for example J. Mol. Cat. B: Enzymatic 6 (1999) 29-39; 0246 “Fragments’ as used herein are a portion of a natu Chivata et al. Biocatalysis: Immobilized cells and enzymes, J. rally occurring protein which can existinat least two different Mol. Cat. 37 (1986) 1-24: Sharma et al., Immobilized Bio conformations. Fragments can have the same or substantially materials Techniques and Applications, Angew. Chem. Int. the same amino acid sequence as the naturally occurring Ed. Engl. 21 (1982) 837-54: Laskin (Ed.), Enzymes and protein. Fragments which have different three dimensional Immobilized Cells in Biotechnology. structures as the naturally occurring protein are also included. An example of this, is a "pro-form molecule, such as a low DEFINITIONS activity proprotein that can be modified by cleavage to pro 0242. A "coding sequence of or a “sequence encodes' a duce a mature enzyme with significantly higher activity. particular polypeptide or protein, is a nucleic acid sequence 0247 The term “variant” refers to polynucleotides or which is transcribed and translated into a polypeptide or polypeptides of the invention modified at one or more base protein when placed under the control of appropriate regula pairs, codons, introns, exons, or amino acid residues (respec tory sequences. tively) yet still retain the biological activity of a polypeptide, 0243 A promoter sequence is “operably linked to a cod enzyme, protein, e.g. structural or binding protein, of the ing sequence when RNA polymerase which initiates tran invention. Variants can be produced by any number of means Scription at the promoter will transcribe the coding sequence included methods such as, for example, error-prone PCR, into mRNA. shuffling, oligonucleotide-directed mutagenesis, assembly 0244. The phrase “substantially identical in the context PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette of two nucleic acids or polypeptides, refers to two or more mutagenesis, recursive ensemble mutagenesis, exponential sequences that have, e.g., at least about 50%, 51%, 52%. 53%, ensemble mutagenesis, site-specific mutagenesis, gene reas 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, sembly, GSSM and any combination thereof. 64%. 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 0248. The term “saturation mutagenesis”. Gene Site Satu 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, ration Mutagenesis, or “GSSM includes a method that uses 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, degenerate oligonucleotide primers to introduce point muta 94%, 95%, 96%, 97%, 98%, 99%, or more nucleotide or tions into a polynucleotide, as described in detail, below. amino acid residue (sequence) identity, when compared and 0249. The term “optimized directed evolution system” or aligned for maximum correspondence, as measured using “optimized directed evolution' includes a method for reas one of the known sequence comparison algorithms or by sembling fragments of related nucleic acid sequences, e.g., visual inspection. In alternative aspects, the Substantial iden related genes, and explained in detail, below. tity exists over a region of at least about 100 or more residues (0250. The term “synthetic ligation reassembly” or “SLR' and most commonly the sequences are substantially identical includes a method of ligating oligonucleotide fragments in a over at least about 150 to 200 or more residues. In some non-stochastic fashion, and explained in detail, below. aspects, the sequences are Substantially identical over the 0251 Nucleic Acids entire length of the coding regions. 0252. The invention provides nucleic acids (e.g., the 0245 Additionally a “substantially identical amino acid exemplary SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ sequence is a sequence that differs from a reference sequence ID NO:7, SEQ ID NO:9, SEQ ID NO:11, etc., including all by one or more conservative or non-conservative amino acid nucleic acids disclosed in the SEQID listing, which include Substitutions, deletions, or insertions. In one aspect, the Sub all odd numbered SEQID NO:s from SEQID NO:1 through US 2012/0266329 A1 Oct. 18, 2012 60

SEQID NO:26,897), including expression cassettes such as (leader and trailer) as well as, where applicable, intervening expression vectors, encoding polypeptides (e.g., enzymes) of sequences (introns) between individual coding segments (ex the invention. The invention also includes methods for dis ons). “Operably linked as used herein refers to a functional covering new polypeptide (e.g., enzyme) sequences using the relationship between two or more nucleic acid (e.g., DNA) nucleic acids of the invention. The invention also includes segments. Typically, it refers to the functional relationship of methods for inhibiting the expression of enzymes, genes, transcriptional regulatory sequence to a transcribed transcripts and polypeptides using the nucleic acids of the sequence. For example, a promoter is operably linked to a invention. Also provided are methods for modifying the coding sequence, such as a nucleic acid of the invention, if it nucleic acids of the invention by, e.g., synthetic ligation reas stimulates or modulates the transcription of the coding sembly, optimized directed evolution system and/or satura sequence in an appropriate host cell or other expression sys tion mutagenesis. tem. Generally, promoter transcriptional regulatory 0253) The nucleic acids of the invention can be made, isolated and/or manipulated by, e.g., cloning and expression sequences that are operably linked to a transcribed sequence of cDNA libraries, amplification of message or genomic are physically contiguous to the transcribed sequence, i.e., DNA by PCR, and the like. For example, exemplary they are cis-acting. However, Some transcriptional regulatory sequences of the invention were initially derived from envi sequences. Such as enhancers, need not be physically contigu ronmental sources. ous or located in close proximity to the coding sequences 0254. In one aspect, the invention provides nucleic acids, whose transcription they enhance. and the polypeptides encoded by them, with a common nov 0258 As used herein, the term “promoter” includes all elty in that they are derived from a common Source, e.g., an sequences capable of driving transcription of a coding environmental or a bacterial Source. sequence in a cell, e.g., a plant cell. Thus, promoters used in 0255. In practicing the methods of the invention, homolo the constructs of the invention include cis-acting transcrip gous genes can be modified by manipulating a template tional control elements and regulatory sequences that are nucleic acid, as described herein. The invention can be prac involved in regulating or modulating the timing and/or rate of ticed in conjunction with any method or protocol or device transcription of a gene. For example, a promoter can be a known in theart, which are well described in the scientific and cis-acting transcriptional control element, including an patent literature. enhancer, a promoter, a transcription terminator, an origin of 0256 The phrases “nucleic acid' or “nucleic acid replication, a chromosomal integration sequence, 5' and 3' sequence' as used herein refer to an oligonucleotide, nucle untranslated regions, or an intronic sequence, which are otide, polynucleotide, or to a fragment of any of these, to involved in transcriptional regulation. These cis-acting DNA or RNA of genomic or synthetic origin which may be sequences typically interact with proteins or other biomol single-stranded or double-stranded and may represent a sense ecules to carry out (turn on/off, regulate, modulate, etc.) tran or antisense (complementary) Strand, to peptide nucleic acid scription. “Constitutive' promoters are those that drive (PNA), or to any DNA-like or RNA-like material, natural or expression continuously under most environmental condi synthetic in origin. The phrases “nucleic acid' or “nucleic tions and states of development or cell differentiation. acid sequence' includes oligonucleotide, nucleotide, poly “Inducible' or “regulatable' promoters direct expression of nucleotide, or to a fragment of any of these, to DNA or RNA the nucleic acid of the invention under the influence of envi (e.g., mRNA, rRNA, tRNA, iRNA) of genomic or synthetic ronmental conditions or developmental conditions. origin which may be single-stranded or double-stranded and Examples of environmental conditions that may affect tran may represent a sense or antisense strand, to peptide nucleic Scription by inducible promoters include anaerobic condi acid (PNA), or to any DNA-like or RNA-like material, natural tions, elevated temperature, drought, or the presence of light. or synthetic in origin, including, e.g., iRNA, ribonucleopro 0259 “Plasmids’ can be commercially available, publicly teins (e.g., e.g., double stranded iRNAs, e.g., iRNPs). The available on an unrestricted basis, or can be constructed from term encompasses nucleic acids, i.e., oligonucleotides, con available plasmids in accord with published procedures. taining known analogues of natural nucleotides. The term Equivalent plasmids to those described herein are known in also encompasses nucleic-acid-like structures with synthetic the art and will be apparent to the ordinarily skilled artisan. backbones, see e.g., Mata (1997) Toxicol. Appl. Pharmacol. 0260. In one aspect, the term “recombinant’ means that 144:189-197; Strauss-Soukup (1997) Biochemistry 36:8692 the nucleic acid is adjacent to a “backbone' nucleic acid to 8698; Samstag (1996) Antisense Nucleic Acid Drug Dev which it is not adjacent in its natural environment. Addition 6:153-156. “Oligonucleotide' includes either a single ally, to be “enriched the nucleic acids will represent 5% or Stranded poly deoxynucleotide or two complementary more of the number of nucleic acid inserts in a population of poly deoxynucleotide strands which may be chemically syn nucleic acid backbone molecules. Backbone molecules thesized. Such synthetic oligonucleotides have no 5' phos according to the invention include nucleic acids such as phate and thus will not ligate to another oligonucleotide with expression vectors, self-replicating nucleic acids, viruses, out adding a phosphate with an ATP in the presence of a integrating nucleic acids and other vectors or nucleic acids kinase. A synthetic oligonucleotide can ligate to a fragment used to maintain or manipulate a nucleic acid insert of inter that has not been dephosphorylated. est. Typically, the enriched nucleic acids represent 15% or 0257. A "coding sequence of or a “nucleotide sequence more of the number of nucleic acid inserts in the population of encoding a particular polypeptide or protein, is a nucleic recombinant backbone molecules. More typically, the acid sequence which is transcribed and translated into a enriched nucleic acids represent 50% or more of the number polypeptide or protein when placed under the control of of nucleic acid inserts in the population of recombinant back appropriate regulatory sequences. The term “gene' means the bone molecules. In a one aspect, the enriched nucleic acids segment of DNA involved in producing a polypeptide chain; represent 90% or more of the number of nucleic acid inserts in it includes regions preceding and following the coding region the population of recombinant backbone molecules. US 2012/0266329 A1 Oct. 18, 2012

0261) One aspect of the invention is an isolated nucleic 0266 General Techniques acid comprising one of the sequences of the invention, or a 0267. The nucleic acids used to practice this invention, fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, whether RNA, iRNA, antisense nucleic acid, cDNA, genomic 100, 150, 200, 300, 400, or 500 or more consecutive bases of DNA, vectors, viruses or hybrids thereof, may be isolated a nucleic acid of the invention. The isolated, nucleic acids from a variety of sources, genetically engineered, amplified, may comprise DNA, including cDNA, genomic DNA and and/or expressed/generated recombinantly. Recombinant synthetic DNA. The DNA may be double-stranded or single polypeptides generated from these nucleic acids can be indi Stranded and if single stranded may be the coding Strand or vidually isolated or cloned and tested for a desired activity. non-coding (anti-sense) strand. Alternatively, the isolated Any recombinant expression system can be used, including nucleic acids may comprise RNA. bacterial, mammalian, yeast, insect or plant cell expression 0262 The isolated nucleic acids of the invention may be systems. used to prepare one of the polypeptides of the invention, or 0268 Alternatively, these nucleic acids can be synthesized fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50. in vitro by well-known chemical synthesis techniques, as 75, 100, or 150 or more consecutive amino acids of one of the described in, e.g., Adams (1983).J. Am. Chem. Soc. 105:661; polypeptides of the invention. Accordingly, another aspect of Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel the invention is an isolated nucleic acid which encodes one of (1995) Free Radic. Biol. Med. 19:373-380: Blommers (1994) the polypeptides of the invention, or fragments comprising at Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage consecutive amino acids of one of the polypeptides of the (1981) Tetra. Lett. 22:1859; U.S. Pat. No. 4,458,066. invention. The coding sequences of these nucleic acids may 0269. Techniques for the manipulation of nucleic acids, be identical to one of the coding sequences of one of the Such as, e.g., Subcloning, labeling probes (e.g., random nucleic acids of the invention or may be different coding primer labeling using Klenow polymerase, nick translation, sequences which encode one of the of the invention having at amplification), sequencing, hybridization and the like are least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more consecutive amino acids of one of the polypeptides of the well described in the Scientific and patent literature, see, e.g., invention, as a result of the redundancy or degeneracy of the Sambrook, ed., MOLECULAR CLONING: A LABORA genetic code. The genetic code is well known to those of skill TORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor in the art and can be obtained, e.g., on page 214 of B. Lewin, Laboratory, (1989); CURRENT PROTOCOLS 1N Genes VI, Oxford University Press, 1997. MOLECULARBIOLOGY, Ausubel, ed. John Wiley & Sons, 0263. The isolated nucleic acid which encodes one of the Inc., New York (1997); LABORATORY TECHNIQUES IN polypeptides of the invention, but is not limited to: only the BIOCHEMISTRY AND MOLECULAR BIOLOGY: coding sequence of a nucleic acid of the invention and addi HYBRIDIZATION WITH NUCLEIC ACID PROBES, Part tional coding sequences, such as leader sequences or propro I. Theory and Nucleic Acid Preparation, Tijssen, ed. Elsevier, tein sequences and non-coding sequences, such as introns or N.Y. (1993). non-coding sequences 5' and/or 3' of the coding sequence. 0270. Another useful means of obtaining and manipulat Thus, as used herein, the term “polynucleotide encoding a ing nucleic acids used to practice the methods of the invention polypeptide' encompasses a polynucleotide which includes is to clone from genomic samples, and, if desired, Screen and only the coding sequence for the polypeptide as well as a re-clone inserts isolated or amplified from, e.g., genomic polynucleotide which includes additional coding and/or non clones or cDNA clones. Sources of nucleic acid used in the coding sequence. methods of the invention include genomic or cDNA libraries 0264. Alternatively, the nucleic acid sequences of the contained in, e.g., mammalian artificial chromosomes invention, may be mutagenized using conventional tech (MACs), see, e.g., U.S. Pat. Nos. 5,721,118; 6,025,155: niques, such as site directed mutagenesis, or other techniques human artificial chromosomes, see, e.g., Rosenfeld (1997) familiar to those skilled in the art, to introduce silent changes Nat. Genet. 15:333-335; yeast artificial chromosomes (YAC); into the polynucleotides o of the invention. As used herein, bacterial artificial chromosomes (BAC); P1 artificial chromo 'silent changes' include, for example, changes which do not somes, see, e.g., Woon (1998) Genomics 50:306-316; P1-de alter the amino acid sequence encoded by the polynucleotide. rived vectors (PACs), see, e.g., Kern (1997) Biotechniques Such changes may be desirable in order to increase the level 23:120-124; cosmids, recombinant viruses, phages or plas of the polypeptide produced by host cells containing a vector mids. encoding the polypeptide by introducing codons or codon 0271 In one aspect, a nucleic acid encoding a polypeptide pairs which occur frequently in the host organism. of the invention is assembled in appropriate phase with a 0265. The invention also relates to polynucleotides which leader sequence capable of directing secretion of the trans have nucleotide changes which result in amino acid substitu lated polypeptide or fragment thereof. tions, additions, deletions, fusions and truncations in the 0272. The invention provides fusion proteins and nucleic polypeptides of the invention. Such nucleotide changes may acids encoding them. A polypeptide of the invention can be beintroduced using techniques such as site directed mutagen fused to a heterologous peptide or polypeptide. Such as N-ter esis, random chemical mutagenesis, exonuclease III deletion minal identification peptides which impart desired character and other recombinant DNA techniques. Alternatively, such istics, such as increased stability or simplified purification. nucleotide changes may be naturally occurring allelic Vari Peptides and polypeptides of the invention can also be Syn ants which are isolated by identifying nucleic acids which thesized and expressed as fusion proteins with one or more specifically hybridize to probes comprising at least 10, 15, 20, additional domains linked thereto for, e.g., producing a more 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 con immunogenic peptide, to more readily isolate a recombi secutive bases of one of the sequences of the invention (or the nantly synthesized peptide, to identify and isolate antibodies sequences complementary thereto) under conditions of high, and antibody-expressing B cells, and the like. Detection and moderate, or low Stringency as provided herein. purification facilitating domains include, e.g., metal chelating US 2012/0266329 A1 Oct. 18, 2012 62 peptides such as polyhistidine tracts and histidine-tryptophan tide, enzyme, protein, e.g. structural or binding protein, of the modules that allow purification on immobilized metals, pro invention in a tissue-specific manner. The tissue-specificity tein A domains that allow purification on immobilized immu can be seed specific, stem specific, leaf specific, root specific, noglobulin, and the domain utilized in the FLAGS extension/ fruit specific and the like. affinity purification system (Immunex Corp, Seattle Wash.). 0278. The term “expression cassette' as used herein refers The inclusion of a cleavable linker sequences such as Factor to a nucleotide sequence which is capable of affecting expres Xa or enterokinase (Invitrogen, San Diego Calif.) between a sion of a structural gene (i.e., a protein coding sequence. Such purification domain and the motif-comprising peptide or as a polypeptide, enzyme, protein, e.g. structural or binding polypeptide to facilitate purification. For example, an expres protein, of the invention) in a host compatible with such sion vector can include an epitope-encoding nucleic acid sequences. Expression cassettes include at least a promoter sequence linked to six histidine residues followed by a thiore operably linked with the polypeptide coding sequence; and, doxin and an enterokinase cleavage site (see e.g., Williams optionally, with other sequences, e.g., transcription termina (1995) Biochemistry 34:1787-1797; Dobeli (1998) Protein tion signals. Additional factors necessary or helpful in effect Expr. Purif. 12:404-414). The histidine residues facilitate ing expression may also be used, e.g., enhancers, alpha-fac detection and purification while the enterokinase cleavage tors. Thus, expression cassettes also include plasmids, site provides a means for purifying the epitope from the expression vectors, recombinant viruses, any form of recom remainder of the fusion protein. Technology pertaining to binant “naked DNA vector, and the like. A “vector” com vectors encoding fusion proteins and application of fusion prises a nucleic acid which can infect, transfect, transiently or proteins are well described in the scientific and patent litera permanently transduce a cell. It will be recognized that a ture, see e.g., Kroll (1993) DNA Cell. Biol. 12:441-53. vector can be a naked nucleic acid, or a nucleic acid com 0273 Transcriptional and Translational Control plexed with protein or lipid. The vector optionally comprises Sequences viral or bacterial nucleic acids and/or proteins, and/or mem 0274 The invention provides nucleic acid (e.g., DNA) branes (e.g., a cell membrane, a viral lipid envelope, etc.). sequences of the invention operatively linked to expression Vectors include, but are not limited to replicons (e.g., RNA (e.g., transcriptional or translational) control sequence(s), replicons, bacteriophages) to which fragments of DNA may e.g., promoters or enhancers, to direct or modulate RNA be attached and become replicated. Vectors thus include, but synthesis/expression. The expression control sequence can be are not limited to RNA, autonomous self-replicating circular in an expression vector. Exemplary bacterial promoters or linear DNA or RNA (e.g., plasmids, viruses, and the like, include lacI, lac7, T3, T7, gpt, lambda PR, PL and trp. Exem see, e.g., U.S. Pat. No. 5.217.879), and include both the plary eukaryotic promoters include CMV immediate early, expression and non-expression plasmids. Where a recombi HSV thymidine kinase, early and late SV40, LTRs from ret nant microorganism or cell culture is described as hosting an rovirus, and mouse metallothionein I. “expression vector' this includes both extra-chromosomal 0275 Promoters suitable for expressing a polypeptide in circular and linear DNA and DNA that has been incorporated bacteria include the E. coli lac or trp promoters, the lad into the host chromosome(s). Where a vector is being main promoter, the lacz promoter, the T3 promoter, the T7 pro tained by a host cell, the vector may either bestably replicated moter, the gpt promoter, the lambda PR promoter, the lambda by the cells during mitosis as an autonomous structure, or is PL promoter, promoters from operons encoding glycolytic incorporated within the host's genome. enzymes Such as 3-phosphoglycerate kinase (PGK), and the 0279 "Tissue-specific' promoters are transcriptional con acid phosphatase promoter. Eukaryotic promoters include the trol elements that are only active in particular cells or tissues CMV immediate early promoter, the HSV thymidine kinase or organs, e.g., in plants oranimals. Tissue-specific regulation promoter, heat shock promoters, the early and late SV40 may beachieved by certain intrinsic factors which ensure that promoter, LTRs from retroviruses, and the mouse metal genes encoding proteins specific to a given tissue are lothionein-I promoter. Other promoters known to control expressed. Such factors are known to exist in mammals and expression of genes in prokaryotic or eukaryotic cells or their plants so as to allow for specific tissues to develop. viruses may also be used. Promoters Suitable for expressing 0280. The term “plant” includes whole plants, plant parts the polypeptide or fragment thereof in bacteria include the E. (e.g., leaves, stems, flowers, roots, etc.), plant protoplasts, colilac ortrp promoters, the lacI promoter, the lacz promoter, seeds and plant cells and progeny of same. The class of plants the T3 promoter, the T7 promoter, the gpt promoter, the which can be used in the method of the invention is generally lambda P. promoter, the lambda P, promoter, promoters as broad as the class of higher plants amenable to transfor from operons encoding glycolytic enzymes Such as 3-phos mation techniques, including angiosperms (monocotyledon phoglycerate kinase (PGK) and the acid phosphatase pro ous and dicotyledonous plants), as well as gymnosperms. It moter. Fungal promoters include the C-factor promoter. includes plants of a variety of ploidy levels, including poly Eukaryotic promoters include the CMV immediate early pro ploid, diploid, haploid and hemizygous states. As used herein, moter, the HSV thymidine kinase promoter, heat shock pro the term “transgenic plant' includes plants or plant cells into moters, the early and late SV40 promoter, LTRs from retro which a heterologous nucleic acid sequence has been viruses and the mouse metallothionein-I promoter. Other inserted, e.g., the nucleic acids and various recombinant con promoters known to control expression of genes in prokary structs (e.g., expression cassettes) of the invention. otic or eukaryotic cells or their viruses may also be used. 0281. In one aspect, a constitutive promoter such as the (0276 Tissue-Specific Promoters CaMV 35S promoter can be used for expression in specific 0277. The invention provides expression cassettes that can parts of the plant or seed orthroughout the plant. For example, be expressed in a tissue-specific manner, e.g., that can express for overexpression, a plant promoter fragment can be a polypeptide, enzyme, protein, e.g. structural or binding employed which will direct expression of a nucleic acid in protein, of the invention in a tissue-specific manner. The Some or all tissues of a plant, e.g., a regenerated plant. Such invention also provides plants or seeds that express a polypep promoters are referred to herein as “constitutive' promoters US 2012/0266329 A1 Oct. 18, 2012

and are active under most environmental conditions and states of root-specific promoters include the promoter from the of development or cell differentiation. Examples of constitu alcoholdehydrogenase gene (DeLisle (1990) Int. Rev. Cytol. tive promoters include the cauliflower mosaic virus (CaMV) 123:39-60). Other promoters that can be used to express the 35S transcription initiation region, the 1'- or 2'-promoter nucleic acids of the invention include, e.g., ovule-specific, derived from T-DNA of Agrobacterium tumefaciens, and embryo-specific, endosperm-specific, integument-specific, other transcription initiation regions from various plant genes seed coat-specific promoters, or some combination thereof a known to those of skill. Such genes include, e.g., ACT11 from leaf-specific promoter (see, e.g., Busk (1997) Plant J. 11:1285 Arabidopsis (Huang (1996) Plant Mol. Biol. 33:125-139); 1295, describing a leaf-specific promoter in maize); the Cat3 from Arabidopsis (GenBank No. U43147, Zhong ORF13 promoter from Agrobacterium rhizogenes (which (1996) Mol. Gen. Genet. 251:196-203); the gene encoding Stearoyl-acyl carrier protein desaturase from Brassica napus exhibits high activity in roots, see, e.g., Hansen (1997) Supra); (Genbank No. X74782, Solocombe (1994) Plant Physiol. a maize pollen specific promoter (see, e.g., Guerrero (1990) 104:1167-1176); GPc1 from maize (GenBank No. X15596: Mol. Gen. Genet. 224:161 168); a tomato promoter active Martinez (1989).J. Mol. Biol. 208:551-565); the Gpc2 from during fruit ripening, senescence and abscission of leaves maize (GenBank No. U45855, Manjunath (1997) Plant Mol. and, to a lesser extent, offlowers can be used (see, e.g., Blume Biol. 33:97-112); plant promoters described in U.S. Pat. Nos. (1997) Plant J. 12:731 746); a pistil-specific promoter from 4,962,028; 5,633,440. the potato SK2 gene (see, e.g., Ficker (1997) Plant Mol. Biol. 0282. The invention uses tissue-specific or constitutive 35:425 431); the Blec4 gene from pea, which is active in promoters derived from viruses which can include, e.g., the epidermal tissue of vegetative and floral shoot apices of trans tobamovirus subgenomic promoter (Kumagai (1995) Proc. genic alfalfa making it a useful tool to target the expression of Natl. Acad. Sci. USA 92:1679-1683; the rice tungro bacilli foreign genes to the epidermal layer of actively growing form virus (RTBV), which replicates only in phloem cells in shoots or fibers; the ovule-specific BEL1 gene (see, e.g., infected rice plants, with its promoter which drives strong Reiser (1995) Cell 83:735-742, GenBank No. U39944); and/ phloem-specific reporter gene expression; the cassava vein or, the promoter in Klee, U.S. Pat. No. 5,589,583, describing mosaic virus (CVMV) promoter, with highest activity invas a plant promoter region is capable of conferring high levels of cular elements, in leaf mesophyll cells, and in root tips (Ver transcription in meristematic tissue and/or rapidly dividing daguer (1996) Plant Mol. Biol. 31:1129-1139). cells. 0283 Alternatively, the plant promoter may direct expres 0285 Alternatively, plant promoters which are inducible Sion of a polypeptide, enzyme, protein, e.g. structural or upon exposure to plant hormones, such as auxins, are used to binding protein-expressing nucleic acid in a specific tissue, express the nucleic acids of the invention. For example, the organ or cell type (i.e. tissue-specific promoters) or may be invention can use the auxin-response elements E1 promoter otherwise under more precise environmental or developmen fragment (AuXREs) in the Soybean (Glycine max L.) (Liu tal control or under the control of an inducible promoter. (1997) Plant Physiol. 115:397–407); the auxin-responsive Examples of environmental conditions that may affect tran Arabidopsis GST6 promoter (also responsive to salicylic acid Scription include anaerobic conditions, elevated temperature, and hydrogen peroxide) (Chen (1996) Plant J. 10: 955-966); the presence of light, or sprayed with chemicals/hormones. the auxin-inducible parC promoter from tobacco (Sakai For example, the invention incorporates the drought-induc (1996) 37:906-913); a plant biotin response element (Streit ible promoter of maize (Busk (1997) supra); the cold, (1997) Mol. Plant Microbe Interact. 10:933-937); and, the drought, and high salt inducible promoter from potato (Kirch promoter responsive to the stress hormone abscisic acid (1997) Plant Mol. Biol. 33:897-909). (Sheen (1996) Science 274: 1900-1902). 0284 Tissue-specific promoters can promote transcrip 0286 The nucleic acids of the invention can also be oper tion only within a certain time frame of developmental stage ably linked to plant promoters which are inducible upon within that tissue. See, e.g., Blazquez (1998) Plant Cell exposure to chemicals reagents which can be applied to the 10:791-800, characterizing the Arabidopsis LEAFY gene plant, Such as herbicides or antibiotics. For example, the promoter. See also Cardon (1997) Plant J 12:367-77, describ maize In2-2 promoter, activated by benzenesulfonamide her ing the transcription factor SPL3, which recognizes a con bicide safeners, can be used (De Veylder (1997) Plant Cell served sequence motif in the promoter region of the A. Physiol. 38:568-577); application of different herbicide thaliana floral meristem identity gene AP1; and Mandel safeners induces distinct gene expression patterns, including (1995) Plant Molecular Biology, Vol. 29, pp. 995-1004, expression in the root, hydathodes, and the shoot apical mer describing the meristem promoter eIF4. Tissue specific pro istem. Coding sequence can be under the control of, e.g., a moters which are active throughout the life cycle of a particu tetracycline-inducible promoter, e.g., as described with trans lar tissue can be used. In one aspect, the nucleic acids of the genic tobacco plants containing the Avena sativa L. (oat) invention are operably linked to a promoter active primarily arginine decarboxylase gene (Masgrau (1997) Plant J. only in cotton fiber cells. In one aspect, the nucleic acids of 11:465-473); or, a salicylic acid-responsive element (Stange the invention are operably linked to a promoter active prima (1997) Plant J. 11:1315-1324). Using chemically- (e.g., hor rily during the stages of cotton fiber cell elongation, e.g., as mone- or pesticide-) induced promoters, i.e., promoter described by Rinehart (1996) supra. The nucleic acids can be responsive to a chemical which can be applied to the trans operably linked to the Fbl2A gene promoter to be preferen genic plant in the field, expression of a polypeptide of the tially expressed in cotton fiber cells (Ibid). See also, John invention can be induced at a particular stage of development (1997) Proc. Natl. Acad. Sci. USA89:5769-5773: John, et al., of the plant. Thus, the invention also provides for transgenic U.S. Pat. Nos. 5,608,148 and 5,602,321, describing cotton plants containing an inducible gene encoding for polypep fiber-specific promoters and methods for the construction of tides of the invention whose host range is limited to target transgenic cotton plants. Root-specific promoters may also be plant species, such as corn, rice, barley, wheat, potato or other used to express the nucleic acids of the invention. Examples crops, inducible at any stage of development of the crop. US 2012/0266329 A1 Oct. 18, 2012 64

0287. One of skill will recognize that a tissue-specific vectors can comprise an origin of replication, any necessary plant promoter may drive expression of operably linked ribosome binding sites, a polyadenylation site, splice donor sequences in tissues other than the target tissue. Thus, a and acceptor sites, transcriptional termination sequences, and tissue-specific promoter is one that drives expression prefer 5' flanking non-transcribed sequences. In some aspects, DNA entially in the target tissue or cell type, but may also lead to sequences derived from the SV40 splice and polyadenylation Some expression in other tissues as well. sites may be used to provide the required non-transcribed 0288 The nucleic acids of the invention can also be oper genetic elements. ably linked to plant promoters which are inducible upon 0293. In one aspect, the expression vectors contain one or exposure to chemicals reagents. These reagents include, e.g., more selectable marker genes to permit selection of host cells herbicides, synthetic auxins, or antibiotics which can be containing the vector. Such selectable markers include genes applied, e.g., sprayed, onto transgenic plants. Inducible encoding dihydrofolate reductase or genes conferring neo expression of the polypeptide, enzyme, protein, e.g. structural mycin resistance for eukaryotic cell culture, genes conferring or binding protein-producing nucleic acids of the invention tetracycline or amplicillin resistance in E. coli, and the S. will allow the grower to select plants with the optimal cerevisiae TRP1 gene. Promoter regions can be selected from polypeptide, enzyme, protein, e.g. structural or binding pro any desired gene using chloramphenicol transferase (CAT) tein, expression and/or activity. The development of plant vectors or other vectors with selectable markers. parts can thus controlled. In this way the invention provides 0294 Vectors for expressing the polypeptide or fragment the means to facilitate the harvesting of plants and plant parts. thereof in eukaryotic cells can also contain enhancers to For example, in various embodiments, the maize In2-2 pro increase expression levels. Enhancers are cis-acting elements moter, activated by benzenesulfonamide herbicide safeners, of DNA that can be from about 10 to about 300 bp in length. is used (De Veylder (1997) Plant Cell Physiol. 38:568-577): They can act on a promoter to increase its transcription. application of different herbicide safeners induces distinct Exemplary enhancers include the SV40 enhancer on the late gene expression patterns, including expression in the root, side of the replication origin by 100 to 270, the cytomega hydathodes, and the shoot apical meristem. Coding lovirus early promoter enhancer, the polyoma enhancer on the sequences of the invention are also under the control of a late side of the replication origin, and the adenovirus enhanc tetracycline-inducible promoter, e.g., as described with trans CS genic tobacco plants containing the Avena sativa L. (oat) 0295) A nucleic acid sequence can be inserted into a vector arginine decarboxylase gene (Masgrau (1997) Plant J. by a variety of procedures. In general, the sequence is ligated 11:465-473); or, a salicylic acid-responsive element (Stange to the desired position in the vector following digestion of the (1997) Plant J. 11:1315-1324). insert and the vector with appropriate restriction endonu 0289. In some aspects, properpolypeptide expression may cleases. Alternatively, blunt ends in both the insert and the require polyadenylation region at the 3'-end of the coding vector may be ligated. A variety of cloning techniques are region. The polyadenylation region can be derived from the known in the art, e.g., as described in Ausubel and Sambrook. natural gene, from a variety of other plant (or animal or other) Such procedures and others are deemed to be within the scope genes, or from genes in the Agrobacterial T-DNA. of those skilled in the art. 0290 Expression Vectors and Cloning Vehicles 0296. The vector can be in the form of a plasmid, a viral 0291. The invention provides expression vectors and clon particle, or a phage. Other vectors include chromosomal, ing vehicles comprising nucleic acids of the invention, e.g., non-chromosomal and synthetic DNA sequences, derivatives sequences encoding the polypeptide, enzyme, protein, e.g. of SV40; bacterial plasmids, phage DNA, baculovirus, yeast structural or binding proteins of the invention. Expression plasmids, vectors derived from combinations of plasmids and vectors and cloning vehicles of the invention can comprise phage DNA, viral DNA such as vaccinia, adenovirus, fowl viral particles, baculovirus, phage, plasmids, phagemids, pox virus, and pseudorabies. A variety of cloning and expres cosmids, fosmids, bacterial artificial chromosomes, viral sion vectors for use with prokaryotic and eukaryotic hosts are DNA (e.g., vaccinia, adenovirus, foulpox virus, pseudorabies described by, e.g., Sambrook. and derivatives of SV40), P1-based artificial chromosomes, 0297 Particular bacterial vectors which can be used yeast plasmids, yeast artificial chromosomes, and any other include the commercially available plasmids comprising vectors specific for specific hosts of interest (such as bacillus, genetic elements of the well known cloning vector pBR322 Aspergillus and yeast). Vectors of the invention can include (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, chromosomal, non-chromosomal and synthetic DNA Uppsala, Sweden), GEM1 (Promega Biotec, Madison, Wis., sequences. Large numbers of Suitable vectors are known to USA) pOE70, pGE60, pGE-9 (Qiagen), pID10, psiX174 those of skill in the art, and are commercially available. pBLUESCRIPT II KS, pNH8A, pNH16a, pNH18A, Exemplary vectors are include: bacterial: pCE vectors pNH46A (Stratagene), ptrc99a, pKK223-3, pKK233-3, (Qiagen), pFBLUESCRIPT plasmids, pNH vectors, (lambda DR540, pRIT5 (Pharmacia), pKK232-8 and pCM7. Particu ZAP vectors (Stratagene); ptrc99a, pKK223-3, plR540, lar eukaryotic vectors include pSV2CAT, pOG44, pXT1, pRIT2T (Pharmacia); Eukaryotic: pXT1, pSG5 (Stratagene), pSG (Stratagene) pSVK3, pBPV, pMSG, and pSVL (Phar pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia). However, macia). However, any other vector may be used as long as it is any other plasmid or other vector may be used so long as they replicable and viable in the host cell. are replicable and viable in the host. Low copy number or 0298. The nucleic acids of the invention can be expressed high copy number vectors may be employed with the present in expression cassettes, vectors or viruses and transiently or invention. stably expressed in plant cells and seeds. One exemplary 0292. The expression vector can comprise a promoter, a transient expression system uses episomal expression sys ribosome binding site for translation initiation and a tran tems, e.g., cauliflower mosaic virus (CaMV) viral RNA gen Scription terminator. The vector may also include appropriate erated in the nucleus by transcription of an episomal mini sequences for amplifying expression. Mammalian expression chromosome containing Supercoiled DNA, see, e.g., Covey US 2012/0266329 A1 Oct. 18, 2012

(1990) Proc. Natl. Acad. Sci. USA 87: 1633-1637. Alterna dihydrofolate reductase or neomycin resistance for eukary tively, coding sequences, i.e., all or Sub-fragments of otic cell culture, or Such as tetracycline or amplicillin resis sequences of the invention can be inserted into a plant host tance in E. coli. cell genome becoming an integral part of the host chromo 0303 Mammalian expression vectors may also comprise Somal DNA. Sense or antisense transcripts can be expressed in this manner. A vector comprising the sequences (e.g., pro an origin of replication, any necessary ribosome binding moters or coding regions) from nucleic acids of the invention sites, a polyadenylation site, splice donor and acceptor sites, can comprise a marker gene that confers a selectable pheno transcriptional termination sequences and 5' flanking non type on a plant cell or a seed. For example, the marker may transcribed sequences. In some aspects, DNA sequences encode biocide resistance, particularly antibiotic resistance, derived from the SV40 splice and polyadenylation sites may Such as resistance to kanamycin, G418, bleomycin, hygro be used to provide the required nontranscribed genetic ele mycin, or herbicide resistance, such as resistance to chloro mentS. sulfuron or Basta. 0304 Vectors for expressing the polypeptide or fragment 0299| Expression vectors capable of expressing nucleic thereof in eukaryotic cells may also contain enhancers to acids and proteins in plants are well known in the art, and can increase expression levels. Enhancers are cis-acting elements include, e.g., vectors from Agrobacterium spp., potato virus of DNA, usually from about 10 to about 300 bp in length that X (see, e.g., Angell (1997) EMBO.J. 16:3675-3684), tobacco act on a promoter to increase its transcription. Examples mosaic virus (see, e.g., Casper (1996) Gene 173:69-73), include the SV40 enhancer on the late side of the replication tomato bushy stunt virus (see, e.g., Hillman (1989) Virology origin by 100 to 270, the cytomegalovirus early promoter 169:42-50), tobacco etch virus (see, e.g., Dolja (1997) Virol enhancer, the polyoma enhancer on the late side of the repli ogy 234:243-252), bean golden mosaic virus (see, e.g., Mori cation origin and the adenovirus enhancers. (1993) Microbiol Immunol. 37:471-476), cauliflower mosaic virus (see, e.g., Cecchini (1997) Mol. Plant Microbe 0305. In addition, the expression vectors typically contain Interact. 10:1094-1101), maize Ac/Ds transposable element one or more selectable marker genes to permit selection of (see, e.g., Rubin (1997) Mol. Cell. Biol. 17:6294-6302: host cells containing the vector. Such selectable markers Kunze (1996) Curr. Top. Microbiol. Immunol. 204:161-194), include genes encoding dihydrofolate reductase or genes con and the maize Suppressor-mutator (Spm) transposable ele ferring neomycin resistance for eukaryotic cell culture, genes ment (see, e.g., Schlappi (1996) Plant Mol. Biol. 32:717 conferring tetracycline or amplicillin resistance in E. coli and 725); and derivatives thereof. the S. cerevisiae TRP1 gene. 0300. In one aspect, the expression vector can have two (0306 In some aspects, the nucleic acid encoding one of replication systems to allow it to be maintained in two organ the polypeptides of the invention, or fragments comprising at isms, for example in mammalian or insect cells for expression least about 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 and in a prokaryotic host for cloning and amplification. Fur consecutive amino acids thereof is assembled in appropriate thermore, for integrating expression vectors, the expression phase with a leader sequence capable of directing secretion of vector can contain at least one sequence homologous to the the translated polypeptide or fragment thereof. Optionally, host cell genome. It can contain two homologous sequences the nucleic acid can encode a fusion polypeptide in which one which flank the expression construct. The integrating vector of the polypeptides of the invention, or fragments comprising can be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 vector. Constructs for integrating vectors are well known in consecutive amino acids thereof is fused to heterologous pep the art. tides or polypeptides, such as N-terminal identification pep 0301 Expression vectors of the invention may also tides which impart desired characteristics, such as increased include a selectable marker gene to allow for the selection of stability or simplified purification. bacterial strains that have been transformed, e.g., genes which 0307 The appropriate DNA sequence may be inserted render the bacteria resistant to drugs such as ampicillin, into the vector by a variety of procedures. In general, the DNA chloramphenicol, erythromycin, kanamycin, neomycin and sequence is ligated to the desired position in the vector fol tetracycline. Selectable markers can also include biosynthetic lowing digestion of the insert and the vector with appropriate genes, such as those in the histidine, tryptophan and leucine restriction . Alternatively, blunt ends in both biosynthetic pathways. the insert and the vector may be ligated. A variety of cloning 0302) The DNA sequence in the expression vector is techniques are disclosed in Ausubeletal. Current Protocols in operatively linked to an appropriate expression control Molecular Biology, John Wiley 503 Sons, Inc. 1997 and sequence(s) (promoter) to direct RNA synthesis. Particular Sambrook et al., Molecular Cloning: A Laboratory Manual named bacterial promoters include lacI, lacZ, T3, T7, gpt, 2nd Ed., Cold Spring Harbor Laboratory Press (1989. Such lambda P, P and trp. Eukaryotic promoters include CMV procedures and others are deemed to be within the scope of immediate early, HSV thymidine kinase, early and late SV40. those skilled in the art. LTRs from retrovirus and mouse metallothionein-I. Selection 0308 The vector may be, for example, in the form of a of the appropriate vector and promoter is well within the level plasmid, a viral particle, or a phage. Other vectors include ofordinary skill in the art. The expression vector also contains chromosomal, nonchromosomal and synthetic DNA a ribosome binding site for translation initiation and a tran sequences, derivatives of SV40; bacterial plasmids, phage Scription terminator. The vector may also include appropriate DNA, baculovirus, yeast plasmids, vectors derived from sequences for amplifying expression. Promoter regions can combinations of plasmids and phage DNA, viral DNA such as be selected from any desired gene using chloramphenicol vaccinia, adenovirus, fowl pox virus and pseudorabies. A transferase (CAT) vectors or other vectors with selectable variety of cloning and expression vectors for use with markers. In addition, the expression vectors in one aspect prokaryotic and eukaryotic hosts are described by Sambrook, contain one or more selectable marker genes to provide a et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., phenotypic trait for selection of transformed host cells Such as Cold Spring Harbor, N.Y., (1989). US 2012/0266329 A1 Oct. 18, 2012 66

0309 Host Cells and Transformed Cells matography, affinity chromatography, hydroxylapatite chro 0310. The invention also provides a transformed cell com matography and lectin chromatography. Protein refolding prising a nucleic acid sequence of the invention, e.g., a steps can be used, as necessary, in completing configuration sequence encoding a polypeptide, enzyme, protein, e.g. struc of the polypeptide. If desired, high performance liquid chro tural or binding protein, of the invention, or a vector of the matography (HPLC) can be employed for final purification invention. The host cell may be any of the host cells familiar steps. to those skilled in the art, including prokaryotic cells, eukary 0315. The constructs in host cells can be used in a conven otic cells, such as bacterial cells, fungal cells, yeast cells, tional manner to produce the gene product encoded by the mammalian cells, insect cells, or plant cells. Exemplary bac recombinant sequence. Depending upon the host employed in terial cells include E. coli, Streptomyces, Bacillus subtilis, a recombinant production procedure, the polypeptides pro Bacillus cereus, Salmonella typhimurium and various species duced by host cells containing the vector may be glycosylated within the genera Streptomyces and Staphylococcus. Exem or may be non-glycosylated. Polypeptides of the invention plary insect cells include Drosophila S2 and Spodoptera Sf9. may or may not also include an initial methionine amino acid Exemplary animal cells include CHO, COS or Bowes mela residue. noma or any mouse or human cell line. The selection of an 0316 Cell-free translation systems can also be employed appropriate host is within the abilities of those skilled in the to produce a polypeptide of the invention. Cell-free transla art. Techniques for transforming a wide variety of higher tion systems can use mRNAs transcribed from a DNA con plant species are well known and described in the technical struct comprising a promoter operably linked to a nucleic acid and scientific literature. See, e.g., Weising (1988) Ann. Rev. encoding the polypeptide or fragment thereof. In some Genet. 22:421-477; U.S. Pat. No. 5,750,870. aspects, the DNA construct may be linearized prior to con 0311. The vector can be introduced into the host cells ducting an in vitro transcription reaction. The transcribed using any of a variety of techniques, including transforma mRNA is then incubated with an appropriate cell-free trans tion, transfection, transduction, viral infection, gene guns, or lation extract, Such as a rabbit reticulocyte extract, to produce Ti-mediated gene transfer. Particular methods include cal the desired polypeptide or fragment thereof. cium phosphate transfection, DEAE-Dextran mediated trans 0317. The expression vectors can contain one or more fection, lipofection, or electroporation (Davis, L., Dibner, M., selectable marker genes to provide a phenotypic trait for Battey, I., Basic Methods in Molecular Biology, (1986)). selection of transformed host cells such as dihydrofolate 0312. In one aspect, the nucleic acids or vectors of the reductase or neomycin resistance for eukaryotic cell culture, invention are introduced into the cells for screening, thus, the or such as tetracycline or ampicillin resistance in E. coli. nucleic acids enter the cells in a manner Suitable for Subse 0318 Host cells containing the polynucleotides of inter quent expression of the nucleic acid. The method of introduc est, e.g., nucleic acids of the invention, can be cultured in tion is largely dictated by the targeted cell type. Exemplary conventional nutrient media modified as appropriate for acti methods include CaPO precipitation, liposome fusion, lipo Vating promoters, selecting transformants or amplifying fection (e.g., LIPOFECTINTM), electroporation, viral infec genes. The culture conditions, such as temperature, pH and tion, etc. The candidate nucleic acids may stably integrate the like, are those previously used with the host cell selected into the genome of the host cell (for example, with retroviral for expression and will be apparent to the ordinarily skilled introduction) or may exist either transiently or stably in the artisan. The clones which are identified as having the speci cytoplasm (i.e. through the use of traditional plasmids, utiliz fied enzyme activity may then be sequenced to identify the ing standard regulatory sequences, selection markers, etc.). polynucleotide sequence encoding an enzyme having the As many pharmaceutically important Screens require human enhanced activity. or model mammalian cell targets, retroviral vectors capable 0319. The invention provides a method for overexpressing of transfecting Such targets can be used. a recombinant polypeptide, enzyme, protein, e.g. structural or 0313 Where appropriate, the engineered host cells can be binding protein, in a cell comprising expressing a vector cultured in conventional nutrient media modified as appro comprising a nucleic acid of the invention, e.g., a nucleic acid priate for activating promoters, selecting transformants or comprising a nucleic acid sequence with at least about 50%, amplifying the genes of the invention. Following transforma 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, tion of a Suitable host strain and growth of the host strain to an 61%. 62%, 63%, 64%. 65%, 66%, 67%, 68%, 69%, 70%, appropriate cell density, the selected promoter may be 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, induced by appropriate means (e.g., temperature shift or 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, chemical induction) and the cells may be cultured for an 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more additional period to allow them to produce the desired sequence identity to an exemplary sequence of the invention polypeptide or fragment thereof. over a region of at least about 100 residues, wherein the 0314 Cells can be harvested by centrifugation, disrupted sequence identities are determined by analysis with a by physical or chemical means, and the resulting crude sequence comparison algorithm or by visual inspection, or, a extract is retained for further purification. Microbial cells nucleic acid that hybridizes under stringent conditions to a employed for expression of proteins can be disrupted by any nucleic acid sequence of the invention. The overexpression convenient method, including freeze-thaw cycling, Sonica can be effected by any means, e.g., use of a high activity tion, mechanical disruption, or use of cell lysing agents. Such promoter, a dicistronic vector or by gene amplification of the methods are well known to those skilled in the art. The Vector. expressed polypeptide or fragment thereof can be recovered 0320. The nucleic acids of the invention can be expressed, and purified from recombinant cell cultures by methods or overexpressed, in any in vitro or in vivo expression system. including ammonium sulfate or ethanol precipitation, acid Any cell culture systems can be employed to express, or extraction, anion or cation exchange chromatography, phos over-express, recombinant protein, including bacterial, phocellulose chromatography, hydrophobic interaction chro insect, yeast, fungal or mammalian cultures. Over-expression US 2012/0266329 A1 Oct. 18, 2012 67 can be effected by appropriate choice of promoters, enhanc 0326. The constructs in host cells can be used in a conven ers, vectors (e.g., use of replicon Vectors, dicistronic vectors tional manner to produce the gene product encoded by the (see, e.g., Gurtu (1996) Biochem. Biophys. Res. Commun. recombinant sequence. Depending upon the host employed in 229:295-8), media, culture systems and the like. In one a recombinant production procedure, the polypeptides pro aspect, gene amplification using selection markers, e.g., duced by host cells containing the vector may be glycosylated (see, e.g., Sanders (1987) Dev. Biol. or may be non-glycosylated. Polypeptides of the invention Stand. 66:55-63), in cell systems are used to overexpress the may or may not also include an initial methionine amino acid polypeptides of the invention. residue. 0327. Alternatively, the polypeptides of the invention, or 0321. The host cell may be any of the host cells familiar to fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50. those skilled in the art, including prokaryotic cells, eukaryotic 75, 100, or 150 or more consecutive amino acids thereof can cells, mammalian cells, insect cells, or plant cells. As repre be synthetically produced by conventional peptide synthesiz sentative examples of appropriate hosts, there may be men ers. In other aspects, fragments or portions of the polypep tioned: bacterial cells, such as E. coli, Streptomyces, Bacillus tides may be employed for producing the corresponding full subtilis, Bacillus cereus, Salmonella typhimurium and Vari length polypeptide by peptide synthesis; therefore, the ous species within the genera Streptomyces and Staphylococ fragments may be employed as intermediates for producing cus, fungal cells, such as yeast, insect cells such as Droso the full-length polypeptides. phila S2 and Spodoptera Sf9, animal cells such as CHO, COS 0328 Cell-free translation systems can also be employed or Bowes melanoma and adenoviruses. The selection of an to produce one of the polypeptides of the invention, or frag appropriate host is within the abilities of those skilled in the ments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, art. 100, or 150 or more consecutive amino acids thereof using 0322 The vector may be introduced into the host cells mRNAs transcribed from a DNA construct comprising a pro using any of a variety of techniques, including transforma moteroperably linked to a nucleic acid encoding the polypep tion, transfection, transduction, viral infection, gene guns, or tide or fragment thereof. In some aspects, the DNA construct may be linearized prior to conducting an in vitro transcription Ti-mediated gene transfer. Particular methods include cal reaction. The transcribed mRNA is then incubated with an cium phosphate transfection, DEAE-Dextran mediated trans appropriate cell-free translation extract, such as a rabbit fection, lipofection, or electroporation (Davis, L., Dibner, M., reticulocyte extract, to produce the desired polypeptide or Battey, I., Basic Methods in Molecular Biology, (1986)). fragment thereof. 0323 Where appropriate, the engineered host cells can be 0329 Amplification of Nucleic Acids cultured in conventional nutrient media modified as appro 0330. In practicing the invention, nucleic acids encoding priate for activating promoters, selecting transformants or the polypeptides of the invention, or modified nucleic acids, amplifying the genes of the invention. Following transforma can be reproduced by, e.g., amplification. The invention pro tion of a Suitable host strain and growth of the host strain to an vides amplification primer sequence pairs for amplifying appropriate cell density, the selected promoter may be nucleic acids encoding polypeptides (e.g., enzymes) of the induced by appropriate means (e.g., temperature shift or invention. In one aspect, the primer pairs are capable of chemical induction) and the cells may be cultured for an amplifying nucleic acid sequences of the invention, e.g., additional period to allow them to produce the desired including the exemplary SEQID NO:1, SEQID NO:3, SEQ polypeptide or fragment thereof. ID NO:5, SEQID NO:7, SEQID NO:9, SEQID NO:11, etc., 0324 Cells are typically harvested by centrifugation, dis including all nucleic acids disclosed in the SEQ ID listing, rupted by physical or chemical means and the resulting crude which include all odd numbered SEQID NO:s from SEQID extract is retained for further purification. Microbial cells NO:1 through SEQID NO:26,897, or a subsequence thereof, employed for expression of proteins can be disrupted by any etc. One of skill in the art can design amplification primer convenient method, including freeze-thaw cycling, Sonica sequence pairs for any part of or the full length of these tion, mechanical disruption, or use of cell lysing agents. Such Sequences. methods are well known to those skilled in the art. The 0331. In one aspect, the invention provides a nucleic acid expressed polypeptide or fragment thereof can be recovered amplified by a primer pair of the invention, e.g., a primer pair and purified from recombinant cell cultures by methods as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, including ammonium sulfate or ethanol precipitation, acid 19, 20, 21, 22, 23, 24, or 25 or more residues of a nucleic acid extraction, anion or cation exchange chromatography, phos of the invention, and about the first (the 5') 15, 16, 17, 18, 19. phocellulose chromatography, hydrophobic interaction chro 20, 21, 22, 23, 24, or 25 or more residues of the complemen matography, affinity chromatography, hydroxylapatite chro tary strand. matography and lectin chromatography. Protein refolding 0332 The invention provides an amplification primer steps can be used, as necessary, in completing configuration sequence pair for amplifying a nucleic acid encoding a of the polypeptide. If desired, high performance liquid chro polypeptide having an enzyme, structural or binding activity, matography (HPLC) can be employed for final purification wherein the primer pair is capable of amplifying a nucleic steps. acid comprising a sequence of the invention, or fragments or 0325 Various mammalian cell culture systems can also be subsequences thereof. One or each member of the amplifica employed to express recombinant protein. Examples of mam tion primer sequence pair can comprise an oligonucleotide malian expression systems include the COS-7 lines of mon comprising at least about 10 to 50 or more consecutive bases key kidney fibroblasts (described by Gluzman, Cell, 23:175, of the sequence, or about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 1981) and other cell lines capable of expressing proteins from 22, 23, 24, or 25 or more consecutive bases of the sequence. a compatible vector, such as the C127, 3T3, CHO, HeLa and The invention provides amplification primer pairs, wherein BHK cell lines. the primer pair comprises a first member having a sequence as US 2012/0266329 A1 Oct. 18, 2012

set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550 20, 21, 22, 23, 24, or 25 or more residues of a nucleic acid of or more, residues. The invention provides polypeptides com the invention, and a second member having a sequence as set prising sequences having at least about 50%, 51%, 52%. 53%, forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 20, 21, 22, 23, 24, or 25 or more residues of the complemen 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, tary strand of the first member. The invention provides a 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, polypeptide, enzyme, protein, e.g. structural or binding pro 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, tein, generated by amplification, e.g., polymerase chain reac 94%. 95%, 96%, 97%, 98%, 99%, or more, or complete tion (PCR), using an amplification primer pair of the inven (100%) sequence identity to an exemplary polypeptide of the tion. The invention provides methods of making a invention. The extent of sequence identity (homology) may polypeptide, enzyme, protein, e.g. structural or binding pro be determined using any computer program and associated tein, by amplification, e.g., polymerase chain reaction (PCR). parameters, including those described herein, such as BLAST using an amplification primer pair of the invention. In one 2.2.2. or FASTA version 3.0t78, with the default parameters. aspect, the amplification primer pair amplifies a nucleic acid 0337 As used herein, the terms “computer.” “computer from a library, e.g., a gene library, such as an environmental program' and “processor are used in their broadest general library. contexts and incorporate all Such devices, as described in 0333 Amplification reactions can also be used to quantify detail, below. the amount of nucleic acid in a sample (Such as the amount of 0338 Nucleic acid sequences of the invention can com message in a cell sample), label the nucleic acid (e.g., to apply prise at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, it to an array or a blot), detect the nucleic acid, or quantify the 300, 400, or 500 or more consecutive nucleotides of an exem amount of a specific nucleic acid in a sample. In one aspect of plary sequence of the invention and sequences Substantially the invention, message isolated from a cellor a cDNA library identical thereto. Homologous sequences and fragments of are amplified. nucleic acid sequences of the invention can refer to a 0334. The skilled artisan can select and design suitable sequence having at least about 50%, 51%, 52%. 53%, 54%, oligonucleotide amplification primers. Amplification meth 55%, 56%, 57%, 58%, 59%, 60%, 61%. 62%, 63%, 64%, ods are also well known in the art, and include, e.g., poly 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, merase chain reaction, PCR (see, e.g., PCR PROTOCOLS, A 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, GUIDE TO METHODS AND APPLICATIONS, ed. Innis, 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, Academic Press, N.Y. (1990) and PCR STRATEGIES 95%, 96%, 97%, 98%, 99%, or more sequence identity (ho (1995), ed. Innis, Academic Press, Inc., N.Y., ligase chain mology) to these sequences. Homology (sequence identity) reaction (LCR) (see, e.g., Wu (1989) Genomics 4:560; may be determined using any of the computer programs and Landegren (1988) Science 241:1077; Barringer (1990) Gene parameters described herein, including FASTA version 89:117); transcription amplification (see, e.g., Kwoh (1989) 3.0t78 with the default parameters. Homologous sequences Proc. Natl. Acad. Sci. USA 86: 1173); and, self-sustained also include RNA sequences in which uridines replace the sequence replication (see, e.g., Guatelli (1990) Proc. Natl. thymines in the nucleic acid sequences of the invention. The Acad. Sci. USA 87: 1874); Q Beta replicase amplification homologous sequences may be obtained using any of the (see, e.g., Smith (1997) J. Clin. Microbiol. 35:1477-1491), procedures described herein or may result from the correction automated Q-beta replicase amplification assay (see, e.g., of a sequencing error. It will be appreciated that the nucleic Burg (1996) Mol. Cell. Probes 10:257-271) and other RNA acid sequences of the invention can be represented in the polymerase mediated techniques (e.g., NASBA, Cangene, traditional single character format (See the inside back cover Mississauga, Ontario); see also Berger (1987) Methods Enzy of Stryer, Lubert. Biochemistry, 3rd Ed., W. H. Freeman & mol. 152:307-316; Sambrook; Ausubel; U.S. Pat. Nos. 4,683, Co., New York.) or in any other format which records the 195 and 4,683.202; Sooknanan (1995) Biotechnology identity of the nucleotides in a sequence. 13:563-564. 0339 Various sequence comparison programs identified 0335 Determining the Degree of Sequence Identity elsewhere in this patent specification are particularly contem 0336. The invention provides nucleic acids comprising plated for use in this aspect of the invention. Protein and/or sequences having at least about 50%, 51%, 52%. 53%, 54%, nucleic acid sequence homologies may be evaluated using 55%, 56%, 57%, 58%, 59%, 60%, 61%. 62%, 63%, 64%, any of the variety of sequence comparison algorithms and 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, programs known in the art. Such algorithms and programs 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, include, but are by no means limited to, TBLASTN, 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, BLASTP, FASTA, TFASTA and CLUSTALW (see, e.g., 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85 (8):2444 sequence identity to an exemplary nucleic acid of the inven 2448, 1988: Altschul et al., J. Mol. Biol. 215(3):403-410, tion (e.g., SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ 1990; Thompson Nucleic Acids Res. 22(2):4673-4680, 1994; ID NO:7, SEQ ID NO:9, SEQ ID NO:11, etc., including all Higgins et al., Methods Enzymol. 266:383-402, 1996; Alts nucleic acids disclosed in the SEQID listing, which include chulet al., J. Mol. Biol. 215(3):403-410, 1990: Altschulet al., all even numbered SEQID NO:s from SEQID NO:1 through Nature Genetics 3:266-272, 1993). SEQ ID NO:26,897, and nucleic acids encoding SEQ ID 0340 Homology or identity is often measured using NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID sequence analysis Software (e.g., Sequence Analysis Soft NO:10, etc., and all polypeptides disclosed in the SEQ ID ware Package of the Genetics Computer Group, University of listing, which include all even numbered SEQID NO:s from Wisconsin Biotechnology Center, 1710 University Avenue, SEQ ID NO:2 through SEQID NO:26,898) over a region of Madison, Wis. 53705). Such software matches similar at least about 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, sequences by assigning degrees of homology to various dele 500, 550, 600, 650, 700, 750, 800, 850,900,950, 1000, 1050, tions, Substitutions and other modifications. The terms US 2012/0266329 A1 Oct. 18, 2012 69

“homology” and “identity” in the context of two or more least twenty-one other genomes have already been nucleic acids or polypeptide sequences, refer to two or more sequenced, including, for example, M. genitalium (Fraser et sequences or Subsequences that are the same or have a speci al., 1995), M. jannaschii (Bult et al., 1996), H. influenzae fied percentage of amino acid residues or nucleotides that are (Fleischmann et al., 1995), E. coli (Blattner et al., 1997) and the same when compared and aligned for maximum corre yeast (S. cerevisiae) (Mewes et al., 1997) and D. melano spondence overa comparison window or designated region as gaster (Adams et al., 2000). Significant progress has also measured using any number of sequence comparison algo been made in sequencing the genomes of model organism, rithms or by manual alignment and visual inspection. Such as mouse, C. elegans and Arabadopsis sp. Several data 0341 For sequence comparison, typically one sequence bases containing genomic information annotated with some acts as a reference sequence, to which test sequences are functional information are maintained by different organiza compared. When using a sequence comparison algorithm, tions and may be accessible via internet. test and reference sequences are entered into a computer, Subsequence coordinates are designated, if necessary and 0343 One example of a useful algorithm is BLAST and sequence algorithm program parameters are designated. BLAST 2.0 algorithms, which are described in Altschuletal. Default program parameters can be used, or alternative Nuc. Acids Res. 25:3389-3402, 1977 and Altschul et al., J. parameters can be designated. The sequence comparison Mol. Biol. 215:403-410, 1990, respectively. Software for per algorithm then calculates the percent sequence identities for forming BLAST analyses is publicly available through the the test sequences relative to the reference sequence, based on National Center for Biotechnology Information. This algo the program parameters. rithm involves first identifying high scoring sequence pairs 0342. A “comparison window', as used herein, includes (HSPs) by identifying short words of length W in the query reference to a segment of any one of the number of contiguous sequence, which either match or satisfy some positive-valued positions selected from the group consisting of from 20 to threshold score T when aligned with a word of the same 600, usually about 50 to about 200, more usually about 100 to length in a database sequence. T is referred to as the neigh about 150 in which a sequence may be compared to a refer borhood word score threshold (Altschul et al., supra). These ence sequence of the same number of contiguous positions initial neighborhood word hits act as seeds for initiating after the two sequences are optimally aligned. Methods of searches to find longer HSPs containing them. The word hits alignment of sequence for comparison are well-known in the are extended in both directions along each sequence for as far art. Optimal alignment of sequences for comparison can be as the cumulative alignment score can be increased. Cumu conducted, e.g., by the local homology algorithm of Smith & lative scores are calculated using, for nucleotide sequences, Waterman, Adv. Appl. Math. 2:482, 1981, by the homology the parameters M (reward score for a pair of matching resi alignment algorithm of Needleman & Wunsch, J. Mol. Biol dues; always >0). Foramino acid sequences, a scoring matrix 48:443, 1970, by the search for similarity method of person & is used to calculate the cumulative score. Extension of the Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988, by com word hits in each direction are halted when: the cumulative puterized implementations of these algorithms (GAP, BEST alignment score falls off by the quantity X from its maximum FIT, FASTA and TFASTA in the Wisconsin Genetics Soft achieved value; the cumulative score goes to Zero or below, ware Package, Genetics Computer Group, 575 Science Dr. due to the accumulation of one or more negative-scoring Madison, Wis.), or by manual alignment and visual inspec residue alignments; or the end of either sequence is reached. tion. Other algorithms for determining homology or identity The BLAST algorithm parameters W.T and X determine the include, for example, in addition to a BLAST program (Basic sensitivity and speed of the alignment. The BLASTN pro Local Alignment Search Tool at the National Center for Bio logical Information), ALIGN, AMAS (Analysis of Multiply gram (for nucleotide sequences) uses as defaults a wordlength Aligned Sequences), AMPS (Protein Multiple Sequence (W) of 11, an expectation (E) of 10, M=5, N=-4 and a com Alignment), ASSET (Aligned Segment Statistical Evaluation parison of both strands. For amino acid sequences, the Tool), BANDS, BESTSCOR, BIOSCAN (Biological BLASTP program uses as defaults a wordlength of 3 and Sequence Comparative Analysis Node), BLIMPS (BLocks expectations (E) of 10 and the BLOSUM62 scoring matrix IMProved Searcher), FASTA, Intervals & Points, BMB, (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA CLUSTAL V, CLUSTAL W, CONSENSUS, LCONSEN 89:10915, 1989) alignments (B) of 50, expectation (E) of 10, SUS, WCONSENSUS, Smith-Waterman algorithm, DAR M=5, N=-4 and a comparison of both strands. WIN. Las Vegas algorithm, FNAT (Forced Nucleotide Align 0344) The BLAST algorithm also performs a statistical ment Tool), Framealign, Framesearch, DYNAMIC, FILTER, analysis of the similarity between two sequences (see, e.g., FSAP (Fristensky Sequence Analysis Package), GAP (Glo Karlin & Altschul, Proc. Natl. Acad. Sci. USA 90:5873, bal Alignment Program), GENAL. GIBBS. GenQuest, ISSC 1993). One measure of similarity provided by BLAST algo (Sensitive Sequence Comparison), LALIGN (Local rithm is the smallest sum probability (P(N)), which provides Sequence Alignment), LCP (Local Content Program), an indication of the probability by which a match between MACAW (Multiple Alignment Construction & Analysis two nucleotide or amino acid sequences would occur by Workbench), MAP (Multiple Alignment Program), MBLKP, chance. For example, a nucleic acid is considered similar to a MBLKN, PIMA (Pattern-Induced Multi-sequence Align references sequence if the Smallest Sum probability in a com ment), SAGA (Sequence Alignment by Genetic Algorithm) parison of the test nucleic acid to the reference nucleic acid is and WHAT-IF. Such alignment programs can also be used to less than about 0.2, more in one aspect less than about 0.01 screen genome databases to identify polynucleotide and most in one aspect less than about 0.001. sequences having Substantially identical sequences. A num 0345. In one aspect, protein and nucleic acid sequence ber of genome databases are available, for example, a Sub homologies are evaluated using the Basic Local Alignment stantial portion of the human genome is available as part of Search Tool (“BLAST) In particular, five specific BLAST the Human Genome Sequencing Project (Gibbs, 1995). At programs are used to perform the following task: US 2012/0266329 A1 Oct. 18, 2012 70

(0346 (1) BLASTP and BLAST3 compare an amino 0357 Homology (sequence identity) may be determined acid query sequence against a protein sequence data using any of the computer programs and parameters base; described herein. A nucleic acid or polypeptide sequence of 0347 (2) BLASTN compares a nucleotide query the invention can be stored, recorded and manipulated on any sequence against a nucleotide sequence database; medium which can be read and accessed by a computer. As 0348 (3) BLASTX compares the six-frame conceptual used herein, the words “recorded' and “stored’ refer to a translation products of a query nucleotide sequence process for storing information on a computer medium. A (both Strands) against a protein sequence database; skilled artisan can readily adopt any of the presently known 0349 (4) TBLASTN compares a query protein methods for recording information on a computer readable sequence against a nucleotide sequence database trans medium to generate manufactures comprising one or more of lated in all six reading frames (both strands); and the nucleic acid sequences of the invention, one or more of the 0350 (5) TBLASTX compares the six-frame transla polypeptide sequences of the invention. Another aspect of the tions of a nucleotide query sequence against the six invention is a computer readable medium having recorded frame translations of a nucleotide sequence database. thereon at least 2, 5, 10, 15, or 20 or more nucleic acid or 0351. The BLAST programs identify homologous polypeptide sequences of the invention. sequences by identifying similar segments, which are referred to herein as “high-scoring segment pairs.” between a 0358 Another aspect of the invention is a computer read query amino or nucleic acid sequence and a test sequence able medium having recorded thereon one or more of the which is in one aspect obtained from a protein or nucleic acid nucleic acid sequences of the invention. Another aspect of the sequence database. High-scoring segment pairs are in one invention is a computer readable medium having recorded aspect identified (i.e., aligned) by means of a scoring matrix, thereon one or more of the polypeptide sequences of the many of which are known in the art. In one aspect, the scoring invention. Another aspect of the invention is a computer read matrix used is the BLOSUM62 matrix (Gonnet (1992) Sci able medium having recorded thereon at least 2, 5, 10, 15, or ence 256:1443-1445; Henikoff and Henikoff (1993) Proteins 20 or more of the nucleic acid or polypeptide sequences as set 17:49-61). Less in one aspect, the PAM or PAM250 matrices forth above. may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978, 0359 Computer readable media include magnetically Matrices for Detecting Distance Relationships: Atlas of Pro readable media, optically readable media, electronically tein Sequence and Structure, Washington: National Biomedi readable media and magnetic/optical media. For example, the cal Research Foundation). BLAST programs are accessible computer readable media may be a hard disk, a floppy disk, a through the U.S. National Library of Medicine. magnetic tape, CD-ROM, Digital Versatile Disk (DVD), 0352. The parameters used with the above algorithms may dom. Access Memory (RAM), or Read Only Memory (ROM) be adapted depending on the sequence length and degree of as well as other types of other media known to those skilled in homology studied. In some aspects, the parameters may be the art. the default parameters used by the algorithms in the absence of instructions from the user. 0360 Aspects of the invention include systems (e.g., inter 0353 Computer Systems and Computer Program Prod net based systems), particularly computer systems which lucts store and manipulate the sequence information described 0354) To determine and identify sequence identities, herein. One example of a computer system 100 is illustrated structural homologies, motifs and the like in silico, a nucleic in block diagram form in FIG.1. As used herein, “a computer acid or polypeptide sequence of the invention can be stored, system’ refers to the hardware components, software com recorded, and manipulated on any medium which can be read ponents and data storage components used to analyze a nucle and accessed by a computer. otide sequence of a nucleic acid sequence of the invention, or 0355 Accordingly, the invention provides computers, a polypeptide sequence of the invention. The computer sys computer systems, computer readable mediums, computer tem 100 typically includes a processor for processing, access programs products and the like recorded or stored thereon the ing and manipulating the sequence data. The processor 105 nucleic acid and polypeptide sequences of the invention. As can be any well-known type of central processing unit. Such used herein, the words “recorded' and “stored’ refer to a as, for example, the Pentium III from Intel Corporation, or process for storing information on a computer medium. A similar processor from Sun, Motorola, Compaq, AMD or skilled artisan can readily adopt any known methods for International Business Machines. recording information on a computer readable medium to 0361 Typically the computer system 100 is a general pur generate manufactures comprising one or more of the nucleic pose system that comprises the processor 105 and one or more acid and/or polypeptide sequences of the invention. internal data storage components 110 for storing data and one 0356. The polypeptides of the invention include the or more data retrieving devices for retrieving the data stored polypeptide sequences of the invention, e.g., the exemplary on the data storage components. A skilled artisan can readily sequences of the invention, and sequences Substantially iden appreciate that any one of the currently available computer tical thereto, and fragments of any of the preceding systems are Suitable. sequences. Substantially identical, or homologous, polypep 0362. In one particular aspect, the computer system 100 tide sequences refer to a polypeptide sequence having at least includes a processor 105 connected to a bus which is con 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, nected to a main memory 115 (in one aspect implemented as 60%, 61%, 62%, 63%, 64%. 65%, 66%, 67%, 68%, 69%, RAM) and one or more internal data storage devices 110. 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, Such as a hard drive and/or other computer readable media 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, having data recorded thereon. In some aspects, the computer 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or system 100 further includes one or more data retrieving more, or complete (100%) sequence identity (homology) to device 118 for reading the data stored on the internal data an exemplary sequence of the invention. storage devices 110. US 2012/0266329 A1 Oct. 18, 2012

0363 The data retrieving device 118 may represent, for sequences. The parameters that control whether gaps or other example, a floppy disk drive, a compact disk drive, a magnetic features are introduced into a sequence during comparison tape drive, or a modem capable of connection to a remote data are normally entered by the user of the computer system. storage system (e.g., via internet) etc. In some aspects, the 0370. Once a comparison of the two sequences has been internal data storage device 110 is a removable computer performed at the state 210, a determination is made at a readable medium Such as a floppy disk, a compact disk, a decision state 210 whether the two sequences are the same. Of magnetic tape, etc. containing control logic and/or data course, the term 'same is not limited to sequences that are recorded thereon. The computer system 100 may advanta absolutely identical. Sequences that are within the homology geously include or be programmed by appropriate Software parameters entered by the user will be marked as “same in for reading the control logic and/or the data from the data the process 200. storage component once inserted in the data retrieving device. 0371) Ifa determination is made that the two sequences are 0364 The computer system 100 includes a display 120 the same, the process 200 moves to a state 214 wherein the which is used to display output to a computer user. It should name of the sequence from the database is displayed to the also be noted that the computer system 100 can be linked to user. This state notifies the user that the sequence with the other computer systems 125a-c in a network or wide area displayed name fulfills the homology constraints that were network to provide centralized access to the computer system entered. Once the name of the stored sequence is displayed to 1OO. the user, the process 200 moves to a decision state 218 0365 Software for accessing and processing the nucle wherein a determination is made whether more sequences otide sequences of a nucleic acid sequence of the invention, or exist in the database. If no more sequences exist in the data a polypeptide sequence of the invention, (such as search tools, base, then the process 200 terminates at an end state 220. compare tools and modeling tools etc.) may reside in main However, if more sequences do exist in the database, then the memory 115 during execution. process 200 moves to a state 224 wherein a pointer is moved 0366. In some aspects, the computer system 100 may fur to the next sequence in the database so that it can be compared ther comprise a sequence comparison algorithm for compar to the new sequence. In this manner, the new sequence is ing a nucleic acid sequence of the invention, or a polypeptide aligned and compared with every sequence in the database. sequence of the invention, stored on a computer readable 0372. It should be noted that if a determination had been medium to a reference nucleotide or polypeptide sequence(s) made at the decision state 212 that the sequences were not stored on a computer readable medium. A 'sequence com homologous, then the process 200 would move immediately parison algorithm” refers to one or more programs which are to the decision state 218 in order to determine if any other implemented (locally or remotely) on the computer system sequences were available in the database for comparison. 100 to compare a nucleotide sequence with other nucleotide 0373 Accordingly, one aspect of the invention is a com sequences and/or compounds stored within a data storage puter system comprising a processor, a data storage device means. For example, the sequence comparison algorithm having stored thereon a nucleic acid sequence of the inven may compare the nucleotide sequences of a nucleic acid tion, or a polypeptide sequence of the invention, a data storage sequence of the invention, or a polypeptide sequence of the device having retrievably stored thereon reference nucleotide invention, stored on a computer readable medium to reference sequences or polypeptide sequences to be compared to a sequences stored on a computer readable medium to identify nucleic acid sequence of the invention, or a polypeptide homologies or structural motifs. sequence of the invention and a sequence comparer for con 0367 FIG. 2 is a flow diagram illustrating one aspect of a ducting the comparison. The sequence comparer may indi process 200 for comparing a new nucleotide or protein cate a homology level between the sequences compared or sequence with a database of sequences in order to determine identify structural motifs in the above described nucleic acid the homology levels between the new sequence and the code a nucleic acid sequence of the invention, or a polypep sequences in the database. The database of sequences can be tide sequence of the invention, or it may identify structural a private database stored within the computer system 100, or motifs in sequences which are compared to these nucleic acid a public database such as GENBANK that is available codes and polypeptide codes. In some aspects, the data Stor through the Internet. age device may have stored thereon the sequences of at least 0368. The process 200 begins at a start state 201 and then 2, 5, 10, 15, 20, 25, 30 or 40 or more of the nucleic acid moves to a state 202 wherein the new sequence to be com sequences of the invention, or the polypeptide sequences of pared is stored to a memory in a computer system 100. As the invention. discussed above, the memory could be any type of memory, 0374. Another aspect of the invention is a method for including RAM or an internal storage device. determining the level of homology between a nucleic acid 0369. The process 200 then moves to a state 204 wherein sequence of the invention, or a polypeptide sequence of the a database of sequences is opened for analysis and compari invention and a reference nucleotide sequence. The method son. The process 200 then moves to a state 206 wherein the including reading the nucleic acid code or the polypeptide first sequence stored in the database is read into a memory on code and the reference nucleotide or polypeptide sequence the computer. A comparison is then performed at a state 210 through the use of a computer program which determines to determine if the first sequence is the same as the second homology levels and determining homology between the sequence. It is important to note that this step is not limited to nucleic acid code or polypeptide code and the reference performing an exact comparison between the new sequence nucleotide or polypeptide sequence with the computer pro and the first sequence in the database. Well-known methods gram. The computer program may be any of a number of are known to those of skill in the art for comparing two computer programs for determining homology levels, includ nucleotide or protein sequences, even if they are not identical. ing those specifically enumerated herein, (e.g., BLAST2N For example, gaps can be introduced into one sequence in with the default parameters or with any modified parameters). order to raise the homology level between the two tested The method may be implemented using the computer systems US 2012/0266329 A1 Oct. 18, 2012 72 described above. The method may also be performed by nucleotide polymorphisms. The method may be implemented reading at least 2, 5, 10, 15, 20, 25, 30 or 40 or more of the by the computer systems described above and the method above described nucleic acid sequences of the invention, or illustrated in FIG. 3. The method may also be performed by the polypeptide sequences of the invention through use of the reading at least 2, 5, 10, 15, 20, 25, 30, or 40 or more of the computer program and determining homology between the nucleic acid sequences of the invention and the reference nucleic acid codes or polypeptide codes and reference nucle nucleotide sequences through the use of the computer pro otide sequences or polypeptide sequences. gram and identifying differences between the nucleic acid 0375 FIG. 3 is a flow diagram illustrating one aspect of a codes and the reference nucleotide sequences with the com process 250 in a computer for determining whether two puter program. sequences are homologous. The process 250 begins at a start 0380. In other aspects the computer based system may state 252 and then moves to a state 254 wherein a first further comprise an identifier for identifying features within a sequence to be compared is stored to a memory. The second nucleic acid sequence of the invention or a polypeptide sequence to be compared is then Stored to a memory at a state sequence of the invention. 256. The process 250 then moves to a state 260 wherein the 0381 An “identifier refers to one or more programs first character in the first sequence is read and then to a state which identifies certain features within a nucleic acid 262 wherein the first character of the second sequence is read. sequence of the invention, or a polypeptide sequence of the It should be understood that if the sequence is a nucleotide invention. In one aspect, the identifier may comprise a pro sequence, then the character would normally be either A.T. C. gram which identifies an open reading frame in a nucleic acid G or U. If the sequence is a protein sequence, then it is in one sequence of the invention. aspect in the single letter amino acid code so that the first and 0382 FIG. 4 is a flow diagram illustrating one aspect of an sequence sequences can be easily compared. identifier process 300 for detecting the presence of a feature in 0376. A determination is then made at a decision state 264 a sequence. The process 300 begins at a start state 302 and whether the two characters are the same. If they are the same, then moves to a state 304 wherein a first sequence that is to be then the process 250 moves to a state 268 wherein the next checked for features is stored to a memory 115 in the com characters in the first and second sequences are read. A deter puter system 100. The process 300 then moves to a state 306 mination is then made whether the next characters are the wherein a database of sequence features is opened. Such a same. If they are, then the process 250 continues this loop database would include a list of each feature's attributes along until two characters are not the same. If a determination is with the name of the feature. For example, a feature name made that the next two characters are not the same, the pro could be “Initiation Codon and the attribute would be cess 250 moves to a decision state 274 to determine whether “ATG'. Another example would be the feature name there are any more characters either sequence to read. “TAATAA Box” and the feature attribute would be 0377 If there are not any more characters to read, then the “TAATAA’. An example of such a database is produced by process 250 moves to a state 276 wherein the level of homol the University of Wisconsin Genetics Computer Group. ogy between the first and second sequences is displayed to the Alternatively, the features may be structural polypeptide user. The level of homology is determined by calculating the motifs such as alpha helices, beta sheets, or functional proportion of characters between the sequences that were the polypeptide motifs such as enzymatic active sites, helix-turn same out of the total number of sequences in the first helix motifs or other motifs known to those skilled in the art. sequence. Thus, if every character in a first 100 nucleotide 0383. Once the database of features is opened at the state sequence aligned with a every character in a second sequence, 306, the process 300 moves to a state 308 wherein the first the homology level would be 100%. feature is read from the database. A comparison of the 0378. Alternatively, the computer program may be a com attribute of the first feature with the first sequence is then puter program which compares the nucleotide sequences of a made at a state 310. A determination is then made at a decision nucleic acid sequence as set forth in the invention, to one or state 316 whether the attribute of the feature was found in the more reference nucleotide sequences in order to determine first sequence. If the attribute was found, then the process 300 whether the nucleic acid code of the invention, differs from a moves to a state 318 wherein the name of the found feature is reference nucleic acid sequence at one or more positions. displayed to the user. Optionally Such a program records the length and identity of 0384 The process 300 then moves to a decision state 320 inserted, deleted or substituted nucleotides with respect to the whereina determination is made whether move features exist sequence of either the reference polynucleotide or a nucleic in the database. If no more features do exist, then the process acid sequence of the invention. In one aspect, the computer 300 terminates at an end state 324. However, if more features program may be a program which determines whether a do exist in the database, then the process 300 reads the next nucleic acid sequence of the invention, contains a single sequence feature at a state 326 and loops back to the state 310 nucleotide polymorphism (SNP) with respect to a reference wherein the attribute of the next feature is compared against nucleotide sequence. the first sequence. It should be noted, that if the feature 0379 Accordingly, another aspect of the invention is a attribute is not found in the first sequence at the decision state method for determining whether a nucleic acid sequence of 316, the process 300 moves directly to the decision state 320 the invention, differs at one or more nucleotides from a ref in order to determine if any more features exist in the data erence nucleotide sequence comprising the steps of reading base. the nucleic acid code and the reference nucleotide sequence 0385 Accordingly, another aspect of the invention is a through use of a computer program which identifies differ method of identifying a feature within a nucleic acid ences between nucleic acid sequences and identifying differ sequence of the invention, or a polypeptide sequence of the ences between the nucleic acid code and the reference nucle invention, comprising reading the nucleic acid code(s) or otide sequence with the computer program. In some aspects, polypeptide code(s) through the use of a computer program the computer program is a program which identifies single which identifies features therein and identifying features US 2012/0266329 A1 Oct. 18, 2012 within the nucleic acid code(s) with the computer program. In (0389) Hybridization of Nucleic Acids one aspect, computer program comprises a computer pro 0390 The invention provides isolated or recombinant gram which identifies open reading frames. The method may nucleic acids that hybridize under Stringent conditions to an be performed by reading a single sequence or at least 2, 5, 10. exemplary sequence of the invention (e.g., SEQ ID NO:3, 15, 20, 25, 30, or 40 of the nucleic acid sequences of the SEQID NO:5, SEQID NO:7, SEQID NO:9). The stringent invention, or the polypeptide sequences of the invention, conditions can be highly stringent conditions, medium strin through the use of the computer program and identifying gent conditions and/or low stringent conditions, including the features within the nucleic acid codes or polypeptide codes high and reduced stringency conditions described herein. In with the computer program. one aspect, it is the stringency of the wash conditions that set 0386 A nucleic acid sequence of the invention, or a forth the conditions which determine whether a nucleic acid polypeptide sequence of the invention, may be stored and is within the scope of the invention, as discussed below. manipulated in a variety of data processor programs in a 0391) “Hybridization” refers to the process by which a variety of formats. For example, a nucleic acid sequence of nucleic acid strand joins with a complementary strand the invention, or a polypeptide sequence of the invention, may through base pairing. Hybridization reactions can be sensitive be stored as text in a word processing file, such as Microsoft and selective so that a particular sequence of interest can be WORDTM or WORDPERFECTTM or as an ASCII file in a identified even in Samples in which it is present at low con variety of database programs familiar to those of skill in the centrations. Suitably stringent conditions can be defined by, art, such as DB2TM, SYBASETM, or ORACLETM. In addition, for example, the concentrations of Salt or formamide in the many computer programs and databases may be used as prehybridization and hybridization solutions, or by the sequence comparison algorithms, identifiers, or sources of hybridization temperature and are well known in the art. In reference nucleotide sequences or polypeptide sequences to particular, stringency can be increased by reducing the con be compared to a nucleic acid sequence of the invention, or a centration of salt, increasing the concentration of formamide, polypeptide sequence of the invention. The following list is or raising the hybridization temperature. In alternative intended not to limit the invention but to provide guidance to aspects, nucleic acids of the invention are defined by their programs and databases which are useful with the nucleic ability to hybridize under various stringency conditions (e.g., acid sequences of the invention, or the polypeptide sequences high, medium, and low), as set forth herein. of the invention. 0392 For example, hybridization under high stringency 0387. The programs and databases which may be used conditions could occur in about 50% formamide at about 37° include, but are not limited to: MacPattern (EMBL), Discov C. to 42°C. Hybridization could occur under reduced strin eryBase (Molecular Applications Group), GeneMine (Mo gency conditions in about 35% to 25% formamide at about lecular Applications Group), Look (Molecular Applications 30° C. to 35° C. In one aspect, hybridization occurs under Group), MacLook (Molecular Applications Group), BLAST high Stringency conditions, e.g., at 42°C. in 50% formamide, and BLAST2 (NCBI), BLASTN and BLASTX (Altschulet 5xSSPE, 0.3% SDS and 200 n/ml sheared and denatured al, J. Mol. Biol. 215: 403, 1990), FASTA (Pearson and Lip salmon sperm DNA. Hybridization could occur under these man, Proc. Natl. Acad. Sci. USA, 85: 2444, 1988), FASTDB reduced stringency conditions, but in 35% formamide at a (Brutlaget al. Comp. App. Biosci. 6:237-245, 1990), Catalyst reduced temperature of 35°C. The temperature range corre (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular sponding to a particular level of stringency can be further Simulations Inc.), Cerius’.DBAccess (Molecular Simula narrowed by calculating the purine to pyrimidine ratio of the tions Inc.), HypoCien (Molecular Simulations Inc.), Insight nucleic acid of interest and adjusting the temperature accord II, (Molecular Simulations Inc.), Discover (Molecular Simu ingly. Variations on the above ranges and conditions are well lations Inc.), CHARMm (Molecular Simulations Inc.), Felix known in the art. (Molecular Simulations Inc.), DelPhi, (Molecular Simula 0393. In alternative aspects, nucleic acids of the invention tions Inc.), QuanteMM, (Molecular Simulations Inc.), as defined by their ability to hybridize under stringent condi Homology (Molecular Simulations Inc.), Modeler (Molecu tions can be between about five residues and the full length of lar Simulations Inc.), ISIS (Molecular Simulations Inc.), nucleic acid of the invention; e.g., they can be at least 5, 10. Quanta/Protein Design (Molecular Simulations Inc.), 15, 20, 25, 30, 35, 40, 50,55, 60, 65,70, 75, 80,90, 100, 150, WebLab (Molecular Simulations Inc.), WebLab Diversity 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, Explorer (Molecular Simulations Inc.), Gene Explorer (Mo 800, 850,900,950, 1000, or more, residues in length. Nucleic lecular Simulations Inc.), SeqPold (Molecular Simulations acids shorter than full length are also included. These nucleic Inc.), the MDL. Available Chemicals Directory database, the acids can be useful as, e.g., hybridization probes, labeling MDL Drug Data Report data base, the Comprehensive probes, PCR oligonucleotide probes, iRNA (single or double Medicinal Chemistry database, Derwents's World Drug Stranded), antisense or sequences encoding antibody binding Index database, the BioByteMasterFile database, the Gen peptides (epitopes), motifs, active sites and the like. bank database and the Genseqn database. Many other pro 0394. In one aspect, nucleic acids of the invention are grams and databases would be apparent to one of skill in the defined by their ability to hybridize under high stringency art given the present disclosure. comprises conditions of about 50% formamide at about 37° 0388 Motifs which may be detected using the above pro C. to 42°C. In one aspect, nucleic acids of the invention are grams include sequences encoding leucine Zippers, helix defined by their ability to hybridize under reduced stringency turn-helix motifs, glycosylation sites, ubiquitination sites, comprising conditions in about 35% to 25% formamide at alpha helices and beta sheets, signal sequences encoding sig about 30° C. to 35° C. nal peptides which direct the secretion of the encoded pro 0395 Alternatively, nucleic acids of the invention are teins, sequences implicated intranscription regulation Such as defined by their ability to hybridize under high stringency homeoboxes, acidic stretches, enzymatic active sites, Sub comprising conditions at 42°C. in 50% formamide, 5xSSPE, strate binding sites and enzymatic cleavage sites. 0.3% SDS, and a repetitive sequence blocking nucleic acid, US 2012/0266329 A1 Oct. 18, 2012 74 such as cot-1 or salmon sperm DNA (e.g., 200 n/ml sheared decreasing homology to the detectable probe, less stringent and denatured salmon sperm DNA). In one aspect, nucleic conditions may be used. For example, the hybridization tem acids of the invention are defined by their ability to hybridize perature may be decreased in increments of 5°C. from 68°C. under reduced stringency conditions comprising 35% forma to 42°C. in a hybridization buffer having a Na+ concentration mide at a reduced temperature of 35° C. of approximately 1M. Following hybridization, the filter may 0396. In nucleic acid hybridization reactions, the condi be washed with 2xSSC, 0.5% SDS at the temperature of tions used to achieve a particular level of stringency will vary, hybridization. These conditions are considered to be “mod depending on the nature of the nucleic acids being hybridized. erate' conditions above 50° C. and “low” conditions below For example, the length, degree of complementarity, nucle 50° C. A specific example of “moderate’ hybridization con otide sequence composition (e.g., GC v. AT content) and ditions is when the above hybridization is conducted at 55° C. nucleic acid type (e.g., RNA v. DNA) of the hybridizing A specific example of “low stringency' hybridization condi regions of the nucleic acids can be considered in selecting tions is when the above hybridization is conducted at 45° C. hybridization conditions. An additional consideration is 0403. Alternatively, the hybridization may be carried out whether one of the nucleic acids is immobilized, for example, in buffers, such as 6xSSC, containing formamide at a tem on a filter. perature of 42°C. In this case, the concentration of forma 0397 Hybridization may be carried out under conditions mide in the hybridization buffer may be reduced in 5% incre of low stringency, moderate stringency or high Stringency. As ments from 50% to 0% to identify clones having decreasing an example of nucleic acid hybridization, a polymer mem levels of homology to the probe. Following hybridization, the brane containing immobilized denatured nucleic acids is first filter may be washed with 6xSSC, 0.5% SDS at 50° C. These prehybridized for 30 minutes at 45° C. in a solution consisting conditions are considered to be “moderate' conditions above of 0.9 MNaCl,50mMNaHPO, pH 7.0,5.0 mMNaEDTA, 25% formamide and “low” conditions below 25% forma 0.5% SDS, 10xDenhardt's and 0.5 mg/ml polyriboadenylic mide. A specific example of “moderate’ hybridization con acid. Approximately 2x10 cpm (specific activity 4-9x10 ditions is when the above hybridization is conducted at 30% cpm/ug) of Pend-labeled oligonucleotide probe are then formamide. A specific example of “low stringency” hybrid added to the solution. After 12-16 hours of incubation, the ization conditions is when the above hybridization is con membrane is washed for 30 minutes at room temperature in ducted at 10% formamide. 1xSET (150 mMNaCl, 20 mM Tris hydrochloride, pH 7.8, 1 0404 However, the selection of a hybridization format is mM NaEDTA) containing 0.5% SDS, followed by a 30 not critical—it is the stringency of the wash conditions that minute wash in fresh 1xSET at T-10°C. for the oligonucle set forth the conditions which determine whether a nucleic otide probe. The membrane is then exposed to auto-radio acid is within the scope of the invention. Wash conditions graphic film for detection of hybridization signals. used to identify nucleic acids within the scope of the invention 0398 All of the foregoing hybridizations would be con include, e.g.: a salt concentration of about 0.02 molar at pH 7 sidered to be under conditions of high Stringency. and a temperature of at least about 50° C. or about 55° C. to 0399. Following hybridization, a filter can be washed to about 60° C.; or, a salt concentration of about 0.15 MNaCl at remove any non-specifically bound detectable probe. The 72°C. for about 15 minutes; or, a salt concentration of about stringency used to wash the filters can also be varied depend 0.2xSSC at a temperature of at least about 50° C. or about 55° ing on the nature of the nucleic acids being hybridized, the C. to about 60° C. for about 15 to about 20 minutes; or, the length of the nucleic acids being hybridized, the degree of hybridization complex is washed twice with a solution with a complementarity, the nucleotide sequence composition (e.g., salt concentration of about 2xSSC containing 0.1% SDS at GC v. AT content) and the nucleic acid type (e.g., RNA v. room temperature for 15 minutes and then washed twice by DNA). Examples of progressively higher stringency condi 0.1xSSC containing 0.1% SDS at 68° C. for 15 minutes; or, tion washes are as follows: 2xSSC, 0.1% SDS at room tem equivalent conditions. See Sambrook, Tijssen and Ausubel perature for 15 minutes (low stringency); 0.1xSSC, 0.5% for a description of SSC buffer and equivalent conditions. SDS at room temperature for 30 minutes to 1 hour (moderate 04.05 These methods may be used to isolate nucleic acids stringency); 0.1xSSC, 0.5% SDS for 15 to 30 minutes at of the invention. For example, the preceding methods may be between the hybridization temperature and 68°C. (high strin used to isolate nucleic acids having a sequence with at least gency); and 0.15M NaCl for 15 minutes at 72°C. (very high about 97%, at least 95%, at least 90%, at least 85%, at least stringency). A final low stringency wash can be conducted in 80%, at least 75%, at least 70%, at least 65%, at least 60%, at 0.1 xSSC at room temperature. The examples above are least 55%, or at least 50% sequence identity (homology) to a merely illustrative of one set of conditions that can be used to nucleic acid sequence selected from the group consisting of wash filters. One of skill in the art would know that there are one of the sequences of the invention, or fragments compris numerous recipes for different stringency washes. Some ing at least about 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, other examples are given below. 200, 300, 400, or 500 consecutive bases thereof and the 0400. In one aspect, hybridization conditions comprise a sequences complementary thereto. Sequence identity (ho wash step comprising a wash for 30 minutes at room tem mology) may be measured using the alignment algorithm. For perature in a solution comprising 1x150 mM NaCl, 20 mM example, the homologous polynucleotides may have a coding Tris hydrochloride, pH 7.8, 1 mM NaEDTA, 0.5% SDS, sequence which is a naturally occurring allelic variant of one followed by a 30 minute wash in fresh solution. of the coding sequences described herein. Such allelic vari 0401 Nucleic acids which have hybridized to the probe ants may have a Substitution, deletion or addition of one or are identified by autoradiography or other conventional tech more nucleotides when compared to the nucleic acids of the niques. invention. Additionally, the above procedures may be used to 0402. The above procedure may be modified to identify isolate nucleic acids which encode polypeptides having at nucleic acids having decreasing levels of homology to the least about 99%. 95%, at least 90%, at least 85%, at least 80%, probe sequence. For example, to obtain nucleic acids of at least 75%, at least 70%, at least 65%, at least 60%, at least US 2012/0266329 A1 Oct. 18, 2012

55%, or at least 50% sequence identity (homology) to a 0412. Alternatively, more than one probe (at least one of polypeptide of the invention, or fragments comprising at least which is capable of specifically hybridizing to any comple 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive mentary sequences which are present in the nucleic acid amino acids thereof as determined using a sequence align sample), may be used in an amplification reaction to deter ment algorithm (e.g., such as the FASTA version 3.0t78 algo mine whether the sample contains an organism containing a rithm with the default parameters). nucleic acid sequence of the invention (e.g., an organism from 0406 Oligonucleotides Probes and Methods for Using which the nucleic acid was isolated). Typically, the probes them comprise oligonucleotides. In one aspect, the amplification 0407. The invention also provides nucleic acid probes that reaction may comprise a PCR reaction. PCR protocols are can be used, e.g., for identifying nucleic acids encoding a described in Ausubel and Sambrook, supra. Alternatively, the polypeptide with an enzyme, structural or binding activity or amplification may comprise a ligase chain reaction, 3SR, or fragments thereof or for identifying polypeptide, enzyme, protein, e.g. structural or binding protein, genes. In one Strand displacement reaction. (See Barany, F. “The Ligase aspect, the probe comprises at least 10 consecutive bases of a Chain Reaction in a PCR World”, PCR Methods and Appli nucleic acid of the invention. Alternatively, a probe of the cations 1:5-16, 1991; E. Fahy et al., “Self-sustained Sequence invention can be at least about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, Replication (3SR): An Isothermal Transcription-based 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, Amplification System Alternative to PCR. PCR Methods 70, 80,90, 100, 110, 120, 130, 150 or about 10 to 50, about 20 and Applications 1:25-33, 1991; and Walker G. T. et al., to 60 about 30 to 70, consecutive bases of a sequence as set “Strand Displacement Amplification—an Isothermal in vitro forth in a nucleic acid of the invention. The probes identify a DNA Amplification Technique'. Nucleic Acid Research nucleic acid by binding and/or hybridization. The probes can 20:1691-1696, 1992). In such procedures, the nucleic acids in be used in arrays of the invention, see discussion below, the sample are contacted with the probes, the amplification including, e.g., capillary arrays. The probes of the invention reaction is performed and any resulting amplification product can also be used to isolate other nucleic acids or polypeptides. is detected. The amplification product may be detected by 0408. The isolated nucleic acids of the invention, the performing gel electrophoresis on the reaction products and sequences complementary thereto, or a fragment comprising staining the gel with an intercalator Such as ethidium bro at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, mide. Alternatively, one or more of the probes may be labeled 400, or 500 consecutive bases of one of the sequences of the invention, or the sequences complementary thereto may also with a radioactive isotope and the presence of a radioactive be used as probes to determine whether a biological sample, amplification product may be detected by autoradiography Such as a Soil sample, contains an organism having a nucleic after gel electrophoresis. acid sequence of the invention oran organism from which the 0413 Probes derived from sequences near the ends of the nucleic acid was obtained. In such procedures, a biological sequences of the invention, may also be used in chromosome sample potentially harboring the organism from which the walking procedures to identify clones containing genomic nucleic acid was isolated is obtained and nucleic acids are sequences located adjacent to the sequences of the invention. obtained from the sample. The nucleic acids are contacted Such methods allow the isolation of genes which encode with the probe under conditions which permit the probe to additional proteins from the host organism. specifically hybridize to any complementary sequences from 0414. The isolated nucleic acids of the invention, the which are present therein. sequences complementary thereto, or a fragment comprising 04.09 Where necessary, conditions which permit the at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, probe to specifically hybridize to complementary sequences 400, or 500 consecutive bases of one of the sequences of the may be determined by placing the probe in contact with invention, or the sequences complementary thereto may be complementary sequences from samples known to contain used as probes to identify and isolate related nucleic acids. In the complementary sequence as well as control sequences Some aspects, the related nucleic acids may be cDNAS or which do not contain the complementary sequence. Hybrid genomic from organisms other than the one from ization conditions, such as the salt concentration of the which the nucleic acid was isolated. For example, the other hybridization buffer, the formamide concentration of the organisms may be related organisms. In Such procedures, a hybridization buffer, or the hybridization temperature, may nucleic acid sample is contacted with the probe under condi be varied to identify conditions which allow the probe to tions which permit the probe to specifically hybridize to hybridize specifically to complementary nucleic acids. related sequences. Hybridization of the probe to nucleic acids 0410. If the sample contains the organism from which the from the related organism is then detected using any of the nucleic acid was isolated, specific hybridization of the probe methods described above. is then detected. Hybridization may be detected by labeling 0415 By varying the stringency of the hybridization con the probe with a detectable agent such as a radioactive iso ditions used to identify nucleic acids, such as cDNAs or tope, a fluorescent dye oran enzyme capable of catalyzing the genomic DNAs, which hybridize to the detectable probe, formation of a detectable product. nucleic acids having different levels of homology to the probe 0411 Many methods for using the labeled probes to detect can be identified and isolated. Stringency may be varied by the presence of complementary nucleic acids in a sample are conducting the hybridization at varying temperatures below familiar to those skilled in the art. These include Southern the melting temperatures of the probes. The melting tempera Blots, Northern Blots, colony hybridization procedures and ture, T, is the temperature (under defined ionic strength and dot blots. Protocols for each of these procedures are provided pH) at which 50% of the target sequence hybridizes to a in Ausubel et al. Current Protocols in Molecular Biology, perfectly complementary probe. Very stringent conditions are John Wiley 503 Sons, Inc. (1997) and Sambrook et al., selected to be equal to or about 5°C. lower than the T for a Molecular Cloning: A Laboratory Manual 2nd Ed., Cold particular probe. The melting temperature of the probe may Spring Harbor Laboratory Press (1989. be calculated using the following formulas: US 2012/0266329 A1 Oct. 18, 2012 76

0416) For probes between 14 and 70 nucleotides in length 0422. Inhibition of a polypeptide, enzyme, protein, e.g. the melting temperature (T,) is calculated using the formula: structural or binding protein, expression can have a variety of T81.5+16.6(log Na+I)+0.41 (fraction G+C)-(600/N) industrial applications. For example, inhibition of a polypep where N is the length of the probe. tide, enzyme, protein, e.g. structural or binding protein, 0417. If the hybridization is carried out in a solution con expression can slow or prevent spoilage. In one aspect, use of taining formamide, the melting temperature may be calcu compositions of the invention that inhibit the expression and/ lated using the equation: T81.5+16.6(log Na+I)+0.41 or activity of a polypeptide, enzyme, protein, e.g. structural or (fraction G+C)-(0.63% formamide)-(600/N) where N is the binding protein, e.g., antibodies, antisense oligonucleotides, length of the probe. ribozymes and RNAi are used to slow or prevent spoilage. 0418 Prehybridization may be carried out in 6xSSC, Thus, in one aspect, the invention provides methods and 5xDenhardt's reagent, 0.5% SDS, 100 ug denatured frag compositions comprising application onto a plant or plant mented salmon sperm DNA or 6xSSC, 5xDenhardt's reagent, 0.5% SDS, 100 ug denatured fragmented salmon sperm product (e.g., a cereal, a grain, a fruit, seed, root, leaf, etc.) DNA, 50% formamide. The formulas for SSC and Denhardt's antibodies, antisense oligonucleotides, ribozymes and RNAi Solutions are listed in Sambrook et al., Supra. of the invention to slow or prevent spoilage. These composi 0419) Hybridization is conducted by adding the detectable tions also can be expressed by the plant (e.g., a transgenic probe to the prehybridization solutions listed above. Where plant) or another organism (e.g., a bacterium or other micro the probe comprises double stranded DNA, it is denatured organism transformed with a polypeptide, enzyme, protein, before addition to the hybridization solution. The filter is e.g. structural or binding protein, gene of the invention). contacted with the hybridization solution for a sufficient 0423. The compositions of the invention for the inhibition period of time to allow the probe to hybridize to cDNAs or of a polypeptide, enzyme, protein, e.g. structural or binding genomic DNAS containing sequences complementary thereto protein, expression, e.g., antisense, iRNA (e.g., siRNA, or homologous thereto. For probes over 200 nucleotides in miRNA), ribozymes, antibodies, can be used as pharmaceu length, the hybridization may be carried out at 15-25° C. tical compositions, e.g., as anti-pathogen agents or in other below the T. For shorter probes, such as oligonucleotide therapies, e.g., as anti-microbials for, e.g., Salmonella, or to probes, the hybridization may be conducted at 5-10°C. below neutralize a biological warfare agent, e.g., anthrax. the T. In one aspect, for hybridizations in 6xSSC, the hybridization is conducted at approximately 68°C. Usually, Antisense Oligonucleotides for hybridizations in 50% formamide containing solutions, the hybridization is conducted at approximately 42°C. 0424 The invention provides antisense oligonucleotides 0420 Inhibiting Expression of Polypeptides, Enzymes, capable of binding a polypeptide, enzyme, protein, e.g. struc Proteins tural or binding protein, message which, in one aspect, can 0421. The invention provides nucleic acids complemen inhibit a polypeptide, enzyme, protein, e.g. structural or bind tary to (e.g., antisense sequences to) the nucleic acids of the ing protein, activity by targeting mRNA. Strategies for invention, e.g., nucleic acids comprising antisense, iRNA, designing antisense oligonucleotides are well described in the ribozymes. Nucleic acids of the invention comprising anti Scientific and patent literature, and the skilled artisan can sense sequences can be capable of inhibiting the transport, design such a polypeptide, enzyme, protein, e.g. structural or splicing or transcription of polypeptide, enzyme, protein, e.g. binding protein, oligonucleotides using the novel reagents of structural or binding protein genes. The inhibition can be the invention. For example, gene walking/RNA mapping pro effected through the targeting of genomic DNA or messenger tocols to screen for effective antisense oligonucleotides are RNA. The transcription or function of targeted nucleic acid well known in the art, see, e.g., Ho (2000) Methods Enzymol. can be inhibited, for example, by hybridization and/or cleav 314:168-183, describing an RNA mapping assay, which is age. In one aspect, inhibitors of the invention include oligo based on standard molecular techniques to provide an easy nucleotides which are able to either bind a polypeptide, and reliable method for potent antisense sequence selection. enzyme, protein, e.g. structural or binding protein, gene or See also Smith (2000) Eur. J. Pharm. Sci. 11:191-198. message, in either case preventing or inhibiting the produc 0425 Naturally occurring nucleic acids are used as anti tion or function of a polypeptide, enzyme, protein, e.g. struc sense oligonucleotides. The antisense oligonucleotides can tural or binding protein. The association can be through be of any length; for example, in alternative aspects, the sequence specific hybridization. Another useful class of antisense oligonucleotides are between about 5 to 100, about inhibitors includes oligonucleotides which cause inactivation 10 to 80, about 15 to 60, about 18 to 40. The optimal length or cleavage of a polypeptide, enzyme, protein, e.g. structural can be determined by routine Screening. The antisense oligo or binding protein, message. The oligonucleotide can have nucleotides can be present at any concentration. The optimal enzyme activity which causes Such cleavage. Such as concentration can be determined by routine screening. A wide ribozymes. The oligonucleotide can be chemically modified variety of synthetic, non-naturally occurring nucleotide and or conjugated to an enzyme or composition capable of cleav nucleic acid analogues are known which can address this ing the complementary nucleic acid. A pool of many different potential problem. For example, peptide nucleic acids such oligonucleotides can be screened for those with the (PNAS) containing non-ionic backbones, such as N-(2-ami desired activity. Thus, the invention provides various compo noethyl)glycine units can be used. Antisense oligonucle sitions for the inhibition of a polypeptide, enzyme, protein, otides having phosphorothioate linkages can also be used, as e.g. structural or binding protein, expression on a nucleic acid described in WO 97/03211; WO 96/3915.4; Mata (1997) and/or protein level, e.g., antisense, iRNA and ribozymes Toxicol Appl Pharmacol 144:189-197; Antisense Therapeu comprising a polypeptide, enzyme, protein, e.g. structural or tics, ed. Agrawal (Humana Press, Totowa, N.J., 1996). Anti binding protein, sequences of the invention and the anti sense oligonucleotides having synthetic DNA backbone ana polypeptide, anti-enzyme, anti-protein, e.g. anti-structural or logues provided by the invention can also include phosphoro anti-binding protein antibodies of the invention. dithioate, methylphosphonate, phosphoramidate, alkyl US 2012/0266329 A1 Oct. 18, 2012 77 phosphotriester, Sulfamate, 3'-thioacetal, methylene(meth No. 4,987,071. The recitation of these specific motifs is not ylimino), 3'-N-carbamate, and morpholino carbamate nucleic intended to be limiting. Those skilled in the art will recognize acids, as described above. that a ribozyme of the invention, e.g., an enzymatic RNA 0426 Combinatorial chemistry methodology can be used molecule of this invention, can have a specific Substrate bind to create vast numbers of oligonucleotides that can be rapidly ing site complementary to one or more of the target gene RNA screened for specific oligonucleotides that have appropriate regions. A ribozyme of the invention can have a nucleotide binding affinities and specificities toward any target. Such as sequence within or Surrounding that Substrate binding site the sense and antisense a polypeptide, enzyme, protein, e.g. which imparts an RNA cleaving activity to the molecule. structural or binding protein, sequences of the invention (see, 0430 RNA Interference (RNAi) e.g., Gold (1995).J. of Biol. Chem. 270: 13581-13584). 0431. In one aspect, the invention provides an RNA inhibi 0427. Inhibitory Ribozymes tory molecule, a so-called “RNAi molecule, comprising a 0428 The invention provides ribozymes capable of bind polypeptide, enzyme, protein, e.g. structural or binding pro ing a polypeptide, enzyme, protein, e.g. structural or binding tein, sequence of the invention. The RNAi molecule com protein, message. These ribozymes can inhibit a polypeptide, prises a double-stranded RNA (dsRNA) molecule. The RNAi enzyme, protein, e.g. structural orbinding protein, activity by, can inhibit expression of a polypeptide, enzyme, protein, e.g. e.g., targeting mRNA. Strategies for designing ribozymes and structural or binding protein, gene. In one aspect, the RNAi is selecting the polypeptide, enzyme, protein, e.g. structural or about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex binding protein-specific antisense sequence for targeting are nucleotides in length. While the invention is not limited by well described in the scientific and patent literature, and the any particular mechanism of action, the RNAi can enter a cell skilled artisan can design such ribozymes using the novel and cause the degradation of a single-stranded RNA (ssRNA) reagents of the invention. Ribozymes act by binding to a of similar or identical sequences, including endogenous target RNA through the target RNA binding portion of a mRNAs. When a cell is exposed to double-stranded RNA ribozyme which is held in close proximity to an enzymatic (dsRNA), mRNA from the homologous gene is selectively portion of the RNA that cleaves the target RNA. Thus, the degraded by a process called RNA interference (RNAi). A ribozyme recognizes and binds a target RNA through possible basic mechanism behind RNAi, e.g., siRNA for complementary base-pairing, and once bound to the correct inhibiting transcription and/or miRNA to inhibit translation, site, acts enzymatically to cleave and inactivate the target is the breaking of a double-stranded RNA (dsRNA) matching RNA. Cleavage of a target RNA in such a manner will destroy a specific gene sequence into short pieces called short inter its ability to direct synthesis of an encoded protein if the fering RNA, which trigger the degradation of mRNA that cleavage occurs in the coding sequence. After a ribozyme has matches its sequence. In one aspect, the RNAi’s of the inven bound and cleaved its RNA target, it can be released from that tion are used in gene-silencing therapeutics, see, e.g., Shuey RNA to bind and cleave new targets repeatedly. (2002) Drug Discov. Today 7:1040-1046. In one aspect, the 0429. In some circumstances, the enzymatic nature of a invention provides methods to selectively degrade RNA using ribozyme can be advantageous over other technologies. Such the RNAi’s of the invention. The process may be practiced in as antisense technology (where a nucleic acid molecule sim vitro, ex vivo or in vivo. In one aspect, the RNAi molecules of ply binds to a nucleic acid target to block its transcription, the invention can be used to generate a loss-of-function muta translation or association with another molecule) as the effec tion in a cell, an organ oran animal. Methods for making and tive concentration of ribozyme necessary to effect atherapeu using RNAi molecules for selectively degrade RNA are well tic treatment can be lower than that of an antisense oligo known in the art, see, e.g., U.S. Pat. Nos. 6,506.559; 6.511, nucleotide. This potential advantage reflects the ability of the 824; 6,515,109; 6,489,127. ribozyme to act enzymatically. Thus, a single ribozyme mol 0432 Modification of Nucleic Acids ecule is able to cleave many molecules of target RNA. In 0433. The invention provides methods of generating vari addition, a ribozyme is typically a highly specific inhibitor, ants of the nucleic acids of the invention, e.g., those encoding with the specificity of inhibition depending not only on the a polypeptide, enzyme, protein, e.g. structural or binding base pairing mechanism of binding, but also on the mecha protein. These methods can be repeated or used in various nism by which the molecule inhibits the expression of the combinations to generate a polypeptide, enzyme, protein, e.g. RNA to which it binds. That is, the inhibition is caused by structural or binding protein, having an altered or different cleavage of the RNA target and so specificity is defined as the activity or an altered or different stability from that of a ratio of the rate of cleavage of the targeted RNA over the rate polypeptide, enzyme, protein, e.g. structural or binding pro of cleavage of non-targeted RNA. This cleavage mechanism tein, encoded by the template nucleic acid. These methods is dependent upon factors additional to those involved in base also can be repeated or used in various combinations, e.g., to pairing. Thus, the specificity of action of a ribozyme can be generate variations in gene/message expression, message greater than that of antisense oligonucleotide binding the translation or message stability. In another aspect, the genetic same RNA site. The ribozyme of the invention, e.g., an enzy composition of a cell is altered by, e.g., modification of a matic ribozyme RNA molecule, can be formed in a hammer homologous gene ex vivo, followed by its reinsertion into the head motif, a hairpin motif, as a hepatitis delta virus motif, a cell. group I intron motif and/or an RNaseP-like RNA in associa 0434. A nucleic acid of the invention can be altered by any tion with an RNA guide sequence. Examples of hammerhead means. For example, random or stochastic methods, or, non motifs are described by, e.g., Rossi (1992) Aids Research and stochastic, or "directed evolution, methods, see, e.g., U.S. Human Retroviruses 8:183; hairpin motifs by Hampel (1989) Pat. No. 6,361.974. Methods for random mutation of genes Biochemistry 28:4929, and Hampel (1990) Nuc. Acids Res. are well known in the art, see, e.g., U.S. Pat. No. 5,830,696. 18:299; the hepatitis delta virus motif by Perrotta (1992) For example, mutagens can be used to randomly mutate a Biochemistry 31:16; the RNaseP motif by Guerrier-Takada gene. Mutagens include, e.g., ultraviolet light or gamma irra (1983) Cell 35:849; and the group I intron by Cech U.S. Pat. diation, or a chemical mutagen, e.g., mitomycin, nitrous acid, US 2012/0266329 A1 Oct. 18, 2012 photoactivated psoralens, alone or in combination, to induce binatorial multiple cassette mutagenesis creates all the per DNA breaks amenable to repair by recombination. Other mutations of mutant and wildtype cassettes’ BioTechniques chemical mutagens include, for example, Sodium bisulfite, 18:194-195; Stemmer et al. (1995) “Single-step assembly of nitrous acid, hydroxylamine, hydrazine or formic acid. Other a gene and entire plasmid form large numbers of oligodeox mutagens are analogues of nucleotide precursors, e.g., yribonucleotides' Gene, 164:49-53; Stemmer (1995) “The nitrosoguanidine, 5-bromouracil, 2-aminopurine, oracridine. Evolution of Molecular Computation” Science 270: 1510; These agents can be added to a PCR reaction in place of the Stemmer (1995) “Searching Sequence Space” Bio/Technol nucleotide precursor thereby mutating the sequence. Interca ogy 13:549-553; Stemmer (1994) “Rapid evolution of a pro lating agents such as proflavine, acriflavine, quinacrine and tein in vitro by DNA shuffling Nature 370:389-391; and the like can also be used. Stemmer (1994) “DNA shuffling by random fragmentation 0435 Any technique in molecular biology can be used, and reassembly: In vitro recombination for molecular evolu e.g., random PCR mutagenesis, see, e.g., Rice (1992) Proc. tion. Proc. Natl. Acad. Sci., USA 91:10747-10751. Natl. Acad. Sci. USA 89:5467-5471; or, combinatorial mul 0437. Mutational methods of generating diversity include, tiple cassette mutagenesis, see, e.g., Crameri (1995) Biotech for example, site-directed mutagenesis (Ling et al. (1997) niques 18:194-196. Alternatively, nucleic acids, e.g., genes, Approaches to DNA mutagenesis: an overview” Anal Bio can be reassembled after random, or "stochastic fragmen chem. 254(2): 157-178; Dale et al. (1996) “Oligonucleotide tation, see, e.g., U.S. Pat. Nos. 6.291,242: 6.287,862; 6,287, directed random mutagenesis using the phosphorothioate 861; 5,955,358; 5,830,721; 5,824,514; 5,811,238; 5,605,793. method Methods Mol. Biol. 57:369-374; Smith (1985) “In In alternative aspects, modifications, additions or deletions vitro mutagenesis' Ann. Rev. Genet. 19:423-462; Botstein & are introduced by error-prone PCR, shuffling, oligonucle Shortle (1985) “Strategies and applications of in vitro otide-directed mutagenesis, assembly PCR, sexual PCR mutagenesis' Science 229:1193-1201; Carter (1986) “Site mutagenesis, in Vivo mutagenesis, cassette mutagenesis, directed mutagenesis' Biochem. J. 237:1-7: and Kunkel recursive ensemble mutagenesis, exponential ensemble (1987) “The efficiency of oligonucleotide directed mutagen mutagenesis, site-specific mutagenesis, gene reassembly, esis' in Nucleic Acids & Molecular Biology (Eckstein, F. and Gene Site Saturation Mutagenesis (GSSM), synthetic liga Lilley, D. M. J. eds. Springer Verlag, Berlin)); mutagenesis tion reassembly (SLR), recombination, recursive sequence using uracil containing templates (Kunkel (1985) “Rapid and recombination, phosphothioate-modified DNA mutagenesis, efficient site-specific mutagenesis without phenotypic selec uracil-containing template mutagenesis, gapped duplex tion Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. mutagenesis, point mismatch repair mutagenesis, repair-de (1987) “Rapid and efficient site-specific mutagenesis without ficient host strain mutagenesis, chemical mutagenesis, radio phenotypic selection Methods in Enzymol. 154, 367-382; genic mutagenesis, deletion mutagenesis, restriction-selec and Bass et al. (1988) “Mutant Trp repressors with new DNA tion mutagenesis, restriction-purification mutagenesis, binding specificities’ Science 242:240-245); oligonucle artificial gene synthesis, ensemble mutagenesis, chimeric otide-directed mutagenesis (Methods in Enzymol. 100: 468 nucleic acid multimer creation, and/or a combination of these 500 (1983); Methods in Enzymol. 154; 329-350 (1987); and other methods. Zoller (1982) “Oligonucleotide-directed mutagenesis using 0436 The following publications describe a variety of M13-derived vectors: an efficient and general procedure for recursive recombination procedures and/or methods which the production of point mutations in any DNA fragment can be incorporated into the methods of the invention: Stem Nucleic Acids Res. 10:6487-6500; Zoller & Smith (1983) mer (1999) “Molecular breeding of viruses for targeting and “Oligonucleotide-directed mutagenesis of DNA fragments other clinical properties'Tumor Targeting 4:1-4; Ness (1999) cloned into M13 vectors' Methods in Enzymol. 100:468 Nature Biotechnology 17:893-896; Chang (1999)“Evolution 500; and Zoller (1987) Oligonucleotide-directed mutagen of a cytokine using DNA family shuffling Nature Biotech esis: a simple method using two oligonucleotide primers and nology 17:793-797; Minshull (1999) “Protein evolution by a single-stranded DNA template' Methods in Enzymol. 154: molecular breeding Current Opinion in Chemical Biology 329-350); phosphorothioate-modified DNA mutagenesis 3:284-290; Christians (1999) “Directed evolution of thymi (Taylor (1985) “The use of phosphorothioate-modified DNA dine kinase for AZT phosphorylation using DNA family shuf in reactions to prepare nicked DNA'Nucl. fling Nature Biotechnology 17:259-264; Crameri (1998) Acids Res. 13:8749-8764; Taylor (1985) “The rapid genera “DNA shuffling of a family of genes from diverse species tion of oligonucleotide-directed mutations at high frequency accelerates directed evolution' Nature 391:288-291; Crameri using phosphorothioate-modified DNA' Nucl. Acids Res. 13: (1997) “Molecular evolution of an arsenate detoxification 8765-8787 (1985): Nakamaye (1986) “Inhibition of restric pathway by DNA shuffling.” Nature Biotechnology 15:436 tion endonuclease Nici I cleavage by phosphorothioate groups 438; Zhang (1997) “Directed evolution of an effective fucosi and its application to oligonucleotide-directed mutagenesis' dase from a galactosidase by DNA shuffling and screening Nucl. Acids Res. 14:9679–9698; Sayers (1988) “Y-T Exonu Proc. Natl. Acad. Sci. USA94:4504-4509: Patten et al. (1997) cleases in phosphorothioate-based oligonucleotide-directed Applications of DNA Shuffling to Pharmaceuticals and Vac mutagenesis' Nucl. Acids Res. 16:791-802; and Sayers et al. cines’ Current Opinion in Biotechnology 8:724-733; (1988) "Strand specific cleavage of phosphorothioate-con Crameri et al. (1996) “Construction and evolution of anti taining DNA by reaction with restriction endonucleases in the body-phage libraries by DNA shuffling Nature Medicine presence of ethidium bromide' Nucl. Acids Res. 16: 803 2:100-103; Gates et al. (1996) “Affinity selective isolation of 814); mutagenesis using gapped duplex DNA (Kramer et al. ligands from peptide libraries through display on a lac repres (1984) “The gapped duplex DNA approach to oligonucle sor headpiece dimer' Journal of Molecular Biology 255: otide-directed mutation construction' Nucl. Acids Res. 12: 373-386; Stemmer (1996) “Sexual PCR and Assembly PCR” 9441-9456; Kramer & Fritz (1987) Methods in Enzymol. In: The Encyclopedia of Molecular Biology. VCHPublishers, “Oligonucleotide-directed construction of mutations via New York. pp. 447-457; Crameri and Stemmer (1995)“Com gapped duplex DNA' 154:350-367; Kramer (1988) US 2012/0266329 A1 Oct. 18, 2012 79

“Improved enzymatic in vitro reactions in the gapped duplex Vaccine Vector Engineering.” WO99/41368 by Punnonen et DNA approach to oligonucleotide-directed construction of al. “Optimization of Immunomodulatory Properties of mutations' Nucl. Acids Res. 16: 7207; and Fritz (1988)“Oli Genetic Vaccines:” EP 752008 by Stemmer and Crameri, gonucleotide-directed construction of mutations: a gapped “DNA Mutagenesis by Random Fragmentation and Reas duplex DNA procedure without enzymatic reactions in vitro sembly;” EP 0932670 by Stemmer “Evolving Cellular DNA Nucl. Acids Res. 16: 6987-6999). Uptake by Recursive Sequence Recombination: WO 0438. Additional protocols that can be used to practice the 99/23107 by Stemmer et al., “Modification of Virus Tropism invention include point mismatch repair (Kramer (1984) and Host Range by Viral Genome Shuffling.” WO99/21979 “Point Mismatch Repair Cell 38:879-887), mutagenesis by Apt et al., “Human Papillomavirus Vectors: WO using repair-deficient host strains (Carter et al. (1985) 98/31837 by del Cardayre et al. “Evolution of Whole Cells “Improved oligonucleotide site-directed mutagenesis using and Organisms by Recursive Sequence Recombination: WO M13 vectors' Nucl. Acids Res. 13: 4431-4443; and Carter 98/27230 by Patten and Stemmer, “Methods and Composi (1987) “Improved oligonucleotide-directed mutagenesis tions for Polypeptide Engineering.” WO 98/27230 by Stem using M13 vectors' Methods in Enzymol. 154: 382-403), mer et al., “Methods for Optimization of Gene Therapy by deletion mutagenesis (Eghtedarzadeh (1986) “Use of oligo Recursive Sequence Shuffling and Selection.” WO00/00632, nucleotides to generate large deletions' Nucl. Acids Res. 14: “Methods for Generating Highly Diverse Libraries.” WO 5115), restriction-selection and restriction-selection and 00/09679, “Methods for Obtaining in Vitro Recombined restriction-purification (Wells et al. (1986) “Importance of Polynucleotide Sequence Banks and Resulting Sequences.” hydrogen-bond formation in stabilizing the transition state of WO 98/42832 by Arnold et al., “Recombination of Poly subtilisin' Phil. Trans. R. Soc. Lond. A 317: 415-423), nucleotide Sequences. Using Random or Defined Primers.” mutagenesis by total gene synthesis (Nambiar et al. (1984) WO99/29902 by Arnold et al., “Method for Creating Poly “Total synthesis and cloning of a gene coding for the ribonu nucleotide and Polypeptide Sequences.” WO 98/41653 by clease S protein’ Science 223: 1299-1301: Sakamar and Vind, “An in Vitro Method for Construction of a DNA Khorana (1988) “Total synthesis and expression of a gene for Library.” WO 98/41622 by Borchert et al., “Method for Con the a-subunit of bovine rod outer segment guanine nucle structing a Library Using DNA Shuffling,” and WO 98/.42727 otide-binding protein () Nucl. Acids Res. 14: by Pati and Zarling, “Sequence Alterations using Homolo 6361-6372; Wells et al. (1985) “Cassette mutagenesis: an gous Recombination.” efficient method for generation of multiple mutations at 0440 Protocols that can be used to practice the invention defined sites' Gene34:315-323; and Grundstrometal. (1985) (providing details regarding various diversity generating “Oligonucleotide-directed mutagenesis by microscale shot methods) are described, e.g., in U.S. patent application Ser. gun gene synthesis' Nucl. Acids Res. 13: 3305-3316), No. 09/407,800, “SHUFFLING OF CODON ALTERED double-strand break repair (Mandecki (1986); Arnold (1993) GENES’ by Patten et al. filed Sep. 28, 1999; “EVOLUTION “Protein engineering for unusual environments’ Current OF WHOLE CELLS AND ORGANISMS BY RECURSIVE Opinion in Biotechnology 4:450-455. “Oligonucleotide-di SEQUENCE RECOMBINATION” by del Cardayre et al., rected double-strand break repair in plasmids of Escherichia U.S. Pat. No. 6,379,964; “OLIGONUCLEOTIDE MEDI coli: a method for site-specific mutagenesis' Proc. Natl. ATED NUCLEICACID RECOMBINATION” by Crameriet Acad. Sci. USA, 83:7177-7181). Additional details on many al., U.S. Pat. Nos. 6,319,714; 6,368,861; 6,376,246; 6,423, of the above methods can be found in Methods in Enzymol 542; 6,426.224 and PCT/US00/01203: “USE OF CODON ogy Volume 154, which also describes useful controls for VARIED OLIGONUCLEOTIDE SYNTHESIS FOR SYN trouble-shooting problems with various mutagenesis meth THETICSHUFFLING” by Welchet al., U.S. Pat. No. 6,436, ods. 675; “METHODS FOR MAKING CHARACTER 0439 Protocols that can be used to practice the invention STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES are described, e.g., in U.S. Pat. No. 5,605,793 to Stemmer HAVING DESIRED CHARACTERISTICS” by Selifonov et (Feb. 25, 1997), “Methods for In Vitro Recombination.” U.S. al., filed Jan. 18, 2000, (PCT/US00/01202) and, e.g. “METH Pat. No. 5,811.238 to Stemmer et al. (Sep. 22, 1998) “Meth ODS FOR MAKING CHARACTER STRINGS, POLY ods for Generating Polynucleotides having Desired Charac NUCLEOTIDES & POLYPEPTIDES HAVING DESIRED teristics by Iterative Selection and Recombination;'''U.S. Pat. CHARACTERISTICS” by Selifonov et al., filed Jul.18, 2000 No. 5,830,721 to Stemmer et al. (Nov. 3, 1998), “DNA (U.S. Ser. No. 09/618,579); “METHODS OF POPULATING Mutagenesis by Random Fragmentation and Reassembly;' DATA STRUCTURES FOR USE IN EVOLUTIONARY U.S. Pat. No. 5,834,252 to Stemmer, et al. (Nov. 10, 1998) SIMULATIONS''' by Selifonov and Stemmer, filed Jan. 18, "End-Complementary Polymerase Reaction: U.S. Pat. No. 2000 (PCT/US00/01138); and “SINGLE-STRANDED 5,837,458 to Minshull, et al. (Nov. 17, 1998), “Methods and NUCLEICACID TEMPLATE-MEDIATED RECOMBINA Compositions for Cellular and Metabolic Engineering; WO TION AND NUCLEIC ACID FRAGMENT ISOLATION 95/22625, Stemmer and Crameri, “Mutagenesis by Random by Affholter, filed Sep. 6, 2000 (U.S. Ser. No. 09/656,549): Fragmentation and Reassembly:” WO 96/33207 by Stemmer and U.S. Pat. Nos. 6,177.263; 6,153,410. and Lipschutz “End Complementary Polymerase Chain 0441 Non-stochastic, or “directed evolution, methods Reaction:” WO97/20078 by Stemmer and Crameri “Meth include, e.g., Saturation mutagenesis, such as Gene Site Satu ods for Generating Polynucleotides having Desired Charac ration Mutagenesis (GSSM), synthetic ligation reassembly teristics by Iterative Selection and Recombination: WO (SLR), or a combination thereof are used to modify the 97/35966 by Minshull and Stemmer, “Methods and Compo nucleic acids of the invention to generate a polypeptide, sitions for Cellular and Metabolic Engineering:” WO enzyme, protein, e.g. structural or binding protein, with new 99/41402 by Punnonen et al. “Targeting of Genetic Vaccine or altered properties (e.g., activity under highly acidic or Vectors:” WO99/41383 by Punnonen et al. “Antigen Library alkaline conditions, high or low temperatures, and the like). Immunization:” WO99/41369 by Punnonen et al. “Genetic Polypeptides encoded by the modified nucleic acids can be US 2012/0266329 A1 Oct. 18, 2012 screened for an activity before testing for glucan hydrolysis or oligonucleotides containing a degenerate N.N.G/T triplet, 32 other activity. Any testing modality or protocol can be used, individual sequences can code for all 20 possible natural e.g., using a capillary array platform. See, e.g., U.S. Pat. Nos. amino acids. Thus, in a reaction vessel in which a parental 6,361,974; 6,280,926; 5,939,250. polynucleotide sequence is Subjected to Saturation mutagen 0442 Gene Site Saturation Mutagenesis, or, GSSM esis using at least one such oligonucleotide, there are gener 0443) The invention also provides methods for making ated 32 distinct progeny polynucleotides encoding 20 distinct enzyme using Gene Site Saturation mutagenesis, or, GSSM, polypeptides. In contrast, the use of a non-degenerate oligo as described herein, and also in U.S. Pat. Nos. 6,171,820 and nucleotide in site-directed mutagenesis leads to only one 6,579,258. progeny polypeptide product per reaction vessel. Nondegen 0444. In one aspect, codon primers containing a degener erate oligonucleotides can optionally be used in combination ate N.N.G/T sequence are used to introduce point mutations with degenerate primers disclosed; for example, nondegen into a polynucleotide, e.g., a polypeptide, enzyme, protein, erate oligonucleotides can be used to generate specific point e.g. structural or binding protein, or an antibody of the inven mutations in a working polynucleotide. This provides one tion, so as to generate a set of progeny polypeptides in which means to generate specific silent point mutations, point muta a full range of single amino acid Substitutions is represented tions leading to corresponding amino acid changes, and point at each amino acid position, e.g., an amino acid residue in an mutations that cause the generation of stop codons and the enzyme active site or ligand binding site targeted to be modi corresponding expression of polypeptide fragments. fied. These oligonucleotides can comprise a contiguous first 0447. In one aspect, each saturation mutagenesis reaction homologous sequence, a degenerate N.N.G/T sequence, and, vessel contains polynucleotides encoding at least 20 progeny optionally, a second homologous sequence. The downstream polypeptide (e.g., a polypeptide, enzyme, protein, e.g. struc progeny translational products from the use of Such oligo tural or binding protein) molecules such that all 20 natural nucleotides include all possible amino acid changes at each amino acids are represented at the one specific amino acid amino acid site along the polypeptide, because the degen position corresponding to the codon position mutagenized in eracy of the N.N.G/T sequence includes codons for all 20 the parental polynucleotide (other aspects use less than all 20 amino acids. In one aspect, one Such degenerate oligonucle natural combinations). The 32-fold degenerate progeny otide (comprised of, e.g., one degenerate N.N.G/T cassette) is polypeptides generated from each saturation mutagenesis used for Subjecting each original codon in a parental poly reaction vessel can be subjected to clonal amplification (e.g. nucleotide template to a full range of codon Substitutions. In cloned into a suitable host, e.g., E. coif host, using, e.g., an another aspect, at least two degenerate cassettes are used— expression vector) and subjected to expression screening. either in the same oligonucleotide or not, for Subjecting at When an individual progeny polypeptide is identified by least two original codons in a parental polynucleotide tem screening to display a favorable change in property (when plate to a full range of codon Substitutions. For example, more compared to the parental polypeptide. Such as increased glu than one N.N.G/T sequence can be contained in one oligo can hydrolysis activity under alkaline or acidic conditions), it nucleotide to introduce amino acid mutations at more than can be sequenced to identify the correspondingly favorable one site. This plurality of N.N.G/T sequences can be directly amino acid Substitution contained therein. contiguous, or separated by one or more additional nucleotide 0448. In one aspect, upon mutagenizing each and every sequence(s). In another aspect, oligonucleotides serviceable amino acid position in a parental polypeptide using Saturation for introducing additions and deletions can be used either mutagenesis as disclosed herein, favorable amino acid alone or in combination with the codons containing an N.N. changes may be identified at more than one amino acid posi G/T sequence, to introduce any combination or permutation tion. One or more new progeny molecules can be generated of amino acid additions, deletions, and/or Substitutions. that contain a combination of all or part of these favorable 0445. In one aspect, simultaneous mutagenesis of two or amino acid substitutions. For example, if 2 specific favorable more contiguous amino acid positions is done using an oli amino acid changes are identified in each of 3 amino acid gonucleotide that contains contiguous N.N.G/T triplets, i.e. a positions in a polypeptide, the permutations include 3 possi degenerate (N.N.G/T)n sequence. In another aspect, degen bilities at each position (no change from the original amino erate cassettes having less degeneracy than the N.N.G/T acid, and each of two favorable changes) and 3 positions. sequence are used. For example, it may be desirable in some Thus, there are 3x3x3 or 27 total possibilities, including 7that instances to use (e.g. in an oligonucleotide) a degenerate were previously examined-6 single point mutations (i.e. 2 at triplet sequence comprised of only oneN, where said N can be each of three positions) and no change at any position. in the first second or third position of the triplet. Any other 0449 In yet another aspect, site-saturation mutagenesis bases including any combinations and permutations thereof can be used together with shuffling, chimerization, recombi can be used in the remaining two positions of the triplet. nation and other mutagenizing processes, along with screen Alternatively, it may be desirable in Some instances to use ing. This invention provides for the use of any mutagenizing (e.g. in an oligo) a degenerate N.N.N triplet sequence. process(es), including saturation mutagenesis, in an iterative 0446. In one aspect, use of degenerate triplets (e.g., N.N. manner. In one exemplification, the iterative use of any G/T triplets) allows for systematic and easy generation of a mutagenizing process(es) is used in combination with screen full range of possible natural amino acids (for a total of 20 ing. amino acids) into each and every amino acid position in a 0450. The invention also provides for the use of propri polypeptide (in alternative aspects, the methods also include etary codon primers (containing a degenerate N.N.N generation of less than all possible Substitutions per amino sequence) to introduce point mutations into a polynucleotide, acid residue, or codon, position). For example, for a 100 So as to generate a set of progeny polypeptides in which a full amino acid polypeptide, 2000 distinct species (i.e. 20 possible range of singleamino acid Substitutions is represented at each amino acids per positionX 100 amino acid positions) can be amino acid position; e.g., with Gene Site Saturation Mutagen generated. Through the use of an oligonucleotide or set of esis (GSSM). The oligos used are comprised contiguously of US 2012/0266329 A1 Oct. 18, 2012 a first homologous sequence, a degenerate N.N.N sequence nucleotide. This provides a means to generate specific silent and in one aspect but not necessarily a second homologous point mutations, point mutations leading to corresponding sequence. The downstream progeny translational products amino acid changes and point mutations that cause the gen from the use of Such oligos include all possible amino acid eration of stop codons and the corresponding expression of changes at each amino acid site along the polypeptide, polypeptide fragments. because the degeneracy of the N.N.N sequence includes 0456. Thus, in one aspect of this invention, each saturation codons for all 20 amino acids. mutagenesis reaction vessel contains polynucleotides encod 0451. In one aspect, one such degenerate oligo (comprised ing at least 20 progeny polypeptide molecules such that all 20 of one degenerate N.N.N cassette) is used for subjecting each amino acids are represented at the one specific amino acid original codon in a parental polynucleotide template to a full position corresponding to the codon position mutagenized in range of codon Substitutions. In another aspect, at least two the parental polynucleotide. The 32-fold degenerate progeny degenerate N.N.N cassettes are used—either in the same polypeptides generated from each saturation mutagenesis oligo or not, for Subjecting at least two original codons in a reaction vessel can be subjected to clonal amplification (e.g., parental polynucleotide template to a full range of codon cloned into a suitable E. coli host using an expression vector) substitutions. Thus, more than one N.N.N sequence can be and Subjected to expression screening. When an individual contained in one oligo to introduce amino acid mutations at progeny polypeptide is identified by Screening to display a more than one site. This plurality of N.N.N sequences can be favorable change in property (when compared to the parental directly contiguous, or separated by one or more additional polypeptide), it can be sequenced to identify the correspond nucleotide sequence(s). In another aspect, oligos serviceable ingly favorable amino acid Substitution contained therein. for introducing additions and deletions can be used either 0457. It is appreciated that upon mutagenizing each and alone or in combination with the codons containing an N.N.N every amino acid position in a parental polypeptide using sequence, to introduce any combination or permutation of saturation mutagenesis as disclosed herein, favorable amino amino acid additions, deletions and/or substitutions. acid changes may be identified at more than one amino acid 0452. In a particular exemplification, it is possible to position. One or more new progeny molecules can be gener simultaneously mutagenize two or more contiguous amino ated that contain a combination of all or part of these favor acid positions using an oligo that contains contiguous N.N.N able amino acid substitutions. For example, if 2 specific triplets, i.e. a degenerate (N.N.N), sequence. favorable amino acid changes are identified in each of 3 0453. In another aspect, the present invention provides for amino acid positions in a polypeptide, the permutations the use of degenerate cassettes having less degeneracy than include 3 possibilities at each position (no change from the the N.N.N sequence. For example, it may be desirable in original amino acid and each of two favorable changes) and 3 Some instances to use (e.g. in an oligo) a degenerate triplet positions. Thus, there are 3x3x3 or 27 total possibilities, sequence comprised of only one N, where the N can be in the including 7 that were previously examined-6 single point first second or third position of the triplet. Any other bases mutations (i.e., 2 at each of three positions) and no change at including any combinations and permutations thereof can be any position. used in the remaining two positions of the triplet. Alterna 0458. Thus, in a non-limiting exemplification, this inven tively, it may be desirable in some instances to use (e.g., in an tion provides for the use of saturation mutagenesis in combi oligo) a degenerate N.N.N triplet sequence, N.N.G/T, or an nation with additional mutagenization processes, such as pro N.N. G/C triplet sequence. cess where two or more related polynucleotides are 0454. It is appreciated, however, that the use of a degen introduced into a suitable host cell such that a hybrid poly erate triplet (such as N.N.G/T or an N.N. G/C triplet nucleotide is generated by recombination and reductive reas sequence) as disclosed in the instant invention is advanta SOrtment. geous for several reasons. In one aspect, this invention pro 0459. In addition to performing mutagenesis along the vides a means to systematically and fairly easily generate the entire sequence of a gene, the instant invention provides that substitution of the full range of possible amino acids (for a mutagenesis can be use to replace each of any number of total of 20 amino acids) into each and every amino acid bases in a polynucleotide sequence, wherein the number of positionina polypeptide. Thus, for a 100 amino acid polypep bases to be mutagenized is in one aspect every integer from 15 tide, the invention provides away to systematically and fairly to 100,000. Thus, instead of mutagenizing every position easily generate 2000 distinct species (i.e., 20 possible amino along a molecule, one can Subject every or a discrete number acids per position times 100 amino acid positions). It is appre of bases (in one aspect a subset totaling from 15 to 100,000) ciated that there is provided, through the use of an oligo to mutagenesis. In one aspect, a separate nucleotide is used containing a degenerate N.N.G/T or an N.N. G/C triplet for mutagenizing each position or group of positions along a sequence, 32 individual sequences that code for 20 possible polynucleotide sequence. A group of 3 positions to be amino acids. Thus, in a reaction vessel in which a parental mutagenized may be a codon. The mutations can be intro polynucleotide sequence is Subjected to Saturation mutagen duced using a mutagenic primer, containing a heterologous esis using one Such oligo, there are generated 32 distinct cassette, also referred to as a mutagenic cassette. Exemplary progeny polynucleotides encoding 20 distinct polypeptides. cassettes can have from 1 to 500 bases. Each nucleotide In contrast, the use of a non-degenerate oligo in site-directed position in Such heterologous cassettes be N.A, C, G, T, A/C, mutagenesis leads to only one progeny polypeptide product A/G, A/T, C/G, C/T, G/T, C/G/T, A/G/T, A/C/T, A/C/G, or E, per reaction vessel. where E is any base that is not A, C, G, or T (E can be referred 0455 This invention also provides for the use of nonde to as a designer oligo). generate oligos, which can optionally be used in combination 0460. In a general sense, Saturation mutagenesis is com with degenerate primers disclosed. It is appreciated that in prised of mutagenizing a complete set of mutagenic cassettes Some situations, it is advantageous to use nondegenerate oli (wherein each cassette is in one aspect about 1-500 bases in gos to generate specific point mutations in a working poly length) in defined polynucleotide sequence to be US 2012/0266329 A1 Oct. 18, 2012 mutagenized (wherein the sequence to be mutagenized is in design. This method includes the steps of generating by one aspect from about 15 to 100,000 bases in length). Thus, a design a plurality of specific nucleic acid building blocks group of mutations (ranging from 1 to 100 mutations) is having serviceable mutually compatible ligatable ends, and introduced into each cassette to be mutagenized. A grouping assembling these nucleic acid building blocks, such that a of mutations to be introduced into one cassette can be differ designed overall assembly order is achieved. ent or the same from a second grouping of mutations to be introduced into a second cassette during the application of 0467. The mutually compatible ligatable ends of the one round of Saturation mutagenesis. Such groupings are nucleic acid building blocks to be assembled are considered exemplified by deletions, additions, groupings of particular to be “serviceable' for this type of ordered assembly if they codons and groupings of particular nucleotide cassettes. enable the building blocks to be coupled in predetermined 0461) Defined sequences to be mutagenized include a orders. Thus, the overall assembly order in which the nucleic whole gene, pathway, cDNA, an entire open reading frame acid building blocks can be coupled is specified by the design (ORF) and entire promoter, enhancer, repressor/transactiva of the ligatable ends. If more than one assembly step is to be tor, origin of replication, intron, operator, or any polynucle used, then the overall assembly order in which the nucleic otide functional group. Generally, a “defined sequences” for acid building blocks can be coupled is also specified by the this purpose may be any polynucleotide that a 15 base-poly sequential order of the assembly step(s). In one aspect, the nucleotide sequence and polynucleotide sequences of lengths annealed building pieces are treated with an enzyme. Such as between 15 bases and 15,000 bases (this invention specifi a ligase (e.g. T4DNA ligase), to achieve covalent bonding of cally names every integer in between). Considerations in the building pieces. choosing groupings of codons include types of amino acids 0468. In one aspect, the design of the oligonucleotide encoded by a degenerate mutagenic cassette. building blocks is obtained by analyzing a set of progenitor 0462. In one exemplification a grouping of mutations that nucleic acid sequence templates that serve as a basis for can be introduced into a mutagenic cassette, this invention producing a progeny set of finalized chimeric polynucle specifically provides for degenerate codon Substitutions (us otides. These parental oligonucleotide templates thus serve as ing degenerate oligos) that code for 2,3,4,5,6,7,8,9, 10, 11, a source of sequence information that aids in the design of the 12, 13, 14, 15, 16, 17, 18, 19 and 20 amino acids at each nucleic acid building blocks that are to be mutagenized, e.g., position and a library of polypeptides encoded thereby. chimerized or shuffled. In one aspect of this method, the 0463 Synthetic Ligation Reassembly (SLR) sequences of a plurality of parental nucleic acidtemplates are 0464. The invention provides a non-stochastic gene modi fication system termed “synthetic ligation reassembly, or aligned in order to select one or more demarcation points. The simply “SLR, a “directed evolution process to generate demarcation points can be located at an area of homology, and polypeptides, e.g., a polypeptide, enzyme, protein, e.g. struc are comprised of one or more nucleotides. These demarcation tural or binding protein, or antibodies of the invention, with points are in one aspect shared by at least two of the progeni new or altered properties. SLR is a method of ligating oligo tor templates. The demarcation points can thereby be used to nucleotide fragments together non-stochastically. This delineate the boundaries of oligonucleotide building blocks method differs from stochastic oligonucleotide shuffling in to be generated in order to rearrange the parental polynucle that the nucleic acid building blocks are not shuffled, concat otides. The demarcation points identified and selected in the enated or chimerized randomly, but rather are assembled progenitor molecules serve as potential chimerization points non-stochastically. See, e.g., U.S. Pat. Nos. 6,773,900; 6,740, in the assembly of the final chimeric progeny molecules. A 506; 6,713,282; 6,635,449; 6,605,449; 6,537,776. demarcation point can be an area of homology (comprised of 0465. In one aspect, SLR comprises the following steps: at least one homologous nucleotide base) shared by at least (a) providing a template polynucleotide, wherein the template two parental polynucleotide sequences. Alternatively, a polynucleotide comprises sequence encoding a homologous demarcation point can be an area of homology that is shared gene; (b) providing a plurality of building block polynucle by at least half of the parental polynucleotide sequences, or, it otides, wherein the building block polynucleotides are can be an area of homology that is shared by at least two thirds designed to cross-over reassemble with the template poly of the parental polynucleotide sequences. Even more in one nucleotide at a predetermined sequence, and a building block aspect a serviceable demarcation points is an area of homol polynucleotide comprises a sequence that is a variant of the ogy that is shared by at least three fourths of the parental homologous gene and a sequence homologous to the template polynucleotide sequences, or, it can be shared by at almost all polynucleotide flanking the variant sequence; (c) combining of the parental polynucleotide sequences. In one aspect, a a building block polynucleotide with a template polynucle demarcation point is an area of homology that is shared by all otide such that the building block polynucleotide cross-over of the parental polynucleotide sequences. reassembles with the template polynucleotide to generate 0469. In one aspect, a ligation reassembly process is per polynucleotides comprising homologous gene sequence formed exhaustively in order to generate an exhaustive library variations. of progeny chimeric polynucleotides. In other words, all pos 0466 SLR does not depend on the presence of high levels sible ordered combinations of the nucleic acid building of homology between polynucleotides to be rearranged. blocks are represented in the set of finalized chimeric nucleic Thus, this method can be used to non-stochastically generate acid molecules. At the same time, in another aspect, the libraries (or sets) of progeny molecules comprised of over assembly order (i.e. the order of assembly of each building 10' different chimeras. SLR can be used to generate librar block in the 5' to 3 sequence of each finalized chimeric nucleic ies comprised of over 10" different progeny chimeras. acid) in each combination is by design (or non-stochastic) as Thus, aspects of the present invention include non-stochastic described above. Because of the non-stochastic nature of this methods of producing a set of finalized chimeric nucleic acid invention, the possibility of unwanted side products is greatly molecule shaving an overall assembly order that is chosen by reduced. US 2012/0266329 A1 Oct. 18, 2012

0470. In another aspect, the ligation reassembly method is building blocks having serviceable mutually compatible performed systematically. For example, the method is per ligatable ends and assembling these nucleic acid building formed in order to generate a systematically compartmental blocks, such that a designed overall assembly order is ized library of progeny molecules, with compartments that achieved. can be screened systematically, e.g. one by one. In other 0474 The mutually compatible ligatable ends of the words this invention provides that, through the selective and judicious use of specific nucleic acid building blocks, coupled nucleic acid building blocks to be assembled are considered with the selective and judicious use of sequentially stepped to be “serviceable' for this type of ordered assembly if they assembly reactions, a design can be achieved where specific enable the building blocks to be coupled in predetermined sets of progeny products are made in each of several reaction orders. Thus, in one aspect, the overall assembly order in vessels. This allows a systematic examination and screening which the nucleic acid building blocks can be coupled is procedure to be performed. Thus, these methods allow a specified by the design of the ligatable ends and, if more than potentially very large number of progeny molecules to be one assembly step is to be used, then the overall assembly examined systematically in Smaller groups. Because of its order in which the nucleic acid building blocks can be ability to perform chimerizations in a manner that is highly coupled is also specified by the sequential order of the assem flexible yet exhaustive and systematic as well, particularly bly step(s). In a one aspect of the invention, the annealed when there is a low level of homology among the progenitor building pieces are treated with an enzyme, such as a ligase molecules, these methods provide for the generation of a (e.g., T4 DNA ligase) to achieve covalent bonding of the library (or set) comprised of a large number of progeny mol building pieces. ecules. Because of the non-stochastic nature of the instant 0475 Ina another aspect, the design of nucleic acid build ligation reassembly invention, the progeny molecules gener ing blocks is obtained upon analysis of the sequences of a set ated in one aspect comprise a library of finalized chimeric of progenitor nucleic acid templates that serve as a basis for nucleic acid molecules having an overall assembly order that producing a progeny set of finalized chimeric nucleic acid is chosen by design. The Saturation mutagenesis and opti molecules. These progenitor nucleic acid templates thus mized directed evolution methods also can be used to gener serve as a source of sequence information that aids in the ate different progeny molecular species. It is appreciated that design of the nucleic acid building blocks that are to be the invention provides freedom of choice and control regard mutagenized, i.e. chimerized or shuffled. ing the selection of demarcation points, the size and number 0476. In one exemplification, the invention provides for of the nucleic acid building blocks, and the size and design of the chimerization of a family of related genes and their the couplings. It is appreciated, furthermore, that the require encoded family of related products. In a particular exempli ment for intermolecular homology is highly relaxed for the fication, the encoded products are enzymes. The polypeptide, operability of this invention. In fact, demarcation points can enzyme, protein, e.g. structural or binding proteins of the even be chosen in areas of little or no intermolecular homol present invention can be mutagenized in accordance with the ogy. For example, because of codon wobble, i.e. the degen methods described herein. eracy of codons, nucleotide Substitutions can be introduced 0477 Thus according to one aspect of the invention, the into nucleic acid building blocks without altering the amino sequences of a plurality of progenitor nucleic acid templates acid originally encoded in the corresponding progenitor tem (e.g., polynucleotides of the invention) are aligned in order to plate. Alternatively, a codon can be altered such that the select one or more demarcation points, which demarcation coding for an originally amino acid is altered. This invention points can be located at an area of homology. The demarca provides that such substitutions can be introduced into the tion points can be used to delineate the boundaries of nucleic nucleic acid building block in order to increase the incidence acid building blocks to be generated. Thus, the demarcation of intermolecular homologous demarcation points and thus to points identified and selected in the progenitor molecules allow an increased number of couplings to beachieved among serve as potential chimerization points in the assembly of the the building blocks, which in turn allows a greater number of progeny molecules. progeny chimeric molecules to be generated. 0478. Typically a serviceable demarcation point is an area 0471. In one aspect, the present invention provides a non of homology (comprised of at least one homologous nucle stochastic method termed synthetic gene reassembly, that is otide base) shared by at least two progenitor templates, but the Somewhat related to stochastic shuffling, save that the nucleic demarcation point can be an area of homology that is shared acid building blocks are not shuffled or concatenated or chi by at least half of the progenitor templates, at least two thirds merized randomly, but rather are assembled non-stochasti of the progenitor templates, at least three fourths of the pro cally. genitor templates and in one aspect at almost all of the pro 0472. The synthetic gene reassembly method does not genitor templates. Even more in one aspect still a serviceable depend on the presence of a high level of homology between demarcation point is an area of homology that is shared by all polynucleotides to be shuffled. The invention can be used to of the progenitor templates. non-stochastically generate libraries (or sets) of progeny mol 0479. In a one aspect, the gene reassembly process is ecules comprised of over 10" different chimeras. Conceiv performed exhaustively in order to generate an exhaustive ably, synthetic gene reassembly can even be used to generate library. In other words, all possible ordered combinations of libraries comprised of over 10" different progeny chime the nucleic acid building blocks are represented in the set of aS finalized chimeric nucleic acid molecules. At the same time, 0473. Thus, in one aspect, the invention provides a non the assembly order (i.e. the order of assembly of each building stochastic method of producing a set of finalized chimeric block in the 5' to 3 sequence of each finalized chimeric nucleic nucleic acid molecules having an overall assembly order that acid) in each combination is by design (or non-stochastic). is chosen by design, which method is comprised of the steps Because of the non-stochastic nature of the method, the pos of generating by design a plurality of specific nucleic acid sibility of unwanted side products is greatly reduced. US 2012/0266329 A1 Oct. 18, 2012

0480. In another aspect, the method provides that the gene 0486 A man-made gene produced using the invention can reassembly process is performed systematically, for example also serve as a substrate for recombination with another to generate a systematically compartmentalized library, with nucleic acid. Likewise, a man-made gene pathway produced compartments that can be screened systematically, e.g., one by one. In other words the invention provides that, through the using the invention can also serve as a Substrate for recombi selective and judicious use of specific nucleic acid building nation with another nucleic acid. In one aspect, the recombi blocks, coupled with the selective and judicious use of nation is facilitated by, or occurs at, areas of homology sequentially stepped assembly reactions, an experimental between the man-made, intron-containing gene and a nucleic design can be achieved where specific sets of progeny prod acid, which serves as a recombination partner. In one aspect, ucts are made in each of several reaction vessels. This allows the recombination partner may also be a nucleic acid gener a systematic examination and screening procedure to be per ated by the invention, including a man-made gene or a man formed. Thus, it allows a potentially very large number of made gene pathway. Recombination may be facilitated by or progeny molecules to be examined systematically in Smaller may occurat areas of homology that exist at the one (or more) groups. artificially introduced intron(s) in the man-made gene. 0481 Because of its ability to perform chimerizations in a manner that is highly flexible yet exhaustive and systematic as 0487. The synthetic gene reassembly method of the inven well, particularly when there is a low level of homology tion utilizes a plurality of nucleic acid building blocks, each of among the progenitor molecules, the instant invention pro which in one aspect has two ligatable ends. The two ligatable vides for the generation of a library (or set) comprised of a ends on each nucleic acid building block may be two blunt large number of progeny molecules. Because of the non ends (i.e. each having an overhang of Zero nucleotides), or in stochastic nature of the instant gene reassembly invention, the one aspect one blunt end and one overhang, or more in one progeny molecules generated in one aspect comprise a library aspect still two overhangs. of finalized chimeric nucleic acid molecules having an overall assembly order that is chosen by design. In a particularly 0488 A useful overhang for this purpose may be a 3' aspect, such a generated library is comprised of greater than overhang or a 5' overhang. Thus, a nucleic acid building block 10 to greater than 10" different progeny molecular spe may have a 3' overhang or alternatively a 5' overhang or C1GS. alternatively two 3' overhangs or alternatively two 5" over 0482 In one aspect, a set of finalized chimeric nucleic acid hangs. The overall order in which the nucleic acid building molecules, produced as described is comprised of a poly blocks are assembled to form a finalized chimeric nucleic nucleotide encoding a polypeptide. According to one aspect, acid molecule is determined by purposeful experimental this polynucleotide is a gene, which may be a man-made design and is not random. gene. According to another aspect, this polynucleotide is a gene pathway, which may be a man-made gene pathway. The 0489. In one aspect, a nucleic acid building block is gen invention provides that one or more man-made genes gener erated by chemical synthesis of two single-stranded nucleic ated by the invention may be incorporated into a man-made acids (also referred to as single-stranded oligos) and contact gene pathway, such as pathway operable in a eukaryotic ing them so as to allow them to anneal to form a double organism (including a plant). Stranded nucleic acid building block. 0483. In another exemplification, the synthetic nature of 0490 A double-stranded nucleic acid building block can the step in which the building blocks are generated allows the be of variable size. The sizes of these building blocks can be design and introduction of nucleotides (e.g., one or more Small or large. Exemplary sizes for building block range from nucleotides, which may be, for example, codons or introns or 1 base pair (not including any overhangs) to 100,000 base regulatory sequences) that can later be optionally removed in pairs (not including any overhangs). Other exemplary size an in vitro process (e.g., by mutagenesis) or in an in vivo ranges are also provided, which have lower limits of from 1 bp process (e.g., by utilizing the gene splicing ability of a host to 10,000 bp (including every integer value in between) and organism). It is appreciated that in many instances the intro upper limits of from 2 bp to 100,000 bp (including every duction of these nucleotides may also be desirable for many integer value in between). other reasons in addition to the potential benefit of creating a 0491. Many methods exist by which a double-stranded serviceable demarcation point. nucleic acid building block can be generated that is service 0484 Thus, according to another aspect, the invention able for the invention; and these are known in the art and can provides that a nucleic acid building block can be used to be readily performed by the skilled artisan. introduce an intron. Thus, the invention provides that func 0492. According to one aspect, a double-stranded nucleic tional introns may be introduced into a man-made gene of the acid building block is generated by first generating two single invention. The invention also provides that functional introns Stranded nucleic acids and allowing them to anneal to form a may be introduced into a man-made gene pathway of the double-stranded nucleic acid building block. The two strands invention. Accordingly, the invention provides for the gen of a double-stranded nucleic acid building block may be eration of a chimeric polynucleotide that is a man-made gene complementary at every nucleotide apart from any that form containing one (or more) artificially introduced intron(s). an overhang; thus containing no mismatches, apart from any 0485. Accordingly, the invention also provides for the overhang(s). According to another aspect, the two strands of generation of a chimeric polynucleotide that is a man-made a double-stranded nucleic acid building block are comple gene pathway containing one (or more) artificially introduced mentary at fewer than every nucleotide apart from any that intron(s). In one aspect, the artificially introduced intron(s) form an overhang. Thus, according to this aspect, a double are functional in one or more host cells for gene splicing much Stranded nucleic acid building block can be used to introduce in the way that naturally-occurring introns serve functionally codon degeneracy. In one aspect the codon degeneracy is in gene splicing. The invention provides a process of produc introduced using the site-saturation mutagenesis described ing man-made intron-containing polynucleotides to be intro herein, using one or more N.N.G/T cassettes or alternatively duced into host organisms for recombination and/or splicing. using one or more N.N.N cassettes. US 2012/0266329 A1 Oct. 18, 2012

0493. The in vivo recombination method of the invention the boundaries on the functional variety between the chimeric can be performed blindly on a pool of unknown hybrids or molecules is reduced. This provides a more manageable num alleles of a specific polynucleotide or sequence. However, it is ber of variables when calculating which oligonucleotide from not necessary to know the actual DNA or RNA sequence of the original parental polynucleotides might be responsible for the specific polynucleotide. affecting a particular trait. 0494 The approach of using recombination within a 0500. One method for creating a chimeric progeny poly mixed population of genes can be useful for the generation of nucleotide sequence is to create oligonucleotides correspond any useful proteins, for example, interleukin I, antibodies, ing to fragments or portions of each parental sequence. Each tPA and growth hormone. This approach may be used to oligonucleotide in one aspect includes a unique region of generate proteins having altered specificity or activity. The overlap so that mixing the oligonucleotides together results in approach may also be useful for the generation of hybrid a new variant that has each oligonucleotide fragment nucleic acid sequences, for example, promoter regions, assembled in the correct order. Alternatively protocols for introns, exons, enhancer sequences, 31 untranslated regions practicing these methods of the invention can be found in U.S. or 51 untranslated regions of genes. Thus this approach may Pat. Nos. 6,773,900; 6,740,506; 6,713,282; 6,635,449; 6,605, be used to generate genes having increased rates of expres 449; 6,537,776; 6,361,974. Sion. This approach may also be useful in the study of repeti 0501. The number of oligonucleotides generated for each tive DNA sequences. Finally, this approach may be useful to parental variant bears a relationship to the total number of mutate ribozymes or aptamers. resulting crossovers in the chimeric molecule that is ulti 0495. In one aspect the invention described herein is mately created. For example, three parental nucleotide directed to the use of repeated cycles of reductive reassort sequence variants might be provided to undergo a ligation ment, recombination and selection which allow for the reaction in order to find a chimeric variant having, for directed molecular evolution of highly complex linear example, greater activity at high temperature. As one sequences. Such as DNA, RNA or proteins thorough recom example, a set of 50 oligonucleotide sequences can be gen bination. erated corresponding to each portions of each parental vari 0496 Optimized Directed Evolution System ant. Accordingly, during the ligation reassembly process 0497. The invention provides a non-stochastic gene modi there could be up to 50 crossover events within each of the fication system termed “optimized directed evolution sys chimeric sequences. The probability that each of the gener tem' to generate polypeptides, e.g., a polypeptide, enzyme, ated chimeric polynucleotides will contain oligonucleotides protein, e.g. structural or binding protein, orantibodies of the from each parental variant in alternating order is very low. If invention, with new or altered properties. Optimized directed each oligonucleotide fragment is present in the ligation reac evolution is directed to the use of repeated cycles of reductive tion in the same molar quantity it is likely that in some reassortment, recombination and selection that allow for the positions oligonucleotides from the same parental polynucle directed molecular evolution of nucleic acids through recom otide will ligate next to one another and thus not result in a bination. Optimized directed evolution allows generation of a crossover event. If the concentration of each oligonucleotide large population of evolved chimeric sequences, wherein the from each parent is kept constant during any ligation step in generated population is significantly enriched for sequences this example, there is a "/3 chance (assuming 3 parents) that an that have a predetermined number of crossover events. oligonucleotide from the same parental variant will ligate 0498. A crossover event is a point in a chimeric sequence within the chimeric sequence and produce no crossover. where a shift in sequence occurs from one parental variant to 0502. Accordingly, a probability density function (PDF) another parental variant. Such a point is normally at the junc can be determined to predict the population of crossover ture of where oligonucleotides from two parents are ligated events that are likely to occur during each step in a ligation together to form a single sequence. This method allows cal reaction given a set number of parental variants, a number of culation of the correct concentrations of oligonucleotide oligonucleotides corresponding to each variant, and the con sequences so that the final chimeric population of sequences centrations of each variant during each step in the ligation is enriched for the chosen number of crossover events. This reaction. The statistics and mathematics behind determining provides more control over choosing chimeric variants hav the PDF is described below. By utilizing these methods, one ing a predetermined number of crossover events. can calculate such a probability density function, and thus 0499. In addition, this method provides a convenient enrich the chimeric progeny population for a predetermined means for exploring a tremendous amount of the possible number of crossover events resulting from a particular liga protein variant space in comparison to other systems. Previ tion reaction. Moreover, a target number of crossover events ously, if one generated, for example, 10' chimeric molecules can be predetermined, and the system then programmed to during a reaction, it would be extremely difficult to test such calculate the starting quantities of each parental oligonucle a high number of chimeric variants for a particular activity. otide during each step in the ligation reaction to result in a Moreover, a significant portion of the progeny population probability density function that centers on the predetermined would have a very high number of crossover events which number of crossover events. These methods are directed to the resulted in proteins that were less likely to have increased use of repeated cycles of reductive reassortment, recombina levels of a particular activity. By using these methods, the tion and selection that allow for the directed molecular evo population of chimerics molecules can be enriched for those lution of a nucleic acid encoding a polypeptide through variants that have a particular number of crossover events. recombination. This system allows generation of a large Thus, although one can still generate 10' chimeric molecules population of evolved chimeric sequences, wherein the gen during a reaction, each of the molecules chosen for further erated population is significantly enriched for sequences that analysis most likely has, for example, only three crossover have a predetermined number of crossover events. A cross events. Because the resulting progeny population can be over event is a point in a chimeric sequence where a shift in skewed to have a predetermined number of crossover events, sequence occurs from one parental variant to another parental US 2012/0266329 A1 Oct. 18, 2012

variant. Such a point is normally at the juncture of where which are unrelated, allows more efficient exploration all of oligonucleotides from two parents are ligated together to the possible protein variants that might be provide a particular form a single sequence. The method allows calculation of the trait or activity. correct concentrations of oligonucleotide sequences so that 0510 InVivo Shuffling the final chimeric population of sequences is enriched for the 0511. In vivo shuffling of molecules is use in methods of chosen number of crossover events. This provides more con the invention that provide variants of polypeptides of the trol over choosing chimeric variants having a predetermined invention, e.g., antibodies, a polypeptide, enzyme, protein, number of crossover events. e.g. structural or binding protein, and the like. In vivo shuf 0503. In addition, these methods provide a convenient fling can be performed utilizing the natural property of cells to means for exploring a tremendous amount of the possible recombine multimers. While recombination in vivo has pro protein variant space in comparison to other systems. By vided the major natural route to molecular diversity, genetic using the methods described herein, the population of chi recombination remains a relatively complex process that merics molecules can be enriched for those variants that have involves 1) the recognition of homologies; 2) Strand cleavage, a particular number of crossover events. Thus, although one Strand invasion, and metabolic steps leading to the production can still generate 10' chimeric molecules during a reaction, of recombinant chiasma; and finally 3) the resolution of chi each of the molecules chosen for further analysis most likely asma into discrete recombined molecules. The formation of has, for example, only three crossover events. Because the the chiasma requires the recognition of homologous resulting progeny population can be skewed to have a prede Sequences. termined number of crossover events, the boundaries on the 0512. In another aspect, the invention includes a method functional variety between the chimeric molecules is for producing a hybrid polynucleotide from at least a first reduced. This provides a more manageable number of Vari polynucleotide and a second polynucleotide. The invention ables when calculating which oligonucleotide from the origi can be used to produce a hybrid polynucleotide by introduc nal parental polynucleotides might be responsible for affect ing at least a first polynucleotide and a second polynucleotide ing a particular trait. (e.g., one, or both, being an exemplary polypeptide-, 0504. In one aspect, the method creates a chimeric prog enzyme-, protein-, e.g. structural orbinding protein-encoding eny polynucleotide sequence by creating oligonucleotides sequence of the invention) which share at least one region of corresponding to fragments or portions of each parental partial sequence homology into a suitable host cell. The sequence. Each oligonucleotide in one aspect includes a regions of partial sequence homology promote processes unique region of overlap so that mixing the oligonucleotides which result in sequence reorganization producing a hybrid together results in a new variant that has each oligonucleotide polynucleotide. The term “hybrid polynucleotide', as used fragment assembled in the correct order. See also U.S. Ser. herein, is any nucleotide sequence which results from the No. 09/332,835. method of the present invention and contains sequence from 0505 Determining Crossover Events at least two original polynucleotide sequences. Such hybrid 0506 Aspects of the invention include a system and soft polynucleotides can result from intermolecular recombina ware that receive a desired crossover probability density func tion events which promote sequence integration between tion (PDF), the number of parent genes to be reassembled, DNA molecules. In addition, such hybrid polynucleotides and the number of fragments in the reassembly as inputs. The can result from intramolecular reductive reassortment pro output of this program is a “fragment PDF that can be used cesses which utilize repeated sequences to alter a nucleotide to determine a recipe for producing reassembled genes, and sequence within a DNA molecule. the estimated crossover PDF of those genes. The processing 0513. In vivo reassortment is focused on “inter-molecu described herein is in one aspect performed in MATLABTM lar processes collectively referred to as “recombination' (The MathWorks, Natick, Mass.) a programming language which in bacteria, is generally viewed as a “RecA-dependent’ and development environment for technical computing. phenomenon. The invention can rely on recombination pro 0507. Iterative Processes cesses of a host cell to recombine and re-assort sequences, or 0508. In practicing the invention, these processes can be the cells’ ability to mediate reductive processes to decrease iteratively repeated. For example, a nucleic acid (or, the the complexity of quasi-repeated sequences in the cell by nucleic acid) responsible for an altered or new a polypeptide, deletion. This process of “reductive reassortment’ occurs by enzyme, protein, e.g. structural or binding protein, phenotype an “intra-molecular”. RecA-independent process. is identified, re-isolated, again modified, re-tested for activity. 0514. Therefore, in another aspect of the invention, novel This process can be iteratively repeated until a desired phe polynucleotides can be generated by the process of reductive notype is engineered. For example, an entire biochemical reassortment. The method involves the generation of con anabolic or catabolic pathway can be engineered into a cell, structs containing consecutive sequences (original encoding including, e.g., a polypeptide, enzyme, protein, e.g. structural sequences), their insertion into an appropriate vector and their or binding protein, activity. Subsequent introduction into an appropriate host cell. The 0509 Similarly, if it is determined that a particular oligo reassortment of the individual molecular identities occurs by nucleotide has no affect at all on the desired trait (e.g., a new combinatorial processes between the consecutive sequences a polypeptide, enzyme, protein, e.g. structural or binding in the construct possessing regions of homology, or between protein, phenotype), it can be removed as a variable by Syn quasi-repeated units. The reassortment process recombines thesizing larger parental oligonucleotides that include the and/or reduces the complexity and extent of the repeated sequence to be removed. Since incorporating the sequence sequences and results in the production of novel molecular within a larger sequence prevents any crossover events, there species. Various treatments may be applied to enhance the will no longer be any variation of this sequence in the progeny rate of reassortment. These could include treatment with polynucleotides. This iterative practice of determining which ultra-violet light, or DNA damaging chemicals and/or the use oligonucleotides are most related to the desired trait, and of host cell lines displaying enhanced levels of "genetic insta US 2012/0266329 A1 Oct. 18, 2012 bility”. Thus the reassortment process may involve homolo 0524 3) The recovery of vectors containing interrupted gous recombination or the natural property of quasi-repeated genes which can be selected when insert size decreases. sequences to direct their own evolution. 0525. 4) The use of direct selection techniques with an 0515 Repeated or “quasi-repeated sequences play a role expression vector and the appropriate selection. in genetic instability. In the present invention, “quasi-repeats' 0526 Encoding sequences (for example, genes) from are repeats that are not restricted to their original unit struc related organisms may demonstrate a high degree of homol ture. Quasi-repeated units can be presented as an array of ogy and encode quite diverse protein products. These types of sequences in a construct; consecutive units of similar sequences are particularly useful in the present invention as sequences. Once ligated, the junctions between the consecu tive sequences become essentially invisible and the quasi quasi-repeats. However, while the examples illustrated below repetitive nature of the resulting construct is now continuous demonstrate the reassortment of nearly identical original at the molecular level. The deletion process the cell performs encoding sequences (quasi-repeats), this process is not lim to reduce the complexity of the resulting construct operates ited to Such nearly identical repeats. between the quasi-repeated sequences. The quasi-repeated 0527 The following example demonstrates a method of units provide a practically limitless repertoire of templates the invention. Encoding nucleic acid sequences (quasi-re upon which slippage events can occur. The constructs con peats) derived from three (3) unique species are described. taining the quasi-repeats thus effectively provide Sufficient Each sequence encodes a protein with a distinct set of prop molecular elasticity that deletion (and potentially insertion) erties. Each of the sequences differs by a single or a few base events can occur virtually anywhere within the quasi-repeti pairs at a unique position in the sequence. The quasi-repeated tive units. sequences are separately or collectively amplified and ligated 0516. When the quasi-repeated sequences are all ligated in into random assemblies such that all possible permutations the same orientation, for instance head to tailor vice versa, the and combinations are available in the population of ligated cell cannot distinguish individual units. Consequently, the molecules. The number of quasi-repeat units can be con reductive process can occur throughout the sequences. In trolled by the assembly conditions. The average number of contrast, when for example, the units are presented head to quasi-repeated units in a construct is defined as the repetitive head, rather than head to tail, the inversion delineates the endpoints of the adjacent unit so that deletion formation will index (RI). favor the loss of discrete units. Thus, it is preferable with the 0528. Once formed, the constructs may, or may not be size present method that the sequences are in the same orientation. fractionated on an agarose gel according to published proto Random orientation of quasi-repeated sequences will result cols, inserted into a cloning vector and transfected into an in the loss of reassortment efficiency, while consistent orien appropriate host cell. The cells are then propagated and tation of the sequences will offer the highest efficiency. How “reductive reassortment' is effected. The rate of the reductive ever, while having fewer of the contiguous sequences in the reassortment process may be stimulated by the introduction same orientation decreases the efficiency, it may still provide of DNA damage if desired. Whether the reduction in RI is sufficient elasticity for the effective recovery of novel mol mediated by deletion formation between repeated sequences ecules. Constructs can be made with the quasi-repeated by an “intra-molecular mechanism, or mediated by recom sequences in the same orientation to allow higher efficiency. bination-like events through “inter-molecular mechanisms 0517 Sequences can be assembled in a head to tail orien is immaterial. The end result is a reassortment of the mol tation using any of a variety of methods, including the fol ecules into all possible combinations. lowing: 0529 Optionally, the method comprises the additional 0518 a) Primers that include a poly-Ahead and poly-T step of screening the library members of the shuffled pool to tail which when made single-stranded would provide identify individual shuffled library members having the abil orientation can be utilized. This is accomplished by hav ity to bind or otherwise interact, or catalyze a particular ing the first few bases of the primers made from RNA reaction (e.g., such as catalytic domain of an enzyme) with a and hence easily removed RNaseH. predetermined macromolecule. Such as for example a pro 0519 b) Primers that include unique restriction cleav teinaceous receptor, an oligosaccharide, virion, or other pre age sites can be utilized. Multiple sites, a battery of determined compound or structure. unique sequences and repeated synthesis and ligation 0530. The polypeptides that are identified from such steps would be required. libraries can be used for therapeutic, diagnostic, research and 0520 c) The inner few bases of the primer could be related purposes (e.g., catalysts, solutes for increasing osmo thiolated and an exonuclease used to produce properly larity of an aqueous solution and the like) and/or can be tailed molecules. Subjected to one or more additional cycles of shuffling and/or 0521. The recovery of the re-assorted sequences relies on selection. the identification of cloning vectors with a reduced repetitive 0531. In another aspect, it is envisioned that prior to or index (RI). The re-assorted encoding sequences can then be during recombination or reassortment, polynucleotides gen recovered by amplification. The products are re-cloned and erated by the method of the invention can be subjected to expressed. The recovery of cloning vectors with reduced RI agents or processes which promote the introduction of muta can be affected by: tions into the original polynucleotides. The introduction of 0522 1) The use of vectors only stably maintained Such mutations would increase the diversity of resulting when the construct is reduced in complexity. hybrid polynucleotides and polypeptides encoded therefrom. 0523 2) The physical recovery of shortened vectors by The agents or processes which promote mutagenesis can physical procedures. In this case, the cloning vector include, but are not limited to: (+)-CC-1065, or a synthetic would be recovered using standard plasmid isolation analog such as (+)-CC-1065-(N3-Adenine (See Sun and Hur procedures and size fractionated on either an agarose ley, (1992); an N-acetylated or deacetylated 4'-fluoro-4-ami gel, or column with a low molecular weight cut off nobiphenyl adduct capable of inhibiting DNA synthesis (See, utilizing standard procedures. for example, van de Poll et al. (1992)); or a N-acetylated or US 2012/0266329 A1 Oct. 18, 2012

deacetylated 4-aminobiphenyl adduct capable of inhibiting Caldwell (1992) PCR Methods Applic. 2:28-33. Briefly, in DNA synthesis (See also, van de Poll et al. (1992), pp. 751 Such procedures, nucleic acids to be mutagenized are mixed 758); trivalent chromium, a trivalent chromium salt, a poly with PCR primers, reaction buffer, MgCl, MnCl, Taq poly cyclic aromatic hydrocarbon (PAH) DNA adduct capable of merase and an appropriate concentration of dNTPs for inhibiting DNA replication, such as 7-bromomethyl-benzo. achieving a high rate of point mutation along the entire length anthracene (“BMA'), tris(2,3-dibromopropyl)phosphate of the PCR product. For example, the reaction may be per (“Tris-BP), 1,2-dibromo-3-chloropropane (“DBCP), formed using 20 (moles of nucleic acid to be mutagenized, 30 2-bromoacrolein (2BA), benzo C. pyrene-7,8-dihydrodio 1 pmole of each PCR primer, a reaction buffer comprising 50 9-10-epoxide (“BPDE'), a platinum(II) halogen salt, N-hy mMKC1, 10 mM Tris HCl (pH 8.3) and 0.01% gelatin, 7 mM droxy-2-amino-3-methylimidazo[4,5-f-quinoline (“N-hy MgCl2, 0.5 mM MnCl, 5 units of Taq polymerase, 0.2 mM droxy-IQ) and N-hydroxy-2-amino-1-methyl-6- dGTP, 0.2 mM dATP, 1 mM dCTP and 1 mM dTTP, PCR may phenylimidazo 4.5-f-pyridine (“N-hydroxy-PhIP). be performed for 30 cycles of 94° C. for 1 min, 45° C. for 1 Exemplary means for slowing or halting PCR amplification min, and 72°C. for 1 min. However, it will be appreciated that consist of UV light (+)-CC-1065 and (+)-CC-1065-(N3-Ad these parameters may be varied as appropriate. The enine). Particularly encompassed means are DNA adducts or mutagenized nucleic acids are cloned into an appropriate polynucleotides comprising the DNA adducts from the poly vector and the activities of the polypeptides encoded by the nucleotides or polynucleotides pool, which can be released or mutagenized nucleic acids are evaluated. removed by a process including heating the solution compris 0537 Variants may also be created using oligonucleotide ing the polynucleotides prior to further processing. directed mutagenesis to generate site-specific mutations in 0532. In another aspect the invention is directed to a any cloned DNA of interest. Oligonucleotide mutagenesis is method of producing recombinant proteins having biological described, e.g., in Reidhaar-Olson (1988) Science 241:53-57. activity by treating a sample comprising double-stranded Briefly, in such procedures a plurality of double stranded template polynucleotides encoding a wild-type protein under oligonucleotides bearing one or more mutations to be intro conditions according to the invention which provide for the duced into the cloned DNA are synthesized and inserted into production of hybrid or re-assorted polynucleotides. the cloned DNA to be mutagenized. Clones containing the 0533. Producing Sequence Variants mutagenized DNA are recovered and the activities of the 0534. The invention also provides additional methods for polypeptides they encode are assessed. making sequence variants of the nucleic acid (e.g., polypep 0538 Another method for generating variants is assembly tide, enzyme, protein, e.g. structural or binding protein) PCR. Assembly PCR involves the assembly of a PCR product sequences of the invention. The invention also provides addi from a mixture of small DNA fragments. A large number of tional methods for isolating a polypeptide, enzyme, protein, different PCR reactions occur in parallel in the same vial, with e.g. structural or binding protein, using the nucleic acids and the products of one reaction priming the products of another polypeptides of the invention. In one aspect, the invention reaction. Assembly PCR is described in, e.g., U.S. Pat. No. provides for variants of a polypeptide, enzyme, protein, e.g. 5,965,408. Structural or binding protein, coding sequence (e.g., a gene, 0539 Still another method of generating variants is sexual cDNA or message) of the invention, which can be altered by PCR mutagenesis. In sexual PCR mutagenesis, forced any means, including, e.g., random or stochastic methods, or, homologous recombination occurs between DNA molecules non-stochastic, or “directed evolution, methods, as of different but highly related DNA sequence in vitro, as a described above. result of random fragmentation of the DNA molecule based 0535 The isolated variants may be naturally occurring. on sequence homology, followed by fixation of the crossover Variant can also be created in vitro. Variants may be created by primer extension in a PCR reaction. Sexual PCR mutagen using genetic engineering techniques such as site directed esis is described, e.g., in Stemmer (1994) Proc. Natl. Acad. mutagenesis, random chemical mutagenesis, Exonuclease III Sci. USA'91: 10747-10751. Briefly, in such procedures a plu deletion procedures, and standard cloning techniques. Alter rality of nucleic acids to be recombined are digested with natively, such variants, fragments, analogs, or derivatives DNase to generate fragments having an average size of may be created using chemical synthesis or modification 50-200 nucleotides. Fragments of the desired average size are procedures. Other methods of making variants are also famil purified and resuspended in a PCR mixture. PCR is conducted iar to those skilled in the art. These include procedures in under conditions which facilitate recombination between the which nucleic acid sequences obtained from natural isolates nucleic acid fragments. For example, PCR may be performed are modified to generate nucleic acids which encode polypep by resuspending the purified fragments at a concentration of tides having characteristics which enhance their value in 10-30 ng/ul in a solution of 0.2 mM of each dNTP 2.2 mM industrial or laboratory applications. In Such procedures, a MgCl, 50 mM KCL, 10 mM Tris HCl, pH 9.0, and 0.1% large number of variant sequences having one or more nucle Triton X-100. 2.5 units of Taq polymerase per 100:1 of reac otide differences with respect to the sequence obtained from tion mixture is added and PCR is performed using the follow the natural isolate are generated and characterized. These ing regime: 94° C. for 60 seconds, 94° C. for 30 seconds, nucleotide differences can result in amino acid changes with 50–55°C. for 30 seconds, 72° C. for 30 seconds (30-45 times) respect to the polypeptides encoded by the nucleic acids from and 72°C. for 5 minutes. However, it will be appreciated that the natural isolates. these parameters may be varied as appropriate. In some 0536 For example, variants may be created using error aspects, oligonucleotides may be included in the PCR reac prone PCR. In error prone PCR, PCR is performed under tions. In other aspects, the Klenow fragment of DNA poly conditions where the copying fidelity of the DNA polymerase merase I may be used in a first set of PCR reactions and Taq is low, such that a high rate of point mutations is obtained polymerase may be used in a Subsequent set of PCR reactions. along the entire length of the PCR product. Error prone PCR Recombinant sequences are isolated and the activities of the is described, e.g., in Leung (1989) Technique 1:11-15) and polypeptides they encode are assessed. US 2012/0266329 A1 Oct. 18, 2012

0540 Variants may also be created by in vivo mutagen another residue bearing an amide group; exchange of a basic esis. In some aspects, random mutations in a sequence of residue such as Lysine and Arginine with another basic resi interest are generated by propagating the sequence of interest due; and replacement of an aromatic residue such as Pheny in a bacterial strain, such as an E. coli strain, which carries lalanine, Tyrosine with another aromatic residue. mutations in one or more of the DNA repair pathways. Such “mutator strains have a higher random mutation rate than (0547. Other variants are those in which one or more of the that of a wild-type parent. Propagating the DNA in one of amino acid residues of a polypeptide of the invention includes these strains will eventually generate random mutations a Substituent group. within the DNA. Mutator strains suitable for use for in vivo 0548 Still other variants are those in which the polypep mutagenesis are described in PCT Publication No. WO tide is associated with another compound, Such as a com 91/16427, published Oct. 31, 1991, entitled “Methods for pound to increase the half-life of the polypeptide (for Phenotype Creation from Multiple Gene Populations”. example, polyethylene glycol). 0541 Variants may also be generated using cassette 0549. Additional variants are those in which additional mutagenesis. In cassette mutagenesis a small region of a amino acids are fused to the polypeptide, Such as a leader double stranded DNA molecule is replaced with a synthetic sequence, a secretory sequence, a proprotein sequence or a oligonucleotide “cassette' that differs from the native sequence which facilitates purification, enrichment, or stabi sequence. The oligonucleotide often contains completely lization of the polypeptide. and/or partially randomized native sequence. 0550. In some aspects, the fragments, derivatives and ana 0542. Recursive ensemble mutagenesis may also be used logs retain the same biological function or activity as the to generate variants. Recursive ensemble mutagenesis is an polypeptides of the invention. In other aspects, the fragment, algorithm for protein engineering (protein mutagenesis) derivative, or analog includes a proprotein, such that the frag developed to produce diverse populations of phenotypically ment, derivative, or analog can be activated by cleavage of the related mutants whose members differ in amino acid proprotein portion to produce an active polypeptide. sequence. This method uses a feedback mechanism to control 0551 Optimizing Codons to Achieve High Levels of Pro Successive rounds of combinatorial cassette mutagenesis. tein Expression in Host Cells Recursive ensemble mutagenesis is described, e.g., in Arkin 0552. The invention provides methods for modifying (1992) Proc. Natl. Acad. Sci. USA 89:7811-7815. polypeptide-, enzyme-, protein-, e.g. structural or binding 0543. In some aspects, variants are created using exponen protein-encoding nucleic acids to modify codon usage. In one tial ensemble mutagenesis. Exponential ensemble mutagen aspect, the invention provides methods for modifying codons esis is a process for generating combinatorial libraries with a in a nucleic acid encoding a polypeptide, enzyme, protein, high percentage of unique and functional mutants, wherein e.g. structural or binding protein, to increase or decrease its Small groups of residues are randomized in parallel to iden expression in a host cell. The invention also provides nucleic tify, at each altered position, amino acids which lead to func acids encoding a polypeptide, enzyme, protein, e.g. structural tional proteins. Exponential ensemble mutagenesis is or binding protein, modified to increase its expression in a described, e.g., in Delegrave (1993) Biotechnology Res. host cell, a polypeptide, enzyme, protein, e.g. structural or 11:1548-1552. Random and site-directed mutagenesis are binding protein, so modified, and methods of making the described, e.g., in Arnold (1993) Current Opinion in Biotech modified a polypeptide, enzyme, protein, e.g. structural or nology 4:450-455. binding protein. The method comprises identifying a “non 0544. In some aspects, the variants are created using shuf preferred’ or a “less preferred codon in a polypeptide fling procedures wherein portions of a plurality of nucleic enzyme-, protein-, e.g. structural orbinding protein-encoding acids which encode distinct polypeptides are fused together nucleic acid and replacing one or more of these non-preferred to create chimeric nucleic acid sequences which encode chi or less preferred codons with a “preferred codon’ encoding meric polypeptides as described in U.S. Pat. No. 5,965,408, the same amino acid as the replaced codon and at least one filed Jul. 9, 1996, entitled, “Method of DNA Reassembly by non-preferred or less preferred codon in the nucleic acid has Interrupting Synthesis” and U.S. Pat. No. 5,939,250, filed been replaced by a preferred codon encoding the same amino May 22, 1996, entitled, “Production of Enzymes Having acid. A preferred codon is a codon over-represented in coding Desired Activities by Mutagenesis. sequences in genes in the host cell and a non-preferred or less 0545. The variants of the polypeptides of the invention preferred codon is a codon under-represented in coding may be variants in which one or more of the amino acid sequences in genes in the host cell. residues of the polypeptides of the sequences of the invention 0553 Host cells for expressing the nucleic acids, expres are substituted with a conserved or non-conserved amino acid sion cassettes and vectors of the invention include bacteria, residue (in one aspect a conserved amino acid residue) and yeast, fungi, plant cells, insect cells and mammalian cells. Such substituted amino acid residue may or may not be one Thus, the invention provides methods for optimizing codon encoded by the genetic code. usage in all of these cells, codon-altered nucleic acids and 0546 Conservative substitutions are those that substitute a polypeptides made by the codon-altered nucleic acids. Exem given amino acid in a polypeptide by another amino acid of plary host cells include gram negative bacteria, Such as like characteristics. Typically seen as conservative Substitu Escherichia coli: gram positive bacteria, Such as Streptomy tions are the following replacements: replacements of an ces sp., Lactobacillusgasseri, Lactococcus lactis, LactoCOC aliphatic amino acid such as Alanine, Valine, Leucine and cus Cremoris, Bacillus subtilis, Bacillus cereus. Exemplary Isoleucine with another aliphatic amino acid; replacement of host cells also include eukaryotic organisms, e.g., various a Serine with a Threonine or vice versa; replacement of an yeast, Such as Saccharomyces sp., including Saccharomyces acidic residue Such as Aspartic acid and Glutamic acid with cerevisiae, Schizosaccharomyces pombe, Pichia pastoris, another acidic residue; replacement of a residue bearing an and Kluyveromyces lactis, Hansenula polymorpha, Aspergil amide group. Such as Asparagine and Glutamine, with lus niger, and mammalian cells and cell lines and insect cells US 2012/0266329 A1 Oct. 18, 2012 90 and cell lines. Thus, the invention also includes nucleic acids “knockout animal.” e.g., a "knockout mouse.” engineered not and polypeptides optimized for expression in these organisms to express an endogenous gene, which is replaced with a gene and species. expressing a polypeptide, enzyme, protein, e.g. structural or 0554 For example, the codons of a nucleic acid encoding binding protein, of the invention, or, a fusion protein com a polypeptide, enzyme, protein, e.g. structural or binding prising a polypeptide, enzyme, protein, e.g. structural orbind protein, isolated from a bacterial cell are modified such that ing protein, of the invention. the nucleic acid is optimally expressed in a bacterial cell 0559 Transgenic Plants and Seeds different from the bacteria from which the polypeptide, 0560. The invention provides transgenic plants and seeds enzyme, protein, e.g. structural or binding protein was comprising a nucleic acid, a polypeptide (e.g., a polypeptide, derived, a yeast, a fungi, a plant cell, an insect cell or a enzyme, protein, e.g. structural or binding protein), an mammalian cell. Methods for optimizing codons are well expression cassette or vector or a transfected or transformed known in the art, see, e.g., U.S. Pat. No. 5,795.737; Baca cell of the invention. The invention also provides plant prod (2000) Int. J. Parasitol. 30:113-118; Hale (1998) Protein ucts, e.g., oils, seeds, leaves, extracts and the like, comprising Expr. Purif. 12:185-188: Narum (2001) Infect. Immun. a nucleic acid and/or a polypeptide (e.g., a polypeptide, 69:7250-7253. See also Narum (2001) Infect. Immun. enzyme, protein, e.g. structural or binding protein) of the 69:7250-7253, describing optimizing codons in mouse sys invention. The transgenic plant can be dicotyledonous (a tems; Outchkourov (2002) Protein Expr. Purif. 24:18-24, dicot) or monocotyledonous (a monocot). The invention also describing optimizing codons in yeast; Feng (2000) Bio provides methods of making and using these transgenic chemistry 39:15399-15409, describing optimizing codons in plants and seeds. The transgenic plant or plant cell expressing E. coli; Humphreys (2000) Protein Expr. Purif. 20:252-264, a polypeptide of the present invention may be constructed in describing optimizing codon usage that affects secretion in E. accordance with any method known in the art. See, for coli. example, U.S. Pat. No. 6,309,872. 0555 Transgenic Non-Human Animals 0561 Nucleic acids and expression constructs of the 0556. The invention provides transgenic non-human ani invention can be introduced into a plant cell by any means. mals comprising a nucleic acid, a polypeptide (e.g., a For example, nucleic acids or expression constructs can be polypeptide, enzyme, protein, e.g. structural or binding pro introduced into the genome of a desired plant host, or, the tein), an expression cassette or vector or a transfected or nucleic acids or expression constructs can be episomes. Intro transformed cell of the invention. The invention also provides duction into the genome of a desired plant can be such that the methods of making and using these transgenic non-human hosts a polypeptide, enzyme, protein, e.g. structural or bind animals. ing protein, production is regulated by endogenous transcrip 0557. The transgenic non-human animals can be, e.g., tional or translational control elements. The invention also goats, rabbits, sheep, pigs (including all Swine, hogs and provides “knockout plants' where insertion of gene sequence related animals), cows, rats and mice, comprising the nucleic by, e.g., homologous recombination, has disrupted the acids of the invention. These animals can be used, e.g., as in expression of the endogenous gene. Means to generate Vivo models to study a polypeptide, enzyme, protein, e.g. “knockout” plants are well-known in the art, see, e.g., Strepp structural or binding protein, activity, or, as models to Screen (1998) Proc Natl. Acad. Sci. USA 95:4368-4373; Miao for agents that change the polypeptide, enzyme, protein, e.g. (1995) Plant J 7:359-365. See discussion on transgenic structural or binding protein activity in vivo. The coding plants, below. sequences for the polypeptides to be expressed in the trans 0562. The nucleic acids of the invention can be used to genic non-human animals can be designed to be constitutive, confer desired traits on essentially any plant, e.g., on starch or, under the control of tissue-specific, developmental-spe producing plants, such as potato, wheat, rice, barley, and the cific or inducible transcriptional regulatory factors. Trans like. Nucleic acids of the invention can be used to manipulate genic non-human animals can be designed and generated metabolic pathways of a plant in order to optimize or alter using any method known in the art; see, e.g., U.S. Pat. Nos. host’s expression of polypeptide, enzyme, protein, e.g. struc 6,211,428; 6,187,992: 6,156,952: 6,118,044: 6,111,166; tural or binding protein. The can change a polypeptide, 6,107,541; 5,959,171; 5,922,854; 5,892,070; 5,880,327; enzyme, protein, e.g. structural or binding protein, activity in 5,891,698; 5,639,940; 5,573,933; 5,387,742; 5,087,571, a plant. Alternatively, a polypeptide, enzyme, protein, e.g. describing making and using transformed cells and eggs and structural or binding protein, of the invention can be used in transgenic mice, rats, rabbits, sheep, pigs and cows. See also, production of a transgenic plant to produce a compound not e.g., Pollock (1999) J. Immunol. Methods 231:147-157, naturally produced by that plant. This can lower production describing the production of recombinant proteins in the milk costs or create a novel product. of transgenic dairy animals; Baguisi (1999) Nat. Biotechnol. 0563. In one aspect, the first step in production of a trans 17:456-461, demonstrating the production of transgenic genic plant involves making an expression construct for goats. U.S. Pat. No. 6,211,428, describes making and using expression in a plant cell. These techniques are well known in transgenic non-human mammals which express in their the art. They can include selecting and cloning a promoter, a brains a nucleic acid construct comprising a DNA sequence. coding sequence for facilitating efficient binding of ribo U.S. Pat. No. 5,387.742, describes injecting cloned recombi Somes to mRNA and selecting the appropriate gene termina nant or synthetic DNA sequences into fertilized mouse eggs, tor sequences. One exemplary constitutive promoter is implanting the injected eggs in pseudo-pregnant females, and CaMV35S, from the cauliflower mosaic virus, which gener growing to term transgenic mice. U.S. Pat. No. 6,187,992, ally results in a high degree of expression in plants. Other describes making and using a transgenic mouse. promoters are more specific and respond to cues in the plant's 0558 “Knockout animals' can also be used to practice the internal or external environment. An exemplary light-induc methods of the invention. For example, in one aspect, the ible promoter is the promoter from the cab gene, encoding the transgenic or modified animals of the invention comprise a major chlorophyll afb binding protein. US 2012/0266329 A1 Oct. 18, 2012

0564. In one aspect, the nucleic acid is modified to achieve (1997) Plant Mol. Biol. 33:989-999), see Porta (1996). “Use greater expression in a plant cell. For example, a sequence of of viral replicons for the expression of genes in plants.” Mol. the invention is likely to have a higher percentage of A-T Biotechnol. 5:209-221. nucleotide pairs compared to that seen in a plant, some of 0569. Alternatively, nucleic acids, e.g., an expression con which prefer G-C nucleotide pairs. Therefore, A-T nucle struct, can be combined with suitable T-DNA flanking otides in the coding sequence can be substituted with G-C regions and introduced into a conventional Agrobacterium nucleotides without significantly changing the amino acid tumefaciens host vector. The virulence functions of the Agro sequence to enhance production of the gene product in plant bacterium tumefaciens host will direct the insertion of the cells. construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria. Agrobacterium tumefa 0565. Selectable marker gene can be added to the gene ciens-mediated transformation techniques, including disarm construct in order to identify plant cells or tissues that have ing and use of binary vectors, are well described in the sci Successfully integrated the transgene. This may be necessary entific literature. See, e.g., Horsch (1984) Science 233:496 because achieving incorporation and expression of genes in 498; Fraley (1983) Proc. Natl. Acad. Sci. USA 80:4803 plant cells is a rare event, occurring injust a few percent of the (1983); Gene Transfer to Plants, Potrykus, ed. (Springer targeted tissues or cells. Selectable marker genes encode pro Verlag, Berlin 1995). The DNA in an A. tumefaciens cell is teins that provide resistance to agents that are normally toxic contained in the bacterial chromosome as well as in another to plants, such as antibiotics or herbicides. Only plant cells structure known as a Ti (tumor-inducing) plasmid. The Ti that have integrated the selectable marker gene will survive plasmid contains a stretch of DNA termed T-DNA (~20 kb when grown on a medium containing the appropriate antibi long) that is transferred to the plant cell in the infection otic or herbicide. As for other inserted genes, marker genes process and a series of Vir (virulence) genes that direct the also require promoter and termination sequences for proper infection process. A. tumefaciens can only infect a plant function. through wounds: when a plant root or stem is wounded it 0566 In one aspect, making transgenic plants or seeds gives off certain chemical signals, in response to which, the comprises incorporating sequences of the invention and, vir genes of A. tumefaciens become activated and direct a optionally, marker genes into a target expression construct series of events necessary for the transfer of the T-DNA from (e.g., a plasmid), along with positioning of the promoter and the Tiplasmid to the plant's chromosome. The T-DNA enters the terminator sequences. This can involve transferring the the plant cell through the wound. One speculation is that the modified gene into the plant through a suitable method. For T-DNA waits until the plant DNA is being replicated or tran example, a construct may be introduced directly into the scribed, then inserts itself into the exposed plant DNA. In genomic DNA of the plant cell using techniques such as order to use A. tumefaciens as a transgene vector, the tumor electroporation and microinjection of plant cell protoplasts, inducing section of T-DNA have to be removed, while retain or the constructs can be introduced directly to plant tissue ing the T-DNA border regions and the vir genes. The trans using ballistic methods, such as DNA particle bombardment. gene is then inserted between the T-DNA border regions, For example, see, e.g., Christou (1997) Plant Mol. Biol. where it is transferred to the plant cell and becomes integrated 35:197-203; Pawlowski (1996) Mol. Biotechnol. 6:17-30; into the plant's chromosomes. Klein (1987) Nature 327:70-73; Takumi (1997) Genes Genet. 0570. The invention provides for the transformation of Syst. 72:63-69, discussing use of particle bombardment to monocotyledonous plants using the nucleic acids of the introduce transgenes into wheat; and Adam (1997) supra, for invention, including important cereals, see Hiei (1997) Plant use of particle bombardment to introduce YACs into plant Mol. Biol. 35:205-218. See also, e.g., Horsch, Science (1984) cells. For example, Rinehart (1997) supra, used particle bom 233:496; Fraley (1983) Proc. Natl. Acad. Sci USA 80:4803; bardment to generate transgenic cotton plants. Apparatus for Thykjaer (1997) supra; Park (1996)Plant Mol. Biol. 32:1135 accelerating particles is described U.S. Pat. No. 5,015,580; 1148, discussing T-DNA integration into genomic DNA. See and, the commercially available BioRad (Biolistics) PDS also D'Halluin, U.S. Pat. No. 5,712,135, describing a process 2000 particle acceleration instrument; see also, John, U.S. for the stable integration of a DNA comprising a gene that is Pat. No. 5,608, 148; and Ellis, U.S. Pat. No. 5,681,730, functional in a cell of a cereal, or other monocotyledonous describing particle-mediated transformation of gymno plant. sperms. 0571. In one aspect, the third step can involve selection 0567. In one aspect, protoplasts can be immobilized and and regeneration of whole plants capable of transmitting the injected with a nucleic acids, e.g., an expression construct. incorporated target gene to the next generation. Such regen Although plant regeneration from protoplasts is not easy with eration techniques rely on manipulation of certain phytohor cereals, plant regeneration is possible in legumes using mones in a tissue culture growth medium, typically relying on Somatic embryogenesis from protoplast derived callus. Orga a biocide and/or herbicide marker that has been introduced nized tissues can be transformed with naked DNA using gene together with the desired nucleotide sequences. Plant regen gun technique, where DNA is coated on tungsten micro eration from cultured protoplasts is described in Evans et al., projectiles, shot /100th the size of cells, which carry the DNA Protoplasts Isolation and Culture, Handbook of Plant Cell deep into cells and organelles. Transformed tissue is then Culture, pp. 124-176, MacMillilan Publishing Company, induced to regenerate, usually by Somatic embryogenesis. New York, 1983; and Binding, Regeneration of Plants, Plant This technique has been Successful in several cereal species Protoplasts, pp. 21-73, CRC Press, Boca Raton, 1985. including maize and rice. Regeneration can also be obtained from plant callus, explants, 0568 Nucleic acids, e.g., expression constructs, can also organs, or parts thereof. Such regeneration techniques are be introduced in to plant cells using recombinant viruses. described generally in Klee (1987) Ann. Rev. of Plant Phys. Plant cells can be transformed using viral vectors, such as, 38:467-486. To obtain whole plants from transgenic tissues e.g., tobacco mosaic virus derived vectors (Rouwendal Such as immature embryos, they can be grown under con US 2012/0266329 A1 Oct. 18, 2012 92 trolled environmental conditions in a series of media contain 0576. Using known procedures, one of skill can screen for ing nutrients and hormones, a process known as tissue cul plants of the invention by detecting the increase or decrease of ture. Once whole plants are generated and produce seed, transgene mRNA or protein in transgenic plants. Means for evaluation of the progeny begins. detecting and quantitation of mRNAS or proteins are well 0572. After the expression cassette is stably incorporated known in the art. in transgenic plants, it can be introduced into other plants by 0577 Polypeptides and Peptides sexual crossing. Any of a number of Standard breeding tech niques can be used, depending upon the species to be crossed. 0578. In one aspect, the invention provides isolated or Since transgenic expression of the nucleic acids of the inven recombinant polypeptides having a sequence identity (e.g., at tion leads to phenotypic changes, plants comprising the least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, recombinant nucleic acids of the invention can be sexually 58%, 59%, 60%, 61%. 62%, 63%, 64%. 65%, 66%, 67%, crossed with a second plant to obtaina final product. Thus, the 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, seed of the invention can be derived from a cross between two 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, transgenic plants of the invention, or a cross between a plant 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, of the invention and another plant. The desired effects (e.g., 98%, 99%, or more, or complete (100%) sequence identity, or expression of the polypeptides of the invention to produce a homology) to an exemplary sequence of the invention, e.g., plant in which flowering behavior is altered) can be enhanced proteins having a sequence as set forth in SEQID NO:2, SEQ when both parental plants express the polypeptides (e.g., a ID NO:4, SEQID NO:6, SEQID NO:8, SEQID NO:10, etc., polypeptide, enzyme, protein, e.g. structural or binding pro and all polypeptides disclosed in the SEQ ID listing, which tein) of the invention. The desired effects can be passed to include all even numbered SEQID NO:s from SEQID NO:2 future plant generations by Standard propagation means. through SEQID NO:26,898). The percent sequence identity 0573 The nucleic acids and polypeptides of the invention can be over the full length of the polypeptide, or, the identity are expressed in or inserted in any plant or seed. Transgenic can be over a region of at least about 50, 60, 70, 80,90, 100, plants of the invention can be dicotyledonous or monocoty 150, 200,250, 300,350, 400, 450, 500, 550, 600, 650, 700 or ledonous. Examples of monocot transgenic plants of the more residues. invention are grasses, such as meadow grass (bluegrass, Poa), forage grass Such as festuca, lolium, temperate grass, such as 0579. “Amino acid” or "amino acid sequence' as used Agrostis, and cereals, e.g., wheat, oats, rye, barley, rice, Sor herein refer to an oligopeptide, peptide, polypeptide, or pro ghum, and maize (corn). Examples of dicot transgenic plants tein sequence, or to a fragment, portion, or subunit of any of of the invention are tobacco, legumes, such as lupins, potato, these and to naturally occurring or synthetic molecules. Sugar beet, pea, bean and soybean, and cruciferous plants “Amino acid' or "amino acid sequence' include an oligopep (family Brassicaceae). Such as cauliflower, rape seed, and the tide, peptide, polypeptide, or protein sequence, or to a frag closely related model organism Arabidopsis thaliana. Thus, ment, portion, or subunit of any of these, and to naturally the transgenic plants and seeds of the invention include a occurring or synthetic molecules. The term “polypeptide' as broad range of plants, including, but not limited to, species used herein, refers to amino acids joined to each other by from the genera Anacardium, Arachis, Asparagus, Atropa, peptide bonds or modified peptide bonds, i.e., peptide isos Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, teres and may contain modified amino acids other than the 20 Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, gene-encoded amino acids. The polypeptides may be modi Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, fied by either natural processes, such as post-translational Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, processing, or by chemical modification techniques which Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicoti are well known in the art. Modifications can occur anywhere ana, Olea, Oryza, Panieum, Pannisetum, Persea, Phaseolus, in the polypeptide, including the peptide backbone, the amino Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, acid side-chains and the amino or carboxyl termini. It will be Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigo appreciated that the same type of modification may be present nella, Triticum, Vicia, Vitis, Vigna, and Zea. in the same or varying degrees at several sites in a given 0574. In alternative embodiments, the nucleic acids of the polypeptide. Also a given polypeptide may have many types invention are expressed in plants which contain fiber cells, of modifications. Modifications include , acyla including, e.g., cotton, silk cotton tree (Kapok, Ceiba pentan tion, ADP-ribosylation, amidation, covalent attachment of dra), desert willow, creosote bush, winterfat, balsa, ramie, flavin, covalent attachment of a heme moiety, covalent attach kenaf, hemp, roselle, jute, sisal abaca and flax. In alternative ment of a nucleotide or nucleotide derivative, covalent attach embodiments, the transgenic plants of the invention can be ment of a lipid or lipid derivative, covalent attachment of a members of the genus Gossypium, including members of any phosphatidylinositol, cross-linking cyclization, disulfide Gossypium species, such as G. arboreum, G. herbaceum, G. bond formation, demethylation, formation of covalent cross barbadense, and G. hirsutum. links, formation of cysteine, formation of pyroglutamate, 0575. The invention also provides for transgenic plants to formylation, gamma-, glycosylation, GPI be used for producing large amounts of the polypeptides (e.g., anchor formation, hydroxylation, iodination, methylation, a polypeptide, enzyme, protein, e.g. structural or binding myristolyation, oxidation, pegylation, glucan hydrolase pro protein, or antibody) of the invention. For example, see cessing, phosphorylation, prenylation, racemization, sele Palmgren (1997) Trends Genet. 13:348; Chong (1997) Trans noylation, sulfation and transfer-RNA mediated addition of genic Res. 6:289-296 (producing human milk protein beta amino acids to protein Such as arginylation. (See Creighton, casein in transgenic potato plants using an auxin-inducible, T. E., Proteins—Structure and Molecular Properties 2nd Ed., bidirectional mannopine synthase (mas 1'.2) promoter with W.H. Freeman and Company, New York (1993); Posttransla Agrobacterium tumefaciens-mediated leaf disc transforma tional Covalent Modification of Proteins, B.C. Johnson, Ed., tion methods). Academic Press, New York, pp. 1-12 (1983)). The peptides US 2012/0266329 A1 Oct. 18, 2012

and polypeptides of the invention also include all “mimetic' 0583. In alternative aspects, polypeptides of the invention and "peptidomimetic' forms, as described in further detail, having enzyme, structural or binding activity are members of below. a genus of polypeptides sharing specific structural elements, e.g., amino acid residues, that correlate with enzyme, struc 0580. As used herein, the term "isolated” means that the tural or binding activity. These shared structural elements can material is removed from its original environment (e.g., the be used for the routine generation of polypeptide, enzyme, natural environment if it is naturally occurring). For example, protein, e.g. structural or binding protein, variants. These a naturally-occurring polynucleotide or polypeptide present shared structural elements of a polypeptide, enzyme, protein, in a living animal is not isolated, but the same polynucleotide e.g. structural or binding protein, of the invention can be used or polypeptide, separated from some or all of the coexisting as guidance for the routine generation of a polypeptide, materials in the natural system, is isolated. Such polynucle enzyme, protein, e.g. structural or binding protein, variants otides could be part of a vector and/or such polynucleotides or within the scope of the genus of polypeptides of the invention. polypeptides could be part of a composition and still be iso 0584 Polypeptides and peptides of the invention can be lated in that such vector or composition is not part of its isolated from natural sources, be synthetic, or be recombi natural environment. As used herein, the term “purified” does nantly generated polypeptides. Peptides and proteins can be not require absolute purity; rather, it is intended as a relative recombinantly expressed in vitro or in vivo. The peptides and definition. Individual nucleic acids obtained from a library polypeptides of the invention can be made and isolated using have been conventionally purified to electrophoretic homo any method known in the art. Polypeptide and peptides of the geneity. The sequences obtained from these clones could not invention can also be synthesized, whole or in part, using be obtained directly either from the library or from total chemical methods well known in the art. See e.g., Caruthers human DNA. The purified nucleic acids of the invention have (1980) Nucleic Acids Res. Symp. Ser. 215-223; Horn (1980) been purified from the remainder of the genomic DNA in the Nucleic Acids Res. Symp. Ser. 225-232; Banga, A. K. Thera organism by at least 10-10' fold. However, the term “puri peutic Peptides and Proteins. Formulation, Processing and fied also includes nucleic acids which have been purified Delivery Systems (1995) Technomic Publishing Co., Lan from the remainder of the genomic DNA or from other caster, Pa. For example, peptide synthesis can be performed sequences in a library or other environment by at least one using various Solid-phase techniques (see e.g., Roberge order of magnitude, typically two or three orders and more (1995) Science 269:202: Merrifield (1997) Methods Enzy typically four or five orders of magnitude. mol. 289:3-13) and automated synthesis may be achieved, 0581) “Recombinant” polypeptides or proteins refer to e.g., using the ABI 431A Peptide Synthesizer (PerkinElmer) polypeptides or proteins produced by recombinant DNA in accordance with the instructions provided by the manufac techniques; i.e., produced from cells transformed by an exog turer. enous DNA construct encoding the desired polypeptide or 0585. The peptides and polypeptides of the invention can protein. “Synthetic polypeptides or protein are those pre also be glycosylated. The glycosylation can be added post pared by chemical synthesis. Solid-phase chemical peptide translationally either chemically or by cellular biosynthetic synthesis methods can also be used to synthesize the polypep mechanisms, wherein the later incorporates the use of known tide or fragments of the invention. Such method have been glycosylation motifs, which can be native to the sequence or known in the art since the early 1960's (Merrifield, R. B., J. can be added as a peptide or added in the nucleic acid coding Am. Chem. Soc., 85:2149-2154, 1963) (See also Stewart, J. sequence. The glycosylation can be O-linked or N-linked. M. and Young, J. D., Solid Phase Peptide Synthesis, 2nd Ed., 0586. The peptides and polypeptides of the invention, as Pierce Chemical Co., Rockford, Ill., pp. 11-12)) and have defined above, include all “mimetic' and "peptidomimetic' recently been employed in commercially available laboratory forms. The terms “mimetic' and "peptidomimetic' refer to a peptide design and synthesis kits (Cambridge Research Bio synthetic chemical compound which has substantially the chemicals). Such commercially available laboratory kits have same structural and/or functional characteristics of the generally utilized the teachings of H. M. Geysen et al. Proc. polypeptides of the invention. The mimetic can be either Natl. Acad. Sci., USA, 81:3998 (1984) and provide for syn entirely composed of synthetic, non-natural analogues of thesizing peptides upon the tips of a multitude of “rods” or amino acids, or, is a chimeric molecule of partly natural “pins' all of which are connected to a single plate. peptide amino acids and partly non-natural analogs of amino 0582 Polypeptides of the invention can also be shorter acids. The mimetic can also incorporate any amount of natu than the full length of exemplary polypeptides. In alternative ral amino acid conservative Substitutions as long as Such aspects, the invention provides polypeptides (peptides, frag substitutions also do not substantially alter the mimetic's ments) ranging in size between about 5 and the full length of structure and/or activity. As with polypeptides of the inven a polypeptide, e.g., an enzyme. Such as a polypeptide, tion which are conservative variants or members of a genus of enzyme, protein, e.g. structural orbinding protein; exemplary polypeptides of the invention (e.g., having about 50% or more sizes being of about 5, 10, 15, 20, 25.30,35, 40, 45,50, 55,60, sequence identity to an exemplary sequence of the invention), 65,70, 75,80, 85,90, 100, 125, 150, 175, 200,250, 300,350, routine experimentation will determine whether a mimetic is 400, 450, 500, 550, 600, 650, 700, or more residues, e.g., within the scope of the invention, i.e., that its structure and/or contiguous residues of an exemplary a polypeptide, enzyme, function is not Substantially altered. Thus, in one aspect, a protein, e.g. structural or binding protein, of the invention. mimetic composition is within the scope of the invention if it Peptides of the invention (e.g., a Subsequence of an exem has a polypeptide, enzyme, protein, e.g. structural or binding plary polypeptide of the invention) can be useful as, e.g., protein's activity. labeling probes, antigens, toleragens, motifs, a polypeptide, 0587 Polypeptide mimetic compositions of the invention enzyme, protein, e.g. structural orbinding protein, active sites can contain any combination of non-natural structural com (e.g., “catalytic domains'), signal sequences and/or prepro ponents. In alternative aspect, mimetic compositions of the domains. invention include one or all of the following three structural US 2012/0266329 A1 Oct. 18, 2012 94 groups: a) residue linkage groups other than the natural amide reagents, including, e.g., phenylglyoxal, 2,3-butanedione, bond ("peptide bond') linkages; b) non-natural residues in 1.2-cyclo-hexanedione, or ninhydrin, in one aspect under place of naturally occurring amino acid residues; or c) resi alkaline conditions. Tyrosine residue mimetics can be gener dues which induce secondary structural mimicry, i.e., to ated by reacting tyrosyl with, e.g., aromatic diazonium com induce or stabilize a secondary structure, e.g., a beta turn, pounds or tetranitromethane. N-acetylimidizol and tetrani gamma turn, beta sheet, alpha helix conformation, and the tromethane can be used to form O-acetyl tyrosyl species and like. For example, a polypeptide of the invention can be 3-nitro derivatives, respectively. Cysteine residue mimetics characterized as a mimetic when all or some of its residues are can be generated by reacting cysteinyl residues with, e.g., joined by chemical means other than natural peptide bonds. alpha-haloacetates Such as 2-chloroacetic acid or chloroac Individual peptidomimetic residues can be joined by peptide etamide and corresponding amines; to give carboxymethyl or bonds, other chemical bonds or coupling means, such as, e.g., carboxyamidomethyl derivatives. Cysteine residue mimetics glutaraldehyde, N-hydroxysuccinimide esters, bifunctional can also be generated by reacting cysteinyl residues with, e.g., maleimides, N,N'-dicyclohexylcarbodiimide (DCC) or N,N'- bromo-trifluoroacetone, alpha-bromo-beta-(5-imidozoyl) diisopropylcarbodiimide (DIC). Linking groups that can be propionic acid; chloroacetyl phosphate, N-alkylmaleimides, an alternative to the traditional amide bond ("peptide bond') 3-nitro-2-pyridyl disulfide; methyl 2-pyridyl disulfide: linkages include, e.g., ketomethylene (e.g., —C(=O)— p-chloromercuribenzoate: 2-chloromercuri-4 nitrophenol; CH for —C(=O) NH ), aminomethylene (CH or, chloro-7-nitrobenzo-Oxa-1,3-diazole. Lysine mimetics NH), ethylene, olefin (CH=CH), ether (CH-0), thioether can be generated (and amino terminal residues can be altered) (CH-S), tetrazole (CN ), thiazole, retroamide, thioam by reacting lysinyl with, e.g., Succinic or other carboxylic ide, or ester (see, e.g., Spatola (1983) in Chemistry and Bio acid anhydrides. Lysine and other alpha-amino-containing chemistry of Amino Acids, Peptides and Proteins, Vol. 7, pp residue mimetics can also be generated by reaction with imi 267-357, “Peptide Backbone Modifications.” Marcell Dek doesters, such as methyl picolinimidate, pyridoxal phos ker, NY). phate, pyridoxal, chloroborohydride, trinitro-benzene 0588 A polypeptide of the invention can also be charac Sulfonic acid, O-methylisourea, 2.4, pentanedione, and terized as a mimetic by containing all or some non-natural transamidase-catalyzed reactions with glyoxylate. Mimetics residues in place of naturally occurring amino acid residues. of methionine can be generated by reaction with, e.g., Non-natural residues are well described in the scientific and methionine Sulfoxide. Mimetics of proline include, e.g., pipe patent literature; a few exemplary non-natural compositions colic acid, thiazolidine carboxylic acid, 3- or 4-hydroxy pro useful as mimetics of natural amino acid residues and guide line, dehydroproline, 3- or 4-methylproline, or 3.3,-dimeth lines are described below. Mimetics of aromatic amino acids ylproline. Histidine residue mimetics can be generated by can be generated by replacing by, e.g., D- or L-naphylalanine; reacting histidyl with, e.g., diethylprocarbonate or para-bro D- or L-phenylglycine; D- or L-2 thieneylalanine: D- or L-1, mophenacyl bromide. Other mimetics include, e.g., those -2, 3-, or 4-pyreneylalanine: D- or L-3 thieneylalanine; D- or generated by hydroxylation of proline and lysine; phospho L-(2-pyridinyl)-alanine, D- or L-3-pyridinyl)-alanine, D- or rylation of the hydroxyl groups of seryl or threonyl residues; L-(2-pyrazinyl)-alanine, D- or L-(4-isopropyl)-phenylgly methylation of the alpha-amino groups of lysine, arginine and cine; D-(trifluoromethyl)-phenylglycine; D-(trifluorom histidine; acetylation of the N-terminal amine; methylation of ethyl)-phenylalanine: D-p-fluoro-phenylalanine: D- or L-p- main chain amide residues or substitution with N-methyl biphenylphenylalanine; D- O L-p-methoxy amino acids; or amidation of C-terminal carboxyl groups. biphenylphenylalanine; D- or L-2-indole(alkyl)alanines; 0590 A residue, e.g., an amino acid, of a polypeptide of and, D- or L-alkylainines, where alkyl can be substituted or the invention can also be replaced by an amino acid (or unsubstituted methyl, ethyl, propyl, hexyl, butyl, pentyl, iso peptidomimetic residue) of the opposite chirality. Thus, any propyl, iso-butyl, sec-isotyl, iso-pentyl, or a non-acidic amino acid naturally occurring in the L-configuration (which amino acids. Aromatic rings of a non-natural amino acid can also be referred to as the R or S, depending upon the include, e.g., thiazolyl, thiophenyl, pyrazolyl, benzimida structure of the chemical entity) can be replaced with the Zolyl, naphthyl, furanyl, pyrrolyl, and pyridylaromatic rings. amino acid of the same chemical structural type or a pepti 0589 Mimetics of acidic amino acids can be generated by domimetic, but of the opposite chirality, referred to as the Substitution by, e.g., non-carboxylate amino acids while D-amino acid, but also can be referred to as the R- or S-form. maintaining a negative charge; (phosphono)alanine; Sulfated 0591. The invention also provides methods for modifying threonine. Carboxyl side groups (e.g., aspartyl or glutamyl) the polypeptides of the invention by either natural processes, can also be selectively modified by reaction with carbodiim Such as post-translational processing (e.g., phosphorylation, ides (R' N C N R") such as, e.g., 1-cyclohexyl-3(2- acylation, etc), or by chemical modification techniques, and morpholinyl-(4-ethyl)carbodiimide or 1-ethyl-3 (4-azonia-4, the resulting modified polypeptides. Modifications can occur 4-dimetholpentyl)carbodiimide. Aspartyl or glutamyl can anywhere in the polypeptide, including the peptide backbone, also be converted to asparaginyl and glutaminyl residues by the amino acid side-chains and the amino or carboxyl termini. reaction with ammonium ions. Mimetics of basic amino acids It will be appreciated that the same type of modification may can be generated by Substitution with, e.g., (in addition to be present in the same or varying degrees at several sites in a lysine and arginine) the amino acids ornithine, , or given polypeptide. Also a given polypeptide may have many (guanidino)-acetic acid, or (guanidino)alkyl-acetic acid, types of modifications. Modifications include acetylation, where alkyl is defined above. Nitrile derivative (e.g., contain acylation, ADP-ribosylation, amidation, covalent attachment ing the CN-moiety in place of COOH) can be substituted for of flavin, covalent attachment of a heme moiety, covalent asparagine or glutamine. Asparaginyl and glutaminyl resi attachment of a nucleotide or nucleotide derivative, covalent dues can be deaminated to the corresponding aspartyl or attachment of a lipid or lipid derivative, covalent attachment glutamyl residues. Arginine residue mimetics can be gener of a phosphatidylinositol, cross-linking cyclization, disulfide ated by reacting arginyl with, e.g., one or more conventional bond formation, demethylation, formation of covalent cross US 2012/0266329 A1 Oct. 18, 2012 links, formation of cysteine, formation of pyroglutamate, 0595 Polypeptides of the invention can have an enzyme, formylation, gamma-carboxylation, glycosylation, GPI structural or binding activity under various conditions, e.g., anchor formation, hydroxylation, iodination, methylation, extremes in pH and/or temperature, oxidizing agents, and the myristolyation, oxidation, pegylation, proteolytic process like. The invention provides methods leading to alternative a ing, phosphorylation, prenylation, racemization, selenoyla polypeptide, enzyme, protein, e.g. structural or binding pro tion, sulfation, and transfer-RNA mediated addition of amino tein, preparations with different catalytic efficiencies and sta acids to protein Such as arginylation. See, e.g., Creighton, T. bilities, e.g., towards temperature, oxidizing agents and E. Proteins—Structure and Molecular Properties 2nd Ed., changing wash conditions. In one aspect, a polypeptide, W.H. Freeman and Company, New York (1993); Posttransla enzyme, protein, e.g. structural or binding protein, variants tional Covalent Modification of Proteins, B. C. Johnson, Ed., can be produced using techniques of site-directed mutagen Academic Press, New York, pp. 1-12 (1983). esis and/or random mutagenesis. In one aspect, directed evo 0592 Solid-phase chemical peptide synthesis methods lution can be used to produce a great variety of a polypeptide, can also be used to synthesize the polypeptide or fragments of enzyme, protein, e.g. structural or binding protein, variants the invention. Such method have been known in the art since with alternative specificities and stability. the early 1960's (Merrifield, R. B., J. Am. Chem. Soc., 0596. The proteins of the invention are also useful as 85:2149-2154, 1963) (See also Stewart, J. M. and Young, J. research reagents to identify a polypeptide, enzyme, protein, D., Solid Phase Peptide Synthesis, 2nd Ed., Pierce Chemical e.g. structural or binding protein, modulators, e.g., activators Co., Rockford, Ill., pp. 11-12)) and have recently been or inhibitors of a polypeptide, enzyme, protein, e.g. structural employed in commercially available laboratory peptide orbinding protein, activity. Briefly, test samples (compounds, design and synthesis kits (Cambridge Research Biochemi broths, extracts, and the like) are added to a polypeptide, cals). Such commercially available laboratory kits have gen enzyme, protein, e.g. structural or binding protein, assays to erally utilized the teachings of H. M. Geysen et al. Proc. Natl. determine their ability to inhibit substrate cleavage. Inhibitors Acad. Sci., USA, 81:3998 (1984) and provide for synthesiz identified in this way can be used in industry and research to ing peptides upon the tips of a multitude of “rods” or “pins' reduce or prevent undesired . As with a polypep all of which are connected to a single plate. When such a system is utilized, a plate of rods or pins is inverted and tide, enzyme, protein, e.g. structural or binding protein, inserted into a second plate of corresponding wells or reser inhibitors can be combined to increase the spectrum of activ Voirs, which contain solutions for attaching or anchoring an ity. appropriate amino acid to the pin's or rod's tips. By repeating 0597. The enzymes of the invention are also useful as Such a process step, i.e., inverting and inserting the rods and research reagents to digest proteins or in protein sequencing. pin's tips into appropriate solutions, amino acids are built into For example, the polypeptide, enzyme, protein, e.g. structural desired peptides. In addition, a number of available FMOC or binding proteins may be used to break polypeptides into peptide synthesis systems are available. For example, assem Smaller fragments for sequencing using, e.g. an automated bly of a polypeptide or fragment can be carried out on a Solid Sequencer. support using an Applied Biosystems, Inc. Model 431 ATM 0598. The invention also provides methods of discovering automated peptide synthesizer. Such equipment provides new a polypeptide, enzyme, protein, e.g. structural or binding ready access to the peptides of the invention, either by direct protein, using the nucleic acids, polypeptides and antibodies synthesis or by synthesis of a series of fragments that can be of the invention. In one aspect, phagemid libraries are coupled using other known techniques. screened for expression-based discovery of a polypeptide, 0593. The polypeptides of the invention include a enzyme, protein, e.g. structural or binding protein. In another polypeptide, enzyme, protein, e.g. structural or binding pro aspect, lambda phage libraries are screened for expression tein, in an active or inactive form. For example, the polypep based discovery of a polypeptide, enzyme, protein, e.g. struc tides of the invention include proproteins before “maturation tural or binding protein. Screening of the phage or phagemid or processing of prepro sequences, e.g., by a proprotein processing enzyme. Such as a proprotein convertase to gen libraries can allow the detection of toxic clones; improved erate an “active' mature protein. The polypeptides of the access to Substrate; reduced need for engineering a host, invention include a polypeptide, enzyme, protein, e.g. struc by-passing the potential for any bias resulting from mass tural orbinding protein, inactive for other reasons, e.g., before excision of the library; and, faster growth at low clone den “activation” by a post-translational processing event, e.g., an sities. Screening of phage or phagemid libraries can be in endo- or exo-peptidase or proteinase action, a phosphoryla liquid phase or in Solid phase. In one aspect, the invention tion event, an amidation, a glycosylation or a Sulfation, a provides screening in liquid phase. This gives a greater flex dimerization event, and the like. The polypeptides of the ibility in assay conditions; additional substrate flexibility: invention include all active forms, including active Subse higher sensitivity for weak clones; and ease of automation quences, e.g., catalytic domains or active sites, of the enzyme. over Solid phase screening. 0594. The invention includes immobilized polypeptides, 0599. The invention provides screening methods using the enzymes, proteins, e.g. structural or binding proteins, anti proteins and nucleic acids of the invention and robotic auto polypeptides, anti-enzymes, anti-proteins, e.g. anti-structural mation to enable the execution of many thousands of biocata or anti-binding proteins, antibodies and fragments thereof. lytic reactions and Screening assays in a short period of time, The invention provides methods for inhibiting a polypeptide, e.g., per day, as well as ensuring a high level of accuracy and enzyme, protein, e.g. structural or binding protein, activity, reproducibility (see discussion of arrays, below). As a result, e.g., using dominant negative mutants or anti-polypeptide, a library of derivative compounds can be produced in a matter anti-enzyme, anti-protein, e.g. anti-structural or anti-binding of weeks. For further teachings on modification of molecules, protein antibodies of the invention. The invention includes including small molecules, see PCT/US94/09174. heterocomplexes, e.g., fusion proteins, heterodimers, etc., 0600. In one aspect, polypeptides or fragments of the comprising the polypeptide, enzyme, protein, e.g. structural invention may be obtained through biochemical enrichment or binding proteins of the invention. or purification procedures. The sequence of potentially