<<

US008119385B2

(12) United States Patent (10) Patent No.: US 8,119,385 B2 Mathur et al. (45) Date of Patent: Feb. 21, 2012

(54) NUCLEICACIDS AND AND (52) U.S. Cl...... 435/212:530/350 METHODS FOR MAKING AND USING THEMI (58) Field of Classification Search ...... None (75) Inventors: Eric J. Mathur, San Diego, CA (US); See application file for complete search history. Cathy Chang, San Diego, CA (US) (56) References Cited (73) Assignee: BP Corporation North America Inc., Houston, TX (US) OTHER PUBLICATIONS c Mount, Bioinformatics, Cold Spring Harbor Press, Cold Spring Har (*) Notice: Subject to any disclaimer, the term of this bor New York, 2001, pp. 382-393.* patent is extended or adjusted under 35 Spencer et al., “Whole-Genome Sequence Variation among Multiple U.S.C. 154(b) by 689 days. Isolates of Pseudomonas aeruginosa” J. Bacteriol. (2003) 185: 1316 1325. (21) Appl. No.: 11/817,403 Database Sequence GenBank Accession No. BZ569932 Dec. 17. 1-1. 2002. (22) PCT Fled: Mar. 3, 2006 Omiecinski et al., “Epoxide -Polymorphism and role in (86). PCT No.: PCT/US2OO6/OOT642 toxicology” Toxicol. Lett. (2000) 1.12: 365-370. S371 (c)(1), * cited by examiner (2), (4) Date: May 7, 2008 Primary Examiner — James Martinell (87) PCT Pub. No.: WO2006/096527 (74) Attorney, Agent, or Firm — Kalim S. Fuzail PCT Pub. Date: Sep. 14, 2006 (57) ABSTRACT (65) Prior Publication Data The invention provides polypeptides, including , structural proteins and binding proteins, polynucleotides US 201O/OO11456A1 Jan. 14, 2010 encoding these polypeptides, and methods of making and using these polynucleotides and polypeptides. Polypeptides, Related U.S. Application Data including enzymes and antibodies, and nucleic acids of the (60) Provisional application No. 60/658,984, filed on Mar. invention can be used in industrial, experimental, food and 4, 2005. feed processing, nutritional and pharmaceutical applications, e.g., for food and feed Supplements, colorants, neutraceuti (51) Int. Cl. cals, cosmetic and pharmaceutical needs. CI2N 9/48 (2006.01) CI2N 9/50 (2006.01) 4 Claims, 4 Drawing Sheets U.S. Patent Feb. 21, 2012 Sheet 1 of 4 US 8,119,385 B2

COMPUTER SYSTEM

raw.she f(5 PROCESSOR / I

15

Y W.

r old hadhav P www.wawr

12D DAA RETREVING DSPAY DEWCE

GRE U.S. Patent Feb. 21, 2012 Sheet 2 of 4 US 8,119,385 B2

2O 2O NY STAR

22 strikt Ai SORE NEW SEQUENCE TO A MEMORY

t

OPEN AABASE OF SELENCES r 2O6 READ FIRST SEQUENCE IN DATABASE

2O PERFORM COMPARSON OF NEW SEQUENC STORE SENCE in

YES 214 NC displaySPAYSTORE stored sequenceSELENCE nameArarNAME TO SER

GO TONEX SECUENCEN AAEASE

ORE SEENCES IN ---.YES OAABASE

NC

END

GRE 2 U.S. Patent Feb. 21, 2012 Sheet 3 of 4 US 8,119,385 B2

250

254 SOREA FRST SEQUENCE TO A MEMORY 256 STOREASECOND SEQUENCE TO A MEMORY 260 READ FIRST CHARACTER OF FIRST SEQUENCE - L. 262 READ FIRST CHARACTER OF SECOND SEQUENCE 264 SAME?

YES 268 READ NEXT CHARACTER OF FIRS AND SECOND YES SEQUENCES

NO

276 DSPLAY HOMOOGY level BETWEEN THE FRS AND SECOND SECUENCES

GTR 3 U.S. Patent Feb. 21, 2012 Sheet 4 of 4 US 8,119,385 B2

Sea N 304 STORE AFRST SEQUENCE O MEMORY - 36

OPEN CAABASE OF SEQUENCE FEATURES 3. REA) FIRST FEATURE FROM OAABASE d

3. COMPARE FEATUREATTrauTES WHE FRS SECAENCE 36

wamwr FOUN

YES 38 www.aalth riottlur -- SPLAY FOUND FEATURE TO THE USER 326 NO

REA NEX FEATURE EN AAEASE

MORE FEATURESN Avr YES

FORE US 8,119,385 B2 1. 2 NUCLECACDS AND PROTEINS AND these polypeptides, having the activities described in Table 1, METHODS FOR MAKING AND USING THEMI Table 2 or Table 3, below. The enzymes and proteins of the invention have utility in a variety of applications. REFERENCE TO SEQUENCE LISTING SUBMITTED VIA EFS-WEB SUMMARY This application includes an electronically Submitted The invention provides isolated or recombinant nucleic sequence (SEQ ID) listing. The entire content of this acids comprising a nucleic acid sequence having at least sequence listing is herein incorporated by reference for all about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, purposes. The sequence listing is identified on the electroni 10 59%, 60%, 61%, 62%, 63%, 64%. 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, cally filed .txt file as follows: 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity to an FileName Date of Creation Size (bytes) 15 exemplary nucleic acid of the invention, e.g., including SEQ ID NO:1, SEQID NO:3, SEQID NO:5, SEQID NO:7, SEQ 2009-08-24-SequenceListing Aug. 24, 2009 38,547,757 bytes ID NO:9, SEQID NO:11, SEQID NO:13, SEQID NO:15, (D2170-1N).txt SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQID NO:25, and all nucleic acids disclosed in the SEQ ID listing, which include all odd numbered SEQ ID FIELD OF THE INVENTION NOs: from SEQID NO:1 through SEQID NO:26,897, over a region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, This invention relates to molecular and cellular biology 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, and biochemistry. In one aspect, the invention provides 700, 750, 800, 850, 900,950, 1000, 1050, 1100, 1150, 1200, polypeptides, including enzymes, structural proteins and 25 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, binding proteins (e.g., ligands, receptors), polynucleotides 1750, 1800, 1850, 1900, 1950, 2000, 2050,2100,2200,2250, encoding these polypeptides, and methods of making and 2300, 2350, 2400, 2450, 2500, or more residues, encodes at using these polynucleotides and polypeptides. In one aspect, least one polypeptide having an , structural or binding the invention is directed to polypeptides, e.g., enzymes, struc activity, and the sequence identities are determined by analy tural proteins and binding proteins, including thermostable 30 sis with a sequence comparison algorithm or by a visual and thermotolerant activity, and polynucleotides encoding inspection. In one aspect, the enzymes and proteins of the these enzymes, structural proteins and binding proteins and invention include, e.g. aldolases, alpha-, ami making and using these polynucleotides and polypeptides. dases, e.g. secondary amidases, , catalases, caro The polypeptides of the invention can be used in a variety of tenoid pathway enzymes, dehalogenases, endoglucanases, pharmaceutical, agricultural and industrial contexts, includ 35 epoxide , , hydrolases, , gly ing the manufacture of cosmetics and nutraceuticals. cosidases, inteins, , laccases, , monooxyge Additionally, the polypeptides of the invention can be used nases, nitroreductases, , P450 enzymes, pectate in food processing, brewing, bath additives, alcohol produc , , , , polymerases tion, synthesis, enantioselectivity, hide preparation in and Xylanases. In another aspect, the isolated and recombi the leather industry, waste management and animal degrada 40 nant polypeptides of the invention, including enzymes, struc tion, silver recovery in the photographic industry, medical tural proteins and binding proteins, and polynucleotides treatment, silk degumming, biofilm degradation, biomass encoding these polypeptides, of the invention have activity as conversion to ethanol, biodefense, antimicrobial agents and described in Table 1, Table 2 or Table 3, below. disinfectants, personal care and cosmetics, biotech reagents, In one aspect, the invention also provides isolated or in corn wet milling and pharmaceuticals such as digestive 45 recombinant nucleic acids with a common novelty in that they aids and anti-inflammatory (anti-phlogistic) agents. are all derived from a common Source, e.g., an environmental Source, mixed environmental sources or mixed cultures. The BACKGROUND invention provides isolated or recombinant nucleic acids iso lated from a common Source, e.g. an environmental source, The invention provides isolated and recombinant polypep 50 mixed environmental sources or mixed cultures comprising a tides, including enzymes, structural proteins and binding pro polynucleotide of the invention, e.g., an exemplary sequence teins, polynucleotides encoding these polypeptides, and of the invention, including SEQ ID NO:1, SEQ ID NO:3, methods of making and using these polynucleotides and SEQID NO:5, SEQID NO:7, SEQIDNO:9, SEQID NO:11, polypeptides. The polypeptides of the invention, and the SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID polynucleotides encoding the polypeptides of the invention, 55 NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, and encompass many classes of enzymes, structural proteins and all nucleic acids disclosed in the SEQ ID listing, which binding proteins. In one aspect, the enzymes and proteins of include all odd numbered SEQID NOs: from SEQID NO:1 the invention include, e.g. aldolases, alpha-galactosidases, through SEQID NO:26,897, over a region of at least about amidases, e.g. secondary amidases, amylases, catalases, caro 10, 15, 20, 25, 30, 35, 40, 45,50, 75, 100, 150, 200, 250,300, tenoid pathway enzymes, dehalogenases, endoglucanases, 60 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, epoxide hydrolases, esterases, hydrolases, glucosidases, gly 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, cosidases, inteins, isomerases, laccases, lipases, monooxyge 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, nases, nitroreductases, nitrilases, P450 enzymes, pectate 1950, 2000, 2050,2100, 2200,2250,2300, 2350,2400, 2450, lyases, phosphatases, phospholipases, phytases, polymerases 2500, or more residues, encodes at least one polypeptide and Xylanases. The invention also provides isolated and 65 having an enzyme, structural or binding activity, and the recombinant polypeptides, including enzymes, structural sequence identities are determined by analysis with a proteins and binding proteins, polynucleotides encoding sequence comparison algorithm or by a visual inspection. In US 8,119,385 B2 3 4 one aspect, the enzymes and proteins of the invention include, initiation activity, a TATA-binding activity, a signal transduc e.g. aldolases, alpha-galactosidases, amidases, e.g. secondary tion activity, an energy activity, an ATPase activ amidases, amylases, catalases, carotenoid pathway enzymes, ity, an information storage and/or processing activity, and/or dehalogenases, endoglucanases, epoxide hydrolases, any of the polypeptides activities as set forth in Table 1, Table esterases, hydrolases, glucosidases, glycosidases, inteins, 2 or Table 3, below. isomerases, laccases, lipases, monooxygenases, nitroreduc In one aspect, the sequence comparison algorithm is a tases, nitrilases, P450 enzymes, pectate lyases, phosphatases, BLAST version 2.2.2 algorithm where a filtering setting is set phospholipases, phytases, polymerases and Xylanases. In to blastall-p blastp -d “nr pataa’-FF, and all other options another aspect, the isolated and recombinant polypeptides of are set to default. the invention, including enzymes, structural proteins and 10 Another aspect of the invention is an isolated or recombi binding proteins, and polynucleotides encoding these nant nucleic acid including at least 10, 15, 20, 25, 30, 35, 40, polypeptides, of the invention have activity as described in 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, Table 1, Table 2 or Table 3, below. 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, In alternative aspects, the isolated or recombinant nucleic 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, acid encodes a polypeptide comprising an exemplary 15 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050,2100, sequence of the invention, e.g., including sequences as set 2200, 2250, 2300, 2350, 2400, 2450, 2500, or more consecu forth in SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID tive bases of a nucleic acid sequence of the invention, NO:8, SEQID NO:10, SEQID NO:12, SEQID NO:14, SEQ sequences substantially identical thereto, and the sequences ID NO:16, SEQID NO:18, SEQID NO:20, SEQID NO:22, complementary thereto. SEQID NO:24, and all polypeptides disclosed in the SEQID In one aspect, the isolated or recombinant nucleic acid listing, which include all even numbered SEQID NOs: from encodes a polypeptide having a enzyme, structural or binding SEQ ID NO:2 through SEQ ID NO: 26,898. In one aspect activity, that is thermostable. The polypeptide can retain these polypeptides have an enzyme, structural or binding activity under conditions comprising a temperature range of activity. In one aspect, the enzymes and proteins of the inven between about 37° C. to about 95°C.; between about 55° C. tion include, e.g. aldolases, alpha-galactosidases, amidases, 25 to about 85°C., between about 70° C. to about 95°C., or, e.g. secondary amidases, amylases, catalases, carotenoid between about 90° C. to about 95°C. pathway enzymes, dehalogenases, endoglucanases, epoxide In another aspect, the isolated or recombinant nucleic acid hydrolases, esterases, hydrolases, glucosidases, glycosi encodes a polypeptide having an enzyme, structural or bind dases, inteins, isomerases, laccases, lipases, monooxygena ing activity, which is thermotolerant. The polypeptide can ses, nitroreductases, nitrilases, P450 enzymes, pectate lyases, 30 retain activity after exposure to a temperature in the range phosphatases, phospholipases, phytases, polymerases and from greater than 37° C. to about 95°C. or anywhere in the Xylanases. In another aspect, the isolated and recombinant range from greater than 55° C. to about 85°C. The polypep polypeptides of the invention, including enzymes, structural tide can retain activity after exposure to a temperature in the proteins and binding proteins, and polynucleotides encoding range between about 1° C. to about 5°C., between about 5°C. these polypeptides, of the invention have activity as described 35 to about 15° C., between about 15° C. to about 25° C., in Table 1, Table 2 or Table 3, below. between about 25°C. to about 37°C., between about 37° C. In alternative aspects, the enzyme, structural or binding to about 95° C., between about 55° C. to about 85°C., activity comprises a recombinase activity, a helicase activity, between about 70° C. to about 75°C., or between about 90° C. a DNA replication activity, a DNA recombination activity, an to about 95°C., or more. In one aspect, the polypeptide retains , a trans-isomerase activity or topoisomerase activ 40 activity after exposure to a temperature in the range from ity, a methyltransferase activity, an aminotransferase activity, greater than 90° C. to about 95° C. at about pH 4.5. a -5-methyl activity, a cysteinyl tRNA syn The invention provides isolated or recombinant nucleic thetase activity, a hydrolase, an activity, a phospho acids comprising a sequence that hybridizes under stringent esterase activity, an acetylmuramyl pentapeptide phospho conditions to a nucleic acid comprising a sequence of the transferase activity, a activity, an 45 invention, e.g., an exemplary sequence of the invention, activity, an acetylglucosamine phosphate including SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ transferase activity, a centromere binding activity, a telom ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, erase activity or a transcriptional regulatory activity, a heat SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID shock activity, a activity, a proteinase activ NO:21, SEQID NO:23, SEQID NO:25, and all nucleic acids ity, a peptidase activity, a activity, an endo 50 disclosed in the SEQID listing, which include all odd num activity, an activity, a RecB family exo bered SEQ ID NOs: from SEQ ID NO:1 through SEQ ID nuclease activity, a polymerase activity, a carbamoyl NO:26,897, or fragments or subsequences thereof. In one phosphate synthetase activity, a methyl-thioadenine Syn aspect, the nucleic acid encodes a polypeptide having a thetase activity, an activity, an Fe—S oxi enzyme, structural or binding activity. The nucleic acid can be doreductase activity, a flavodoxin reductase activity, a per 55 at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, mease activity, a thymidylate activity, a dehydrogenase 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, activity, a pyrophosphorylase activity, a coenzyme metabo 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200 or more lism activity, a dinucleotide-utilizing enzyme activity, a residues in length or the full length of the or transcript. molybdopterin or thiamine activity, a beta-lac In one aspect, the stringent conditions include a wash step tamase activity, a ligand binding activity, an transport 60 comprising a wash in 0.2xSSC at a temperature of about 65° activity, an ion metabolism activity, a tellurite resistance pro C. for about 15 minutes. tein activity, an inorganic ion transport activity, a The invention provides a nucleic acid probe for identifying transport activity, a nucleotide metabolism activity, an actin a nucleic acid encoding a polypeptidehaving a enzyme, struc or activity, a activity or a acyl hydrolase tural orbinding activity, wherein the probe comprises at least (LAH) activity, a envelop biogenesis activity, an outer 65 about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, membrane synthesis activity, a ribosomal structure synthesis 85,90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500,550, activity, a translational processing activity, a transcriptional 600, 650, 700, 750, 800, 850, 900, 950, 1000 or more, con US 8,119,385 B2 5 6 secutive bases of a sequence comprising a sequence of the capable of amplifying a nucleic acid sequence of the inven invention, or fragments or Subsequences thereof, wherein the tion, or fragments or Subsequences thereof. probe identifies the nucleic acid by binding or hybridization. The invention provides expression cassettes comprising a The probe can comprise an oligonucleotide comprising at nucleic acid of the invention or a Subsequence thereof. In one least about 10 to 50, about 20 to 60, about 30 to 70, about 40 aspect, the expression cassette can comprise the nucleic acid to 80, or about 60 to 100 consecutive bases of a sequence that is operably linked to a promoter. The promoter can be a comprising a sequence of the invention, or fragments or Sub viral, bacterial, mammalian or plant promoter. In one aspect, sequences thereof. the plant promoter can be a potato, rice, corn, wheat, tobacco The invention provides a nucleic acid probe for identifying or barley promoter. The promoter can be a constitutive pro 10 moter. The constitutive promoter can comprise CaMV35S. In a nucleic acid encoding a polypeptidehaving a enzyme, struc another aspect, the promoter can be an inducible promoter. In tural or binding activity, wherein the probe comprises a one aspect, the promoter can be a tissue-specific promoter or nucleic acid comprising a sequence at least about 10, 15, 20, an environmentally regulated or a developmentally regulated 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, promoter. Thus, the promoter can be, e.g., a seed-specific, a 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000 15 leaf-specific, a root-specific, a stem-specific oran abscission or more residues having at least about 50%, 51%, 52%. 53%, induced promoter. In one aspect, the expression cassette can 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, further comprise a plant or plant virus expression vector. 64%. 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, The invention provides cloning vehicles comprising an 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, expression cassette (e.g., a vector) of the invention or a 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, nucleic acid of the invention. The cloning vehicle can be a 94%. 95%, 96%, 97%, 98%, 99%, or more, or complete viral vector, a plasmid, a phage, a phagemid, a cosmid, a (100%) sequence identity to a nucleic acid of the invention. In fosmid, a bacteriophage or an artificial chromosome. The one aspect, the sequence identities are determined by analysis viral vector can comprise an adenovirus vector, a retroviral with a sequence comparison algorithm or by visual inspec vector or an adeno-associated viral vector. The cloning tion. In alternative aspects, the probe can comprise an oligo 25 vehicle can comprise a bacterial artificial chromosome nucleotide comprising at least about 10 to 50, about 20 to 60, (BAC), a plasmid, a bacteriophage P1-derived vector (PAC), about 30 to 70, about 40 to 80, or about 60 to 100 consecutive a yeast artificial chromosome (YAC), or a mammalian artifi bases of a nucleic acid sequence of the invention, or a Subse cial chromosome (MAC). quence thereof. The invention provides transformed cell comprising a The invention provides an amplification primer pair for 30 nucleic acid of the invention or an expression cassette (e.g., a amplifying a nucleic acid encoding a polypeptide having a vector) of the invention, or a cloning vehicle of the invention. enzyme, structural or binding activity, wherein the primer In one aspect, the transformed cell can be a bacterial cell, a pair is capable of amplifying a nucleic acid comprising a mammalian cell, a fungal cell, a yeast cell, an insect cell or a sequence of the invention, or fragments or Subsequences plant cell. In one aspect, the plant cell can be a cereal, a potato, thereof. One or each member of the amplification primer 35 wheat, rice, corn, tobacco or barley cell. sequence pair can comprise an oligonucleotide comprising at The invention provides transgenic non-human animals least about 10 to 50, or more, consecutive bases of the comprising a nucleic acid of the invention or an expression sequence, or about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22. cassette (e.g., a vector) of the invention. In one aspect, the 23, 24, 25, 26, 27, 28, 29, 30 or more consecutive bases of the animal is a mouse, a rat, a pig, a goat or a sheep. Sequence. 40 The invention provides transgenic plants comprising a The invention provides amplification primer pairs, wherein nucleic acid of the invention or an expression cassette (e.g., a the primer pair comprises a first member having a sequence as vector) of the invention. The transgenic plant can be a cereal set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, plant, a corn plant, a potato plant, a tomato plant, a wheat 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34,35, 36 plant, an oilseed plant, a rapeseed plant, a Soybean plant, a or more residues of a nucleic acid of the invention, and a 45 rice plant, a barley plant or a tobacco plant. second member having a sequence as set forth by about the The invention provides transgenic seeds comprising a first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, nucleic acid of the invention or an expression cassette (e.g., a 25, 26, 27, 28, 29, 30, 31, 32,33, 34,35, 36 or more residues vector) of the invention. The transgenic seed can be a cereal of the complementary strand of the first member. plant, a corn seed, a wheat kernel, an oilseed, a rapeseed, a The invention provides polypeptide-, enzyme-, protein-, 50 Soybean seed, a palm kernel, a Sunflower seed, a sesame seed, e.g. structural or binding protein-encoding nucleic acids gen a peanut or a tobacco plant seed. erated by amplification, e.g., polymerase chain reaction The invention provides an antisense oligonucleotide com (PCR), using an amplification primer pair of the invention. prising a nucleic acid sequence complementary to or capable The invention provides polypeptide-, enzyme-, protein-, e.g. of hybridizing under stringent conditions to a nucleic acid of structural or binding protein-encoding nucleic acids gener 55 the invention. The invention provides methods of inhibiting ated by amplification, e.g., polymerase chain reaction (PCR). the translation of a polypeptide, enzyme, protein, e.g. struc using an amplification primer pair of the invention. The tural or binding protein message in a cell comprising admin invention provides methods of making a polypeptide, istering to the cell or expressing in the cell an antisense enzyme, protein, e.g. structural or binding protein, by ampli oligonucleotide comprising a nucleic acid sequence comple fication, e.g., polymerase chain reaction (PCR), using an 60 mentary to or capable of hybridizing under stringent condi amplification primer pair of the invention. In one aspect, the tions to a nucleic acid of the invention. In one aspect, the amplification primer pair amplifies a nucleic acid from a antisense oligonucleotide is between about 10 to 50, about 20 library, e.g., a gene library, such as an environmental library. to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases The invention provides methods of amplifying a nucleic in length, e.g., 10, 15, 20, 25, 30,35, 40, 45,50, 55, 60, 65,70, acid encoding a polypeptide having an enzyme, structural or 65 75, 80, 85,90, 95, 100 or more bases in length. binding activity, comprising amplification of a template The invention provides methods of inhibiting the transla nucleic acid with an amplification primer sequence pair tion of a polypeptide, enzyme, protein, e.g. structural or bind US 8,119,385 B2 7 8 ing protein message in a cell comprising administering to the Exemplary polypeptide or peptide sequences of the inven cell or expressing in the cell an antisense oligonucleotide tion include SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, comprising a nucleic acid sequence complementary to or SEQ ID NO:8, SEQ ID NO:10, etc., and all polypeptides capable of hybridizing understringent conditions to a nucleic disclosed in the SEQID listing, which include all even num acid of the invention. The invention provides double-stranded bered SEQ ID NOs: from SEQ ID NO:2 through SEQ ID inhibitory RNA (RNAi, or RNA interference) molecules (in NO:26,898, and subsequences thereof and variants thereof. cluding small interfering RNA, or siRNAs, for inhibiting Exemplary polypeptides also include fragments of at least transcription, and microRNAs, or miRNAs, for inhibiting about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75,80, 85,90, 95, 100, translation) comprising a Subsequence of a sequence of the 150, 200, 250, 300, 350, 400, 450, 500, 550, 600 or more invention. In one aspect, the RNAi is about 15, 16, 17, 18, 19. 10 residues in length, or over the full length of an enzyme. 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34,35, 40, Exemplary polypeptide or peptide sequences of the invention 45, 50, 55, 60, 65,70, 75, 80, 85,90, 95, 100 or more duplex include sequence encoded by a nucleic acid of the invention. in length. The invention provides methods of Exemplary polypeptide or peptide sequences of the invention inhibiting the expression of a polypeptide, enzyme, protein, include polypeptides or specifically bound by an peptide, e.g. structural or binding protein in a cell comprising 15 antibody of the invention. administering to the cell or expressing in the cell a double In one aspect, the polypeptide, enzyme, protein, e.g. struc stranded inhibitory RNA (iRNA, including small interfering tural or binding protein, is thermostable. The polypeptide, RNA, or siRNAs, for inhibiting transcription, and microR enzyme, protein, e.g. structural or binding protein can retain NAS, or miRNAs, for inhibiting translation), wherein the activity under conditions comprising a temperature range of RNA comprises a Subsequence of a sequence of the invention. between about 1° C. to about 5°C., between about 5° C. to The invention provides isolated or recombinant polypep about 15° C., between about 15° C. to about 25°C., between tides encoded by a nucleic acid of the invention. In alterna about 25°C. to about 37°C., between about 37° C. to about tive-aspects, the polypeptide can have a sequence as set forth 95°C., between about 55° C. to about 85°C., between about in SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID 70° C. to about 75° C., or between about 90° C. to about 95° NO:8, SEQID NO:10, etc., and all polypeptides disclosed in 25 C., or more. In another aspect, the polypeptide, enzyme, the SEQID listing, which include all even numbered SEQID protein, e.g. structural or binding protein can be thermotoler NOs: from SEQ ID NO:2 through SEQ ID NO:26,898 (the ant. The polypeptide, enzyme, protein, e.g. structural or bind exemplary sequences of the invention), or Subsequences ing protein can retain activity after exposure to a temperature thereof, including fragments having enzymatic and/or Sub in the range from greater than 37°C. to about 95°C., or in the strate binding activity. The polypeptide can have an enzyme, 30 range from greater than 55° C. to about 85°C. In one aspect, structural or binding activity. the polypeptide, enzyme, protein, e.g. structural or binding In alternative aspects, the enzyme, structural or binding protein can retain activity after exposure to a temperature in activity comprises a recombinase activity, a helicase activity, the range from greater than 90° C. to about 95°C. at pH 4.5. a DNA replication activity, a DNA recombination activity, an Another aspect of the invention provides an isolated or isomerase, a trans-isomerase activity or topoisomerase activ 35 recombinant polypeptide or peptide including at least 10, 15. ity, a methyltransferase activity, an aminotransferase activity, 20, 25, 30, 35, 40, 45, 50,55, 60, 65,70, 75, 80, 85,90, 95 or a uracil-5-methyl transferase activity, a cysteinyl tRNA syn 100 or more consecutive bases of a polypeptide or peptide thetase activity, a hydrolase, an esterase activity, a phospho sequence of the invention, sequences Substantially identical esterase activity, an acetylmuramyl pentapeptide phospho thereto, and the sequences complementary thereto. The pep transferase activity, a glycosyltransferase activity, an 40 tide can be, e.g., an immunogenic fragment, a motif (e.g., a acetyltransferase activity, an acetylglucosamine phosphate ), a signal sequence, a prepro sequence or an transferase activity, a centromere binding activity, a telom . erase activity or a transcriptional regulatory activity, a heat The invention provides isolated or recombinant nucleic shock protein activity, a protease activity, a proteinase activ acids comprising a sequence encoding a polypeptide, ity, a peptidase activity, a carboxypeptidase activity, an endo 45 enzyme, protein, e.g. structural or binding protein having any nuclease activity, an exonuclease activity, a RecB family exo of the activities as set forth in Tables 1, 2 or 3, and a signal nuclease activity, a polymerase activity, a carbamoyl sequence, wherein the nucleic acid comprises a sequence of phosphate synthetase activity, a methyl-thioadenine Syn the invention. In one aspect, the isolated or recombinant thetase activity, an oxidoreductase activity, an Fe—S oxi polypeptide can comprise the polypeptide of the invention doreductase activity, a flavodoxin reductase activity, a per 50 comprising a heterologous signal sequence or a heterologous mease activity, a thymidylate activity, a dehydrogenase preprosequence, Such as a heterologous enzyme or non-en activity, a pyrophosphorylase activity, a coenzyme metabo Zyme signal sequence. The invention provides isolated or lism activity, a dinucleotide-utilizing enzyme activity, a recombinant nucleic acids comprising a sequence encoding a molybdopterin or thiamine biosynthesis activity, a beta-lac polypeptide, enzyme, protein, e.g. structural or binding pro tamase activity, a ligand binding activity, an ion transport 55 tein having any of the activities as set forth in Tables 1, 2 or 3, activity, an ion metabolism activity, a tellurite resistance pro wherein the sequence does not contain a signal sequence and tein activity, an inorganic ion transport activity, a nucleotide the nucleic acid comprises a sequence of the invention. In one transport activity, a nucleotide metabolism activity, an actin aspect, the invention provides an isolated or recombinant or myosin activity, a lipase activity or a lipid acyl hydrolase polypeptide comprising a polypeptide of the invention lack (LAH) activity, a cell envelop biogenesis activity, an outer 60 ing all or part of a signal sequence. membrane synthesis activity, a ribosomal structure synthesis In one aspect, the invention provides chimeric proteins activity, a translational processing activity, a transcriptional comprising a first domain comprising a signal sequence of the initiation activity, a TATA-binding activity, a signal transduc invention and at least a second domain. The protein can be a tion activity, an energy metabolism activity, an ATPase activ fusion protein. The second domain can comprise an enzyme. ity, an information storage and/or processing activity, and/or 65 The enzyme can be a non-enzyme. any of the polypeptides activities as set forth in Table 1, Table The invention provides chimeric polypeptides comprising 2 or Table 3, below. at least a first domain comprising signal peptide (SP), a prepro US 8,119,385 B2 9 10 sequence and/or a catalytic domain (CD) of the invention and motolerance can comprise retention of specific activity at 37° at least a second domain comprising a heterologous polypep C. in the range from about 1 to about 500 units per milligram tide or peptide, wherein the heterologous polypeptide or pep of protein after being heated to the elevated temperature. tide is not naturally associated with the signal peptide (SP), The invention provides the isolated or recombinant prepro sequence and/or catalytic domain (CD). In one aspect, polypeptide of the invention, wherein the polypeptide com the heterologous polypeptide or peptide is not an enzyme. prises at least one glycosylation site. In one aspect, glycosy The heterologous polypeptide or peptide can be amino termi lation can be an N-linked glycosylation. In one aspect, the nal to, carboxy terminal to or on both ends of the signal polypeptide can be glycosylated after being expressed in a P peptide (SP), prepro sequence and/or catalytic domain (CD). pastoris or a S. pombe. The invention provides isolated or recombinant nucleic 10 In one aspect, the polypeptide, enzyme, protein, e.g. struc acids encoding a chimeric polypeptide, wherein the chimeric polypeptide comprises at least a first domain comprising sig tural or binding protein can retain activity under conditions nal peptide (SP), a prepro domain and/or a catalytic domain comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4. (CD) of the invention and at least a second domain compris In another aspect, the polypeptide, enzyme, protein, e.g. ing a heterologous polypeptide or peptide, wherein the het 15 structural or binding protein can retain activity under condi erologous polypeptide or peptide is not naturally associated tions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH with the signal peptide (SP), prepro domain and/or catalytic 9.5, pH 10, pH 10.5 or pH 11. In one aspect, the polypeptide domain (CD). can retain an enzyme, structural or binding activity after The invention provides isolated or recombinant signal exposure to conditions comprising about pH 6.5, pH 6, pH sequences (e.g., signal peptides) consisting of or comprising 5.5, pH 5, pH 4.5 or pH 4. In another aspect, the polypeptide a sequence as set forth in residues 1 to 14, 1 to 15, 1 to 16, 1 can retain enzyme, structural or binding activity after expo to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, sure to conditions comprising about pH 7, pH 7.5 pH 8.0, pH 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 11. 32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 to 38, 1 to 40, In one aspect, the polypeptide, enzyme, protein, e.g. struc 1 to 41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46 or 1 to 47, of 25 tural or binding protein of the invention has activity at under a polypeptide of the invention, including the exemplary alkaline conditions, e.g., the alkaline conditions of the gut, polypeptides of the invention (including SEQID NO:2, SEQ e.g., the Small intestine. In one aspect, the polypeptide, ID NO:4, SEQID NO:6, SEQID NO:8, SEQID NO:10, etc., enzyme, protein, e.g. structural or binding protein can retain and all polypeptides disclosed in the SEQ ID listing, which activity after exposure to the acidic pH of the stomach. include all even numbered SEQID NOs: from SEQID NO:2 30 The invention provides protein preparations comprising a through SEQ ID NO:26,898). In one aspect, the invention polypeptide of the invention, wherein the protein preparation provides signal sequences comprising the first 14, 15, 16, 17. comprises a liquid, a solid or a gel. 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34, The invention provides heterodimers comprising a 35,36, 37,38, 39, 40, 41,42, 43,44, 45,46, 47, 48,49, 50, 51, polypeptide of the invention and a second protein or domain. 52,53,54, 55,56, 57,58, 59, 60, 61, 62,63, 64, 65, 66, 67,68, 35 The second member of the heterodimer can be a different 69, 70 or more amino terminal residues of a polypeptide of the enzyme, a different enzyme or another protein. In one aspect, invention. the second domain can be a polypeptide and the heterodimer In one aspect, the enzyme, structural or binding activity can be a fusion protein. In one aspect, the second domain can comprises a specific activity at about 37°C. in the range from be an epitope or a tag. In one aspect, the invention provides about 1 to about 1200 units per milligram of protein, or, about 40 homodimers comprising a polypeptide of the invention. 100 to about 1000 units per milligram of protein. In another The invention provides immobilized polypeptides having aspect, the polypeptide, enzyme, protein, e.g. structural or enzyme, structural or binding activity, wherein the polypep binding protein activity comprises a specific activity from tide comprises a polypeptide of the invention, a polypeptide about 100 to about 1000 units per milligram of protein, or, encoded by a nucleic acid of the invention, or a polypeptide from about 500 to about 750 units per milligram of protein. 45 comprising a polypeptide of the invention and a second Alternatively, the enzyme, structural or binding activity com domain. In one aspect, the polypeptide can be immobilized on prises a specific activity at 37°C. in the range from about 1 to a cell, a metal, a resin, a polymer, a ceramic, a glass, a micro about 750 units per milligram of protein, or, from about 500 electrode, a graphitic particle, a bead, a gel, a plate, an array to about 1200 units per milligram of protein. In one aspect, the or a capillary tube. enzyme, structural or binding activity comprises a specific 50 The invention provides arrays comprising an immobilized activity at 37°C. in the range from about 1 to about 500 units nucleic acid of the invention. The invention provides arrays per milligram of protein, or, from about 750 to about 1000 comprising an antibody of the invention. units per milligram of protein. In another aspect, the enzyme, The invention provides isolated or recombinant antibodies structural or binding activity comprises a specific activity at that specifically bind to a polypeptide of the invention or to a 37° C. in the range from about 1 to about 250 units per 55 polypeptide encoded by a nucleic acid of the invention. These milligram of protein. Alternatively, the enzyme, structural or antibodies of the invention can be a monoclonal or a poly binding activity comprises a specific activity at 37°C. in the clonal antibody. The invention provides hybridomas compris range from about 1 to about 100 units per milligram of pro ing an antibody of the invention, e.g., an antibody that spe tein. cifically binds to a polypeptide of the invention or to a In another aspect, thermotolerance comprises retention of 60 polypeptide encoded by a nucleic acid of the invention. The at least half of the specific activity of the enzyme, structural or invention provides nucleic acids encoding these antibodies. binding protein at 37° C. after being heated to the elevated The invention provides method of isolating or identifying a temperature. Alternatively, thermotolerance can comprise polypeptide having enzyme, structural or binding activity retention of specific activity at 37°C. in the range from about comprising the steps of: (a) providing an antibody of the 1 to about 1200 units per milligram of protein, or, from about 65 invention; (b) providing a sample comprising polypeptides; 500 to about 1000 units per milligram of protein, after being and (c) contacting the sample of step (b) with the antibody of heated to the elevated temperature. In another aspect, ther step (a) under conditions wherein the antibody can specifi US 8,119,385 B2 11 12 cally bind to the polypeptide, thereby isolating or identifying an activity of the polypeptide, enzyme, protein, e.g. structural a polypeptide having an enzyme, structural or binding activ orbinding protein, wherein a change in the enzyme, structural ity. or binding activity measured in the presence of the test com The invention provides methods of making an anti pound compared to the activity in the absence of the test polypeptide, anti-enzyme, or anti-protein, e.g. anti-structural compound provides a determination that the test compound oranti-binding protein, antibody comprising administering to modulates the enzyme, structural or binding activity. In one a non-human animal a nucleic acid of the invention or a aspect, the enzyme, structural or binding activity can be mea polypeptide of the invention or Subsequences thereof in an Sured by providing a polypeptide, enzyme, protein, e.g. struc amount Sufficient to generate a humoral immune response, tural orbinding protein, and detecting a decrease in thereby making an anti-polypeptide, anti-enzyme, or anti 10 the amount of the Substrate or an increase in the amount of a protein, e.g. anti-structural or anti-binding protein, antibody. reaction , or, an increase in the amount of the Substrate The invention provides methods of making an anti-polypep or a decrease in the amount of a reaction product. A decrease tide, anti-enzyme, or anti-protein, e.g. anti-structural or anti in the amount of the Substrate or an increase in the amount of binding protein, immune comprising administering to a non the reaction product with the test compound as compared to human animal a nucleic acid of the invention or a polypeptide 15 the amount of substrate or reaction product without the test of the invention or Subsequences thereof in an amount Suffi compound identifies the test compound as an activator of cient to generate an immune response. enzyme, structural or binding activity. An increase in the The invention provides methods of producing a recombi amount of the Substrate or a decrease in the amount of the nant polypeptide comprising the steps of: (a) providing a reaction product with the test compound as compared to the nucleic acid of the invention operably linked to a promoter; amount of substrate or reaction product without the test com and (b) expressing the nucleic acid of step (a) under condi pound identifies the test compound as an inhibitor of enzyme, tions that allow expression of the polypeptide, thereby pro structural or binding activity. ducing a recombinant polypeptide. In one aspect, the method The invention provides computer systems comprising a can further comprise transforming a host cell with the nucleic processor and a data storage device wherein said data storage acid of step (a) followed by expressing the nucleic acid of step 25 device has stored thereon a polypeptide sequence or a nucleic (a), thereby producing a recombinant polypeptide in a trans acid sequence of the invention (e.g., a polypeptide encoded by formed cell. a nucleic acid of the invention). In one aspect, the computer The invention provides methods for identifying a polypep system can further comprise a sequence comparison algo tide having enzyme, structural or binding activity comprising rithm and a data storage device having at least one reference the following steps: (a) providing a polypeptide of the inven 30 sequence stored thereon. In another aspect, the sequence tion; or a polypeptide encoded by a nucleic acid of the inven comparison algorithm comprises a computer program that tion; (b) providing an enzyme, structural or binding activity indicates polymorphisms. In one aspect, the computer system Substrate; and (c) contacting the polypeptide or a fragment or can further comprise an identifier that identifies one or more variant thereof of step (a) with the substrate of step (b) and features in said sequence. The invention provides computer detecting a decrease in the amount of Substrate or an increase 35 readable media having stored thereon a polypeptide sequence in the amount of a reaction product, wherein a decrease in the or a nucleic acid sequence of the invention. The invention amount of the Substrate or an increase in the amount of the provides methods for identifying a feature in a sequence reaction product detects a polypeptide having a enzyme, comprising the steps of: (a) reading the sequence using a structural or binding activity. computer program which identifies one or more features in a The invention provides methods for identifying a polypep 40 sequence, wherein the sequence comprises a polypeptide tide, enzyme, protein, e.g. structural or binding protein, Sub sequence or a nucleic acid sequence of the invention; and (b) strate comprising the following steps: (a) providing a identifying one or more features in the sequence with the polypeptide of the invention; or a polypeptide encoded by a computer program. The invention provides methods for com nucleic acid of the invention; (b) providing a test Substrate; paring a first sequence to a second sequence comprising the and (c) contacting the polypeptide of step (a) with the test 45 steps of: (a) reading the first sequence and the second Substrate of step (b) and detecting a decrease in the amount of sequence through use of a computer program which com Substrate or an increase in the amount of reaction product, pares sequences, wherein the first sequence comprises a wherein a decrease in the amount of the Substrate or an polypeptide sequence or a nucleic acid sequence of the inven increase in the amount of a reaction product identifies the test tion; and (b) determining differences between the first Substrate as a polypeptide, enzyme, protein, e.g. structural or 50 sequence and the second sequence with the computer pro binding protein, Substrate. gram. The step of determining differences between the first The invention provides methods of determining whether a sequence and the second sequence can further comprise the test compound specifically binds to a polypeptide comprising step of identifying polymorphisms. In one aspect, the method the following steps: (a) expressing a nucleic acid or a vector can further comprise an identifier that identifies one or more comprising the nucleic acid under conditions permissive for 55 features in a sequence. In another aspect, the method can translation of the nucleic acid to a polypeptide, wherein the comprise reading the first sequence using a computer pro nucleic acid comprises a nucleic acid of the invention, or, gram and identifying one or more features in the sequence. providing a polypeptide of the invention; (b) providing a test The invention provides methods for isolating or recovering compound; (c) contacting the polypeptide with the test com a nucleic acid encoding a polypeptide, enzyme, protein, e.g. pound; and (d) determining whether the test compound of 60 structural or binding protein, from an environmental sample step (b) specifically binds to the polypeptide. comprising the steps of: (a) providing an amplification primer The invention provides methods for identifying a modula sequence pair for amplifying a nucleic acid encoding a tor of a enzyme, structural or binding activity comprising the polypeptide, enzyme, protein, e.g. structural or binding pro following steps: (a) providing a polypeptide of the invention tein, wherein the primer pair is capable of amplifying a or a polypeptide encoded by a nucleic acid of the invention; 65 nucleic acid of the invention; (b) isolating a nucleic acid from (b) providing a test compound; (c) contacting the polypeptide the environmental sample or treating the environmental of step (a) with the test compound of step (b) and measuring sample Such that nucleic acid in the sample is accessible for US 8,119,385 B2 13 14 hybridization to the amplification primer pair; and, (c) com being exposed to an elevated temperature. In another aspect, bining the nucleic acid of step (b) with the amplification the variant the polypeptide, enzyme, protein, e.g. structural or primer pair of step (a) and amplifying nucleic acid from the binding protein has increased glycosylation as compared to environmental sample, thereby isolating or recovering a the polypeptide, enzyme, protein, e.g. structural or binding nucleic acid encoding a polypeptide, enzyme, protein, e.g. protein encoded by a template nucleic acid. Alternatively, the structural or binding protein from an environmental sample. variant the polypeptide, enzyme, protein, e.g. structural or One or each member of the amplification primer sequence binding protein has an enzyme, structural or binding activity pair can comprise an oligonucleotide comprising an amplifi under a high temperature, wherein the polypeptide, enzyme, cation primer sequence pair of the invention, e.g., having at protein, e.g. structural or binding protein encoded by the least about 10 to 50 consecutive bases of a sequence of the 10 template nucleic acid is not active under the high temperature. invention. In one aspect, the method can be iteratively repeated until a The invention provides methods for isolating or recovering polypeptide, enzyme, protein, e.g. structural or binding pro a nucleic acid encoding a polypeptide, enzyme, protein, e.g. tein coding sequence having an altered codon usage from that structural or binding protein from an environmental sample of the template nucleic acid is produced. In another aspect, the comprising the steps of: (a) providing a polynucleotide probe 15 method can be iteratively repeated until a polypeptide, comprising a nucleic acid of the invention or a Subsequence enzyme, protein, e.g. structural or binding protein gene hav thereof; (b) isolating a nucleic acid from the environmental ing higher or lower level of message expression or stability sample or treating the environmental sample Such that nucleic from that of the template nucleic acid is produced. acid in the sample is accessible for hybridization to a poly The invention provides methods for modifying codons in a nucleotide probe of step (a); (c) combining the isolated nucleic acid encoding a polypeptide having an enzyme, struc nucleic acid or the treated environmental sample of step (b) tural or binding activity to increase its expression in a host with the polynucleotide probe of step (a); and (d) isolating a cell, the method comprising the following steps: (a) providing nucleic acid that specifically hybridizes with the polynucle a nucleic acid of the invention encoding a polypeptide having otide probe of step (a), thereby isolating or recovering a an enzyme, structural or binding activity; and, (b) identifying nucleic acid encoding a polypeptide, enzyme, protein, e.g. 25 a non-preferred or a less preferred codon in the nucleic acid of structural or binding protein from an environmental sample. step (a) and replacing it with a preferred or neutrally used The environmental sample can comprise a water sample, a codon encoding the same as the replaced codon, liquid sample, a Soil sample, an air sample or a biological wherein a preferred codon is a codon over-represented in sample. In one aspect, the biological sample can be derived coding sequences in in the host cell and a non-preferred from a bacterial cell, a protozoan cell, an insect cell, a yeast 30 or less preferred codon is a codon under-represented in cod cell, a plant cell, a fungal cell or a mammalian cell. ing sequences in genes in the host cell, thereby modifying the The invention provides methods of generating a variant of nucleic acid to increase its expression in a host cell. a nucleic acid encoding a polypeptide having an enzyme, The invention provides methods for modifying codons in a structural or binding activity comprising the steps of: (a) nucleic acid encoding a polypeptide having an enzyme, struc providing a template nucleic acid comprising a nucleic acid of 35 tural orbinding activity; the method comprising the following the invention; and (b) modifying, deleting or adding one or steps: (a) providing a nucleic acid of the invention; and, (b) more nucleotides in the template sequence, or a combination identifying a codon in the nucleic acid of step (a) and replac thereof, to generate a variant of the template nucleic acid. In ing it with a different codon encoding the same amino acid as one aspect, the method can further comprise expressing the the replaced codon, thereby modifying codons in a nucleic variant nucleic acid to generate a variant the polypeptide, 40 acid encoding a polypeptide, enzyme, protein, e.g. structural enzyme, protein, e.g. structural or binding protein. The modi or binding protein. fications, additions or deletions can be introduced by a The invention provides methods for modifying codons in a method comprising error-prone PCR, shuffling, oligonucle nucleic acid encoding a polypeptide having an enzyme, struc otide-directed mutagenesis, assembly PCR, sexual PCR tural or binding activity to increase its expression in a host mutagenesis, in Vivo mutagenesis, cassette mutagenesis, 45 cell, the method comprising the following steps: (a) providing recursive ensemble mutagenesis, exponential ensemble a nucleic acid of the invention encoding a polypeptide, mutagenesis, site-specific mutagenesis, gene reassembly, enzyme, protein, e.g. structural or binding protein, polypep Gene Site Saturation Mutagenesis (GSSM), synthetic liga tide; and, (b) identifying a non-preferred or a less preferred tion reassembly (SLR) or a combination thereof. In another codon in the nucleic acid of step (a) and replacing it with a aspect, the modifications, additions or deletions are intro 50 preferred or neutrally used codon encoding the same amino duced by a method comprising recombination, recursive acid as the replaced codon, wherein a preferred codon is a sequence recombination, phosphothioate-modified DNA codon over-represented in coding sequences in genes in the mutagenesis, uracil-containing template mutagenesis, host cell and a non-preferred or less preferred codon is a gapped duplex mutagenesis, point mismatch repair mutagen codon under-represented in coding sequences in genes in the esis, repair-deficient host strain mutagenesis, chemical 55 host cell, thereby modifying the nucleic acid to increase its mutagenesis, radiogenic mutagenesis, deletion mutagenesis, expression in a host cell. restriction-selection mutagenesis, restriction-purification The invention provides methods for modifying a codon in mutagenesis, artificial gene synthesis, ensemble mutagen a nucleic acid encoding a polypeptide having an enzyme, esis, chimeric nucleic acid multimer creation and a combina structural or binding activity to decrease its expression in a tion thereof. 60 host cell, the method comprising the following steps: (a) In one aspect, the method can be iteratively repeated until providing a nucleic acid of the invention; and (b) identifying a polypeptide, enzyme, protein, e.g. structural or binding at least one preferred codon in the nucleic acid of step (a) and protein having an altered or different activity or an altered or replacing it with a non-preferred or less preferred codon different stability from that of a polypeptide encoded by the encoding the same amino acid as the replaced codon, wherein template nucleic acid is produced. In one aspect, the variant 65 a preferred codon is a codon over-represented in coding the polypeptide, enzyme, protein, e.g. structural or binding sequences in genes in a host cell and a non-preferred or less protein is thermotolerant, and retains some activity after preferred codon is a codon under-represented in coding US 8,119,385 B2 15 16 sequences in genes in the host cell, thereby modifying the erating a library of modified small molecules produced by at nucleic acid to decrease its expression in a host cell. In one least one enzymatic reaction catalyzed by the enzyme. In one aspect, the host cell can be a bacterial cell, a fungal cell, an aspect, the method further comprises a plurality of additional insect cell, a yeast cell, a plant cell or a mammalian cell. enzymes under conditions that facilitate a plurality of bio The invention provides methods for producing a library of catalytic reactions by the enzymes to form a library of modi nucleic acids encoding a plurality of modified polypeptides, fied small molecules produced by the plurality of enzymatic enzymes, proteins, e.g. structural or binding proteins, active reactions. In one aspect, the method further comprises the sites or substrate binding sites, wherein the modified active step of testing the library to determine ifa particular modified sites or substrate binding sites are derived from a first nucleic small molecule that exhibits a desired activity is present acid comprising a sequence encoding a first active site or a 10 within the library. The step of testing the library can further first substrate binding site the method comprising the follow comprises the steps of systematically eliminating all but one ing steps: (a) providing a first nucleic acid encoding a first of the biocatalytic reactions used to produce a portion of the active site or first substrate binding site, wherein the first plurality of the modified small molecules within the library nucleic acid sequence comprises a sequence that hybridizes by testing the portion of the modified small molecule for the under stringent conditions to a nucleic acid of the invention, 15 presence or absence of the particular modified Small molecule and the nucleic acid encodes a polypeptide, enzyme, protein, with a desired activity, and identifying at least one specific e.g. structural or binding protein, active site or a polypeptide, biocatalytic reaction that produces the particular modified enzyme, protein, e.g. structural or binding protein, Substrate small molecule of desired activity. binding site; (b) providing a set of mutagenic oligonucle The invention provides methods for determining a func otides that encode naturally-occurring amino acid variants at tional fragment of a polypeptide, enzyme, protein, e.g. struc a plurality of targeted codons in the first nucleic acid; and, (c) tural orbinding protein, comprising the steps of: (a) providing using the set of mutagenic oligonucleotides to generate a set a polypeptide, enzyme, protein, e.g. structural or binding of active site-encoding or Substrate binding site-encoding protein, wherein the enzyme comprises a polypeptide of the variant nucleic acids encoding a range of amino acid varia invention, or a polypeptide encoded by a nucleic acid of the tions at each amino acid codon that was mutagenized, thereby 25 invention, or a Subsequence thereof, and (b) deleting a plu producing a library of nucleic acids encoding a plurality of rality of amino acid residues from the sequence of step (a) and modified the polypeptide, enzyme, protein, e.g. structural or testing the remaining Subsequence for an enzyme, structural binding protein, active sites or Substrate binding sites. In one or binding activity, thereby determining a functional frag aspect, the method comprises mutagenizing the first nucleic ment of a polypeptide, enzyme, protein, e.g. structural or acid of step (a) by a method comprising an optimized directed 30 binding protein. In one aspect, the polypeptide, enzyme, pro evolution system, Gene Site Saturation Mutagenesis tein, e.g. structural or binding protein activity is measured by (GSSM), synthetic ligation reassembly (SLR), error-prone providing a polypeptide, enzyme, protein, e.g. structural or PCR, shuffling, oligonucleotide-directed mutagenesis, binding protein, Substrate and detecting a decrease in the assembly PCR, sexual PCR mutagenesis, in vivo mutagen amount of the Substrate or an increase in the amount of a esis, cassette mutagenesis, recursive ensemble mutagenesis, 35 reaction product. exponential ensemble mutagenesis, site-specific mutagen The invention provides methods for whole cell engineering esis, gene reassembly, and a combination thereof. In another of new or modified phenotypes by using real-time metabolic aspect, the method comprises mutagenizing the first nucleic flux analysis, the method comprising the following steps: (a) acid of step (a) or variants by a method comprising recombi making a modified cell by modifying the genetic composition nation, recursive sequence recombination, phosphothioate 40 of a cell, wherein the genetic composition is modified by modified DNA mutagenesis, uracil-containing template addition to the cell of a nucleic acid of the invention; (b) mutagenesis, gapped duplex mutagenesis, point mismatch culturing the modified cell to generate a plurality of modified repair mutagenesis, repair-deficient host strain mutagenesis, cells; (c) measuring at least one metabolic parameter of the chemical mutagenesis, radiogenic mutagenesis, deletion cell by monitoring the cell culture of step (b) in real time; and, mutagenesis, restriction-selection mutagenesis, restriction 45 (d) analyzing the data of step (c) to determine if the measured purification mutagenesis, artificial gene synthesis, ensemble parameter differs from a comparable measurement in an mutagenesis, chimeric nucleic acid multimer creation and a unmodified cell under similar conditions, thereby identifying combination thereof. an engineered phenotype in the cell using real-time metabolic The invention provides methods for making a small mol flux analysis. In one aspect, the genetic composition of the ecule comprising the steps of: (a) providing a plurality of 50 cell can be modified by a method comprising deletion of a biosynthetic enzymes capable of synthesizing or modifying a sequence or modification of a sequence in the cell, or, knock Small molecule, wherein one of the enzymes comprises an ing out the expression of agene. In one aspect, the method can enzyme encoded by a nucleic acid of the invention; (b) pro further comprise selecting a cell comprising a newly engi viding a substrate for at least one of the enzymes of step (a): neered phenotype. In another aspect, the method can com and, (c) reacting the Substrate of step (b) with the enzymes 55 prise culturing the selected cell, thereby generating a new cell under conditions that facilitate a plurality of biocatalytic reac strain comprising a newly engineered phenotype. tions to generate a small molecule by a series of biocatalytic The invention provides methods of increasing thermotol reactions. erance or thermostability of a polypeptide, enzyme, protein, The invention provides methods for modifying a small e.g. structural or binding protein, polypeptide, the method molecule comprising the steps: (a) providing a enzyme 60 comprising glycosylating a polypeptide, enzyme, protein, encoded by a nucleic acid of the invention; (b) providing a e.g. structural or binding protein, wherein the polypeptide, Small molecule; and, (c) reacting the enzyme of step (a) with enzyme, protein, e.g. structural or binding protein comprises the small molecule of step (b) under conditions that facilitate at least thirty contiguous amino acids of a polypeptide of the an enzymatic reaction catalyzed by the enzyme, thereby invention; or a polypeptide encoded by a nucleic acid modifying a small molecule by an enzymatic reaction. In one 65 sequence of the invention, thereby increasing thermotoler aspect, the method comprises providing a plurality of Small ance or thermostability of the polypeptide, enzyme, protein, molecule Substrates for the enzyme of step (a), thereby gen e.g. structural or binding protein. In one aspect, the polypep US 8,119,385 B2 17 18 tide, enzyme, protein, e.g. structural or binding protein spe invention provides methods for utilizing a polypeptide, cific activity can be thermostable or thermotolerant at a tem enzyme, protein, e.g. structural or binding protein, as a nutri perature in the range from greater than about 37° C. to about tional Supplement in an animal diet, the method comprising: 95o C. preparing a nutritional Supplement containing a polypeptide, The invention provides methods for overexpressing a enzyme, protein, e.g. structural or binding protein, compris recombinant polypeptide, enzyme, protein, e.g. structural or ing at least thirty contiguous amino acids of a polypeptide of binding protein, in a cell comprising expressing a vector the invention; and administering the nutritional Supplement to comprising a nucleic acid comprising a nucleic acid of the an animal. The animal can be a human, a ruminant or a invention or a nucleic acid sequence of the invention, wherein monogastric animal. The polypeptide, enzyme, protein, e.g. the sequence identities are determined by analysis with a 10 structural or binding protein can be prepared by expression of sequence comparison algorithm or by visual inspection, a polynucleotide encoding the polypeptide, enzyme, protein, wherein overexpression is effected by use of a high activity e.g. structural or binding protein in an selected from promoter, a dicistronic vector or by gene amplification of the the group consisting of a bacterium, a yeast, a plant, an insect, Vector. a fungus and an animal. The organism can be selected from The invention provides methods of making a transgenic 15 the group consisting of an S. pombe, S. cerevisiae, Pichia plant comprising the following steps: (a) introducing a heter pastoris, E. coli, Streptomyces sp., Bacillus sp. and Lactoba ologous nucleic acid sequence into the cell, wherein the het cillus sp. erologous nucleic sequence comprises a nucleic acid The invention provides edible enzyme delivery matrix sequence of the invention, thereby producing a transformed comprising thermostable recombinant polypeptide, enzyme, plant cell; and (b) producing a transgenic plant from the protein, e.g. structural orbinding protein of the invention. The transformed cell. In one aspect, the step (a) can further com invention provides methods for delivering a polypeptide, prise introducing the heterologous nucleic acid sequence by enzyme, protein, e.g. structural or binding protein, Supple electroporation or microinjection of plant cell protoplasts. In ment to an animal, the method comprising: preparing an another aspect, the step (a) can further comprise introducing edible enzyme delivery matrix in the form of pellets compris the heterologous nucleic acid sequence directly to plant tissue 25 ing a granulate edible carrier and thermostable recombinant by DNA particle bombardment. Alternatively, the step (a) can polypeptide, enzyme, protein, e.g. structural or binding pro further comprise introducing the heterologous nucleic acid tein, wherein the pellets readily disperse the polypeptide, sequence into the plant cell DNA using an Agrobacterium enzyme, protein, e.g. structural or binding protein contained tumefaciens host. In one aspect, the plant cell can be a potato, therein into aqueous media, and administering the edible corn, rice, wheat, tobacco, or barley cell. 30 enzyme delivery matrix to the animal. The recombinant The invention provides methods of expressing a heterolo polypeptide, enzyme, protein, e.g. structural or binding pro gous nucleic acid sequence in a plant cell comprising the tein can comprise a polypeptide of the invention. The following steps: (a) transforming the plant cell with a heter polypeptide, enzyme, protein, e.g. structural or binding pro ologous nucleic acid sequence operably linked to a promoter, tein can be glycosylated to provide thermostability at pellet wherein the heterologous nucleic sequence comprises a 35 izing conditions. The delivery matrix can be formed by pel nucleic acid of the invention; (b) growing the plant under letizing a mixture comprising a grain germ and a polypeptide, conditions wherein the heterologous nucleic acids sequence enzyme, protein, e.g. structural or binding protein. The pel is expressed in the plant cell. The invention provides methods letizing conditions can include application of steam. The of expressing a heterologous nucleic acid sequence in a plant pelletizing conditions can comprise application of a tempera cell comprising the following steps: (a) transforming the plant 40 ture in excess of about 80° C. for about 5 minutes and the cell with a heterologous nucleic acid sequence operably enzyme retains a specific activity of at least 350 to about 900 linked to a promoter, wherein the heterologous nucleic units per milligram of enzyme. sequence comprises a sequence of the invention; (b) growing In one aspect, invention provides a pharmaceutical com the plant under conditions wherein the heterologous nucleic position comprising a polypeptide, enzyme, protein, e.g. acids sequence is expressed in the plant cell. 45 structural or binding protein, of the invention, or a polypep The invention provides feeds or foods comprising a tide encoded by a nucleic acid of the invention. In one aspect, polypeptide of the invention, or a polypeptide encoded by a the pharmaceutical composition acts as a digestive aid. nucleic acid of the invention. In one aspect, the invention The details of one or more aspects of the invention are set provides a food, feed, a liquid, e.g., a beverage (such as a fruit forth in the accompanying drawings and the description juice or a beer), a bread or a dough or a bread product, or a 50 below. Other features, objects, and advantages of the inven beverage precursor (e.g., a wort), comprising a polypeptide of tion will be apparent from the description and drawings, and the invention. The invention provides food or nutritional from the claims. Supplements for an animal comprising a polypeptide of the All publications, patents, patent applications, GenBank invention, e.g., a polypeptide encoded by the nucleic acid of sequences and ATCC deposits, cited herein are hereby the invention. 55 expressly incorporated by reference for all purposes. In one aspect, the polypeptide in the food or nutritional Supplement can be glycosylated. The invention provides BRIEF DESCRIPTION OF DRAWINGS edible enzyme delivery matrices comprising a polypeptide of the invention, e.g., a polypeptide encoded by the nucleic acid The following drawings are illustrative of aspects of the of the invention. In one aspect, the delivery matrix comprises 60 invention and are not meant to limit the scope of the invention a pellet. In one aspect, the polypeptide can be glycosylated. In as encompassed by the claims. one aspect, the polypeptide, enzyme, protein, e.g. structural FIG. 1 is a block diagram of a computer system. or binding protein activity is thermotolerant. In another FIG. 2 is a flow diagram illustrating one aspect of a process aspect, the polypeptide, enzyme, protein, e.g. structural or for comparing a new nucleotide or protein sequence with a binding protein activity is thermostable. 65 database of sequences in order to determine the homology The invention provides a food, a feed or a nutritional levels between the new sequence and the sequences in the Supplement comprising a polypeptide of the invention. The database. US 8,119,385 B2 19 20 FIG.3 is a flow diagram illustrating one aspect of a process aldolase activity can comprise, with an oxidation step. Syn in a computer for determining whether two sequences are thesis of a 3R,5S-6-chloro-2,4,6-trideoxy-erythro-hexono homologous. lactone. FIG. 4 is a flow diagram illustrating one aspect of an Alpha-Galactosidases identifier process 300 for detecting the presence of a feature in In one aspect, the invention provides alpha-galactosidases, a Sequence. polynucleotides encoding them, and methods of making and Like reference symbols in the various drawings indicate using these polynucleotides and polypeptides. In one aspect, like elements. the invention is directed to polypeptides, e.g., enzymes, hav ing an alpha-galactosidase activity, including thermostable DETAILED DESCRIPTION 10 and thermotolerant alpha-galactosidase activity, and poly nucleotides encoding these enzymes, and making and using The invention provides isolated and recombinant polypep these polynucleotides and polypeptides. tides, including enzymes, structural proteins and binding pro An alpha galactosidase hydrolyses the non-reducing ter teins, polynucleotides encoding these polypeptides, and 15 minal alpha 1-3,4,6 linked from poly- and oligosac methods of making and using these polynucleotides and charides. These saccharides are commonly found in legumes polypeptides. The polypeptides of the invention, and the and are difficult to digest. As such, alpha-galactosidases can polynucleotides encoding the polypeptides of the invention, be used as a digestive aid to break down raffinose, stachyose, encompass many classes of enzymes, structural proteins and and Verbascose, found in Such foods as beans and other gassy binding proteins. In one aspect, the enzymes and proteins of foods. the invention comprise, e.g. aldolases, alpha-galactosidases, Amidases amidases, e.g. secondary amidases, amylases, catalases, caro In one aspect, the invention provides amidases, polynucle tenoid pathway enzymes, dehalogenases, endoglucanases, otides encoding them, and methods of making and using these epoxide hydrolases, esterases, hydrolases, glucosidases, gly polynucleotides and polypeptides. In one aspect, the inven cosidases, inteins, isomerases, laccases, lipases, monooxyge 25 tion is directed to polypeptides, e.g., enzymes, having an nases, nitroreductases, nitrilases, P450 enzymes, pectate amidase activity, including thermostable and thermotolerant lyases, phosphatases, phospholipases, phytases, polymerases amidase activity, and polynucleotides encoding these and Xylanases, which are more specifically described below. enzymes, and making and using these polynucleotides and The invention also provides isolated and recombinant polypeptides. In one aspect, the amidases of the invention are polypeptides, including enzymes, structural proteins and 30 used in the removal of arginine, or methionine binding proteins, polynucleotides encoding these polypep from the N-terminal end of peptides in peptide or peptidomi tides, having the activities described in Table 1, Table 2 or metic synthesis. In one aspect, the enzyme of the invention, Table 3, below. e.g. an amidase, is selective for the L, or “natural enantiomer Aldolases of the amino acid derivatives and is therefore useful for the 35 production of optically active compounds. These reactions In one aspect, the invention provides aldolases, polynucle can be performed in the presence of the chemically more otides encoding them, and methods of making and using these reactive functionality, a step which is very difficult to polynucleotides and polypeptides. In one aspect, the inven achieve with nonenzymatic methods. The enzyme is also able tion is directed to polypeptides, e.g., enzymes, having an to tolerate high temperatures (at least 70° C.), and high con aldolase activity, including thermostable and thermotolerant 40 centrations of organic (>40% DMSO), both of which aldolase activity, and polynucleotides encoding these cause a disruption of secondary structure in peptides, which enzymes, and making and using these polynucleotides and enables cleavage of otherwise resistant bonds. polypeptides. In one aspect, the aldolase activity comprises Secondary Amidases of the formation of a carbon-carbon bond. In one In one aspect, the invention provides secondary amidases, aspect, the aldolase activity comprises an aldol condensation. 45 polynucleotides encoding them, and methods of making and The aldol condensation can have an aldol donor Substrate using these polynucleotides and polypeptides. In one aspect, comprising an acetaldehyde and an aldol acceptor Substrate the invention is directed to polypeptides, e.g., enzymes, hav comprising an aldehyde. The aldol condensation can yield a ing a secondary amidase activity, including thermostable and product of a single chirality. In one aspect, the aldolase activ thermotolerant secondary amidase activity, and polynucle ity is enantioselective. The aldolase activity can comprise a 50 otides encoding these enzymes, and making and using these 2-deoxyribose-5-phosphate aldolase (DERA) activity. The polynucleotides and polypeptides. aldolase activity can comprise catalysis of the condensation Secondary amidases include a variety of useful enzymes of acetaldehyde as donor and a 2CR)-hydroxy-3-(hydroxy or including peptidases, , and hydantoinases. This mercapto)-propionaldehyde derivative to form a 2-deox class of enzymes can be used in a range of commercial appli ySugar. The aldolase activity can comprise catalysis of the 55 cations. For example, secondary amidases can be used to: 1) condensation of acetaldehyde as donor and a 2-substituted increase flavor in food, in particular cheese (known as acetaldehyde acceptor to form a 2,4,6-trideoxyhexose via a enzyme ripened cheese); 2) promote bacterial and fungal 4-substituted-3-hydroxybutanal intermediate. The aldolase killing; 3) modify and de-protect fine chemical intermediates activity can comprise catalysis of the generation of chiral 4) synthesize peptide bonds; 5) and carry out chiral resolu aldehydes using two acetaldehydes as Substrates. The aldo 60 tions. Particularly, there is a need in the art for an enzyme lase activity can comprises enantioselective assembling of capable of hydrolyzing Cephalosporin C. chiral B,8-dihydroxyheptanoic acid side chains. The aldolase Amylases activity can comprise enantioselective assembling of the core In one aspect, the invention provides amylases, polynucle of R-(R*,R)-2-(4-fluorophenyl)-b.d-dihydroxy-5-(1-me otides encoding them, and methods of making and using these thylethyl)-3-phenyl-4-(phenylamino)-carbonyl)-1H-pyr 65 polynucleotides and polypeptides. In one aspect, the inven role-1-heptanoic acid (Atorvastatin, or LIPITORTM), rosuv tion is directed to polypeptides, e.g., enzymes, having an astatin (CRESTORTM) and/or fluvastatin (LESCOLTM). The activity, including thermostable and thermotolerant US 8,119,385 B2 21 22 amylase activity, and polynucleotides encoding these biosynthetic pathway system comprise at least one novel enzymes, and making and using these polynucleotides and enzyme of the invention—where nucleic acids used in the polypeptides. system encode a novel enzyme of the invention. In one aspect, the polypeptides of the invention can be used Carotenoids are natural pigments which have as amylases, for example, alpha amylases or glucoamylases, and anti-carcinogenic activity. They are free radical scaven to catalyze the of into Sugars. In one aspect, gers, and as such, strong . Carotenoids have a the invention is directed to polypeptides having thermostable conjugated backbone structure and are very rigid molecules, amylase activity, such as alpha amylases or glucoamylase having a backbone consisting of 9 to 11 alternating single? activity, e.g., a 1,4-alpha-D-glucan glucohydrolase activity. double bonds and have very similar electro-optical properties In one aspect, the polypeptides of the invention can be used as 10 as polyacetylene. Astaxanthins are abundant naturally occur amylases, for example, alpha amylases or glucoamylases, to ring carotenoids. They contain an internal unit similar to catalyze the hydrolysis of starch into Sugars, such as . beta-caroteine but have two terminal carbonyl and hydroxyl The invention is also directed to nucleic acid constructs, functionalities. These compounds are useful for food and feed vectors, and host cells comprising the nucleic acid sequences Supplements, colorants, neutraceuticals, cosmetic and phar of the invention as well as recombinant methods for produc 15 maceutical needs. Isoprenoids are compounds biosynthe ing the polypeptides of the invention. The invention is also sized from or containing isoprene (unsaturated branched directed to the use of amylases of the invention in starch chain five-carbon hydrocarbon) units, including terpenes, conversion processes, including production of high carotenoids, fat soluble vitamins, ubiquinone, rubber, and corn Syrup (HFCS), ethanol, dextrose, and dextrose syrups. Some . Biosynthetic pathways for carotenoids, astax Commercially, glucoamylases are used to further hydro anthins and isoprenoids are known; most of these published lyze cornstarch, which has already been partially hydrolyzed pathways are derived from one organism or a combination of with an alpha-amylase. The glucose produced in this reaction genes from a few species. may then be converted to a mixture of glucose and fructose by Catalases a glucose isomerase enzyme. This mixture, or one enriched In one aspect, the invention provides catalases, polynucle with fructose, is the high fructose corn syrup commercialized 25 otides encoding them, and methods of making and using these throughout the world. In general, starch to fructose process polynucleotides and polypeptides. In one aspect, the inven ing consists of four steps: liquefaction of granular starch, tion is directed to polypeptides, e.g., enzymes, having a cata saccharification of the liquefied starch into dextrose, purifi lase activity, including thermostable and thermotolerant cata cation, and isomerization to fructose. The object of a starch lase activity, and polynucleotides encoding these enzymes, liquefaction process is to convert a concentrated Suspension 30 and making and using these polynucleotides and polypep of starch polymer granules into a solution of Soluble shorter tides. chain length dextrins of low viscosity. In processes where is a by-product, The amylases of the invention can be used in automatic catalases of the invention can be used to destroy or detect dish wash (ADW) products and laundry detergent. In ADW hydrogen peroxide, e.g., in production of glyoxylic acid and products, the amylase will function at pH 10-11 and at 45-60 35 in glucose sensors. Also, in processes where hydrogen per C. in the presence of calcium chelators and oxidative condi oxide is used as a bleaching orantibacterial agent, catalases of tions. For laundry, activity at pH 9-10 and 40° C. in the the invention can be used to destroy residual hydrogen per appropriate detergent matrix will be required. Amylases are oxide, e.g. in contact lens cleaning, in bleaching steps in pulp also useful in textile desizing, brewing processes, starch and paper production, and in the pasteurization of dairy prod modification in the paper and pulp industry and other pro 40 ucts. Further, Such catalases of the invention can be used as cesses described in the art. catalysts for oxidation reactions, e.g. epoxidation and Amylases can be used commercially in the initial stages hydroxylation. (liquefaction) of starch processing; in wet corn milling; in Dehalogenases alcohol production; as cleaning agents in detergent matrices; In one aspect, the invention provides dehalogenases, poly in the textile industry for starch desizing; in baking applica 45 nucleotides encoding them, and methods of making and using tions; in the beverage industry; in oilfields in drilling pro these polynucleotides and polypeptides. In one aspect, the cesses; in inking of recycled paper and in animal feed. Amy invention is directed to polypeptides, e.g., enzymes, having a lases are also useful in textile desizing, brewing processes, dehalogenase activity, including thermostable and thermotol starch modification in the paper and pulp industry and other erant dehalogenase activity, and polynucleotides encoding processes. 50 these enzymes, and making and using these polynucleotides Carotenoid Pathway Enzymes and polypeptides. The invention provides novel enzymes, and the polynucle Environmental pollutants consist of a large quantity and otides encoding them, involved in carotenoid (such as lyco variety of chemicals; many of these are toxic, environmental penes and luteins), astaxanthin and/or isoprenoid synthesis. hazards that were designated in 1979 as priority pollutants by The invention also provides novel genes in the carotenoid, 55 the U.S. Environmental Protection Agency. Microbial and astaxanthin and isoprenoid biosynthetic pathways compris enzymatic biodegradation is one method for the elimination ing at least one enzyme of the invention. For example, alter of these pollutants. Accordingly, methods have been designed native aspects, the invention provides one or more nucleic to treat commercial wastes and to bioremediate polluted envi acid coding sequences (CDSs, or ORFs) encoding all, or at ronments via microbial and related enzymatic processes. least one, enzyme(s) involved in a desired biosynthetic path 60 Unfortunately, many chemical pollutants are either resistant way for carotenoids, astaxanthins and/or isoprenoids. The to microbial degradation or are toxic to potential microbial nucleic acid coding sequence(s) can be expressed through an degraders when present in high concentrations and certain expression plasmid, vector, engineered virus or any episomal combinations. expression system, or, can be integrated into the genome of Dehalogenases, e.g. dehalogenases, of the the host cell. In one aspect, the enzyme(s) involved in the 65 invention can cleave carbon-halogen bonds in biosynthetic pathway system comprise a novel combination and halocarboxylic acids by hydrolysis, thus converting them of enzymes. In another aspect, the enzyme(s) involved in the to their corresponding alcohols. This reaction can be used for US 8,119,385 B2 23 24 detoxification involving haloalkanes, such as ethylchloride, with cereal diets, beta-glucan is a contributing factor to vis methylchloride, and 1,2-dichloroethane (e.g., detoxification cosity of gut contents and thereby adversely affects the of toxic composition, e.g., pesticides, poisons, chemical war digestibility of the feed and animal growth rate. For ruminant fare agents and the like comprising haloalkanes). animals, these beta-glucans represent Substantial components The present invention provides a number of dehalogenase offiber intake and more complete digestion of glucans would enzymes useful in bioremediation having improved enzy facilitate higher feed conversion efficiencies. It is desirable matic characteristics. The polynucleotides and polynucle for animal feed endoglucanases to be active in the animal otide products of the invention are useful in, for example, stomach. groundwater treatment involving transformed host cells con Endoglucanases of the invention can be used in the diges taining a polynucleotide or polypeptide of the invention (e.g., 10 tion of , a beta-1,4-linked glucan found in all plant the bacteria Xanthobacter autotrophicus) and the haloalkane material. Cellulose is the most abundant in 1,2-dichlorethane as well as removal of polychlorinated nature. Enzymes of the invention that digest cellulose have biphenyls (PCBs) from soil sediment. utility in the pulp and paper industry, in textile manufacture The haloalkane dehalogenase of the invention are useful in and in household and industrial cleaning agents. carbon-halide reduction efforts. The enzymes of the invention 15 Epoxide Hydrolases initiate the degradation of haloalkanes. Alternatively, host In one aspect, the invention provides epoxide hydrolases, cells containing a dehalogenase polynucleotide or polypep polynucleotides encoding them, and methods of making and tide of the invention can feed on the haloalkanes and produce using these polynucleotides and polypeptides. In one aspect, the detoxifying enzyme. the invention is directed to polypeptides, e.g., enzymes, hav Endoglucanases ing an epoxide hydrolase activity, including thermostable and In one aspect, the invention provides endoglucanases, thermotolerant epoxide hydrolase activity, and polynucle polynucleotides encoding them, and methods of making and otides encoding these enzymes, and making and using these using these polynucleotides and polypeptides. In one aspect, polynucleotides and polypeptides. The polypeptides of the the invention is directed to polypeptides, e.g., enzymes, hav invention can be used as epoxide hydrolases to catalyze the ing an endoglucanase activity, including thermostable and 25 hydrolysis of epoxides and arene oxides to their correspond thermotolerant endoglucanase activity, and polynucleotides ing diols. encoding these enzymes, and making and using these poly Epoxide hydrolases catalyze the hydrolysis of epoxides nucleotides and polypeptides. and arene oxides to their corresponding diols. Epoxide hydro In one aspect, the enzymes of the invention have a gluca lases from microbial Sources are highly versatile biocatalysts nase, e.g., an endoglucanase, activity, e.g., catalyzing 30 for the asymmetric hydrolysis of epoxides on a preparative hydrolysis of internal endo-B-1,4- and/or B-1,3-glucanase scale. Besides kinetic resolution, which furnishes the corre linkages. In one aspect, the endoglucanase activity (e.g., sponding vicinal diol and remaining non-hydrolyzed epoxide endo-1,4-beta-D-glucan 4-glucano hydrolase activity) com in nonracemic form, enantioconvergent processes are pos prises hydrolysis of 1,4- and/or B-1,3-beta-D-glycosidic link sible. These are highly attractive as they lead to the formation ages in cellulose, cellulose derivatives (e.g., carboxy methyl 35 of a single enantiomeric diol from a racemic oxirane. cellulose and hydroxy ethyl cellulose) lichenin, beta-1,4 Microsomal epoxide hydrolases are biotransformation bonds in mixed beta-1,3 glucans, such as cereal beta-D-glu enzymes that catalyze the conversion of a broad array of cans or Xyloglucans and other plant material containing cel xenobiotic epoxide substrates to more polar diol metabolites, lulosic parts. see, e.g., Omiecinski (2000) Toxicol. Lett. 112-113:365-370. Endoglucanases of the invention (e.g., endo-beta-1,4-glu 40 Microsomal epoxide hydrolases catalyze the addition of canases, EC 3.2.1.4, endo-beta-1,3(1)-glucanases, EC water to epoxides in a two-step reaction involving initial 3.2.1.6; endo-beta-1,3-glucanases, EC 3.2.1.39) can hydro attack of an active site carboxylate on the oxirane to give an lyze internal B-1,4- and/or B-1,3-glucosidic linkages in cel ester intermediate followed by hydrolysis of the ester. Soluble lulose and glucan to produce Smaller molecular weight glu epoxide hydrolase play a role in the biosynthesis of inflam cose and glucose oligomers. Glucans are 45 mation mediators. formed from 1,4-B- and/or 1.3--linked D-glucopy Epoxide hydrolases of the invention can be used in the ranose. Endoglucanases of the invention can be used in the detoxification of epoxides or in the biosynthesis of hormones. food industry, for baking and fruit and vegetable processing, Additionally, epoxide hydrolases of the invention can effi breakdown of agricultural waste, in the manufacture of ani ciently process several Substrates, leading to enantiomeri mal feed, in pulp and paper production, textile manufacture 50 cally enriched-epoxides (the unreacted enantiomer) and/or to and household and industrial cleaning agents. Endogluca the corresponding vicinal diols. nases are produced by fungi and bacteria. Esterases Beta-glucans are major non-starch polysaccharides of In one aspect, the invention provides esterases, polynucle cereals. The glucan content can vary significantly depending otides encoding them, and methods of making and using these on variety and growth conditions. The physicochemical prop 55 polynucleotides and polypeptides. In one aspect, the inven erties of this polysaccharide are such that it gives rise to tion is directed to polypeptides, e.g., enzymes, having an Viscous solutions or evengels under oxidative conditions. In esterase activity, including thermostable and thermotolerant addition glucans have high water-binding capacity. All of esterase activity, and polynucleotides encoding these these characteristics present problems for several industries enzymes, and making and using these polynucleotides and including brewing, baking, animal nutrition. In brewing 60 polypeptides. applications, the presence of glucan results in wort filterabil Many esterases are known and have been discovered in a ity and haze formation issues. In baking applications (espe broad variety of , including bacteria, yeast and cially for cookies and crackers), glucans can create sticky higher animals and plants. A principal example of esterases doughs that are difficult to machine and reduce biscuit size. In are the lipases, which are used in the hydrolysis of , addition, this carbohydrate is implicated in rapid rehydration 65 acidolysis (replacement of an esterified fatty acid with a free of the baked product resulting in loss of crispiness and fatty acid) reactions, transesterification (exchange of fatty reduced shelf-life. For monogastric animal feed applications acids between ) reactions, and in ester synthesis. US 8,119,385 B2 25 26 The major industrial applications for lipases include: the in a comprising a polypeptide having an esterase detergent industry, where they are employed to decompose activity, wherein the polypeptide comprises a polypeptide of fatty materials in laundry stains into easily removable hydro the invention, or, a polypeptide encoded by a nucleic acid of philic substances; the food and beverage industry where they the invention. In one aspect, the mammal comprises a human. are used in the manufacture of cheese, the ripening and fla The enzymes of the invention are used in the manufacture of Voring of cheese, as antistaling agents for bakery products, medicaments. and in the production of margarine and other spreads with The invention provides bakery products comprising a natural butter flavors; in waste systems; and in the pharma polypeptide of the invention. The invention provides antistal ceutical industry where they are used as digestive aids. ing agents for bakery products comprising a polypeptidehav Alternatively, esterases of the invention can be used in 10 detergent compositions. In one aspect, the esterase can be a ing an esterase activity, wherein the polypeptide comprises a nonsurface-active esterase. In another aspect, the esterase can polypeptide of the invention, or, a polypeptide encoded by a be a surface-active esterase. The esterase can beformulated in nucleic acid of the invention. a non-aqueous liquid composition, a cast solid, a granular The invention provides methods for hydrolyzing, breaking form, a particulate form, a compressed tablet, a gel form, a 15 up or disrupting a ester-comprising composition comprising paste or a slurry form. the following steps: (a) providing a polypeptide of the inven In another aspect, the invention provides fabrics or clothing tion having an esterase activity, or a polypeptide encoded by comprising an esterase of the invention. In another aspect, a nucleic acid of the invention; (b) providing a composition esterases of the invention are used to treat a lipid-containing comprising a protein; and (c) contacting the polypeptide of fabric. step (a) with the composition of step (b) under conditions In another aspect, the invention provides foods and drinks wherein the esterase hydrolyzes, breaks up or disrupts the comprising an esterase of the invention. The invention also ester-comprising composition. provides cheeses comprising an esterase of the invention. Alternatively, the invention provides methods for liquefy Additionally, the invention provides methods for the manu ing or removing ester-comprising compositions comprising facture of cheese comprising the following steps: (a) provid 25 the following steps: (a) providing a polypeptide of the inven ing a polypeptide having an esterase activity, wherein the tion having an esterase activity, or a polypeptide encoded by polypeptide comprises a polypeptide of the invention, or, a a nucleic acid of the invention; (b) providing a composition polypeptide encoded by a nucleic acid of the invention; (b) comprising a protein; and (c) contacting the polypeptide of providing a cheese precursor, and (c) contacting the polypep step (a) with the composition of step (b) under conditions tide of step (a) with the precursor of step (b) under condition 30 wherein esterase removes or liquefies the ester-comprising wherein the esterase can catalyze cheese manufacturing pro compositions. cesses. In one aspect, the method can comprise the process of Hydrolases ripening and flavoring of cheese. In one aspect, the invention provides hydrolases, poly In another aspect, the invention provides margarines and nucleotides encoding them, and methods of making and using spreads comprising an enzyme of the invention. The inven 35 these polynucleotides and polypeptides. In one aspect, the tion provides methods for production of margarine or other invention is directed to polypeptides, e.g., enzymes, having a spreads with natural butter flavors comprising the following hydrolase activity, e.g., an esterase, acylase, lipase, phospho steps: (a) providing a polypeptide having an esterase activity, lipase or protease activity, including thermostable and ther wherein the polypeptide comprises a polypeptide of the motolerant hydrolase activity, and polynucleotides encoding invention, or, a polypeptide encoded by a nucleic acid of the 40 these enzymes, and making and using these polynucleotides invention; (b) providing a margarine or a spread precursor; and polypeptides. The hydrolase activities of the polypep and (c) contacting the polypeptide of step (a) with the precur tides and peptides of the invention include esterase activity, sor of step (b) under condition wherein the esterase can cata lipase activity (hydrolysis of lipids), acidolysis reactions (to lyze processes involved in margarine or spread production. replace an esterified fatty acid with a free fatty acid), trans The invention provides methods for treating solid or liquid 45 esterification reactions (exchange offatty acids between trig waste products comprising the following steps: (a) providing lycerides), ester synthesis, ester interchange reactions, phos a polypeptide having an esterase activity, wherein the pholipase activity (e.g., A, B, C and Dactivity, polypeptide comprises a polypeptide of the invention, or, a patatin activity, lipid acylhydrolase (LAH) activity) and pro polypeptide encoded by a nucleic acid of the invention; (b) tease activity (hydrolysis of peptide bonds). The polypeptides providing a solid or a liquid waste; and (c) contacting the 50 of the invention can be used in a variety of pharmaceutical, polypeptide of step (a) and the waste of step (b) under con agricultural and industrial contexts, including the manufac ditions wherein the polypeptide can treat the waste. The ture of cosmetics and nutraceuticals. invention provides solid or liquid waste products comprising In one aspect, the polypeptides of the invention are used in a polypeptide of the invention. the biocatalytic synthesis of structured lipids (lipids that con The invention provides methods for aiding digestion in a 55 tain a defined set offatty acids distributed in a defined manner mammal comprising (a) providing a polypeptide having an on the glycerol backbone), including cocoa butter alternatives esterase activity, wherein the polypeptide comprises a (CBA), lipids containing poly-unsaturated fatty acids (PU polypeptide of the invention, or, a polypeptide encoded by a FAS), diacylglycerides, e.g., 1,3-diacyl glycerides (DAGs), nucleic acid of the invention; (b) providing a composition monoglycerides, e.g., 2-monoglycerides (MAGs) and tria comprising a Substrate for the polypeptide of step (a); (c) 60 cylglycerides (TAGs). In one aspect, the polypeptides of the feeding or administering to the mammal the polypeptide of invention are used to modify oils, such as fish, animal and step (a) with a feed or food comprising a substrate for the Vegetable oils, and lipids, such as poly-unsaturated fatty polypeptide of step (a), thereby helping digestion in the mam acids. The hydrolases of the invention having lipase activity mal. In one aspect, the mammal is a human. can modify oils by hydrolysis, alcoholysis, esterification, The invention provides pharmaceutical compositions com 65 transesterification and/or interesterification. The methods of prising a polypeptide and/or a nucleic acid of the invention, the invention can use lipases with defined regio-specificity or e.g., a pharmaceutical composition for use as a digestive aid defined chemoselectivity in biocatalytic synthetic reactions. US 8,119,385 B2 27 28 In another aspect, the polypeptides of the invention are used Intracellular PLA2 is found in almost every mammalian cell. to synthesize enantiomerically pure chiral products. (PLC) removes the phosphate moiety to Additionally, the polypeptides of the invention can be used produce 1.2 diacylglycerol and phospho base. Phospholipase in food processing, brewing, bath additives, alcohol produc D (PLD) produces 1,2-diacylglycerophosphate and base tion, peptide synthesis, enantioselectivity, hide preparation in group. PLC and PLD are important in cell function and sig the leather industry, waste management and animal degrada naling. Patatins are another type of phospholipase thought to tion, silver recovery in the photographic industry, medical work as a PLA. treatment, silk degumming, biofilm degradation, biomass In general, enzymes, including hydrolases such as conversion to ethanol, biodefense, antimicrobial agents and esterases, lipases and proteases, are active over a narrow disinfectants, personal care and cosmetics, biotech reagents, 10 range of environmental conditions (temperature, pH, etc.), in increasing starch yield from corn wet milling and pharma and many are highly specific for particular Substrates. The ceuticals such as digestive aids and anti-inflammatory (anti narrow range of activity for a given enzyme limits its appli phlogistic) agents. cability and creates a need for a selection of enzymes that (a) The major industrial applications for hydrolases, e.g., have similar activities but are active under different condi esterases, lipases, phospholipases and proteases, include the 15 tions or (b) have different substrates. For instance, an enzyme detergent industry, where they are employed to decompose capable of catalyzing a reaction at 50° C. may be so inefficient fatty materials in laundry stains into easily removable hydro at 35° C., that its use at the lower temperature will not be philic substances; the food and beverage industry where they feasible. For this reason, laundry detergents generally contain are used in the manufacture of cheese, the ripening and fla a selection of proteolytic enzymes (e.g., polypeptides of the Voring of cheese, as antistaling agents for bakery products, invention), allowing the detergent to be used over a broad and in the production of margarine and other spreads with range of wash temperature and pH. In view of the specificity natural butter flavors; in waste systems; and in the pharma of enzymes and the growing use of hydrolases in industry, ceutical industry where they are used as digestive aids. research, and medicine, there is an ongoing need in the art for Oils and fats an important renewable raw material for the new enzymes and new enzyme inhibitors. chemical industry. They are available in large quantities from 25 Glucosidases the processing of oilseeds from plants like rice bran oil, In one aspect, the invention provides glucosidases, poly rapeseed (canola), Sunflower, olive, palm or soy. Other nucleotides encoding them, and methods of making and using Sources of valuable oils and fats include fish, restaurant waste, these polynucleotides and polypeptides. In one aspect, the and rendered animal fats. These fats and oils are a mixture of invention is directed to polypeptides, e.g., enzymes, having a triglycerides or lipids, i.e. fatty acids (FAs) esterified on a 30 glucosidase activity, including thermostable and thermotol glycerol scaffold. Each oil or fat contains a wide variety of erant glucosidase activity, and polynucleotides encoding different lipid structures, defined by the FA content and their these enzymes, and making and using these polynucleotides regiochemical distribution on the glycerol backbone. These and polypeptides. properties of the individual lipids determine the physical Alpha-glucosidases of the invention can catalyze the properties of the pure . Hence, the triglyceride 35 hydrolysis of into Sugars. Alpha-glucosidases can content of a fator oil to a large extent determines the physical, hydrolyze terminal non-reducing 1.4 or 1.6 linked C-D-glu chemical and biological properties of the oil. The value of cose residues in starch, with release of C-D-glucose. lipids increases greatly as a function of their purity. High Alpha-glucosidases of the invention can be used commer purity can be achieved by fractional chromatography or dis cially in the stages liquefaction and Saccharification of starch tillation, separating the desired triglyceride from the mixed 40 processing; in wet corn milling; in alcohol production; as background of the fat or oil source. However, this is costly and cleaning agents in detergent matrices; in the textile industry yields are often limited by the low levels at which the triglyc for starch desizing; in baking applications; in the beverage eride occurs naturally. In addition, the purity of the product is industry; in oilfields in drilling processes; in inking of often compromised by the presence of many structurally and recycled paper and in animal feed. Alpha-glucosidases of the physically or chemically similar triglycerides in the oil. 45 invention are also useful in textile desizing, brewing pro An alternative to purifying triglycerides or other lipids cesses, starch modification in the paper and pulp industry and from a natural source is to synthesize the lipids. The products other processes. of Such processes are called structured lipids because they Glycosidases contain a defined set of fatty acids distributed in a defined In one aspect, the invention provides glycosidases, poly manner on the glycerol backbone. The value of lipids also 50 nucleotides encoding them, and methods of making and using increases greatly by controlling the fatty acid content and these polynucleotides and polypeptides. In one aspect, the distribution within the lipid. Lipases can be used to affect invention is directed to polypeptides, e.g., enzymes, having a Such control. glycosidase activity, including thermostable and thermotol Phospholipases are enzymes that hydrolyze the ester bonds erant glycosidase activity, and polynucleotides encoding of phospholipids. Corresponding to their importance in the 55 these enzymes, and making and using these polynucleotides metabolism of phospholipids, these enzymes are widespread and polypeptides. Glycosidase enzymes of the invention can among prokaryotes and eukaryotes. The phospholipases have more specific activity as glucosidases, C.-galactosidases, affect the metabolism, construction and reorganization of B-galactosidases, B-mannosidases, B-mannanases, endoglu biological membranes and are involved in signal cascades. canases, and . Several types of phospholipases are known which differ in 60 O-galactosidases of the invention can catalyze the hydroly their specificity according to the position of the bond attacked sis of galactose groups on a polysaccharide backbone or in the phospholipid molecule. (PLA1) hydrolyze the cleavage of di- or oligosaccharides comprising removes the 1-position fatty acid to produce free fatty acid galactose. B-mannanases of the invention can catalyze the and 1-lyso-2-acylphospholipid. (PLA2) hydrolysis of mannose groups internally on a polysaccharide removes the 2-position fatty acid to produce free fatty acid 65 backbone or hydrolyze the cleavage of di- or oligosaccharides and 1-acyl-2-lysophospholipid. PLA1 and PLA2 enzymes comprising mannose groups. B-mannosidases of the inven can be intra- or extra-cellular, membrane-bound or soluble. tion can hydrolyze non-reducing, terminal mannose residues US 8,119,385 B2 29 30 on a mannose-containing polysaccharide and the cleavage of In one aspect, the detectable moiety domain comprises a di- or oligosaccaharides comprising mannose groups. detectable peptide or polypeptide. The detectable peptide or a Guar gum is a branched galactomannan polysaccharide polypeptide can be a fluorescent peptide or polypeptide. The composed of B-1.4 linked mannose backbone with a-1,6 detectable peptide or a polypeptide can be a bioluminescent linked galactose sidechains. The enzymes required for the or a chemiluminescent peptide or polypeptide. In one aspect, degradation of guar are B-mannanase, B-mannosidase and the bioluminescent or chemiluminescent polypeptide com O-galactosidase. B-mannanase hydrolyses the mannose back prises a green fluorescent protein (GFP), an aequorin, an bone internally and B-mannosidase hydrolyses non-reducing, obelin, a mnemiopsin or a berovin. In one aspect, the detect terminal mannose residues. C-galactosidase hydrolyses able moiety domain comprises an enzyme that generates a O-linked galactose groups. 10 detectable signal. The enzyme that generates a detectable Galactomannan polysaccharides and the enzymes of the signal can comprise an alpha-galactosidase, an invention that degrade them have a variety of applications. (e.g., chloramphenicol acetyltransferase) or a kinase. The Guar is commonly used as a thickening agent in food and is detectable moiety domain can comprise a radioactive isotope. utilized in hydraulic fracturing in oil and gas recovery. Con In one aspect, the chimeric protein is a recombinant fusion sequently, galactomannanases are industrially relevant for the 15 protein. In one aspect, the intein domain splicing activity degradation and modification of guar. Furthermore, a need results in cleavage of the enzyme domain from the intein exists for thermostable galactomannases that are active in domain and detectable domain. The intein domain splicing extreme conditions associated with oil drilling and well activity can result in cleavage of the enzyme domain from the stimulation. intein domain and detectable domain and cleavage of the There are other applications for these enzymes in various detectable domain from the intein domain. In one aspect, the industries, such as in the beet sugar industry. 20-30% of the intein domain splicing activity results in cleavage of the domestic U.S. Sucrose consumption is Sucrose from Sugar detectable domain from the intein domain. In one aspect, the beets. Raw beet Sugar can contain a small amount of raffinose intein domain has only splicing activity. The intein domain when the Sugar beets are stored before processing and rotting can have only cleaving activity. begins to set in. Raffinose inhibits the crystallization of 25 In one aspect, at least one domain is separated from another Sucrose and also constitutes a hidden quantity of Sucrose. domain by a linker. The linker can be a flexible linker. The Thus, there is merit to eliminating raffinose from raw beet intein domain can be separated from the detectable moiety Sugar. C-Galactosidase has also been used as a digestive aid to domain and the enzyme domain by a linker. break down raffinose, stachyose, and Verbascose in Such Isomerases foods as beans and other gassy foods. 30 In one aspect, the invention provides isomerases, e.g. B-Galactosidases of the invention can be used for the pro Xylose isomerases, polynucleotides encoding them, and duction of lactose-free dietary milk products. Additionally, methods of making and using these polynucleotides and B-galactosidases of the invention can be used for the enzy polypeptides. In one aspect, the invention is directed to matic synthesis of oligosaccharides via transglycosylation polypeptides, e.g., enzymes, having an isomerase activity, reactions. 35 e.g. activity, including thermostable and is well known as a debranching enzyme of thermotolerant isomerase activity, e.g. Xylose isomerase pullulan and Starch. The enzyme of the invention can hydro activity, and polynucleotides encoding these enzymes, and lyze C-1,6-glucosidic linkages on these polymers. Starch making and using these polynucleotides and polypeptides. degradation for the production or Sweeteners (glucose or In one aspect, the invention provides Xylose isomerase maltose) is a very important industrial application of this 40 enzymes, polynucleotides encoding the enzymes, methods of enzyme. The degradation of starch is developed in two stages. making and using these polynucleotides and polypeptides. The first stage involves the liquefaction of the substrate with The polypeptides of the invention can be used in a variety of C.-amylase, and the second stage, or saccharification stage, is agricultural and industrial contexts. For example, the performed by B-amylase with pullalanase added as a polypeptides of the invention can be used for converting debranching enzyme, to obtain better yields. 45 glucose to fructose or for manufacturing high content fruc Endoglucanases of the invention can be used in a variety of tose syrups in large quantities. Other examples include use of industrial applications. For instance, the endoglucanases of the polypeptides of the invention in confectionary, brewing, the invention can hydrolyze the internal B-1,4-glycosidic alcohol and Soft drinks production, and in diabetic foods and bonds in cellulose, which may be used for the conversion of SWeetenerS. plant biomass into fuels and chemicals. Endoglucanases of 50 Laccases the invention also have applications in detergent formula In one aspect, the invention provides laccases, polynucle tions, the textile industry, in animal feed, in waste treatment, otides encoding them, and methods of making and using these oil drilling and well stimulation, and in the fruit juice and polynucleotides and polypeptides. In one aspect, the inven brewing industry for the clarification and extraction of juices. tion is directed to polypeptides, e.g., enzymes, having a lac Inteins 55 case activity, including thermostable and thermotolerant lac In one aspect, the invention provides inteins, polynucle case activity, and polynucleotides encoding these enzymes, otides encoding them, and methods of making and using these and making and using these polynucleotides and polypep polynucleotides and polypeptides. In another aspect, the tides. invention provides a chimeric protein comprising at least In one aspect, the invention provides methods of depoly three domains, wherein the first domain comprises at least 60 merizing lignin, e.g., in a pulp or paper manufacturing pro one enzyme domain or a binding , the second cess, using a polypeptide of the invention. In another aspect, domain comprises at least one intein domain and a third the invention provides methods for oxidizing products that domain comprising a detectable moiety domain, at least one can be mediators of laccase-catalyzed oxidation reactions, intein domain is positioned between at least one enzyme or C.9. 2,2-azinobis-(3-ethylbenzthiazoline-6-sulfonate) binding protein and at least one detectable moiety domain, 65 (ABTS), 1-hydroxybenzotriazole (HBT), 2.2.6,6-tetrameth and the intein domain has at least one cleavage or splicing ylpiperidin-1-yloxy (TEMPO), dimethoxyphenol, dihy activity. droxyfumaric acid (DHF) and the like. US 8,119,385 B2 31 32 Laccases are a subclass of the multicopper oxidase Super be varied according to: Stearate to palmitate ratio; selectivity family of enzymes, which includes ascorbate oxidases and of enzyme for palmitate versus Stearate; or enzyme enanti the mammalian protein, ceruloplasmin. Laccases are one of oselectivity (could alter levels of POS/SOP). One-pot synthe the oldest known enzymes and were first implicated in the sis of cocoa butter equivalents or other cocoa butter alterna oxidation of urushiol and laccol. In one aspect, reactions tives is possible using this aspect of the invention. catalyzed by laccases of the invention comprises the oxida In one aspect, lipases that exhibit regioselectivity and/or tion of phenolic Substrates. The major target application has chemoselectivity are used in the structure synthesis of lipids been in the delignification of wood fibers during the prepara or in the processing of lipids. Thus, the methods of the inven tion of pulp. tion use lipases with defined regio-specificity or defined Lipases 10 chemoselectivity (e.g., a fatty acid specificity) in a biocata In one aspect, the invention provides lipases, polynucle lytic synthetic reaction. For example, the methods of the otides encoding them, and methods of making and using these invention can use lipases with SN1, SN2 and/or SN3 regio polynucleotides and polypeptides. In one aspect, the inven specificity, or combinations thereof. In one aspect, the meth tion is directed to polypeptides, e.g., enzymes, having a lipase ods of the invention use lipases that exhibit regioselectivity activity, including thermostable and thermotolerant lipase 15 for the 2-position of a triacylglyceride (TAG). This SN2 regi activity, and polynucleotides encoding these enzymes, and oselectivity can be used in the synthesis of a variety of struc making and using these polynucleotides and polypeptides. tured lipids, e.g., triacylglycerides (TAGS), including 1,3- In one aspect, the lipases of the invention can be used in a DAGs and components of cocoa butter. variety of pharmaceutical, agricultural and industrial con The methods and compositions (lipases) of the invention texts, including the manufacture of cosmetics and nutraceu can be used in the biocatalytic synthesis of structured lipids, ticals. In one aspect, the lipases of the invention are used in the and the production of nutraceuticals (e.g., polyunsaturated biocatalytic synthesis of structured lipids (lipids that contain fatty acids and oils), various foods and food additives (e.g., a defined set of fatty acids distributed in a defined manner on emulsifiers, fat replacers, margarines and spreads), cosmetics the glycerol backbone), including cocoa butter alternatives (e.g., emulsifiers, creams), pharmaceuticals and drug delivery (CBA), lipids containing poly-unsaturated fatty acids (PU 25 agents (e.g., liposomes, tablets, formulations), and animal FAS), diacylglycerides, e.g., 1,3-diacyl glycerides (DAGs), feed additives (e.g., polyunsaturated fatty acids, such as monoglycerides, e.g., 2-monoglycerides (MAGs) and tria linoleic acids) comprising lipids made by the structured syn cylglycerides (TAGs). In one aspect, the polypeptides of the thesis methods of the invention or processed by the methods invention are used to modify oils, such as fish, animal and of the invention Vegetable oils, and lipids, Such as poly-unsaturated fatty 30 In one aspect, lipases of the invention can act on fluoro acids. The lipases of the invention can modify oils by hydroly genic fatty acid (FA) , e.g., umbelliferyl FA esters. In sis, alcoholysis, esterification, transesterification and/or one aspect, profiles of FA specificities of lipases made or interesterification. The methods of the invention use lipases modified by the methods of the invention can be obtained by with defined regio-specificity or defined chemoselectivity in measuring their relative activities on a series of umbelliferyl biocatalytic synthetic reactions. In another aspect, the 35 FA esters, such as palmitate, Stearate, oleate, laurate, PUFA, polypeptides of the invention are used to synthesize enantio butyrate. merically pure chiral products. The methods and compositions (lipases) of the invention The invention provides lipase enzymes, polynucleotides can be used to synthesize enantiomerically pure chiral prod encoding the enzymes, methods of making and using these ucts. In one aspect, the methods and compositions (lipases) of polynucleotides and polypeptides. The polypeptides of the 40 the invention can be used to prepare a D-amino acid and invention can be used in a variety of pharmaceutical, agricul corresponding esters from a racemic mix. For example, D-as tural and industrial contexts, including the manufacture of partic acid can be prepared from racemic aspartic acid. In one cosmetics and nutraceuticals. In one aspect, the polypeptides aspect, optically active D-homophenylalanine and/or its of the invention are used in the biocatalytic synthesis of esters are prepared. The enantioselectively synthesized D-ho structured lipids (lipids that contain a defined set offatty acids 45 mophenylalanine can be starting material for many drugs, distributed in a defined manner on the glycerol backbone), Such as Enalapril, Lisinopril, and Quinapril, used in the treat including cocoa butter alternatives, poly-unsaturated fatty ment of hypertension and congestive heart failure. The D-as acids (PUFAs), 1,3-diacyl glycerides (DAGs), 2-monoglyc partic acid and its derivatives made by the methods and com erides (MAGs) and triacylglycerides (TAGs), such as 1,3- positions of the invention can be used in pharmaceuticals, dipalmitoyl-2-oleoylglycerol (POP), 1,3-distearoyl-2-ole 50 e.g., for the inhibition of arginioSuccinate synthetase to pre oylglycerol (SOS), 1-palmitoyl-2-oleoyl-3-stearoylglycerol vent or treat sepsis or -induced systemic hypotension (POS) or 1-oleoyl-2,3-dimyristoylglycerol (OMM), long or as immunosuppressive agents. The D-aspartic acid and its chain polyunsaturated fatty acids such as arachidonic acid, derivatives made by the methods and compositions of the docosahexaenoic acid (DHA) and eicosapentaenoic acid invention can be used as taste modifying compositions for (EPA). 55 foods, e.g., as sweeteners (e.g., ALITAMETM). For example, In one aspect, the invention provides synthesis (using the methods and compositions (lipases) of the invention can lipases of the invention) of a triglyceride mixture composed be used to synthesizean optical isomer S(+) of 2-(6-methoxy of POS (Palmitic-Oleic-Stearic), POP (Palmitic-Oleic-Palm 2-naphthyl) propionic acid from a racemic (R.S) ester of itic) and SOS (Stearic-Oleic-Stearic) from glycerol. This syn 2-(6-methoxy-2-naphthyl) propionic acid. thesis uses free fatty acids versus fatty acid esters. In one 60 In one aspect, the methods and compositions (lipases) of aspect, this reaction can be performed in one pot with sequen the invention can be used to for stereoselectively hydrolyzing tial addition of fatty acids using crude glycerol and free fatty racemic mixtures of esters of 2-substituted acids, e.g., 2-ary acids and fatty acid esters. In one aspect, Stearate and palmi loxy Substituted acids, such as R-2-(4-hydroxyphenoxy)pro tate are mixed together to generate mixtures of DAGs. In one pionic acid, 2-arylpropionic acid, ketoprofen to synthesize aspect, the diacylglycerides are Subsequently acylated with 65 enantiomerically pure chiral products. oleate to give components of cocoa butter equivalents. In The methods and compositions (lipases) of the invention alternative aspects, the proportions of POS, POP and SOS can can be used to hydrolyze oils, such as fish, animal and Veg US 8,119,385 B2 33 34 etable oils, and lipids, such as poly-unsaturated fatty acids. In In one aspect, the monooxygenases of the invention have one aspect, the polypeptides of the invention are used process commercial utility as biocatalysts for use in the synthesis of fatty acids (such as poly-unsaturated fatty acids), e.g., fish oil aromatic and aliphatic esters and their derivatives, such as fatty acids, for use in or as a feed additive. Addition of poly acids and alcohols. In one aspect, the monooxygenases of the unsaturated fatty acids PUFAs to feed for dairy cattle has been 5 invention are used in the catalysis of Sulfoxidation reactions. demonstrated to result in improved fertility and milk yields. In one aspect, the invention provides Baeyer-Villiger Fish oil contains a high level of PUFAs and therefore is a monooxygenases, polynucleotides encoding the Baeyer-Vil potentially inexpensive source for PUFAS as a starting mate liger monooxygenases, and methods of using these Baeyer rial for the methods of the invention. The biocatalytic meth Villiger monooxygenases and polynucleotides. In one aspect, ods of the invention can process fish oil under mild condi 10 the invention provides methods of producing chiral synthetic tions, thus avoiding harsh conditions utilized in some intermediates using Baeyer-Villiger monooxygenases. processes. Harsh conditions may promote unwanted isomer In one aspect, the monooxygenase activity comprises ization, polymerization and oxidation of the PUFAs. In one catalysis of Sulfoxidation reactions. The monooxygenase aspect, the methods of the invention comprise lipase-cata activity can comprise an asymmetric Sulfoxidation reaction. lyzed total hydrolysis of fish-oil or selective hydrolysis of 15 The monooxygenase activity can be enantiospecific. In one PUFAs from fish oil to provide a mild alternative that would aspect, it can generate a Substantially chiral product. leave the high-value PUFAs intact. In one aspect, the methods In one aspect, the monooxygenase activity comprises gen further comprise hydrolysis of lipids by chemical or physical eration of an ester or a lactone having at least one of the splitting of the fat. following structures: In one aspect, the lipases and methods of the invention are used for the total hydrolysis of fish oil. Lipases can be screened for their ability to catalyze the total hydrolysis of R2, fish oil under different conditions using. In alternative R3 O R4 R aspects, a single or multiple lipases are used to catalyze the 25 total splitting of the fish oil. Several lipases of the invention S may need to be used, owing to the presence of the PUFAs. In R. R. 5 O one aspect, a PUFA-specific lipase of the invention is com bined with a general lipase to achieve the desired effect. wherein: R. R. R. and Rare each independently selected The methods and compositions (lipases) of the invention 30 from —H, substituted or unsubstituted alkyl, alkenyl, alky can be used to catalyze the partial or total hydrolysis of other nyl, aryl, heteroaryl, cycloalkyl, and heterocyclic; wherein oils, e.g. olive oils, that do not contain PUFAs. the substituted groups are substituted with one or more of The methods and compositions (lipases) of the invention lower alkyl, hydroxy, alkoxy, mercapto, cycloalkyl, hetero can be used to catalyze the hydrolysis of PUFA glycerol cyclic, aryl, heteroaryl, aryloxy, and halogen, or two or more esters. These methods can be used to make feed additives. In 35 of R. R. R. and R may together form cyclic moieties, and, one aspect, lipases of the invention catalyze the release of R" is selected from substituted or unsubstituted alkylene, alk PUFAs from simple esters and fish oil. Standard assays and enylene, alkynylene, arylene, heteroarylene, cycloalkylene, analytical methods can be utilized. and heterocyclic; wherein the substitutions are substituted The methods and compositions (lipases) of the invention with one or more of lower alkyl, hydroxy, alkoxy, mercapto, can be used to selectively hydrolyze saturated esters over 40 cycloalkyl, heterocyclic, aryl, heteroaryl, aryloxy, and halo unsaturated esters into acids or alcohols. The methods and gen. compositions (lipases) of the invention can be used to treat In one aspect, the monooxygenase activity comprises oxi latexes for a variety of purposes, e.g., to treat latexes used in dation of a cycloalkanone to produce a chiral lactone. The hair fixative compositions to remove unpleasant odors. The cycloalkanone can comprise a cyclobutanone, a cyclopen methods and compositions (lipases) of the invention can be 45 tanone, a cyclohexanone, a 2-methylcyclopentanone, a 2-me used in the treatment of a lipase deficiency in an animal, e.g., thylcyclohexanone, a cyclohex-2-ene-1-one, a 2-(cyclohex a mammal. Such as a human. The methods and compositions 1-enyl)cyclohexanone, a 12-cyclohexanedione, a 1.3- (lipases) of the invention can be used to prepare lubricants, cyclohexanedione or a 14-cyclohexanedione. Such as hydraulic oils. The methods and compositions (li In one aspect, the monooxygenase activity comprises a pases) of the invention can be used in making and using 50 chlorophenol 4-monooxygenase activity or a Xylene detergents. The methods and compositions (lipases) of the monooxygenase activity. invention can be used in processes for the chemical finishing The invention provides a pharmaceutical composition of fabrics, fibers or yarns. In one aspect, the methods and comprising a polypeptide of the invention. compositions (lipases) of the invention can be used for The invention provides a method for converting a ketone to obtaining flame retardancy in a fabric using, e.g., a halogen 55 its corresponding ester comprising contacting the ketone with substituted carboxylic acid or an ester thereof, i.e. a fluori a polypeptide of the invention under conditions wherein the nated, chlorinated or bromated carboxylic acid or an ester polypeptide catalyzes the conversion of the ketone to its cor thereof. responding ester. In one aspect, the polypeptide has an Monooxygenases monooxygenase activity that is enantiospecific to generate a In one aspect, the invention provides monooxygenases, 60 Substantially chiral product. In one aspect, the ester is an polynucleotides encoding them, and methods of making and aromatic or an aliphatic ester. using these polynucleotides and polypeptides. In one aspect, The invention provides a method for converting a the invention is directed to polypeptides, e.g., enzymes, hav cycloaliphatic ketone to its corresponding lactone comprising ing a monooxygenase activity, including thermostable and contacting the cycloaliphatic ketone with a polypeptide of the thermotolerant monooxygenase activity, and polynucleotides 65 invention under conditions wherein the polypeptide catalyzes encoding these enzymes, and making and using these poly the conversion of the cycloaliphatic ketone to its correspond nucleotides and polypeptides. ing lactone. In one aspect, the polypeptide has an monooxy US 8,119,385 B2 35 36 genase activity that is enantiospecific to generate a Substan a chiral C-hydroxy acid molecule, a chiral amino acid mol tially chiral product. In one aspect, the ester or lactone has at ecule, a chiral B-hydroxy acid molecule, or a chiral gamma least one of the following structures: hydroxy acid molecule. In one embodiment, the chiral mol ecule is an (R)-enantiomer. In another embodiment, the chiral molecule is an (S)-enantiomer. In one embodiment of the invention, one particular enzyme can have R-specificity for R2, one particular Substrate and the same enzyme can have R3 O R4 R S-specificity for a different particular substrate. In one aspect, nitrilases of the invention can be used for S R. R. 5 O 10 making a composition oran intermediate thereof, wherein the of the invention hydrolyzes a cyanohydrin or a ami nonitrile moiety. In one embodiment, the composition or wherein: R. R. R. and Rare each independently selected intermediate thereof comprises (S)-2-amino-4-phenyl from —H, substituted or unsubstituted alkyl, alkenyl, alky butanoic acid. In a further embodiment, the composition or nyl, aryl, heteroaryl, cycloalkyl, and heterocyclic; wherein 15 intermediate thereof comprises an L-amino acid. In a further the substituted groups are substituted with one or more of embodiment, the composition comprises a food additive or a lower alkyl, hydroxy, alkoxy, mercapto, cycloalkyl, hetero pharmaceutical drug. cyclic, aryl, heteroaryl, aryloxy, and halogen, or two or more In another aspect, nitrilases of the invention can be used for of R. R. R. and R may together form cyclic moieties, and, making an (R)-ethyl 4-cyano-3-hydroxybutyric acid, wherein R" is selected from substituted or unsubstituted alkylene, alk the nitrilase of the invention acts upon a hydroxyglutaryl enylene, alkynylene, arylene, heteroarylene, cycloalkylene, nitrile and selectively produces an (R)-enantiomer, so as to and heterocyclic; wherein the substitutions are substituted make (R)-ethyl 4-cyano-3-hydroxybutyric acid. In one with one or more of lower alkyl, hydroxy, alkoxy, mercapto, embodiment, the ee is at least 95% or at least 99%. In another cycloalkyl, heterocyclic, aryl, heteroaryl, aryloxy, and halo embodiment, the hydroxyglutaryl nitrile comprises 1,3-di gen. 25 cyano-2-hydroxy-propane or 3-hydroxyglutaronitrile. Nitroreductases In another aspect, nitrilases of the invention can be used for In one aspect, the invention provides nitroreductases, poly making an (S)-ethyl 4-cyano-3-hydroxybutyric acid, wherein nucleotides encoding them, and methods of making and using the nitrilase of the invention acts upon a hydroxyglutaryl these polynucleotides and polypeptides. In one aspect, the nitrile and selectively produces an (S)-enantiomer, so as to invention is directed to polypeptides, e.g., enzymes, having a 30 make (S)-ethyl 4-cyano-3-hydroxybutyric acid. nitroreductase activity, including thermostable and thermo In another aspect, the nitrilases of the invention can be used tolerant nitroreductase activity, and polynucleotides encod for making a (R)-mandelic acid, wherein the nitrilase of the ing these enzymes, and making and using these polynucle invention acts upon a to produce a (R)-man otides and polypeptides. delic acid. In one embodiment, the (R)-mandelic acid com Nitroreductases can catalyze the six-electron reduction of 35 prises (R)-2-chloromandelic acid. In another embodiment, nitro compounds to the corresponding amines. Amines have a the (R)-mandelic acid comprises an aromatic ring Substitu variety of applications as synthons and advanced pharmaceu tion in the ortho-, meta-, or para-positions; a 1-naphthyl tical intermediates. There are markets for both aromatic derivative of (R)-mandelic acid, a pyridyl derivative of (R)- amines and chiral aliphatic amines. mandelic acid or a thienyl derivative of (R)-mandelic acid or Nitroreductases of the invention fall into two main classes. 40 a combination thereof. These are the -sensitive and oxygen-insensitive In another aspect, the nitrilases of the invention can be used nitroreductases. The oxygen-sensitive enzyme can catalyze for making a (S)-mandelic acid, wherein the nitrilase of the nitroreduction only under anaerobic conditions. A nitro anion invention acts upon a mandelonitrile to produce a (S)-man radical is formed by a one-electron transfer and is immedi delic acid. In one embodiment, the (S)-mandelic acid com ately reoxidized in the presence of oxygen thus generating a 45 prises (S)-methyl and the mandelonitrile futile cycle whereby reducing equivalents are consumed comprises (S)-methoxy-benzyl cyanide. In one embodiment, without nitroreduction. On the other hand the oxygen-insen the (S)-mandelic acid comprises an aromatic ring Substitution sitive nitroreductases catalyze nitroreduction in a series of in the ortho-, meta-, or para-positions; a 1-naphthyl derivative two electron transfers, first via the nitroso and then the of (S)-mandelic acid, a pyridyl derivative of (S)-mandelic hydroxylamine intermediates before forming the amine. 50 acid or a thienyl derivative of (S)-mandelic acid or a combi Nitrilases nation thereof. In one aspect, the invention provides nitrilases, polynucle In yet another aspect, the nitrilases of the invention can be otides encoding them, and methods of making and using these used for making a (S)-phenyl lactic acid derivative or a (R)- polynucleotides and polypeptides. In one aspect, the inven phenylacetic acid derivative, wherein the nitrilase of the tion is directed to polypeptides, e.g., enzymes, having a nit 55 invention acts upon a phenylactonitrile and selectively pro rilase activity, including thermostable and thermotolerant nit duces an (S)-enantiomer oran (R)-enantiomer, thereby pro rilase activity, and polynucleotides encoding these enzymes, ducing an (S)-phenyl lactic acid derivative oran (R)-phenyl and making and using these polynucleotides and polypep lactic acid derivative. tides. P450 Enzymes Nitrilases of the invention can be used for hydrolyzing a 60 In one aspect, the invention provides P450 enzymes, poly nitrile to a carboxylic acid. In one embodiment, the condi nucleotides encoding them, and methods of making and using tions of the reaction comprise aqueous conditions. In another these polynucleotides and polypeptides. In one aspect, the embodiment, the conditions comprise a pH of about 8.0 and/ invention is directed to polypeptides, e.g., enzymes, having a or a temperature from about 37°C. to about 45° C. Nitrilases P450 enzymatic activity, including thermostable and thermo of the invention can also be used for hydrolyzing a cyanohy 65 tolerant P450 enzymatic activity, and polynucleotides encod drin moiety or an aminonitrile moiety of a molecule. Alter ing these enzymes, and making and using these polynucle natively, the nitrilases of the invention can be used for making otides and polypeptides. US 8,119,385 B2 37 38 P450s are oxidative enzymes that are widespread in nature nation (trans-elimination) or hydrolysis of galactan to galac and polypeptides of the invention having P450 activity can be tose orgalactooligomers. The pectate activity can com used in processes Such as detoxifying Xenobiotics, catabolism prise beta-elimination (trans-elimination) or hydrolysis of a of unusual carbon Sources and biosynthesis of secondary plant fiber. The plant fiber can comprise cotton fiber, hemp metabolites (e.g., detoxification of toxic composition, e.g., fiber or flax fiber. pesticides, poisons, chemical warfare agents and the like). The pectate lyases, e.g. pectinases, of the invention can be These activate molecular oxygen using an iron used for hydrolyzing, breaking up or disrupting a pectin- or heme center and utilize a electron shuttle to support the pectate (polygalacturonic acid)-comprising composition, for epoxidation reaction. liquefying or removing a pectin or pectate (polygalacturonic In one aspect, the P450 activity comprises a monooxygen 10 ation reaction. In one aspect, the P450 activity comprises acid) from a composition. Alternatively, the pectate lyases, catalysis of incorporation of oxygen into a Substrate. In one e.g. pectinases, of the invention can be used in detergent aspect, the P450 activity can further comprise hydroxylation compositions. In one aspect, the pectate lyase is a nonsurface of aliphatic or aromatic carbons. In another aspect, the P450 active pectate lyase or a Surface-active pectate lyase. The activity can comprise epoxidation. Alternatively, the P450 15 pectate lyase can be formulated in a non-aqueous liquid com activity can comprise N-O-, or S-dealkylation. In one aspect, position, a cast Solid, a granular form, a particulate form, a the P450 activity can comprise dehalogenation. In another compressed tablet, a gel form, a paste or a slurry form. aspect the P450 activity can comprise oxidative deamination. In one aspect, the pectate lyases, e.g. pectinases, of the Alternatively, the P450 activity can comprise N-oxidation or invention can be used for washing an object. In another N-hydroxylation. In one aspect, the P450 activity can com aspect, textiles or fabrics comprise a polypeptide of the inven prise Sulphoxide formation. tion, or a polypeptide encoded by a nucleic acid of the inven In one aspect, the epoxidase activity further comprises an tion, wherein the polypeptide has pectate lyase, e.g. pectinase alkene Substrate. The epoxidase activity can further comprise activity. Additionally, the pectate lyases, e.g. pectinases, of production of a chiral product. In one aspect, the epoxidase the invention can be used for fiber, thread, textile or fabric activity can be enantioselective. 25 scouring. In one aspect, the pectate lyase is an alkaline active Pectate Lyases and thermostable pectate lyase. The desizing and scouring In one aspect, the invention provides pectate lyases, e.g. treatments can be combined in a single bath. The method can pectinases, polynucleotides encoding them, and methods of further comprise addition of an alkaline and thermostable making and using these polynucleotides and polypeptides. In amylase. The desizing or scouring treatments can comprise one aspect, the invention is directed to polypeptides, e.g., 30 conditions of between about pH 8.5 to pH 10.0 and tempera enzymes, having a pectate lyase, e.g. a pectinase activity, tures of at about 40° C. The method can further comprise including thermostable and thermotolerant pectate lyase, e.g. addition of a bleaching step. The desizing, scouring and a pectinase activity, and polynucleotides encoding these bleaching treatments can be done simultaneously or sequen enzymes, and making and using these polynucleotides and tially in a single-bath container. The bleaching treatment can polypeptides. 35 comprise hydrogen peroxide or at least one peroxy compound The pectate lyases, e.g. pectinases, of the invention can be that can generate hydrogen peroxide when dissolved in water, used to catalyze the beta-elimination or hydrolysis of pectin or combinations thereof, and at leastonebleach activator. The and/or polygalacturonic acid, Such as 1.4-linked alpha-D- fiber, thread, textile or fabric can comprise a cellulosic mate galacturonic acid. They can be used in variety of industrial rial. The cellulosic material can comprise a crude fiber, a yarn, applications, e.g., to treat plant cell walls, such as those in 40 a woven or knit textile, a cotton, a linen, a flax, a ramie, a cotton or other natural fibers. In another exemplary industrial rayon, a hemp, a jute or a blend of natural or synthetic fibers. application, the polypeptides of the invention can be used in Alternatively, the pectate lyases, e.g. pectinases, of the textile Scouring. invention can be used in feeds or foods. For example, the In one aspect, pectate lyase activity comprises catalysis of pectate lyases, e.g. pectinases, of the invention can be used to beta-elimination (trans-elimination) or hydrolysis of pectin 45 improve the extraction of oil from an oil-rich plant material. or polygalacturonic acid (pectate). The pectate lyase activity In one aspect, the oil-lich plant material comprises an oil-rich can comprise the breakup or dissolution of plant cell walls. seed. The oil can be a soybean oil, an olive oil, a rapeseed The pectate lyase activity can comprise beta-elimination (canola) oil or a Sunflower oil. (trans-elimination) or hydrolysis of 1.4-linked alpha-D-ga In another aspect, the pectate lyases, e.g. pectinases, of the lacturonic acid. The pectate lyase activity can comprise 50 invention can be used for preparing a fruit or vegetable juice, catalysis of beta-elimination (trans-elimination) or hydroly syrup, puree or extract. In yet another aspect, the pectate sis of methyl-esterified galacturonic acid. The pectate lyase lyases, e.g. pectinases, of the invention can used for treating a activity can be exo-acting or endo-acting. In one aspect, the paper or a paper or wood pulp. Alternatively, the invention pectate lyase activity is endo-acting and acts at random sites provides papers or paper products or paper pulps comprising within a polymer chain to give a mixture of oligomers. In one 55 a pectate lyase of the invention, or a polypeptide encoded by aspect, the pectate lyase activity is exo-acting and acts from a nucleic acid of the invention. one end of a polymer chain and produces monomers or In yet another aspect, the invention provides pharmaceuti dimers. The pectate lyase activity can catalyze the random cal compositions comprising a polypeptide of the invention, cleavage of alpha-1,4-glycosidic linkages in pectic acid (po or a polypeptide encoded by a nucleic acid of the invention, lygalacturonic acid) by trans-elimination or hydrolysis. The 60 wherein the polypeptide has pectate lyase, e.g. pectinase pectate lyase activity can comprise activity the same or simi activity. The pharmaceutical composition can act as a diges lar to pectate lyase (EC 4.2.2.2), poly(1,4-alpha-D-galactur tive aid. onide) lyase, polygalacturonate lyase (EC 4.2.2.2), pectin Alternatively, the invention provides oral care products lyase (EC 4.2.2.10), polygalacturonase (EC 3.2.1.15), exo comprising a polypeptide of the invention, or a polypeptide polygalacturonase (EC 3.2.1.67), exo-polygalacturonate 65 encoded by a nucleic acid of the invention, wherein the lyase (EC 4.2.2.9) or exo-poly-alpha-galacturonosidase (EC polypeptide has pectate lyase, e.g. pectinase activity. The oral 3.2.1.82). The pectate lyase activity can comprise beta-elimi care product can comprise a toothpaste, a dental cream, a gel US 8,119,385 B2 39 40 or a tooth powder, an odontic, a mouth wash, a pre- or post tolerant phospholipase activity, and polynucleotides encod brushing rinse formulation, a chewing gum, a lozenge or a ing these enzymes, and making and using these polynucle candy. otides and polypeptides. Phosphatases Phospholipases are enzymes that hydrolyze the ester bonds In one aspect, the invention provides phosphatases, poly of phospholipids. Corresponding to their importance in the nucleotides encoding them, and methods of making and using metabolism of phospholipids, these enzymes are widespread these polynucleotides and polypeptides. In one aspect, the among prokaryotes and eukaryotes. The phospholipases invention is directed to polypeptides, e.g., enzymes, having a affect the metabolism, construction and reorganization of activity, including thermostable and thermotol biological membranes and are involved in signal cascades. erant phosphatase activity, and polynucleotides encoding 10 Several types of phospholipases are known which differ in their specificity according to the position of the bond attacked these enzymes, and making and using these polynucleotides in the phospholipid molecule. Phospholipase A1 (PLA1) and polypeptides. removes the 1-position fatty acid to produce free fatty acid Phosphatases are a group of enzymes that remove phos and 1-lyso-2-acylphospholipid. Phospholipase A2 (PLA2) phate groups from organophosphate ester compounds. There 15 removes the 2-position fatty acid to produce free fatty acid are numerous phosphatases, including alkaline phosphatases, and 1-acyl-2-lysophospholipid. PLA1 and PLA2 enzymes and phytases. can be intra- or extra-cellular, membrane-bound or soluble. Alkaline phosphatases are widely distributed enzymes and Intracellular PLA2 is found in almost every mammalian cell. are composed of a group of enzymes which hydrolyze organic Phospholipase C (PLC) removes the phosphate moiety to phosphate ester bonds at alkaline pH. produce 1.2 diacylglycerol and phospho base. Phospholipase Phosphodiesterases are capable of hydrolyzing nucleic D (PLD) produces 1,2-diacylglycerophosphate and base acids by hydrolyzing the phosphodiester bridges of DNA and group. PLC and PLD are important in cell function and sig RNA. The classification of phosphodiesterases depends upon naling. PLD had been the dominant phospholipase in bioca which side of the phosphodiester bridge is attacked. The 3' talysis. Patatins are another type of phospholipase, thought to enzymes specifically hydrolyze the ester linkage between the 25 work as a PLA. 3' carbon and the phosphoric group whereas the 5' enzymes The invention provides methods for cleaving a glycerol hydrolyze the ester linkage between the phosphoric group phosphate ester linkage comprising the following steps: (a) and the 5' carbon of the phosphodiester bridge. The best providing a polypeptide having a phospholipase activity, known of the class 3' enzymes is a from the wherein the polypeptide comprises an amino acid sequence venom of the rattlesnake or from a rustle's viper, which 30 of the invention, or the polypeptide is encoded by a nucleic hydrolyses all the 3' bonds in either RNA or DNA liberating acid of the invention; (b) providing a composition comprising nearly all the nucleotide units as nucleotide 5' phosphates. a glycerolphosphate ester linkage; and, (c) contacting the This enzyme requires a free 3' hydroxyl group on the terminal polypeptide of step (a) with the composition of step (b) under nucleotide residue and proceeds stepwise from that end of the conditions wherein the polypeptide cleaves the glycerolphos polynucleotide chain. This enzyme and all other 35 phate ester linkage. In one aspect, the conditions comprise which attack only at the ends of the polynucleotide chains are between about pH 5 to about 5.5, or, between about pH 4.5 to called . The 5' enzymes are represented by a about 5.0. In one aspect, the conditions comprise a tempera phosphodiesterase from bovine spleen, also an exonuclease, ture of between about 40°C. and about 70°C. In one aspect, which hydrolyses all the 5' linkages of both DNA and RNA the composition comprises a vegetable oil. In one aspect, the and thus liberates only nucleoside 3' phosphates. It begins its 40 composition comprises an oilseed phospholipid. In one attack at the end of the chain having a free 3' hydroxyl group. aspect, the cleavage reaction can generate a water extractable enzymes remove phosphate from phytic acid phosphorylated base and a . (inositol hexaphosphoric acid), a compound found in plants Phospholipases of the invention can be used in oil degum Such as corn, wheat and rice. The enzyme has commercial use ming, wherein the phospholipase is used under conditions for the treatment of animal feed, making the inositol of the 45 wherein the phospholipase can cleave ester linkages in an oil, phytic acid available for animal nutrition. Phytases are used to thereby degumming the oil. In one aspect, the oil is a veg improve the utilization of natural phosphorus in animal feed. etable oil. In another aspect, the vegetable oil comprises Use of phytase as a feed additive enables the animal to oilseed. The vegetable oil can comprise palm oil, rapeseed oil, metabolize a larger degree of its cereal feeds natural mineral corn oil, soybean oil, canola oil, Sesame oil, peanut oil or content thereby reducing or altogether eliminating the need 50 Sunflower oil. In one aspect, the method further comprises for synthetic phosphorus additives. More important than the addition of a phospholipase of the invention, another phos reduced need for phosphorus additives is the corresponding pholipase, another enzyme, or a combination thereof. reduction of phosphorus in pig and chicken waste. Many In another aspect of the invention, phospholipases of the European countries severely limit the amount of manure that invention can be used for converting a non-hydratable phos can be spread per acre due to concerns regarding phosphorus 55 pholipid to a hydratable form or for caustic refining of a contamination of ground water. phospholipid-containing composition. In the latter use, the Alkaline phosphatases hydrolyze monophosphate esters, polypeptide of the invention can be added before caustic releasing an organic phosphate and the cognate alcohol com refining and the composition comprising the phospholipid pound. It is non-specific with respect to the alcohol moiety can comprise a plant and the polypeptide can be expressed and it is this feature which accounts for the many uses of this 60 transgenically in the plant, the polypeptide having a phospho enzyme. lipase activity can be added during crushing of a seed or other Phospholipases plant part, or, the polypeptide having a phospholipase activity In one aspect, the invention provides phospholipases, poly is added following crushing or prior to refining. The polypep nucleotides encoding them, and methods of making and using tide can be added during caustic refining and varying levels of these polynucleotides and polypeptides. In one aspect, the 65 acid and caustic can be added depending on levels of phos invention is directed to polypeptides, e.g., enzymes, having a phorous and levels of free fatty acids. The polypeptide can be phospholipase activity, including thermostable and thermo added after caustic refining: in an intense mixer or retention US 8,119,385 B2 41 42 mixer prior to separation; following a heating step; in a cen Phytases of the invention may also be used in dietary aids trifuge; in a Soapstock; in a washwater; or, during bleaching or in pharmaceutical compositions, for reducing pollution or deodorizing steps. and increasing nutrient availability in an environment or envi In yet another aspect, the phospholipases of the invention ronmental sample by degrading environmental phytic acid, can be used for purification of a phytosterol or a triterpene. for liberating minerals from phytates in plant materials either The phytosterol or a triterpene can comprise a plant sterol. in vitro, i.e., in feed treatment processes, or in vivo, i.e., by The plant sterol can be derived from a vegetable oil. The administering the enzymes to animals. Vegetable oil can comprise a coconut oil, canola oil, cocoa Polymerases butter oil, corn oil, cottonseed oil, linseed oil, olive oil, palm In one aspect, the invention provides polymerases, poly oil, peanut oil, oil derived from a rice bran, safflower oil, 10 nucleotides encoding them, and methods of making and using sesame oil, soybean oil or a Sunflower oil. The method can these polynucleotides and polypeptides. In one aspect, the comprise use of nonpolar solvents to quantitatively extract invention is directed to polypeptides, e.g., enzymes, having a free phytosterols and phytosteryl fatty-acid esters. The phy polymerase activity, including thermostable and thermotol tosterol or a triterpene can comprise a B-sitosterol, a campes erant polymerase activity, and polynucleotides encoding terol, a stigmasterol, a Stigmastanol, a B-sitostanol, a sito 15 these enzymes, and making and using these polynucleotides stanol, a , a chalinasterol, a poriferasterol, a and polypeptides. clionasterol or a brassicasterol. The polymerase enzymes of the invention can have differ In one embodiment, the phospholipases of the invention ent polymerase activities at various high temperatures. In one can be used for refining a crude oil. The polypeptide can have aspect, the polymerase activity comprises addition of deoxy a phospholipase activity is in a water Solution that is added to nucleotides at the 3' hydroxyl end of a polynucleotide. The the composition. The water level can be between about 0.5 to invention also provides kits, e.g., diagnostic kits, and methods 5%. The process time can be less than about 2 hours, less than for performing various amplification reactions, e.g., poly about 60 minutes, less than about 30 minutes, less than 15 merase chain reactions, transcription amplifications, minutes, or less than 5 minutes. The hydrolysis conditions can chain reactions, self-sustained sequence replication or QBeta comprise a temperature of between about 25°C.-70° C. The 25 replicase amplifications. hydrolysis conditions can comprise use of caustics. The In one aspect, the polymerase activity comprises addition hydrolysis conditions can comprise a pH of between about of nucleotides at the 3' hydroxyl end of a nucleic acid. The pH 3 and pH 10, between about pH 4 and pH 9, or between polymerase activity can comprise a 5'-->3' polymerase activ about pH 5 and pH 8. The hydrolysis conditions can comprise ity, a 3'->5' exonuclease activity or a 5'-->3' exonuclease activ addition of emulsifiers and/or mixing after the contacting of 30 ity or all or a combination thereof. In one aspect, the poly step (c). The methods can comprise addition of an emulsion merase activity comprises only a 5'-->3' polymerase activity, breaker and/or heat to promote separation of an aqueous but not a 3'->5' exonuclease activity or a 5'-->3' exonuclease phase. The methods can comprise degumming before the activity. In another aspect, the polymerase activity can com contacting step to collect lecithin by centrifugation and then prise a 5'-->3' polymerase activity and a 3'->5' exonuclease adding a PLC, a PLC and/or a PLA to remove non-hydratable 35 activity, but not a 5'-->3' exonuclease activity. Alternatively, phospholipids. The methods can comprise water degumming the polymerase activity can comprise a 5'-->3' polymerase of crude oil to less than 10 ppm for edible oils and subsequent activity and a 5'-->3' exonuclease activity, but not a 3'->5' physical refining to less than about 50 ppm for biodiesel oils. exonuclease activity. The polymerase activity can comprise The methods can comprise addition of acid to promote hydra addition ofdUTP or dITP. The polymerase activity can com tion of non-hydratable phospholipids. 40 prise addition of a modified or a non-natural nucleotide to a Phytases polynucleotide. Such as an analog of , cytosine, thym In one aspect, the invention provides phytases, polynucle ine, adenine or uracil, e.g., a 2-aminopurine, an inosine or a otides encoding them, and methods of making and using these 5-methylcytosine. polynucleotides and polypeptides. In one aspect, the inven In one aspect, the polymerase activity can comprise strand tion is directed to polypeptides, e.g., enzymes, having a 45 displacement properties. In one aspect, the polymerase activ phytase activity, including thermostable and thermotolerant ity comprises reverse transcriptase activity. phytase activity, and polynucleotides encoding these Proteases enzymes, and making and using these polynucleotides and In one aspect, the invention provides proteases, polynucle polypeptides. otides encoding them, and methods of making and using these Conversion of phytate to inositol and inorganic phospho 50 polynucleotides and polypeptides. In one aspect, the inven rous can be catalyzed by phytase enzymes. Phytases such as tion is directed to polypeptides, e.g., enzymes, having a pro phytase #EC 3.1.3.8 are capable of catalyzing the hydrolysis tease activity, including thermostable and thermotolerant pro of myo-inositol hexaphosphate to D-myo-inositol 1,2,4,5,6- tease activity, and polynucleotides encoding these enzymes, pentaphosphate and orthophosphate. Other phytases hydro and making and using these polynucleotides and polypep lyze inositol pentaphosphate to tetra-, tri-, and lower phos 55 tides. phates. Acid phosphatases are enzymes that catalytically Proteases of the invention can be carbonyl hydrolases hydrolyze a wide variety of phosphate esters. For example, which act to cleave peptide bonds of proteins or peptides. #EC 3.1.3.2 enzymes catalyze the hydrolysis of orthophos Proteolytic enzymes are ubiquitous in occurrence, found in phoric monoesters to orthophosphate products. all living organisms, and are essential for cell growth and Phytases of the invention can be used in producing phytase 60 differentiation. The extracellular proteases are of commercial as a feed additive, e.g. for monogastric animals, fish, poultry, value and find multiple applications in various industrial sec ruminants and other non-ruminants. Phytases of the invention tors. Industrial applications of proteases include food pro can also be used for producing animal feed from certain cessing, brewing, alcohol production, peptide synthesis, industrial processes, e.g., wheat and corn waste products. In enantioselectivity, hide preparation in the leather industry, one aspect, the wet milling process of corn produces glutens 65 waste management and animal degradation, silver recovery in sold as animal feeds. The addition of phytase improves the the photographic industry, medical treatment, silk degum nutritional value of the feed product. ming, biofilm degradation, biomass conversion to ethanol, US 8,119,385 B2 43 44 biodefense, antimicrobial agents and disinfectants, personal role in protein foam stability. All of these characteristics care and cosmetics, biotech reagents and in increasing starch present problems for several industries including brewing, yield from corn wet milling. Additionally, proteases are baking, animal nutrition and paper manufacturing. In brew important components of laundry detergents and other prod ing applications, the presence of Xylan results in wort filter ucts. Within biological research, proteases are used in purifi ability and haze formation issues. In baking applications (es cation processes to degrade unwanted proteins. It is often pecially for cookies and crackers), these arabinoxylans create desirable to employ proteases of low specificity or mixtures Sticky doughs that are difficult to machine and reduce biscuit of more specific proteases to obtain the necessary degree of size. In addition, this carbohydrate is implicated in rapid degradation. rehydration of the baked product resulting in loss of crispi Proteases are classified according to their catalytic mecha 10 ness and reduced shelf-life. For monogastric animal feed nisms. The International Union of Biochemistry and Molecu applications with cereal diets, arabinoxylan is a major con lar Biology (IUBMB) recognizes four mechanistic classes: tributing factor to viscosity of gut contents and thereby (1) the serine proteases; (2) the proteases; (3) the adversely affects the digestibility of the feed and animal aspartic proteases; and (4) the metalloproteases. In addition, growth rate. For ruminant animals, these polysaccharides the IUBMB recognizes a class of (oligopep 15 represent Substantial components of fiber intake and more tidases) of unknown catalytic mechanism. The serine pro complete digestion of arabinoxylans would facilitate higher teases have alkaline pH optima, the metalloproteases are opti feed conversion efficiencies. mally active around neutrality, and the cysteine and aspartic Xylanases are currently used as additives (dough condi enzymes have acidic pH optima. Serine proteases class com tioners) in dough processing for the hydrolysis of water prises two distinct families: the chymotrypsin family, which soluble arabinoxylan. In baking applications (especially for includes the mammalian enzymes such as chymotrypsin, cookies and crackers), arabinoxylan creates sticky doughs trypsin, elastase, or kallikrein, and the Subtilisin family, that are difficult to machine and reduce biscuit size. In addi which include the bacterial enzymes such as subtilisin. Serine tion, this carbohydrate is implicated in rapid rehydration of proteases are used for a variety of industrial purposes, such as the baked product resulting in loss of crispiness and reduced laundry detergents to aid in the removal of proteinaceous 25 shelf-life. stains. In the food processing industry, serine proteases are The enhancement of Xylan digestion in animal feed may used to produce protein-rich concentrates from fish and live improve the availability and digestibility of valuable carbo stock, and in the preparation of dairy products. hydrate and protein feed nutrients. For monogastric animal The proteases of the invention can be used in a variety of feed applications with cereal diets, arabinoxylan is a major diagnostic, therapeutic, and industrial contexts. The proteases 30 contributing factor to viscosity of gut contents and thereby of the invention can be used as, e.g., an additive for a deter adversely affects the digestibility of the feed and animal gent, for processing foods and for chemical synthesis utiliz growth rate. For ruminant animals, these polysaccharides ing a reverse reaction. Additionally, the proteases of the represent Substantial components of fiber intake and more invention can be used in food processing, brewing, bath addi complete digestion would facilitate higher feed conversion tives, alcohol production, peptide synthesis, enantioselectiv 35 efficiencies. It is desirable for animal feed xylanases to be ity, hide preparation in the leather industry, waste manage active in the animal stomach. This requires a feed enzyme to ment and animal degradation, silver recovery in the have high activity at 37° C. and at low pH for monogastrics photographic industry, medical treatment, silk degumming, (pH 2-4) and near neutral pH for ruminants (pH 6.5-7). The biofilm degradation, biomass conversion to ethanol, biode enzyme should also possess resistance to animal gut Xyla fense, antimicrobial agents and disinfectants, personal care 40 nases and stability at the higher temperatures involved in feed and cosmetics, biotech reagents, in increasing starch yield pelleting. As such, there is a need in the art for Xylanase feed from corn wet milling and pharmaceuticals such as digestive additives for monogastric feed with high specific activity, aids and anti-inflammatory (anti-phlogistic) agents. activity at 35-40° C. and pH 2-4, half life greater than 30 Xylanases minutes in SGF and a half-life >5 minutes at 85°C. in for In one aspect, the invention provides Xylanases, polynucle 45 mulated State. For ruminant feed, there is a need for Xylanase otides encoding them, and methods of making and using these feed additives that have a high specific activity, activity at polynucleotides and polypeptides. In one aspect, the inven 35-40°C. and pH 6.5-7.0, halflife greater than 30 minutes in tion is directed to polypeptides, e.g., enzymes, having a Xyla SRF and stability as a concentrated dry powder. nase activity, including thermostable and thermotolerant In one aspect, the Xylanases of the invention are also used Xylanase activity, and polynucleotides encoding these 50 in improving the quality and quantity of milk protein produc enzymes, and making and using these polynucleotides and tion in lactating cows, increasing the amount of soluble sac polypeptides. charides in the stomach and Small intestine of pigs, improving Xylanases (e.g., endo-1,4-beta-Xylanase, EC 3.2.1.8) of late egg production efficiency and egg yields in hens. Addi the invention can hydrolyze internal B-1,4-Xylosidic linkages tionally, Xylanases of the inventions can be used in biobleach in Xylan to produce Smaller molecular weight xylose and 55 ing and treatment of chemical pulps, biobleaching and treat Xylo-oligomers. Xylans are polysaccharides formed from ment of wood or paper pulps, in reducing lignin in wood and 1,4-B-glycoside-linked D-Xylopyranoses. Xylanases of the modifying wood, as feed additives and/or Supplements or in invention are of considerable commercial value, being used in manufacturing cellulose solutions. Detergent compositions the food industry, for baking and fruit and vegetable process comprising Xylanases of the invention are used for fruit, veg ing, breakdown of agricultural waste, in the manufacture of 60 etables and/or mud and clay compounds. animal feed and in pulp and paper production. In another aspect, Xylanases of the invention can be used in Arabinoxylanase are major non-starch polysaccharides of compositions for the treatments and/or prophylaxis of coc cereals representing 2.5-7.1% w/w depending on variety and cidiosis. In yet another aspect, Xylanases of the invention can growth conditions. The physicochemical properties of this be used in the production of water soluble dietary fiber, in polysaccharide are Such that it gives rise to viscous solutions 65 improving the filterability, separation and production of or even gels under oxidative conditions. In addition, arabi starch, the beverage industry in improving filterability of wort noxylans have high water-binding capacity and may have a or beer, in reducing viscosity of plant material, or in increas US 8,119,385 B2 45 46 ing viscosity or gel strength of food products Such as jam, TABLE 1-continued marmalade, jelly, juice, paste, Soup, Salsa, etc. Xylanases of the invention may also be used in hydrolysis of EC (Enzyme Commission) Numbers with the corresponding mode of for which it is selective, particularly in the presence of cellu action or each enzyme class, Subclass and Sub-Subclass lose. In addition, Xylanases of the invention can also be used .2.1.— h NAD(+) or NADP(+) as acceptor. in the production of ethanol, in transformation of a microbe .2.2.— h a as acceptor. that produces ethanol, in production of oenological tannins .2.3.— in Oxygen as acceptor. .2.4.— h a as acceptor. and enzymatic composition, in stimulating the natural .2.7.— h an iron- protein as accep O. defenses of plants, in production of Sugars from hemicellu 2.99.— h other acceptors. lose Substrates, in the cleaning of fruit, vegetables, mud or 10 .3.—— ing on the CH-CH group of donors. .3.1.— h NAD(+) or NADP(+) as acceptor. clay containing soils, in cleaning beer filtration membranes, .3.2.— h a cytochrome as acceptor. and in killing or inhibiting microbial cells. .3.3.— in Oxygen as acceptor. Table 1, below, lists the various EC (Enzyme Commission) .3.5.— h a quinone or related compoun Numbers along with the corresponding mode of action for eptor. h an iron-sulfur protein as accep O. each enzyme class, Subclass and Sub-Subclass. Enzyme 15 h other acceptors. nomenclature is based upon the recommendations of the ing on the CH-NH(2) group of donors. Nomenclature Committee of the International Union of Bio h NAD(+) or NADP(+) as acceptor. chemistry and Molecular Biology (IUBMB). Table 2, below, h a cytochrome as acceptor. in Oxygen as acceptor. lists the various EC Numbers along with the corresponding h a disulfide as acceptor. name given to each enzyme class, Subclass and Sub-Subclass. h an iron-sulfur protein as accep O. Tables 1 and 2 list exemplary enzymatic activities of polypep h other acceptors. tides of the invention, as can be determined by sequence ing on the CH-NH group of donors. identity (e.g., homology); and in one embodiment a sequence h NAD(+) or NADP(+) as acceptor. in Oxygen as acceptor. of the invention comprises an enzyme having at least 50%, h a disulfide as acceptor. 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 25 h a quinone or similar compoun 61%. 62%, 63%, 64%. 65%, 66%, 67%, 68%, 69%, 70%, eptor. .5.8.— h a flavin as accep O. 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 5.99. h other acceptors. 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, .6.—— ing on NADH or NADPH. 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more .6.1.— h NAD(+) or NADP(+) as acceptor. sequence identity (homology) to an enzyme encoded by an 30 .6.2.— h a heme protein as acceptor. exemplary sequence of the invention, including all odd num .6.3.— ha oxygen as acceptor. .6.4.— h a disulfide as acceptor. bered SEQID NO:1 to SEQID NO:26,897, or an exemplary .6.5.— h a quinone or similar compound as polypeptide of the invention, including all even numbered eptor. SEQID NO:2 to SEQID NO:26,898, and with an exemplary h a nitrogenous group as acceptor. function as listed in Table 1 or Table 2. 35 h a flavin as acceptor. h other acceptors. Table 3, below, contains the exemplary SEQID NOs: of the ing on other nitrogenous compounds as invention, and the closest hit (BLAST) information for the donors. polynucleotides and polypeptides of the invention. This infor .7.1.— h NAD(+) or NADP(+) as acceptor. .7.2.- h a cytochrome as acceptor. mation includes the closest hit organism, accession number, .7.3.- h oxygen as acceptor. definition of the closest hit, EC number, percentage amino 40 .7.7.— h an iron-sulfur protein as acceptor. acid identity and the percent nucleotide identity, along with 7.99. h other acceptors. the Evalue for the closest hits. The information contained in .8.—— ing on a Sulfur group of donors. .8.1.— h NAD(+) or NADP(+) as acceptor. Table 3 identifies exemplary activities of polypeptides of the .8.2.— h a cytochrome as acceptor. invention, based on sequence identity (homology). In one .8.3.— h oxygen as acceptor. embodiment a sequence of the invention comprises an 45 .8.4.— h a disulfide as acceptor. enzyme with at least 50%, 51%, 52%. 53%, 54%, 55%, 56%, .8.5.— h a quinone or similar compound as eptor. 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, h an iron-sulfur protein as acceptor. 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 8.98.— h other, known, acceptors. 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, .8.99.— h other acceptors. 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 50 .9.- ...— ing on a heme group of donors. h oxygen as acceptor. 97%, 98%, 99%, or more sequence identity (homology) to an h a nitrogenous group as acceptor. enzyme as listed in Table 3. h other acceptors. Ac ing on diphenols and related TABLE 1. Sub stances as donors. 55 h NAD(+) or NADP(+) as acceptor. EC (Enzyme Commission) Numbers with the corresponding mode of h a cytochrome as acceptor. action for each enzyme class, Subclass and Sub-Subclass h oxygen as acceptor. h other acceptors. 1.——.— . ing on a peroxide as acceptor 1.1.—— Acting on the CH-OH group of donors. 1.1.1.— With NAD(+) or NADP(+) as acceptor. roxidases). 1.1.2.— With a cytochrome as acceptor. 60 ing on hydrogen as donor. 1.1.3.— With oxygen as acceptor. h NAD(+) or NADP(+) as acceptor. 1.1.4.— With a disulfide as acceptor. h a cytochrome as acceptor. 1.1.5.— With a quinone or similar compound as h a quinone or similar compound as acceptor. eptor. 1.1.99.— With other acceptors. h an iron-sulfur protein as acceptor. 1.2.—— Acting on the aldehyde or oxo group of 65 h other known acceptors. donors. h other acceptors. US 8,119,385 B2 47 48 TABLE 1-continued TABLE 1-continued EC (Enzyme Commission) Numbers with the corresponding mode of EC (Enzyme Commission) Numbers with the corresponding mode of action for each enzyme class, Subclass and Sub-Subclass action for each enzyme class, Subclass and Sub-Subclass Acting on single donors with 2.3.1.— Transferring groups other than amino incorporation of molecular oxygen. acyl groups. With incorporation of two atoms of oxygen. 2.3.2.— . With incorporation of one atom of oxygen. 2.3.3.— Acyl groups converted into alkyl on Acting on paired donors, with incorporation or transfer. reduction of molecular oxygen 2.4.—— . With 2-oxoglutarate as one donor, and 10 2.4.1.— Hexosyltransferases. incorporation of one atom each of oxygen into both 2.4.2.— Pentosyltransferases. donors 2.4.99.— Transferring other glycosyl groups. With NADH or NADPH as one donor, and 2.5.—— Transferring alkyl or aryl groups, other incorporation of two atoms of oxygen into one donor than methyl groups. With NADH or NADPH as one donor, and 2.6.—— Transferring nitrogenous groups. incorporation of one atom of oxygen 15 2.6.1.— Transaminases (aminotransferases). With reduced flavin or flavoprotein as one 2.6.3.— Oximinotransferases. donor, and incorporation of one atom of oxygen 2.6.99.— Transferring other nitrogenous groups. With a reduced iron-sulfur protein as one 2.7.—— Transferring phosphorous-containing groups. donor, and incorporation of one atom of oxygen 2.7.1.— Phosphotransferases with an alcohol group as With reduced pteridine as one donor, and acceptor. incorporation of one atom of oxygen 2.7.2.- Phosphotransferases with a carboxyl group as With reduced ascorbate as one donor, and acceptor. incorporation of one atom of oxygen 2.7.3.- Phosphotransferases with a nitrogenous group With another compound as one donor, and as acceptor. incorporation of one incorporation of one atom of 2.7.4.— Phosphotransferases with a phosphate group as OXygen acceptor. With oxidation of a pair of donors resulting 2.7.6.— Diphosphotransferases. in the reduction of molecular oxygen to two molecules 25 2.7.7.— . of water 2.7.8.— for other substituted phosphate With 2-oxoglutarate as one donor, and the groupS. other dehydrogenated. Phosphotransferases with paired acceptors. With NADH or NADPH as one donor, and Transferring Sulfur-containing groups. the other dehydrogenated. . Acting on Superoxide as acceptor. 30 . Oxidizing metal . CoA-transferases. With NAD(+) or NADP(+) as acceptor. Transferring alkylthio groups. With oxygen as acceptor. Transferring selenium-containing groups. With flavin as acceptor. Selenotransferases. Acting on CH or CH(2) groups. Hydrolases. With NAD(+) or NADP(+) as acceptor. Acting on ester bonds. With oxygen as acceptor. 35 Carboxylic ester hydrolases. With a disulfide as acceptor. hiolester hydrolases. With a quinone or similar compound as hosphoric monoester hydrolases. acceptor hosphoric diester hydrolases. With other acceptors. 5 : iphosphoric monoester hydrolases. Acting on iron-sulfur proteins as donors. ulfuricester hydrolases. With NAD(+) or NADP(+) as acceptor. 40 iphosphoric monoester hydrolases. With dinitrogen as acceptor. hosphoric triester hydrolases. With other, known, acceptors. eoxyribonucleases producing 5'- With H(+) as acceptor. OOO(SciS. Acting on reduced flavodoxin as donor. E. ibonucleases producing 5'- With dinitrogen as acceptor. OSOOO(SCS. Acting on phosphorus or arsenic in 45 E. ibonucleases producing 3'- donors. OSOOO(SCS. .20.1.— Acting on phosphorus or arsenic in E.xonucleases active with either ribo- or donors, with NAD(P)(+) as acceptor eoxyribonucleic acid and producing 5'- .20.4.— Acting on phosphorus or arsenic in OSOOO(SCS donors, with disulfide as acceptor 3.1.16.— Exonucleases active with either ribo- or .20.98.— Acting on phosphorus or arsenic in 50 deoxyribonucleic acid producing 3'-phosphomonoesters donors, with other, known acceptors 3.1.21.— producing 5'- .20.99.— Acting on phosphorus or arsenic in phosphomonoesters. donors, with other acceptors 3.1.22.— Endodeoxyribonucleases producing Acting on X-Handy-H to form an X-y other than 5'-phosphomonoesters. bond. 3.1.25.— Site-specific endodeoxyribonucleases .21.3.— With oxygen as acceptor. 55 specific for altered bases. .21.4.— With a disulfide as acceptor. 3.1.26.— producing 5'- .21.99.— With other acceptors. phosphomonoesters. 3.1.27.— Endoribonucleases producing other Other oxidoreductases. han 5'-phosphomonoesters. Transferases. 3.1.30.— Endoribonucleases active with either Transferring one-carbon groups. ribo- or deoxyribonucleic and producing 5'- 2.1.1.— Methyltransferases. 60 phosphomonoesters 2.1.2.— Hydroxymethyl-, formyl- and related Endoribonucleases active with either transferases. ribo- or deoxyribonucleic and producing 3'- 2.1.3.— Carboxyl- and carbamoyltransferases. phosphomonoesters 2.1.4.— Amidinotransferases. 3.2.—— Glycosylases. 2.2.—— Transferring aldehyde or ketone residues. 3.2.1.— Glycosidases, i.e. enzymes hydrolyzing 2.2.1.— Transketolases and transaldolases. 65 O- and S-glycosyl compounds 2.3.—— . 3.2.2.— Hydrolyzing N-glycosyl compounds. US 8,119,385 B2 49 50 TABLE 1-continued TABLE 1-continued EC (Enzyme Commission) Numbers with the corresponding mode of EC (Enzyme Commission) Numbers with the corresponding mode of action for each enzyme class, Subclass and Sub-Subclass action for each enzyme class, Subclass and Sub-Subclass 3.3.—— Acting on ether bonds. 5 5.3.2. interconverting keto- and enol-groups. 3.3.1.— Thioether and trialkylsulfonium 5.3.3.— Transposing C=C bonds. hydrolases. 5.3.4.— Transposing S-S bonds. 3.3.2.— Ether hydrolases. 5.3.99. Other intramolecular oxidoreductases. 3.4.—— Acting on peptide bonds (peptide 5.4.—— intramolecular transferases (). hydrolases). 5.4.1.— Transferring acyl groups. 3.4.11.— . 10 5.4.2.— Phosphotransferases (phosphomutases). 3.4.13.— . 5.4.3.— Transferring amino groups. 3.4.14.— Dipeptidyl-peptidases and tripeptidyl- 5.4.4.— Transferring hydroxy groups. peptidases. 5.4.99. Transferring other groups. 3.4.15.— Peptidyl-dipeptidases. 5.5.—— intramolecular lyases. 3.4.16.— Serine-type . 5.99.—— Other isomerases. 3.4.17.— Metallocarboxypeptidases. 15 6.- ...—.— . 3.4.18.— Cysteine-type carboxypeptidases. 6.1.—— Forming carbon-oxygen bonds. 3.4.19.— Omega peptidases. 6.1.1.— Ligases forming aminoacyl-tRNA and 3.4.21.— Serine endopeptidases. related compounds. 3.4.22.— Cysteine endopeptidases. 6.2.—— Forming carbon-sulfur bonds. 3.4.23.— Aspartic endopeptidases. 6.2.1.— Acid- ligases. 3.4.24.— . 6.3.—— Forming carbon- bonds. 3.4.25.— Threonine endopeptidases. 29 6.3.1. Acid-ammonia (or ) ligases (amide 3.4.99.— Endopeptidases of unknown catalytic synthases). mechanism. 6.3.2.— Acid--D-amino-acid ligases (peptide 3.5.—— Acting on carbon-nitrogen bonds, other synthases). than peptide bonds. 6.3.3.- Cyclo-ligases. 3.5.1.— In linear . 6.3.4.— Other carbon-nitrogen ligases. 3.5.2.— In cyclic amides. 25 3.5.3.— In linear amides. 6.3.5.— Carbon-nitrogen ligases with glutamine as 3.5.4.— In cyclicy amidines. amido-N-donor. 3.55. In nitriles. 6.4.—— Forming carbon-carbon bonds. 3.5.99. In other compounds. 6.5.—— Forming phosphoric ester bonds. 3.6.—— Acting on acid anhydrides. 6.6.—— Forming nitrogen-metal bonds. 3.6.1.— In phosphorous-containing anhydrides. 30 6.6.1- Forming nitrogen-metal bonds. 3.6.2.— In Sulfonyl-containing anhydrides. 3.6.3.— Acting on acid anhydrides; catalyzing transmembrane movement of Substances 3.6.4.— Acting on acid anhydrides; involved in TABLE 2 cellular and Subcellular movement 3.6.5.— Acting on GTP; involved in cellular and 35 EC Numbers with the corresponding name given to each enzyme Subcellular movement. class, Subclass and Sub-Subclass. 3.7.—— Acting on carbon-carbon bonds. 3.7.1.— In ketonic Substances. ENZYME: 1. . . . 3.8.—— Acting on halide bonds. 3.8.1.— In C-halide compounds. .1.1 . 3.9.—— Acting on phosphorus-nitrogen bonds. 40 .1.2 Alcohol dehydrogenase (NADP+). 3.10.—.— Acting on Sulfur-nitrogen bonds. .1.3 Homoserine dehydrogenase. 3.11.—.— Acting on carbon-phosphorus bonds. .1.4 (R,R)-butanediol dehydrogenase. 3.12.—.— Acting on Sulfur-Sulfur bonds. 1.5 . 3.13.—.— Acting on carbon-sulfur bonds. .1.6 Glycerol dehydrogenase. 4.——.— Lyases. 1.7 Propanediol-phosphate dehydrogenase. 4.1.—— Carbon-carbon lyases. .1.1.8 Glycerol-3-phosphate dehydrogenase 4.1.1.— Carboxy-lyases. 45 (NAD+). 4.1.2.— Aldehyde-lyases. .1.1.9 D-xylulose reductase. 4.1.3.— Oxo-acid-lyases. .1.1.10 L-xylulose reductase. 4.1.99.— Other carbon-carbon lyases. .1.1.11 D-arabinitol 4-dehydrogenase. 4.2.—— Carbon-oxygen lyases. .1.1.12 L-arabinitol 4-dehydrogenase. 4.2.1.— Hydro-lyases. .1.1.13 L-arabinitol 2-dehydrogenase. 4.2.2.— Acting on polysaccharides. 50 1.1.1.14 L-iditol 2-dehydrogenase. 4.2.3.— Acting on phosphates. 1.1.15 D-iditol 2-dehydrogenase. 4.2.99.— Other carbon-oxygen lyases. .1.1.16 Galactitol 2-dehydrogenase. 4.3.—— Carbon-nitrogen lyases. 1.1.17 Mannitol-1-phosphate 5 4.3.1.— Animonia-lyases. dehydrogenase. 4.3.2.— Lyases acting on amides, amidines, etc. .1.1.18 nositol 2-dehydrogenase. 4.3.3.— Amine-lyases. 55 1.1.1.19 L-glucuronate reductase. 4.3.99.— Other carbon-nitrogen-lyases. .1.1.20 Glucuronolactone reductase. 4.4.—— Carbon-sulfur lyases. .1.1.21 Aldehyde reductase. 4.5.—— Carbon-halide lyases. .1.1.22 UDP-glucose 6-dehydrogenase. 4.6.—— Phosphorus-oxygen lyases. .1.1.23 Histidinol dehydrogenase. 4.99.-- Other lyases. .1.1.24 Quinate dehydrogenase. 5.——.— Isomerases. 1.1.25 Shikimate dehydrogenase. 5.1.—— Racemases and epimerases. 60 .1.1.26 Glyoxylate reductase. 5.1.1.— Acting on amino acids and derivatives. 1.1.27 L-. 5.1.2.— Acting on hydroxy acids and derivatives. .1.1.28 D-lactate dehydrogenase. 5.1.3.— Acting on carbohydrates and derivatives. 1.1.29 Glycerate dehydrogenase. 5.1.99. Acting on other compounds. 1.1.30 3-hydroxybutyrate dehydrogenase. 5.2.—— Cis-trans-isomerases. .1.1.31 3-hydroxyisobutyrate dehydrogenase. 5.3.—— Intramolecular oxidoreductases. 65 1.1.1.32 Mevalidate reductase. 5.3.1.— Interconverting aldoses and ketoses. 11.33 Mevaldate reductase (NADPH). US 8,119,385 B2 51 52 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 34 Hydroxymethylglutaryl-CoA reductase .1.104 4-oxoproline reductase. (NADPH). 1105 Retinol dehydrogenase. 1.35 3-hydroxyacyl-CoA dehydrogenase. 1106 Pantoate 4-dehydrogenase. 1.36 Acetoacetyl-CoA reductase. 1.107 Pyridoxal 4-dehydrogenase. 1.37 . 1.108 3-dehydrogenase. 38 Malate dehydrogenase (oxaloacetate .1.110 Indolelactate dehydrogenase. decarboxylating). 10 .1.111 3-(imidazol-5-yl)lactate dehydrogenase. 39 Malate dehydrogenase .1.112 Indanol dehydrogenase. (decarboxylating). 1113 L-xylose 1-dehydrogenase. .40 Malate dehydrogenase (oxaloacetate .1.114 Apiose 1-reductase. decarboxylating) (NADP+). 1.115 1-dehydrogenase (NADP+). .1.41 (NAD+). .1.116 D-arabinose 1-dehydrogenase. .1.42 Isocitrate dehydrogenase (NADP+). 15 117 D-arabinose 1-dehydrogenase 1.43 Phosphogluconate 2-dehydrogenase. (NAD(P)+). .44 Phosphogluconate dehydrogenase 1118 Glucose 1-dehydrogenase (NAD+). (decarboxylating). 1119 Glucose 1-dehydrogenase (NADP+). 1.45 L-gulonate 3-dehydrogenase. 1.12O Galactose 1-dehydrogenase (NADP+). .1.46 L-arabinose 1-dehydrogenase. .1.121 Aldose 1-dehydrogenase. 1.47 Glucose 1-dehydrogenase. .1.122 D-threo-aldose 1-dehydrogenase. .1.48 Galactose 1-dehydrogenase. 1.123 Sorbose 5-dehydrogenase (NADP+). 1.49 Glucose-6-phosphate 1-dehydrogenase. .1.124 Fructose 5-dehydrogenase (NADP+). SO 3-alpha-hydroxysteroid dehydrogenase (B- 1.125 2-deoxy-D-gluconate 3-dehydrogenase. specific). 126 2-dehydro-3-deoxy-D-gluconate 6 1.51 3(or 17)beta-hydroxysteroid dehydrogenase. dehydrogenase. 1.52 3-alpha-hydroxycholanate dehydrogenase. 127 2-dehydro-3-deoxy-D-gluconate 5 53 3-alpha(or 20-beta)-hydroxysteroid 25 dehydrogenase. dehydrogenase. 1.128 L-idonate 2-dehydrogenase. 1.54 Allyl-alcohol dehydrogenase. 1129 L-threonate 3-dehydrogenase. 1.55 L-acetaldehyde reductase (NADPH). 1130 3-dehydro-L-gulonate 2-dehydrogenase. 1.56 Ribitol 2-dehydrogenase. 1.131 Mannuronate reductase. 1.57 Fructuronate reductase. 1.132 GDP-mannose 6-dehydrogenase. 1.58 Tagaturonate reductase. 30 1133 dTDP-4-dehydrorhamnose reductase. 1.59 3-hydroxypropionate dehydrogenase. 134 dTDP-6-deoxy-L-talose 4 160 2-hydroxy-3-oxopropionate reductase. dehydrogenase. 1.61 4-hydroxybutyrate dehydrogenase. 1.135 GDP-6-deoxy-D-talose 4-dehydrogenase. .1.62 Estradiol 17-beta-dehydrogenase. 136 UDP-N-acetylglucosamine 6 1.63 Testosterone 17-beta-dehydrogenase. dehydrogenase. .64 Testosterone 17-beta-dehydrogenase 35 .1.1.37 Ribitol-5-phosphate 2-dehydrogenase. (NADP+). 1.138 Mannitol 2-dehydrogenase (NADP+). 1.65 Pyridoxine 4-dehydrogenase. .1.140 Sorbitol-6-phosphate 2-dehydrogenase. 1.66 Omega-hydroxy decanoate dehydrogenase. .141 15-hydroxyprostaglandin dehydrogenase 1.67 Mannitol 2-dehydrogenase. (NAD+). 1.69 Gluconate 5-dehydrogenase. .1.142 D-pinitol dehydrogenase. 1.71 Alcohol dehydrogenase (NAD(P)+). .1.143 Sequoyitol dehydrogenase. 1.72 Glycerol dehydrogenase (NADP+). 40 .1.144 Perillyl-alcohol dehydrogenase. 1.73 Octanol dehydrogenase. 145 3-beta-hydroxy-delta(5)- 1.75 (R)-aminopropanol dehydrogenase. dehydrogenase. 1.76 (S,S)-butanediol dehydrogenase. .1.146 11-beta-hydroxysteroid dehydrogenase. 1.77 Lactaldehyde reductase. 1.147 16-alpha-hydroxysteroid dehydrogenase. 1.78 D-lactaldehyde dehydrogenase. .1.148 Estradiol 17-alpha-dehydrogenase. 1.79 Glyoxylate reductase (NADP+). 45 1.149 20-alpha-hydroxysteroid dehydrogenase. 18O Isopropanol dehydrogenase (NADP+). 21-hydroxysteroid dehydrogenase 1.81 Hydroxypyruvate reductase. (NAD+). 1.82 Malate dehydrogenase (NADP+). 152 3-alpha-hydroxy-5-beta-androstane-17 1.83 D-malate dehydrogenase (decarboxylating). one 3-alpha-dehydrogenase. 1.84 Dimethylmalate dehydrogenase. 1.153 Sepiapterin reductase. 1.85 3-isopropylmalate dehydrogenase. 50 1.154 Ureidoglycolate dehydrogenase. 186 Ketol-acid reductoisomerase. 1.15S Homoisocitrate dehydrogenase. 1.87 Homoisocitrate dehydrogenase. 1.156 Glycerol 2-dehydrogenase (NADP+). 1.88 Hydroxymethylglutaryl-CoA reductase. 1.157 3-hydroxybutyryl-CoA dehydrogenase. 1.90 Aryl-alcohol dehydrogenase. 1.158 UDP-N-acetylmuramate dehydrogenase. 1.91 Aryl-alcohol dehydrogenase (NADP+). 1159 7-alpha-hydroxysteroid dehydrogenase. 92 Oxaloglycolate reductase 55 1160 Dihydrobunolol dehydrogenase. (decarboxylating). .1.161 Cholestanetetraol 26-dehydrogenase. 1.93 Tartrate dehydrogenase. .1.162 Erythrulose reductase. .94 Glycerol-3-phosphate dehydrogenase 1.163 Cyclopentanol dehydrogenase. (NAD(P)+). .1.164 Hexadecanol dehydrogenase. 1.9S Phosphoglycerate dehydrogenase. 1165 2-alkyn-1-oldehydrogenase. 1.96 Diiodophenylpyruvate reductase. 166 Hydroxycyclohexanecarboxylate 1.97 3-hydroxybenzyl-alcohol dehydrogenase. 60 dehydrogenase. 1.98 (R)-2-hydroxy-fatty-acid dehydrogenase. 1.167 Hydroxymalonate dehydrogenase. 1.99 (S)-2-hydroxy-fatty-acid dehydrogenase. 168 2-dehydropantolactone reductase (A- 1OO 3-oxoacyl-acyl-carrier-protein specific). reductase. 1.169 2-dehydropantoate 2-reductase. .1.101 Acylglycerone-phosphate reductase. 170 Sterol-4-alpha-carboxylate 3 1.102 3-dehydrosphinganine reductase. 65 dehydrogenase (decarboxylating). 103 L-threonine 3-dehydrogenase. 172 2-oxoadipate reductase. US 8,119,385 B2 53 54 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 1.173 L-rhamnose 1-dehydrogenase. 231 15-hydroxyprostaglandin-I 1.174 Cyclohexane-1,2-diol dehydrogenase. dehydrogenase (NADP+). 1.175 D-xylose 1-dehydrogenase. 232 15-hydroxylicosatetraenoate 176 12-alpha-hydroxysteroid dehydrogenase. dehydrogenase. 233 N-acylmannosamine 1-dehydrogenase. 177 Glycerol-3-phosphate 1-dehydrogenase 1234 Flavanone 4-reductase. (NADP+). 10 1.23S 8-oxocoformycin reductase. 178 3-hydroxy-2-methylbutyryl-CoA 1.236 Tropinone reductase. dehydrogenase. 1.237 Hydroxyphenylpyruvate reductase. 1179 D-xylose 1-dehydrogenase (NADP+). 1.238 12-beta-hydroxysteroid dehydrogenase. 181 Cholest-5-ene-3-beta,7-alpha-diol 3 239 3-alpha-(17-beta)-hydroxysteroid beta-dehydrogenase. dehydrogenase (NAD+). 1.183 Geraniol dehydrogenase. 15 240 N-acetylhexosamine 1-dehydrogenase. .1.184 Carbonyl reductase (NADPH). .1.241 6-endo-hydroxycineole dehydrogenase. 1.185 L-glycol dehydrogenase. 1243 Carveol dehydrogenase. 1186 dTDP-galactose 6-dehydrogenase. .1.244 Methanol dehydrogenase. 1187 GDP-4-dehydro-D-rhamnose reductase. 1.245 Cyclohexanol dehydrogenase. 1.188 Prostaglandin-F synthase. .1.246 Pterocarpin synthase. 1.189 Prostaglandin-E(2) 9-reductase. 1.247 Codeinone reductase (NADPH). 190 indole-3-acetaldehyde reductase 1.248 Salutaridine reductase (NADPH). (NADH). 1.2SO D-arabinitol 2-dehydrogenase. 191 indole-3-acetaldehyde reductase 251 Galactitol-1-phosphate 5 (NADPH). dehydrogenase. 1.192 Long-chain-alcohol dehydrogenase. .252 Tetrahydroxynaphthalene reductase. 193 5-amino-6-(5- 1.254 (S)-carnitine 3-dehydrogenase. phosphoribosylamino)uracil reductase. 25 1.2SS Mannitol dehydrogenase. 1.194 Coniferyl-alcohol dehydrogenase. 1.256 Fluoren-9-oldehydrogenase. 1.195 Cinnamyl-alcohol dehydrogenase. 257 4-(hydroxymethyl)benzenesulfonate 196 15-hydroxyprostaglandin-D dehydrogenase. dehydrogenase (NADP+). 258 6-hydroxyhexanoate dehydrogenase. 197 15-hydroxyprostaglandin dehydrogenase 259 3-hydroxypimeloyl-CoA (NADP+). 30 dehydrogenase. 198 (+)-borneol dehydrogenase. 260 Sulcatone reductase. 1.199 (S)-usnate reductase. 261 Glycerol-1-phosphate dehydrogenase 2OO Aldose-6-phosphate reductase (NAD(P)+). (NADPH). 262 4-hydroxythreonine-4-phosphate 7-beta-hydroxysteroid dehydrogenase dehydrogenase. (NADP+). 35 263 1,5-anhydro-D-fructose reductase. 12O2 1,3-propanediol dehydrogenase. .1.264 L-idonate 5-dehydrogenase. 1.2O3 Uronate dehydrogenase. 1.265 3-methylbutanal reductase. 1.205 IMP dehydrogenase. 266 dTDP-4-dehydro-6-deoxyglucose 12O6 Tropine dehydrogenase. reductase. 1.2O7 (-)-menthol dehydrogenase. .267 1-deoxy-D-xylulose-5-phosphate 1208 (+)-neomenthol dehydrogenase. reductoisomerase. 209 3(or 17)-alpha-hydroxysteroid 40 268 2-(R)-hydroxypropyl-CoM dehydrogenase. dehydrogenase. 3-beta(or 20-alpha)-hydroxysteroid 269 2-(S)-hydroxypropyl-CoM dehydrogenase. dehydrogenase. Long-chain-3-hydroxyacyl-CoA 270 3-keto-. dehydrogenase. 1.271 GDP-L-fucose synthase. 3-oxoacyl-acyl-carrier-protein 45 1.272 (R)-2-hydroxyacid dehydrogenase. reductase (NADH). 1.273 VelloSimine dehydrogenase. 3-alpha-hydroxysteroid dehydrogenase 1.274 2,5-didehydrogluconate reductase. (A-specific). 1275 (+)-trans-carveol dehydrogenase. 2-dehydropantolactone reductase (B- 1.276 Serine 3-dehydrogenase. specific). 277 3-beta-hydroxy-5-beta-steroid Gluconate 2-dehydrogenase. 50 dehydrogenase. g dehydrogenase. 278 3-beta-hydroxy-5-alpha-steroid Benzyl-2-methyl-hydroxybutyrate dehydrogenase. dehydrogenase. 279 (R)-3-hydroxyacid-ester dehydrogenase. Morphine 6-dehydrogenase. 1.28O (S)-3-hydroxyacid-ester dehydrogenase. Dihydrokaempferol 4-reductase. 281 GDP-4-dehydro-6-deoxy-D-mannose 6-pyruvoyltetrahydropterin 2'-reductase. 55 reductase. Vomifoliol 4'-dehydrogenase. 282 Quinateishikimate dehydrogenase. (R)-4-hydroxyphenylactate .1.2.2 Mannitol dehydrogenase (cytochrome). dehydrogenase. .1.2.3 L-lactate dehydrogenase (cytochrome). 1.223 Isopiperitenol dehydrogenase. .1.2.4 D-lactate dehydrogenase (cytochrome). .1.224 Mannose-6-phosphate 6-reductase. 2.5 D-lactate dehydrogenase (cytochrome c 1.225 Chlordecome reductase. 553). 226 4-hydroxycyclohexanecarboxylate 60 1.3.3 Malate oxidase. dehydrogenase. .1.3.4 . 1.227 (-)-borneol dehydrogenase. 13.5 Hexose oxidase. 1228 (+)-Sabinol dehydrogenase. .1.3.6 oxidase. 229 Diethyl 2-methyl-3-oxoSuccinate 1.3.7 Aryl-. (CC8Se. 13.8 L-gulonolactone oxidase. 230 3-alpha-hydroxyglycyrrhetinate 65 13.9 Galactose oxidase. dehydrogenase. 3.10 Pyranose oxidase. US 8,119,385 B2 55 56 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. .1.3.11 L-Sorbose oxidase. 5 1.2.1.13 Glyceraldehyde-3-phosphate dehydrogenase .1.3.12 Pyridoxine 4-oxidase. (NADP(+)) (phosphorylating). 1.3.13 Alcohol oxidase. 2.1.15 Malonate-semialdehyde dehydrogenase. .1.3.14 Catechol oxidase (dimerizing). .2.1.16 Succinate-semialdehyde dehydrogenase 1.3.15 (S)-2-hydroxy-acid oxidase. (NAD(P)+). .1.3.16 Ecclysone oxidase. 2.1.17 Glyoxylate dehydrogenase (acylating). 1.3.17 oxidase. 10 1.2.1.18 Malonate-semialdehyde dehydrogenase 1.3.18 Secondary-alcohol oxidase. (acetylating). 1.3.19 4-hydroxymandelate oxidase. 2.1.19 Aminobutyraldehyde dehydrogenase. 1.3.20 Long-chain-alcohol oxidase. .2.1.20 Glutarate-semialdehyde dehydrogenase. 3.21 Glycerol-3-phosphate oxidase. .2.1.21 Glycolaldehyde dehydrogenase. 3.23 Thiamine oxidase. .2.1.22 Lactaldehyde dehydrogenase. .1.3.24 L-galactonolactone oxidase. 15 .2.1.23 2-oxoaldehyde dehydrogenase (NAD+). 1.3.25 Cellobiose oxidase. .2.1.24 Succinate-semialdehyde dehydrogenase. 3.27 Hydroxyphytanate oxidase. 2.1.25 2-oxoisovalerate dehydrogenase (acylating). 3.28 Nucleoside oxidase. .2.1.26 2,5-dioxovalerate dehydrogenase. 1.329 N-acylhexosamine oxidase. 2.1.27 Methylmalonate-semialdehyde 3.30 Polyvinyl-alcohol oxidase. dehydrogenase (acylating). 1.3.37 D-arabinono-1,4-lactone oxidase. .2.1.28 dehydrogenase (NAD+). 13.38 Vanillyl-alcohol oxidase. 20 1.2.1.29 Aryl-aldehyde dehydrogenase. 3.39 Nucleoside oxidase (H(2)O(2)-forming). 2.1.30 Aryl-aldehyde dehydrogenase (NADP+). .1.3.40 D-mannitol oxidase. .2.1.31 L-aminoadipate-semialdehyde 3.41 Xylitol oxidase. dehydrogenase. .1.4.1 Vitamin-K-epoxide reductase .2.1.32 Aminomuconate-semialdehyde (warfarin-sensitive). dehydrogenase. .1.4.2 Vitamin-K-epoxide reductase 25 1.2.1.33 (R)-dehydropantoate dehydrogenase. (warfarin-insensitive). 2.1.36 Retinal dehydrogenase. 15.2 Quinoprotein glucose dehydrogenase. 2.1.38 N-acetyl-gamma-glutamyl-phosphate 1991 . reductase. 1992 2-hydroxyglutarate dehydrogenase. 2.1.39 Phenylacetaldehyde dehydrogenase. .99.3 Gluconate 2-dehydrogenase .2.1.40 3-alpha,7-alpha,12-alpha (acceptor). 30 trihydroxycholestan-26-all 26-oxidoreductase. 1994 Dehydrogluconate dehydrogenase. .2.1.41 Glutamate-5-semialdehyde dehydrogenase. 1995 Glycerol-3-phosphate dehydrogenase. .2.1.42 Hexadecanal dehydrogenase (acylating). 99.6 D-2-hydroxy-acid dehydrogenase. .2.1.43 Formate dehydrogenase (NADP+). 1.99.7 Lactate-malate transhydrogenase. .2.1.44 Cinnamoyl-CoA reductase. 1998 Alcohol dehydrogenase (acceptor). 2.1.45 4-carboxy-2-hydroxymuconate-6- 1999 Pyridoxine 5-dehydrogenase. 35 semialdehyde dehydrogenase. 1.99.10 Glucose dehydrogenase (acceptor). .2.1.46 Formaldehyde dehydrogenase. 1.99.11 Fructose 5-dehydrogenase. 2.1.47 4-trimethylammoniobutyraldehyde 1.99.12 Sorbose dehydrogenase. dehydrogenase. 1.99.13 3-dehydrogenase. .2.1.48 Long-chain-aldehyde dehydrogenase. 19914 Glycolate dehydrogenase. .2.1.49 2-oxoaldehyde dehydrogenase 19916 Malate dehydrogenase (acceptor). (NADP+). 1.99.18 Cellobiose dehydrogenase (acceptor). 40 12.1.50 Long-chain-fatty-acyl-CoA reductase. 1.99.19 Uracil dehydrogenase. 2.1.51 Pyruvate dehydrogenase (NADP+). 1992O Alkan-1-oldehydrogenase (acceptor). 2.1.52 Oxoglutarate dehydrogenase 1.99.21 D- (acceptor). (NADP+). 19922 Glycerol dehydrogenase (acceptor). 2.1.53 4-hydroxyphenylacetaldehyde 99.23 Polyvinyl-alcohol dehydrogenase dehydrogenase. (acceptor). 45 1.2.1.54 Gamma-guanidinobutyraldehyde 99.24 Hydroxyacid-oxoacid dehydrogenase. transhydrogenase. 2.1.57 Butanal dehydrogenase. 99.25 Quinate dehydrogenase 2.1.58 Phenylglyoxylate dehydrogenase (pyrroloquinoline-quinone). (acylating). 99.26 3-hydroxycyclohexanone 2.1.59 Glyceraldehyde-3-phosphate dehydrogenase. 50 dehydrogenase (NAD(P)(+)) (phosphorylating). 99.27 (R)-pantolactone dehydrogenase .2.1.60 5-carboxymethyl-2-hydroxymuconic (flavin). semialdehyde dehydrogenase. 19928 Glucose-fructose oxidoreductase. .2.1.61 4-hydroxymuconic-semialdehyde 99.29 Pyranose dehydrogenase (acceptor). dehydrogenase. 19930 2-oxo-acid reductase. .2.1.62 4-formylbenzenesulfonate .2.1.1 Formaldehyde dehydrogenase 55 dehydrogenase. 2.1.2 sMisirense 2.1.63 6-oxohexanoate dehydrogenase. .2.1.3 Aldehyde dehydrogenase (NAD+). .2.1.64 staldehyde .2.1.4 Aldehyde dehydrogenase (NADP+). enyarogenase. 2.1.5 Aldehydey dehydrogenaseydrog (NAD(P)+). 2.1.65 Salicylaldehyde dehydrogenase. 2.17 Benzaldehyde dehydrogenase .2.1.66 Mycothiol-dependent formaldehyde (NADP+). 60 dehydrogenase. .2.1.8 Betaine-aldehyde dehydrogenase. 2.1.67 Vanillin dehydrogenase. .2.1.9 Glyceraldehyde-3-phosphate dehydrogenase 2.1.68 Coniferyl-aldehyde dehydrogenase. (NADP+). 2.1.69 Fluoroacetaldehyde dehydrogenase. .2.1.10 Acetaldehyde dehydrogenase (acetylating). .2.2.1 Formate dehydrogenase (cytochrome). .2.1.11 Aspartate-semialdehyde dehydrogenase. .2.2.2 Pyruvate dehydrogenase (cytochrome). .2.1.12 Glyceraldehyde-3-phosphate dehydrogenase 65 1.2.2.3 Formate dehydrogenase (cytochrome c (phosphorylating). 553). US 8,119,385 B2 57 58 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. .2.2.4 Carbon-monoxide dehydrogenase 3.1.31 2-enoate reductase. (cytochrome b-561). 3.1.32 . .2.3.1 Aldehyde oxidase. 3.1.33 Protochlorophyllide reductase. 2.3.3 Pyruvate oxidase. 3.1.34 2,4-dienoyl-CoA reductase (NADPH). .2.3.4 Oxalate oxidase. 3.1.35 Phosphatidylcholine desaturase. 2.3.5 Glyoxylate oxidase. 3.1.36 Geissoschizine dehydrogenase. 23.6 Pyruvate oxidase (CoA-acetylating). 10 3.1.37 Cis-2-enoyl-CoA reductase (NADPH). 2.3.7 indole-3-acetaldehyde oxidase. 3.1.38 Trans-2-enoyl-CoA reductase (NADPH). 2.38 Pyridoxal oxidase. 3.139 Enoyl-acyl-carrier-protein reductase 23.9 Aryl-aldehyde oxidase. (NADPH, A-specific). .2.3.11 Retinal oxidase. .3.1.40 2-hydroxy-6-oxo-6-phenylhexa-2,4- 2.3.13 4-hydroxyphenylpyruvate oxidase. dienoate reductase. .2.4.1 Pyruvate dehydrogenase (acetyl 15 3.141 Xanthommatin reductase. transferring). .3.1.42 12-oxophytodienoate reductase. .2.4.2 Oxoglutarate dehydrogenase (Succinyl 3.143 Cyclohexadienyl dehydrogenase. transferring). .3.1.44 Trans-2-enoyl-CoA reductase (NAD+). .2.4.4 3-methyl-2-oxobutanoate dehydrogenase (2– 3.145 2'-hydroxyisoflavone reductase. methylpropanoyl-transferring). .3.1.46 Biochanin-A reductase. 2.7.1 Pyruvate synthase. 3.147 Alpha-Santonin 12-reductase. 27.2 2-oxobutyrate synthase. 3.148 15-oxoprostaglandin 13-oxidase. 2.7.3 2-oxoglutarate synthase. 3.149 Cis-3,4-dihydrophenanthrene-3,4-diol 27.4 Carbon-monoxide dehydrogenase ehydrogenase. (ferredoxin). 3.151 2'-hydroxydaidzein reductase. 2.7.5 Aldehyde ferredoxin oxidoreductase. 3.1.52 2-methyl-branched-chain-enoyl-CoA 27.6 Glyceraldehyde-3-phosphate dehydrogenase reductase. (ferredoxin). 25 3.1.53 (3S4R)-3,4-dihydroxycyclohexa-1,5- 2.7.7 3-methyl-2-oxobutanoate dehydrogenase iene-1,4-dicarboxylate dehydrogenase. (ferredoxin). 3.154 Precorrin-6A reductase. 2.78 Indolepyruvate ferredoxin oxidoreductase. 3.156 Cis-2,3-dihydrobiphenyl-2,3-diol 2.7.9 2-oxoglutarate ferredoxin oxidoreductase. enydrogenase. 2.99.2 Carbon-monoxide dehydrogenase 3.1.57 reductase. (acceptor). 30 3.1.58 2,3-dihydroxy-2,3-dihydro-p-cumate 2.99.3 Aldehyde dehydrogenase (pyrroloquinoline enydrogenase. quinone). 3.1.59 ,6-dihydroxy-5-methylcyclohexa-2,4- 2.99.4 Formaldehyde dismutase. ienecarboxylate dehydrogenase. 2.99.5 Formylmethanofuran dehydrogenase. 3.1.60 Dibenzothiophene dihydrodiol 2.99.6 Carboxylate reductase. enydrogenase. 2.99.7 Aldehyde dehydrogenase (FAD 35 .3.1.61 Terephthalate 1,2-cis-dihydrodiol independent). enydrogenase. .3. .1 Dihydrouracil dehydrogenase (NAD+). 3.162 Pimeloyl-CoA dehydrogenase. Dihydropyrimidine dehydrogenase 3.1.63 2,4-dichlorobenzoyl-CoA reductase. (NADP+). .3.1.64 Phthalate 4.5-cis-dihydrodiol beta-reductase. enydrogenase. Cortisone alpha-reductase. 3.1.65 5,6-dihydroxy-3-methyl-2-oxo-1,2,5,6- Cucurbitacin delta(23)-reductase. 40 etrahydroquinoline dehydrogenase. Fumarate reductase (NADH). 3.1.66 Cis-dihydroethylcatechol Meso-tartrate dehydrogenase. enydrogenase. Acyl-CoA dehydrogenase (NADP+). 3.1.67 Cis-1,2-dihydroxy-4-methylcyclohexa Enoyl-acyl-carrier-protein reductase 3,5-diene-1-carboxylate dehydrogenase. (NADH). 3.1.68 2-dihydroxy-6-methylcyclohexa-3,5- .3. Enoyl-acyl-carrier-protein reductase 45 ienecarboxylate dehydrogenase. (NADPH, B-specific). 3.1.69 Zeatin reductase. .11 2-coumarate reductase. 3.1.70 Delta (14)-sterol reductase. .12 . 3.171 Delta(24(24(1)))-sterol reductase. 13 Prephenate dehydrogenase (NADP+). 3.1.72 Delta(24)-sterol reductase. .14 Orotate reductase (NADH). 3.1.73 2-dihydrovomilenine reductase. 1S Orotate reductase (NADPH). 50 3.174 2-alkenal reductase. 16 Beta-nitroacrylate reductase. 3.1.75 Divinyl a 8-vinyl 17 3-methyleneoxindole reductase. reductase. 18 Kynurenate-7,8-dihydrodiol dehydrogenase. 3.176 Precorrin-2 dehydrogenase. 19 Cis-1,2-dihydrobenzene-1,2-diol 32.3 Galactonolactone dehydrogenase. dehydrogenase. 3.31 Dihydroorotate oxidase. .3. Trans-1,2-dihydrobenzene-1,2-diol 55 3.3.2 Lathosterol oxidase. dehydrogenase. 3.3.3 Coproporphyrinogen oxidase. .21 7-dehydrocholesterol reductase. .3.3.4 Protoporphyrinogen oxidase. 22 Cholestenone 5-alpha-reductase. 33.5 . .23 Cholestenone 5-beta-reductase. 33.6 Acyl-CoA oxidase. .24 Billiverdin reductase. 3.3.7 Dihydrouracil oxidase. 25 1,6-dihydroxycyclohexa-2,4-diene-1- 3.38 Tetrahydroberberine oxidase. carboxylate dehydrogenase. 60 33.9 Secologanin synthase. .3. 26 Dihydrodipicolinate reductase. 3.3.10 alphabeta-oxidase. 27 2-hexadecenal reductase. 3.5.1 (ubiquinone). 28 2,3-dihydro-2,3-dihydroxybenzoate .3.7.1 6-hydroxynicotinate reductase. dehydrogenase. .3.7.2 15, 16-dihydrobiliverdin:ferredoxin Cis-1,2-dihydro-1,2-dihydroxynaphthalene oxidoreductase. dehydrogenase. 65 .3.7.3 Phycoerythrobilin:ferredoxin Progesterone 5-alpha-reductase. oxidoreductase. US 8,119,385 B2 59 60 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 37.4 Phytochromobilin:ferredoxin Saccharopine dehydrogenase (NAD+, L oxidoreductase. ysine-forming). 3.7.5 Phycocyanobilin:ferredoxin Saccharopine dehydrogenase (NADP+, L oxidoreductase. ysine-forming). 3.99.1 Succinate dehydrogenase. Saccharopine dehydrogenase (NAD+, L 3.99.2 Butyryl-CoA dehydrogenase. glutamate-forming). 3.99.3 Acyl-CoA dehydrogenase. 10 Saccharopine dehydrogenase (NADP+, 3.99.4 3-oxosteroid 1-dehydrogenase. L-glutamate-forming). 3.99.5 3-oxo-5-alpha-steroid 4-dehydrogenase. D-Octopine dehydrogenase. 3.99.6 3-oxo-5-beta-steroid 4-dehydrogenase. 5. -pyrroline-5-carboxylate dehydrogenase. 3.99.7 Glutaryl-CoA dehydrogenase. S. Methylenetetrahydrofolate dehydrogenase 3.99.8 2-furoyl-CoA dehydrogenase. (NAD+). 3.99.10 ISOValeryl-CoA dehydrogenase. 15 16 D-lysopine dehydrogenase. 3.99.11 Dihydroorotate dehydrogenase. 17 Alanopine dehydrogenase. 3.99.12 2-methylacyl-CoA dehydrogenase. 18 Ephedrine dehydrogenase. 3.99.13 Long-chain-acyl-CoA dehydrogenase. 19 D-nopaline dehydrogenase. 3.99.14 Cyclohexanone dehydrogenase. 2O Methylenetetrahydrofolate reductase 3.99.15 Benzoyl-CoA reductase. (NADPH). 3.99.16 Isoquinoline 1-oxidoreductase. S. .21 Delta(1)-piperideine-2-carboxylate 3.99.17 Quinoline 2-oxidoreductase. reductase. 3.99.18 Quinaldate 4-oxidoreductase. 22 Strombine dehydrogenase. 3.99.19 Quinoline-4-carboxylate 2 23 Tauropine dehydrogenase. oxidoreductase. .24 N(5)-(carboxyethyl)ornithine synthase. 3.99.2O 4-hydroxybenzoyl-CoA reductase. 25 Thiomorpholine-carboxylate 3.99.21 (R)-benzylsuccinyl-CoA dehydrogenase. dehydrogenase. 25 S. 26 Beta-alanopine dehydrogenase. Alanine dehydrogenase. S. 27 1,2-dehydroreticulinium reductase Glutamate dehydrogenase. (NADPH). Glutamate dehydrogenase (NAD(P)+). 28 Opine dehydrogenase. Glutamate dehydrogenase (NADP+). 29 FMN reductase. L-amino-acid dehydrogenase. 30 Flavin reductase. Serine 2-dehydrogenase. 30 31 Berberine reductase. Valine dehydrogenase (NADP+). 32 Vomillenine reductase. Leucine dehydrogenase. .33 Pteridine reductase. Glycine dehydrogenase. 6,7-dihydropteridine reductase. L-erythro-3,5-diaminohexanoate 5.3.1 Sarcosine oxidase. dehydrogenase. .5.3.2 N-methyl-L-amino-acid oxidase. 2,4-diaminopentanoate dehydrogenase. 35 5.34 N(6)-methyl-lysine oxidase. Glutamate synthase (NADPH). .5.3.5 (S)-6-hydroxynicotine oxidase. Glutamate synthase (NADH). 5.3.6 (R)-6-hydroxynicotine oxidase. Lysine dehydrogenase. 5.3.7 L-pipecolate oxidase. 1 6 Diaminopimelate dehydrogenase. .5.3.10 Dimethylglycine oxidase. N-methylalanine dehydrogenase. 5.3.11 Polyamine oxidase. Lysine 6-dehydrogenase. 5.3.12 Dihydrobenzophenanthridine oxidase. Tryptophan dehydrogenase. 40 .5.4.1 Pyrimidodiazepine synthase. Phenylalanine dehydrogenase. S.S.1 Electron-transferring-flavoprotein Glycine dehydrogenase (cytochrome). dehydrogenase. D-aspartate oxidase. 5.8.1 Dimethylamine dehydrogenase. L-amino-acid oxidase. 5.8.2 Trimethylamine dehydrogenase. D-amino-acid oxidase. 5.99.1 Sarcosine dehydrogenase. Amine oxidase (flavin-containing). 45 5.99.2 Dimethylglycine dehydrogenase. Pyridoxamine-phosphate oxidase. 5.99.3 L-pipecolate dehydrogenase. Amine oxidase (copper-containing). 5.99.4 Nicotine dehydrogenase. D-glutamate oxidase. 5.99.5 Methylglutamate dehydrogenase. Ethanolamine oxidase. 5.99.6 Spermidine dehydrogenase. Putrescine oxidase. 5.99.8 Proline dehydrogenase. L-glutamate oxidase. 50 5.99.9 Methylenetetrahydromethanopterin Cyclohexylamine oxidase. dehydrogenase. Protein-lysine 6-oxidase. 5.99.11 5,10-methylenetetrahydromethanopterin L-lysine oxidase. reductase. D-glutamate(D-aspartate) oxidase. 5.99.12 Cytokinin dehydrogenase. L-aspartate oxidase. .6.1.1 NAD(P)(+) transhydrogenase (B-specific). Glycine oxidase. .6.1.2 NAD(P)(+) transhydrogenase (AB Glycine dehydrogenase (decarboxylating). 55 specific). Glutamate synthase (ferredoxin). .6.2.2 Cytochrome-b5 reductase. D-amino-acid dehydrogenase. .6.2.4 NADPH--hemoprotein reductase. Taurine dehydrogenase. 6.25 NADPH--cytochrome-c2 reductase. Amine dehydrogenase. .6.2.6 Leghemoglobin reductase. Aralkylamine dehydrogenase. 6.3.1 NAD(P)H oxidase. Glycine dehydrogenase (cyanide 60 6.5.3 NADH dehydrogenase (ubiquinone). forming). 6.5.4 Monodehydroascorbate reductase (NADH). Pyrroline-2-carboxylate reductase. 6.S.S NADPH:duinone reductase. Pyrroline-5-carboxylate reductase. 6.5.6 p-benzoquinone reductase (NADPH). Dihydrofolate reductase. 6.5.7 2-hydroxy-1,4-benzoquinone reductase. Methylenetetrahydrofolate dehydrogenase 6.6.9 Trimethylamine-N-oxide reductase. (NADP+). 65 6.99.1 NADPH dehydrogenase. Formyltetrahydrofolate dehydrogenase. 6.99.2 NAD(P)H dehydrogenase (quinone). US 8,119,385 B2 61 62 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 6.99.3 NADH dehydrogenase. 8.99.1 Sulfite reductase. 6.99.5 NADH dehydrogenase (quinone). 8.99.2 Adenylyl-sulfate reductase. 6.99.6 NADPH dehydrogenase (quinone). 8.99.3 Hydrogensulfite reductase. .7.1.1 reductase (NADH). 9.3.1 Cytochrome-c oxidase. .7.1.2 Nitrate reductase (NAD(P)H). 96.1 Nitrate reductase (cytochrome). .7.1.3 Nitrate reductase (NADPH). 9.99.1 ron--cytochrome-c reductase. .7.1.4 reductase (NAD(P)H). 10 .10.1.1 Trans-acenaphthene-1,2-diol .7.1.5 Hyponitrite reductase. dehydrogenase. 71.6 Azobenzene reductase. .10.2.1 L-ascorbate-cytochrome-b5 reductase. .7.1.7 GMP reductase. .10.2.2 Ubiquinol-cytochrome-c reductase. .7.1.9 Nitroquinoline-N-oxide reductase. 103.1 Catechol oxidase. .7.1.10 Hydroxylamine reductase (NADH). 103.2 80C8SC. .7.1.11 4 15 10.3.3 L-ascorbate oxidase. (dimethylamino)phenylazoxybenzene reductase. .10.3.4 O-aminophenol oxidase. .7.1.12 N-hydroxy-2-acetamidofluorene 103.5 3-hydroxyanthranilate oxidase. reductase. 10.3.6 Rifamycin-B oxidase. 7.2.1 Nitrite reductase (NO-forming). 10.99.1 Plastoquinol-plastocyanin reductase. 7.2.2 Nitrite reductase (cytochrome; .11. NADH peroxidase. ammonia-forming). NADPH peroxidase. 7.2.3 Trimethylamine-N-oxide reductase Fatty-acid peroxidase. (cytochrome c). Cytocbrome-c peroxidase. 7.3.1 Nitroethane oxidase. Catalase. 7.3.2 Acetylindoxyl oxidase. Peroxidase. 7.3.3 Urate oxidase. odide peroxidase. 7.34 Hydroxylamine oxidase. Glutathione peroxidase. 7.3.5 3-aci-nitropropanoate oxidase. 25 Chloride peroxidase. 7.7.1 Ferredoxin-nitrite reductase. L-ascorbate peroxidase. 7.7.2 Ferredoxin-nitrate reductase. Phospholipid-hydroperoxide glutathione 7.99.1 Hydroxylamine reductase. peroxidase. 7.99.4 Nitrate reductase. Manganese peroxidase. 7.99.5 5,10-methylenetetrahydrofolate Diarylpropane peroxidase. reductase (FADH(2)). 30 Hydrogen dehydrogenase. 7.99.6 Nitrous-oxide reductase. Hydrogen dehydrogenase (NADP+). 7.99.7 Nitric-oxide reductase. Cytochrome-c3 hydrogenase. 7.99.8 Hydroxylamine oxidoreductase. Hydrogen:Quinone oxidoreductase. .8.1.2 Sulfite reductase (NADPH). 12.7.2 Ferredoxin hydrogenase. 8.13 Hypotaurine dehydrogenase. 12.98.1 Coenzyme F420 hydrogenase. .8.1.4 Dihydrolipoyl dehydrogenase. 35 1298.2 5,10-methenyltetrahydromethanopterin 8.15 2-oxopropyl-CoM reductase hydrogenase. (carboxylating). 1298.3 Methanosarcina-phenazine hydrogenase. .8.1.6 Cystine reductase. 12.99.6 Hydrogenase (acceptor). 8.17 Glutathione-disulfide reductase. 3 Catechol 1,2-dioxygenase. 8.1.8 Protein-disulfide reductase. Catechol 2,3-dioxygenase. .8.19 Thioredoxin-disulfide reductase. Protocatechuate 3,4-dioxygenase. 8.1.10 CoA-glutathione reductase. 40 Gentisate 1,2-dioxygenase. .8.1.11 Asparagusate reductase. Homogentisate 1,2-dioxygenase. .8.1.12 Trypanothione-disulfide reductase. 3-hydroxyanthranilate 3,4-dioxygenase. 8.1.13 Bis-gamma-glutamylcystine reductase. Protocatechuate 4,5-dioxygenase. .8.1.14 CoA-disulfide reductase. 2,5-dihydroxypyridine 5,6-dioxygenase. 8.1.15 Mycothione reductase. 7,8-dihydroxykynurenate 8,8a .8.2.1 Sulfite dehydrogenase. 45 dioxygenase. .8.2.2 Thiosulfate dehydrogenase. Tryptophan 2,3-dioxygenase. 8.31 Sulfite oxidase. Lipoxygenase. 8.32 Thiol oxidase. 1 3 Ascorbate 2,3-dioxygenase. 8.33 Glutathione oxidase. 2,3-dihydroxybenzoate 3,4-dioxygenase. .8.3.4 Methanethiol oxidase. 3,4-dihydroxyphenylacetate 2,3- 8.35 Prenylcysteine oxidase. 50 dioxygenase. .8.4.1 Glutathione-homocystine 16 3-carboxyethylcatechol 2,3-dioxygenase. transhydrogenase. 17 Indole 2,3-dioxygenase. .8.4.2 Protein-disulfide reductase 18 Sulfur dioxygenase. (glutathione). 19 Cysteamine dioxygenase. .8.4.3 Glutathione--CoA-glutathione 2O Cysteine dioxygenase. transhydrogenase. 55 22 Caffeate 3,4-dioxygenase. .8.4.4 Glutathione-cystine transhydrogenase. .23 2,3-dihydroxyindole 2,3-dioxygenase. 8.45 Methionine-S-oxide reductase. .24 Quercetin 2,3-dioxygenase. .8.4.6 Protein-methionine-S-oxide reductase. 25 3,4-dihydroxy-9,10-secoandrosta 8.4.7 Enzyme-thiol transhydrogenase 1,3,5(10)-triene-9,17-dione 4,5-dioxygenase. (glutathione-disulfide). 26 Peptide-tryptophan 2,3-dioxygenase. .8.4.8 Phosphoadenylyl-sulfate reductase 27 4-hydroxyphenylpyruvate dioxygenase. (thioredoxin). 60 28 2,3-dihydroxybenzoate 2,3-dioxygenase. 8.49 Adenylyl-sulfate reductase 29 Stizolobate synthase. (glutathione). 30 Stizolobinate synthase. .8.4.10 Adenylyl-sulfate reductase 31 Arachidonate 12-lipoxygenase. (thioredoxin). 32 2-nitropropane dioxygenase. 8.5.1 Glutathione dehydrogenase (ascorbate). 33. Arachidonate 15-lipoxygenase. 8.7.1 Sulfite reductase (ferredoxin). 65 34 Arachidonate 5-lipoxygenase. 8.98.1 CoB-CoM heterodisulfide reductase. 35 Pyrogallol 1,2-. US 8,119,385 B2 63 64 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 36 Chloridazon-catechol dioxygenase. 2.17 dioxygenase. 37 Hydroxyquinol 1,2-dioxygenase. 2.18 Biphenyl 2,3-dioxygenase. 38 1-hydroxy-2-naphthoate 1,2-dioxygenase. 3.1 Salicylate 1-monooxygenase. 39 Biphenyl-2,3-diol 1,2-dioxygenase. : 3.2 4-hydroxybenzoate 3 .40 Arachidonate 8-lipoxygenase. monooxygenase. .41 2,4-dihydroxyacetophenone dioxygenase. 4. 3.3 4-hydroxyphenylacetate 3 .42 Indoleamine-pyrrole 2,3-dioxygenase. 10 monooxygenase. 43 Lignostilbene alpha-beta-dioxygenase. 3.4 Melilotate 3-monooxygenase. .44 Linoleate diol synthase. 3.5 Imidazoleacetate 4 45 Linoleate 11-lipoxygenase. monooxygenase. .46 4-hydroxymandelate synthase. 3.6 Orcinol 2-monooxygenase. 47 3-hydroxy-4-oxoquinoline 2,4- 3.7 2-monooxygenase. dioxygenase. 15 3.8 Dimethylaniline monooxygenase 3 48 3-hydroxy-2-methylquinolin-4-one 2,4- (N-oxide-forming). dioxygenase. 6.4 Tryptophan 5-monooxygenase. 49 Chlorite O(2)-lyase. 6.5 Glyceryl-ether monooxygenase. SO Acetylacetone-cleaving enzyme. 6.6 Mandelate 4-monooxygenase. 2.1 Arginine 2-monooxygenase. 7.1 Dopamine beta-monooxygenase. 2.2 Lysine 2-monooxygenase. 7.3 Peptidylglycine monooxygenase. 2.3 Tryptophan 2-monooxygenase. 7.4 Aminocyclopropanecarboxylate oxidase. 2.4 Lactate 2-monooxygenase. 8.1 Monophenol monooxygenase. 2.5 Renilla-luciferin 2-monooxygenase. 8.2 CMP-N-acetylneuraminate 2.6 Cypridina-luciferin 2-monooxygenase. monooxygenase. 2.7 Photinus-luciferin 4-monooxygenase (ATP .14.19.1 Stearoyl-CoA 9-desaturase. hydrolyzing). .14.19.2 Acyl-acyl-carrier-protein desaturase. Watasenia-luciferin 2-monooxygenase. 25 14.19.3 Linoleoyl-CoA desaturase. Phenylalanine 2-monooxygenase. .14.20.1 Deacetoxycephalosporin-C synthase. i Methylphenyltetrahydropyridine N .14.21.1 (S)-stylopine synthase. monooxygenase. .14.21.2 (S)-cheilanthifoline synthase. Apo-beta-carotenoid-14.13'-dioxygenase. .14.21.3 . Oplophorus-luciferin 2-monooxygenase. .14.21.4 Salutaridine synthase. Inositol oxygenase. 30 14:21.5 (S)-canadine synthase. Tryptophan 2'-dioxygenase. 14.99.1 Prostaglandin-endoperoxide synthase. Gamma-butyrobetaine dioxygenase. 14.99.2 Kynurenine 7,8-hydroxylase. Procollagen-proline dioxygenase. 14.99.3 Heme oxygenase (decyclizing). Pyrimidine-deoxynucleoside 2'- 14.99.4 Progesterone monooxygenase. dioxygenase. 14.99.7 . Procollagen-lysine 5-dioxygenase. 35 14.99.9 Steroid 17-alpha-monooxygenase. dioxygenase. 14.99.10 Steroid 21-monooxygenase. Procollagen-proline 3-dioxygenase. 14.99.11 Estradiol 6-beta-monooxygenase. Trimethyllysine dioxygenase, 14.99.12 Androst-4-ene-3,17-dione Naringenin 3-dioxygenase. monooxygenase. Pyrimidine-deoxynucleoside 1'- 14.99.14 Progesterone 11-alpha-monooxygenase. dioxygenase. 14.99.15 4-methoxybenzoate monooxygenase (O- Hyoscyamine (6S)-dioxygenase. 40 demethylating). Gibberellin-44 dioxygenase. 14.99.19 Plasmanylethanolamine desaturase. Gibberellin 2-beta-dioxygenase. 14.99.20 Phylloquinone monooxygenase (2,3- 6-beta-hydroxyhyoscyamine epoxidizing). epoxidase. 14.99.21 Latia-luciferin monooxygenase Gibberellin 3-beta-dioxygenase. (demethylating). Peptide-aspartate beta 45 14.99.22 Ecdysone 20-monooxygenase. dioxygenase. 14.99.23 3-hydroxybenzoate 2-monooxygenase. Taurine dioxygenase. 14.99.24 Steroid 9-alpha-monooxygenase. Phytanoyl-CoA dioxygenase. 14.99.26 2-hydroxypyridine 5-monooxygenase. Leucocyanidin oxygenase. 14.99.27 uglone 3-monooxygenase. Desacetoxyvindoline 4 14.99.28 Linalool 8-monooxygenase. hydroxylase. 50 14.99.29 Deoxyhypusine monooxygenase. Clavaminate synthase. 14.99.30 Caroteine 7,8-desaturase. Anthranilate 1,2-dioxygenase 14.99.31 Myristoyl-CoA 11-(E) desaturase. (deaminating, decarboxylating). 14.99.32 Myristoyl-CoA 11-(Z) desaturase. 2.3 1,2-dioxygenase. 14.99.33 Delta (12)-fatty acid 2.4 3-hydroxy-2- dehydrogenase. methylpyridinecarboxylate dioxygenase. 55 14.99.34 Monoprenyl isoflavone epoxidase. 2.5 5-pyridoxate dioxygenase. 14.99.35 Thiophene-2-carbonyl-CoA 2.7 Phthalate 4,5-dioxygenase. monooxygenase. 2.8 4-Sulfobenzoate 3,4-dioxygenase. 14.99.36 2.9 4-chlorophenylacetate 3,4- Beta-caroteine 15,15'- : monooxygenase. dioxygenase. 2.10 Benzoate 1,2-dioxygenase. 14.99.37 Taxadiene 5-alpha-hydroxylase. 2.11 Toluene dioxygenase. 60 15.1.1 Superoxide dismutase. 2.12 Naphthalene 1,2-dioxygenase. 151.2 Superoxide reductase. 2.13 2-chlorobenzoate 1,2-dioxygenase. .16.1.1 Mercury(II) reductase. 2.14 2-aminobenzenesulfonate 2,3- .16.1.2 Diferric-transferrin reductase. dioxygenase. .16.1.3 Aquacobalamin reductase. 2.15 Terephthalate 1,2-dioxygenase. .16.1.4 Cob(II)alamin reductase. 2.16 2-hydroxyquinoline 5,6- 65 16.15 Aquacobalamin reductase dioxygenase. (NADPH). US 8,119,385 B2 65 66 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. .16.1.6 Cyanocobalamin reductase .14 5-methyltetrahydropteroyltriglutamate-- (cyanide-eliminating). S-methyltransferase. 16.17 Ferric-chelate reductase. 2. 1.15 Fatty-acid O-methyltransferase. .16.1.8 Methionine synthase reductase. 16 Methylene-fatty-acyl-phospholipid .16.3.1 Ferroxidase. synthase. .16.8.1 Cob(II)yrinic acid a,c-diamide 17 Phosphatidylethanolamine N reductase. 10 methyltransferase. 17.1.1 CDP-4-dehydro-6-deoxyglucose 18 Polysaccharide O reductase. methyltransferase. 17.1.2 4-hydroxy-3-methylbut-2-enyl 19 Trimethylsulfonium-- diphosphate reductase. tetrahydrofolate N-methyltransferase. 17.13 Leucoanthocyanidin reductase. 1.2O Glycine N-methyltransferase. .17.1.4 dehydrogenase. 15 3. .21 Methylamine-glutamate N 17.1.5 Nicotinate dehydrogenase. methyltransferase. 17.3.1 Pteridine oxidase. .1.22 Carnosine N-methyltransferase. 17.3.2 . 1.25 Phenol O-methyltransferase. 17.3.3 6-hydroxynicotinate .1.26 odophenol O-methyltransferase. dehydrogenase. 1.27 Tyramine N-methyltransferase. .17.4.1 Ribonucleoside-diphosphate 28 Phenylethanolamine N reductase. melnyltransferase. 1742 Ribonucleoside-triphosphate 29 RNA (cytosine-5-)- reductase. melnyltransferase. 17.4.3 4-hydroxy-3-methylbut-2-en-1-yl 31 RNA (guanine-N(1)-)- diphosphate synthase. melnyltransferase. 17.5. Phenylacetyl-CoA dehydrogenase. 32 RNA (guanine-N(2)-)- 17.99.1 4-cresol dehydrogenase 25 melnyltransferase. (hydroxylating). .33 RNA (guanine-N(7)-)- 17.99.2 Ethylbenzene hydroxylase. melnyltransferase. .18.1. Rubredoxin--NAD(+) reductase. 34 RNA (-2'-O-)- .18.1.2 Ferredoxin-NADP(+) reductase. melnyltransferase. 18.13 Ferredoxin-NAD(+) reductase. 1.35 RNA (uracil-5-)-methyltransferase. .18.1.4 Rubredoxin--NAD(P)(+) reductase. 30 36 RNA (adenine-N(1)-)- .18.6. Nitrogenase. melnyltransferase. 196. Nitrogenase (flavodoxin). 37 DNA (cytosine-5-)- .20.1. Phosphonate dehydrogenase. melnyltransferase. .20.4. Arsenate reductase (glutaredoxin). 2. 38 O-demethylpuromycin O .20.4.2 Methylarsonate reductase. melnyltransferase. 20.98.1 Arsenate reductase (aZurin). 35 1.39 nositol 3-methyltransferase. 20.99.1 Arsenate reductase (donor). .1.40 nositol 1-methyltransferase. .21.3. Sopenicillin-N Synthase. .1.41 Sterol 24-C-methyltransferase. .21.3.2 Columbamine oxidase. .1.42 Luteolin O-methyltransferase. 21.3.3 Reticuline oxidase. 1.43 Histone-lysine N-methyltransferase. .21.3.4 Sulochrin oxidase ((+)-bisdechlorogeodin .1.44 Dimethylhistidine N orming). methyltransferase. 21.3.5 Sulochrin oxidase ((-)-bisdechlorogeodin 40 2. 1.45 Thymidylate synthase. orming). 2. .1.46 soflavone 4'-O-methyltransferase. 21.3.6 Aureusidin synthase. 1.47 (ndolepyruvate C .21.4.1 D-proline reductase (dithiol). methyltransferase. .21.4.2 Glycine reductase. .1.48 rRNA (adenine-N(6)-)- .21.4.3 Sarcosine reductase. methyltransferase. .21.4.4 Betaine reductase. 45 1.49 Amine N-methyltransferase. 21.99.1 Beta-cyclopiazonate dehydrogenase. 3. 1...SO Loganate O-methyltransferase. 97.1.1 Chlorate reductase. S1 rRNA (guanine-N(1)-)- 97.1.2 Pyrogallol hydroxytransferase. methyltransferase. 97.1.3 Sulfur reductase. 2. 52 rRNA (guanine-N(2)-)- 97.14 Formate acetyltransferase activating methyltransferase. enzyme. 50 1.53 Putrescine N-methyltransferase. 97.1.8 Tetrachloroethene reductive dehalogenase. 1.54 Deoxycytidylate C-methyltransferase. 97.1.9 Selenate reductase. 1.55 tRNA (adenine-N(6)-)-methyltransferase. 97.1.10 Thyroxine 5'-deiodinase. 1.56 mRNA (guanine-N(7)-)-methyltransferase. 97.1.11 Thyroxine 5-deiodinase. 57 mRNA (nucleoside-2'-O-)- ENZYME: 2. . . . methyltransferase. 55 59 Cytochrome c-lysine N 2.1.1.1 Nicotinamide N-methyltransferase. methyltransferase. 2.1.1.2 Guanidinoacetate N-methyltransferase. 2. 1.60 Calmodulin-lysine N-methyltransferase. 2.1.1.3 Thetin-homocysteine S-methyltransferase. 61 tRNA (5-methylaminomethyl-2- 2.1.1.4 Acetylserotonin O-methyltransferase. thiouridylate)-methyltransferase. 2.1.1.5 Betaine-homocysteine S-methyltransferase. 62 mRNA (2'-O-methyladenosine-N(6)-)- 2.1.1.6 Catechol O-methyltransferase. methyltransferase. 2.1.1.7 Nicotinate N-methyltransferase. 60 63 Methylated-DNA-protein-cysteine S 2.1.1.8 Histamine N-methyltransferase. methyltransferase. 2.1.1.9 Thiol S-methyltransferase. .64 3-demethylubiquinone-9 3-O- 2.1.1.10 Homocysteine S-methyltransferase. methyltransferase. 2.1.1.11 protoporphyrin IX 1.65 Licodione 2'-O-methyltransferase. methyltransferase. 66 rRNA (adenosine-2'-O-)-methyltransferase. 2.1.1.12 Methionine S-methyltransferase. 65 1.67 Thiopurine S-methyltransferase. 2.1.1.13 Methionine synthase. 68 Caffeate O-methyltransferase. US 8,119,385 B2 67 68 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 69 5-hydroxyfuranocoumarin 5-O- 5 2.1.1.116 3'-hydroxy-N-methyl-(S)-coclaurine 4'-O- methyltransferase. melnyltransferase. 70 8-hydroxyfuranocoumarin 8-O- 2.1.1.117 (S)-scoulerine 9-O-methyltransferase. methyltransferase. 2.1.1.118 Columbamine O-methyltransferase. 71 Phosphatidyl-N-methylethanolamine N 2.1.1.119 10-hydroxydihydrosanguinarine 10-O- methyltransferase. melnyltransferase. 72 Site-specific DNA-methyltransferase 10 2.1.1.120 12-hydroxydihydrochelirubine 12-O- (adenine-specific). melnyltransferase. .74 Methylenetetrahydrofolate--tRNA-(uracil 2.1.1.121 6-O-methylnorlaudanosoline 5'-O- 5-)-methyltransferase (FADH(2)-oxidizing). melnyltransferase. 1.75 Apigenin 4'-O-methyltransferase. 2.1.1.122 (S)-tetrahydroprotoberberine N 3. 1.76 Quercetin 3-O-methyltransferase. melnyltransferase. 77 Protein-L-isoaspartate(D-aspartate) O is 2.1.1.123 Cytochrome-c-methionine S methyltransferase. melnyltransferase. 2. 1.78 Isoorientin 3'-O-methyltransferase. 2.1.1.124 Cytochrome-c-arginine N .79 Cyclopropane-fatty-acyl-phospholipid melnyltransferase. synthase. 2.1.1.125 Histone-arginine N-methyltransferase. 18O Protein-glutamate O-methyltransferase. 2.1.1.126 Myelin basic protein-arginine N 3. 1.82 3-methylquercitin 7-O-methyltransferase. melnyltransferase. .83 3,7-dimethylguercitin 4'-O- 20 2.1.1.127 Ribulose-bisphosphate carboxylase melnyltransferase. ysine N-methyltransferase. 84 Methylguercetagetin 6-O- 2.1.1.128 (RS)-norcoclaurine 6-O- melnyltransferase. methyltransferase. 2. 1.85 Protein-histidine N-methyltransferase. 2.1.1.129 nositol 4-methyltransferase. 86 Tetrahydromethanopterin S 2.1.1.130 Precorrin-2 C(20)-methyltransferase. melnyltransferase. 25 2.1.1.131 Precorrin-3B C(17)-methyltransferase. 2. 1.87 Pyridine N-methyltransferase. 2.1.1.132 Precorrin-6Y C(5,15)-methyltransferase 88 8-hydroxyquercitin 8-O- (decarboxylating). melnyltransferase. 2.1.1133 Precorrin-4 C(11)-methyltransferase. 89 Tetrahydrocolumbamine 2-O- 2.1.1.136 Chlorophenol O-methyltransferase. melnyltransferase. 2.1.1.1.37 Arsenite methyltransferase. 90 Methanol-5- 30 2.1.1.139 3'-demethylstaurosporine O hydroxybenzimidazolylcobamide Co methyltransferase. metnyi transIerase. 2.1.1.140 (S)-coclaurine-N-methyltransferase. .91 sobutyraldoxime O 2.1.1.141 Jasmonate O-methyltransferase. melnyltransferase. 2.1.1.142 Cycloartenol 24-C-methyltransferase. 1.92 Bergaptol O-methyltransferase. 2.1.1.143 24-methylenesterol C-methyltransferase. 3. 1.93 Xanthotoxol O-methyltransferase. as 2.1.1.144 Trans-aconitate 2-methyltransferase. .94 1-O-demethyl-17-O- 2.1.1145 Trans-aconitate 3-methyltransferase. deacetylvindoline O-methyltransferase. 2.1.1.146 (ISO)eugenol O-methyltransferase. 1.9S Tocopherol O-methyltransferase. 2.1.1.147 Corydaline synthase. 3. 1.96 Thioether S-methyltransferase. 2.1.1.148 Thymidylate synthase (FAD). 97 3-hydroxyanthranilate 4-C- 2.1.1.149 Myricetin O-methyltransferase. melnyltransferase. 2.1.1.150 soflavone 7-O-methyltransferase. 2. 1.98 Diphthine synthase. 40 2.1.1.151 Cobalt-factor II C(20)-methyltransferase. .99 6-methoxy-2,3-dihydro-3- 2.1.1.152 Precorrin-6A synthase (deacetylating). hydroxytabersonine N-methyltransferase. 2.1.2. Glycine hydroxymethyltransferase. Protein-S-isoprenylcysteine O 2.1.2.2 Phosphoribosylglycinamide melnyltransferase. ormyltransferase. .1.101 Macrocin O-methyltransferase. 2.1.2.3 Phosphoribosylaminoimidazolecarboxamide 102 Demethylmacrocin O 45 ormyltransferase. melnyltransferase. 2.1.2.4 Glycine formimidoyltransferase. 103 Phosphoethanolamine N 2.1.2.5 Glutamate formimidoyltransferase. melnyltransferase. 2.1.2.7 D-alanine 2 104 Caffeoyl-CoA O hydroxymethyltransferase. melnyltransferase. 2.1.2.8 Deoxycytidylate 5 105 N-benzoyl-4-hydroxyanthranilate 50 hydroxymethyltransferase. 4-O-methyltransferase. 2.1.2.9 Methionyl-tRNA formyltransferase. 1106 Tryptophan 2-C-methyltransferase. 2.1.2.10 Aminomethyltransferase. 107 Uroporphyrin-III C 2.1.2.11 3-methyl-2-oxobutanoate methyltransferase. hydroxymethyltransferase. 108 6-hydroxymellein O 2.1.3.1 Methylmalonyl-CoA methyltransferase. 55 carboxytransferase. 109 Demethylsterigmatocystin 6-O- 2.1.3.2 Aspartate carbamoyltransferase. methyltransferase. 2.1.3.3 Ornithine carbamoyltransferase. 110 Sterigmatocystin 7-O- 2.1.3.5 Oxamate carbamoyltransferase. methyltransferase. 2.1.3.6 Putrescine carbamoyltransferase. .1.111 Anthranilate N-methyltransferase. 2.1.3.7 3-hydroxymethylcephem 112 Glucuronoxylan 4-O- 60 carbamoyltransferase. methyltransferase. 2.1.3.8 Lysine carbamoyltransferase. 113 Site-specific DNA 2.1.4.1 Glycine amidinotransferase. methyltransferase (cytosine-N(4)-specific). 2.1.4.2 Scyllo-inosamine-4-phosphate .114 Hexaprenyldihydroxybenzoate amidinotransferase. methyltransferase. 2.2.1.1 Transketolase. 115 (RS)-1-benzyl-1,2,3,4- 65 2.2.1.2 Transaldolase. tetrahydroisoquinoline N-methyltransferase. 2.2.1.3 Formaldehyde transketolase. US 8,119,385 B2 69 70 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.2.1. Acetoin-ribose-5-phosphate 2.3. 59 Gentamicin 2'-N-acetyltransferase. transaldolase. 2.3. 60 Gentamicin 3'-N-acetyltransferase. 2.2.1. 2-hydroxy-3-oxoadipate synthase. 2.3. 61 Dihydrolipoyllysine-residue 2.2.1. Acetolactate synthase. Succinyltransferase. 2.2.1. 1-deoxy-D-xylulose-5-phosphate 2.3. 62 2-acylglycerophosphocholine O synthase. . 2.2. Fluorothreonine transaldolase. 10 2.3. 63 1-alkylglycerophosphocholine O 23. Amino-acid N-acetyltransferase. acyltransferase. 23. Imidazole N-acetyltransferase. 2.3. .64 Agmatine N(4)- 23. Glucosamine N-acetyltransferase. coumaroyltransferase. 23. Glucosamine 6-phosphate N 2.3. .65 Glycine N-choloyltransferase. acetyltransferase. 2.3. 66 Leucine N-acetyltransferase. 23. Arylamine N-acetyltransferase. 15 2.3. .67 1-alkylglycerophosphocholine O 23. Choline O-acetyltransferase. acetyltransferase. 23. Carnitine O-acetyltransferase. 2.3. 68 Glutamine N-acyltransferase. 23. Phosphate acetyltransferase. 2.3. 69 Monoterpenol O-acetyltransferase. 2.3.1. Acetyl-CoA C-acetyltransferase. 2.3. 70 CDP-acylglycerol O 2.3.1. Hydrogen-sulfide S-acetyltransferase. arachidonoyltransferase. 2.3.1. Thioethanolamine S-acetyltransferase. 2.3. .71 Glycine N-benzoyltransferase. 2.3.1. Dihydrolipoyllysine-residue 2.3. 72 Indoleacetylglucose--inositol O acetyltransferase. acyltransferase. 23. 13 Glycine N-acyltransferase. 2.3. 73 Diacylglycerol--sterol O 23. .14 Glutamine N-phenylacetyltransferase. acyltransferase. 23. 1S Glycerol-3-phosphate O-acyltransferase. 2.3. .74 Naringenin-. 23. 16 Acetyl-CoA C-acyltransferase. 2.3. .75 Long-chain-alcohol O-fatty 23. 17 Aspartate N-acetyltransferase. 25 acyltransferase. 23. 18 Galactoside O-acetyltransferase. 2.3. .76 Retinol O-fatty-acyltransferase. 23. 19 Phosphate butyryltransferase. 2.3. 77 Triacylglycerol--sterol O 23. 2O Diacylglycerol O-acyltransferase. acyltransferase. 23. .21 Carnitine O-palmitoyltransferase. 2.3. Heparan-alpha-glucosaminide N 23. 22 2-acylglycerol O-acyltransferase. acetyltransferase. 23. .23 1-acylglycerophosphocholine O 30 2.3. Maltose O-acetyltransferase. acyltransferase. 2.3. Cysteine-S-conjugate N 23. .24 Sphingosine N-acyltransferase. acetyltransferase. 23. 25 . 2.3. 81 Aminoglycoside N(3)- 23. 26 Sterol O-acyltransferase. acetyltransferase. 23. 27 O-acetyltransferase. 2.3. 82 Aminoglycoside N(6')- 23. 28 Chloramphenicol O-acetyltransferase. 35 acetyltransferase. 23. 29 Glycine C-acetyltransferase. 2.3. .83 Phosphatidylcholine--dolichol O 23. 30 Serine O-acetyltransferase. acyltransferase. 23. 31 Homoserine O-acetyltransferase. 2.3. 84 Alcohol O-acetyltransferase. 23. 32 Lysine N-acetyltransferase. 2.3. .85 Fatty-acid synthase. 23. .33 Histidine N-acetyltransferase. 2.3. 86 Fatty-acyl-CoA synthase. 23. 34 D-tryptophan N-acetyltransferase. 2.3. 87 Aralkylamine N-acetyltransferase. 23. 35 Glutamate N-acetyltransferase. 40 2.3. 88 Peptide alpha-N-acetyltransferase. 23. 36 D-amino-acid N-acetyltransferase. 2.3. 89 Tetrahydrodipicolinate N-acetyltransferase. 23. 37 5-aminolevulinate synthase. 2.3. 90 Beta-glucogallin O-galloyltransferase. 23. 38 Acyl-carrier-protein S-acetyltransferase. 2.3. 91 Sinapoylglucose-choline O 23. 39 Acyl-carrier-protein S Sinapoyltransferase. malonyltransferase. 2.3. 92 Sinapoylglucose-malate O 23. .40 Acyl-acyl-carrier-protein-phospholipid 45 Sinapoyltransferase. O-acyltransferase. 2.3. .93 13-hydroxylupinine O-tigloyltransferase. 23. .41 3-oxoacyl-acyl-carrier-protein synthase. 2.3. .94 . 23. .42 Glycerone-phosphate O-acyltransferase. 2.3. 95 Trihydroxystilbene synthase. 23. 43 Phosphatidylcholine-sterol O 2.3. .96 Glycoprotein N-palmitoyltransferase. acyltransferase. 2.3. 97 Glycylpeptide N-tetradecanoyltransferase. 2.3.1. N-acetylneuraminate 4-O- 50 2.3. .98 Chlorogenate--glucarate O acetyltransferase. hydroxycinnamoyltransferase. 23. 45 N-acetylneuraminate 7-O(or 9-O)- 2.3. .99 Quinate O-hydroxycinnamoyltransferase. acetyltransferase. 2.3.1. OO Myelin-proteolipid O 23. .46 Homoserine O-Succinyltransferase. palmitoyltransferase. 23. 47 8-amino-7-Oxononanoate synthase. 2.3.1. Formylmethanofuran 23. 48 Histone acetyltransferase. 55 etrahydromethanopterin N-formyltransferase. 23. 49 Deacetyl-citrate-(pro-3S)-lyase S 2.3.1. N(6)-hydroxylysine O-acetyltransferase. acetyltransferase. 2.3.1. Sinapoylglucose--Sinapoylglucose O 23. SO Serine C-palmitoyltransferase. Sinapoyltransferase. 23. S1 1-acylglycerol-3-phosphate O 2.3.1. -alkenylglycerophosphocholine O acyltransferase. acyltransferase. 23. 52 2-acylglycerol-3-phosphate O 2.3.1. 05 Alkylglycerophosphate 2-O- acyltransferase. 60 acetyltransferase. 23. 53 Phenylalanine N-acetyltransferase. 2.3.1. Tartronate O 23. 54 Formate C-acetyltransferase. hydroxycinnamoyltransferase. 23. S6 Aromatic-hydroxylamine O 2.3.1. 7-O-deacetylvindoline O acetyltransferase. acetyltransferase. 23. 57 Diamine N-acetyltransferase. 2.3.1. O8 N-acetyltransferase. 23. S8 2,3-diaminopropionate N 65 2.3.1. 09 Arginine N-Succinyltransferase. oxalyltransferase. 2.3.1. 10 Tyramine N-feruloyltransferase. US 8,119,385 B2 71 72 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.3.1. 11 Mycocerosate synthase. 2.3.1.166 2-alpha-hydroxytaxane 2-O- 2.3.1. 12 D-tryptophan N-malonyltransferase. benzoyltransferase. 2.3.1. 13 Anthranilate N-malonyltransferase. 2.3.1.167 0-deacetylbaccatin III 10-O- 2.3.1. 14 3,4-dichloroaniline N-malonyltransferase. acetyltransferase. 2.3.1. 15 soflavone-7-O-beta-glucoside 6"-O- 2.3.1.168 Dihydrolipoyllysine-residue (2- malonyltransferase. methylpropanoyl)transferase. 2.3.1. 16 Flavonol-3-O-beta-glucoside O 10 2.3.1.169 CO-methylating acetyl-CoA synthase. malonyltransferase. 2.3.2.1 D-glutamyltransferase. 2.3.1. 17 2,3,4,5-tetrahydropyridine-2,6- 2.3.2.2 Gamma-glutamyltransferase. icarboxylate N-Succinyltransferase. 2.32.3 . 2.3.1. 18 N-hydroxyarylamine O-acetyltransferase. 2.3.2.4 Gamma-glutamylcyclotransferase. 2.3.1. 19 cosanoyl-CoA synthase. 2.3.2.5 Glutaminyl-peptide cyclotransferase. 2.3.1. 21 -alkenylglycerophosphoethanolamine O 15 2.3.2.6 Leucyltransferase. acyltransferase. 2.3.2.7 Aspartyltransferase. 2.3.1. 22 Trehalose O-mycolyltransferase. 2.3.2.8 . 2.3.1. 23 Dolichol O-acyltransferase. 2.3.2.9 Agaritine gamma-glutamyltransferase. 2.3.1. 25 -alkyl-2-acetylglycerol O 2.3.2.10 UDP-N-acetylmuramoylpentapeptide-lysine acyltransferase. N(6)-alanyltransferase. 2.3.1. 26 socitrate O 2.3.2.11 Alanylphosphatidylglycerol synthase. dihydroxycinnamoyltransferase. 2.3.2.12 Peptidyltransferase. 2.3.1. 27 Ornithine N-benzoyltransferase. 2.3.2.13 Protein-glutamine gamma 2.3.1. 28 Ribosomal-protein-alanine N glutamyltransferase. acetyltransferase. 2.3.2.14 D-alanine gamma-glutamyltransferase. 2.3.1. 29 Acyl-[acyl-carrier-protein--UDP-N- 2.3.2.15 Glutathione gamma acetylglucosamine O-acyltransferase. glutamylcysteinyltransferase. 2.3.1. 30 Galactarate O 25 2.3.3.1 Citrate (Si)-synthase. hydroxycinnamoyltransferase. 2.3.3.2 Decylcitrate synthase. 2.3.1. 31 Glucarate O 2.3.3.3 Citrate (Re)-synthase. hydroxycinnamoyltransferase. 2.3.3.4 Decylhomocitrate synthase. 2.3.1. 32 Glucarolactone O 2.3.3.5 2-methylcitrate synthase. hydroxycinnamoyltransferase. 2.3.3.6 2-ethylmalate synthase. 2.3.1. 33 Shikimate O 30 2.3.3.7 3-ethylmalate synthase. hydroxycinnamoyltransferase. 2.3.3.8 ATP . 2.3.1. 34 Galactolipid O-acyltransferase. 2.3.3.9 . 2.3.1. 35 Phosphatidylcholine--retinol O 2.3.3.10 Hydroxymethylglutaryl-CoA synthase. acyltransferase. 2.3.3.11 2-hydroxyglutarate synthase. 2.3.1. 36 Polysialic-acid O-acetyltransferase. 2.3.3.12 3-propylmalate synthase. 2.3.1. 37 Carnitine O-octanoyltransferase. 35 2.3.3.13 2-isopropylmalate synthase. 2.3.1. 38 Putrescine N 2.3.3.14 . hydroxycinnamoyltransferase. 2.3.3.15 Sulfoacetaldehyde 2.3.1. 39 Ecclysone O-acyltransferase. acetyltransferase. 2.3.1. 40 Rosmarinate synthase. 2.4.1.1 . 2.3.1. 41 Galactosylacylglycerol O 2.4.1.2 Dextrin dextranase. acyltransferase. 2.4.1.4 Amylosucrase. 2.3.1. 42 Glycoprotein O-fatty 40 2.4.15 Dextranslucrase. acyltransferase. 2.41.7 Sucrose phosphorylase. 2.3.1. 43 Beta-glucogallin 2.4.1.8 Maltose phosphorylase. etrakisgalloylglucose O-galloyltransferase. 2.4.1.9 Inulosucrase. 2.3.1. 44 Anthranilate N-benzoyltransferase. 2.4.1.10 LevanSucrase. 2.3.1. 45 Piperidine N-piperoyltransferase. 2.4.1.11 (starch) synthase. 2.3.1. 46 . 45 2.4.1.12 Cellulose synthase (UDP-forming). 2.3.1. 47 Glycerophospholipid arachidonoyl 2.4.1.13 Sucrose synthase. transferase (CoA-independent). 2.4.1.14 Sucrose-phosphate synthase. 2.3.1. 48 Glycerophospholipid acyltransferase 2.41.15 Alpha,alpha-trehalose-phosphate (CoA-dependent). synthase (UDP-forming). 2.3.1. 49 Platelet-activating factor 2.4.1.16 Chitin synthase. acetyltransferase. 50 2.41.17 . 2.3.1. 50 Salutaridinol 7-O-acetyltransferase. 2.4.1.18 1,4-alpha-glucan branching 2.3.1. 51 . enzyme. 2.3.1. 52 Alcohol O-cinnamoyltransferase. 2.4.1.19 Cyclomaltodextrin 2.3.1. 53 Anthocyanin 5-aromatic glucanotransferase. acyltransferase. 2.4.1.20 . 2.3.1. Propionyl-CoA C(2)- 55 2.4.1.21 Starch synthase. trimethyltridecanoyltransferase. 2.4.1.22 . 2.4.1.23 Sphingosine beta 2.3.1. 55 Acetyl-CoA C-myristoyltransferase. . 2.3.1. 56 Phloroisovalerophenone synthase. 2.4.1.24 1,4-alpha-glucan 6-alpha 2.3.1. 57 Glucosamine-1-phosphate N . acetyltransferase. 2.41.25 4-alpha-glucanotransferase. 2.3.1. 58 Phospholipid:diacylglycerol acyltransferase. 60 2.4.1.26 DNA alpha-glucosyltransferase. 2.3.1. 59 . 2.41.27 DNA beta-glucosyltransferase. 2.3.1. 60 . 2.4.1.28 Glucosyl-DNA beta 2.3.1. 61 Lovastatin nonaketide synthase. glucosyltransferase. 2.3.1. 62 Taxadien-5-alpha-ol O-acetyltransferase. 2.4.1.29 Cellulose synthase (GDP-forming). 2.3.1. 63 O-hydroxytaxane O-acetyltransferase. 2.41.30 1,3-beta-oligoglucan 2.3.1. 64 sopenicillin-NN-acyltransferase. 65 phosphorylase. 2.3.1. 65 6-methylsalicylic acid synthase. 2.4.1.31 Laminaribiose phosphorylase. US 8,119,385 B2 73 74 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.4. 32 Glucomannan 4-beta 2.4. 88 Globoside alpha-N- . acetylgalactosaminyltransferase. 2.4. .33 Alginate synthase. 2.4. 90 N-acetylactosamine synthase. 2.4. 34 1,3-beta-glucan synthase. 2.4. 91 Flavonol 3-O-glucosyltransferase. 2.4. 35 Phenol beta-glucosyltransferase. 2.4. 92 (N-acetylneuraminyl)- 2.4. 36 Alpha,alpha-trehalose-phosphate galactosylglucosylceramide N synthase (GDP-forming). 10 acetylgalactosaminyltransferase. 2.4. 37 Fucosylgalactoside 3-alpha 2.4. .94 Protein N-acetylglucosaminyltransferase. galactosyltransferase. 2.4. 95 Bilirubin-glucuronoside 2.4. 38 Beta-N- glucuronosyltransferase. acetylglucosaminylglycopeptide beta-1,4- 2.4. .96 Sn-glycerol-3-phosphate 1 galactosyltransferase. galactosyltransferase. 2.4. 39 Steroid N 15 2.4. 97 ,3-beta-D-glucan phosphorylase. acetylglucosaminyltransferase. 2.4. .99 Sucrose:Sucrose fructosyltransferase. 2.4. .40 Glycoprotein-fucosylgalactoside 2.4.1. OO 2,1-fructan:2,1-fructan 1 alpha-N-acetylgalactosaminyltransferase. ructosyltransferase. 2.4. Polypeptide N 2.4.1. Alpha-1,3-mannosyl-glycoprotein 2-beta acetylgalactosaminyltransferase. N-acetylglucosaminyltransferase. 2.4. 43 Polygalacturonate 4-alpha 2.4.1. O2 Beta-1,3-galactosyl-O-glycosyl galacturonosyltransferase. glycoprotein beta-1,6-N- 2.4.1. Lipopolysaccharide 3-alpha acetylglucosaminyltransferase. galactosyltransferase. 2.4.1. Alizarin 2-beta-glucosyltransferase. 2.4. 45 2-hydroxyacylsphingosine 1-beta 2.4.1. O-dihydroxycoumarin 7-O- galactosyltransferase. glucosyltransferase. 2.4. .46 2-diacylglycerol 3-beta 2.4.1. 05 Vitexin beta-glucosyltransferase. galactosyltransferase. 25 2.4.1. Isovitexin beta-glucosyltransferase. 2.4. 47 N-acylsphingosine 2.4.1. Dolichyl-phosphate-mannose--protein galactosyltransferase. mannosyltransferase. 2.4. 48 Heteroglycan alpha 2.4.1. tRNA-queuosine beta mannosyltransferase. mannosyltransferase. 2.4. 49 Cellodextrin phosphorylase. 2.4.1. 1 Coniferyl-alcohol glucosyltransferase. 2.4. SO Procollagen galactosyltransferase. 30 2.4.1. Alpha-1,4-glucan-protein synthase 2.4. 52 Poly(glycerol-phosphate) alpha (UDP-forming). glucosyltransferase. 2.4.1. Alpha-1,4-glucan-protein synthase 2.4. 53 Poly(ribitol-phosphate) beta (ADP-forming). glucosyltransferase. 2.4.1. 2-coumarate O-beta 2.4. 54 Undecaprenyl-phosphate glucosyltransferase. mannosyltransferase. 35 2.4.1. Anthocyanidin 3-O- 2.4. S6 Lipopolysaccharide N glucosyltransferase. acetylglucosaminyltransferase. 2.4.1. -3-rhamnosylglucoside 5-O- 2.4. 57 Phosphatidylinositol alpha glucosyltransferase. mannosyltransferase. 2.4.1. Dolichyl-phosphate beta 2.4. S8 Lipopolysaccharide glucosyltransferase. glucosyltransferase I. 2.4.1. Cytokinin 7-beta-glucosyltransferase. 2.4. 60 Abequosyltransferase. 40 2.4.1. Dolichyl-diphosphooligosaccharide-- 2.4. 62 Ganglioside galactosyltransferase. protein glycotransferase. 2.4. 63 Linamarin synthase. 2.4.1. 2O Sinapate 1-glucosyltransferase. 2.4. .64 Alpha,alpha-trehalose phosphorylase. 2.4.1. 21 Indole-3-acetate beta 2.4. 6S 3-galactosyl-N-acetylglucosaminide glucosyltransferase. 4-alpha-L-. 2.4.1. 22 Glycoprotein-N-acetylgalactosamine 2.4. 66 Procollagen glucosyltransferase. 45 3-beta-galactosyltransferase. 2.4. .67 Galactinol--raffinose 2.4.1. 23 Inositol 3-alpha-galactosyltransferase. galactosyltransferase. 2.4.1. 25 Sucrose-1,6-alpha-glucan 3(6)-alpha 2.4. 68 Glycoprotein 6-alpha-L- glucosyltransferase. lucosyltransferase. 2.4.1. 26 Hydroxycinnamate 4-beta 2.4. 69 Galactoside 2-alpha-L- glucosyltransferase. lucosyltransferase. 50 2.4.1. 27 Monoterpenol beta 2.4. 70 Poly(ribitol-phosphate) N glucosyltransferase. acetylglucosaminyltransferase. 2.4.1. 28 Scopoletin glucosyltransferase. 2.4. 71 Arylamine glucosyltransferase. 2.4.1. 29 Peptidoglycan glycosyltransferase. 2.4. 73 Lipopolysaccharide glucosyltransferase II. 2.4.1. 30 Dolichyl-phosphate-mannose-- 2.4. .74 Glycosaminoglycan galactosyltransferase. glycolipid alpha-mannosyltransferase. 2.4. 75 UDP-galacturonosyltransferase. 55 2.4.1. 31 Glycolipid 2-alpha 2.4. .78 Phosphopolyprenol glucosyltransferase. mannosyltransferase. 2.4. .79 Galactosylgalactosylglucosylceramide 2.4.1. 32 Glycolipid 3-alpha beta-D-acetylgalactosaminyltransferase. mannosyltransferase. 2.4.1. 33 Xylosylprotein 4-beta 2.4. Ceramide glucosyltransferase. galactosyltransferase. 2.4. Flavone 7-O-beta-glucosyltransferase. 2.4.1. 34 Galactosylxylosylprotein 3-beta 2.4. Galactinol--sucrose galactosyltransferase. 60 galactosyltransferase. 2.4. Dolichyl-phosphate beta-D- 2.4.1. 35 Galactosylgalactosylxylosylprotein 3 mannosyltransferase. beta-glucuronosyltransferase. 2.4. .85 Cyanohydrin beta-glucosyltransferase. 2.4.1. 36 Gallate 1-beta-glucosyltransferase. 2.4. 86 Glucosaminylgalactosylglucosylceramide 2.4.1. 37 Sn-glycerol-3-phosphate 2-alpha beta-galactosyltransferase. galactosyltransferase. 2.4. .87 N-acetylactosaminide 3-alpha 65 2.4.1. 38 Mannotetraose 2-alpha-N- galactosyltransferase. acetylglucosaminyltransferase. US 8,119,385 B2 75 76 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.4.1. 39 Maltose synthase. 2.4.1. 81 Hydroxyanthraquinone 2.4.1. 40 Alternanslucrase. glucosyltransferase. 2.4.1. 41 N-acetylglucosaminyldiphosphodolichol 2.4.1. 82 Lipid-A-disaccharide synthase. N-acetylglucosaminyltransferase. 2.4.1. 83 Alpha-1,3-glucan synthase. 2.4.1. 42 Chitobiosyldiphosphodolichol beta 2.4.1. 84 Galactolipid galactosyltransferase. mannosyltransferase. 2.4.1. 85 Flavanone 7-O-beta-glucosyltransferase. 2.4.1. 43 Alpha-1,6-mannosyl-glycoprotein 2 10 2.4.1. 86 Glycogenin glucosyltransferase. beta-N-acetylglucosaminyltransferase. 2.4.1. 87 N 2.4.1. 44 Beta-1,4-mannosyl-glycoprotein 4-beta acetylglucosaminyldiphosphoundecaprenol N-acetyl N-acetylglucosaminyltransferase. beta-D-mannosaminyltransferase. 2.4.1. 45 Alpha-1,3-mannosyl-glycoprotein 4 2.4.1. 88 N beta-N-acetylglucosaminyltransferase. acetylglucosaminyldiphosphoundecaprenol 2.4.1. 46 Beta-1,3-galactosyl-O-glycosyl 15 glucosyltransferase. glycoprotein beta-1,3-N- 2.4.1. 89 Luteolin 7-O-glucuronosyltransferase. acetylglucosaminyltransferase. 2.4.1. 90 Luteolin-7-O-glucuronide 7-O- 2.4.1. 47 Acetylgalactosaminyl-O-glycosyl glucuronosyltransferase. glycoprotein beta-1,3-N- 2.4.1. 91 Luteolin-7-O-diglucuronide 4'-O- acetylglucosaminyltransferase. glucuronosyltransferase. 2.4.1. 48 Acetylgalactosaminyl-O-glycosyl 2.4.1. 92 Nuatigenin 3-beta glycoprotein beta-1,6-N- glucosyltransferase. acetylglucosaminyltransferase. 2.4.1. 93 Sarsapogenin 3-beta 2.4.1. 49 N-acetylactosaminide beta-1,3-N- glucosyltransferase. acetylglucosaminyltransferase. 2.4.1. 94 4-hydroxybenzoate 4-O-beta-D- 2.4.1. 50 N-acetylactosaminide beta-1,6-N- glucosyltransferase. acetylglucosaminyl-transferase. 2.4.1. 95 Thiohydroximate beta-D- 2.4.1. 52 4-galactosyl-N-acetylglucosaminide 3 25 glucosyltransferase. alpha-L-fucosyltransferase. 2.4.1. 96 Nicotinate glucosyltransferase. 2.4.1. 53 Dolichyl-phosphate alpha-N- 2.4.1. 97 High-mannose-oligosaccharide beta acetylglucosaminyltransferase. 14-N-acetylglucosaminyltransferase. 2.4.1. Globotriosylceramide beta-1,6-N- 2.4.1. 98 Phosphatidylinositol N acetylgalactosaminyl-transferase. acetylglucosaminyltransferase. 2.4.1. 55 Alpha-1,6-mannosyl-glycoprotein 6 30 2.4.1. 99 Beta-mannosylphosphodecaprenol beta-N-acetylglucosaminyltransferase. mannooligosaccharide 6-mannosyltransferase. 2.4.1. 56 Indolylacetyl-myo-inositol 2.4. 2O1 Alpha-1,6-mannosyl-glycoprotein 4 galactosyltransferase. beta-N-acetylglucosaminyltransferase. 2.4.1. 57 1,2-diacylglycerol 3-glucosyltransferase. 2.4. 2O2 2,4-dihydroxy-7-methoxy-2H-1,4- 2.4.1. 58 13-hydroxydocosanoate 13-beta benzoxazin-3 (4H)-one 2-D-glucosyltransferase. glucosyltransferase. 35 2.4. 2O3 Trans-zeatin O-beta-D- 2.4.1. 59 Flavonol-3-O-glucoside L glucosyltransferase. rhamnosyltransferase. 2.4. 2OS Galactogen 6-beta 2.4.1. 60 Pyridoxine 5'-O-beta-D- galactosyltransferase. glucosyltransferase. 2.4. 2O6 Lactosylceramide 1,3-N-acetyl-beta 2.4.1. 61 Oligosaccharide 4-alpha-D- D-glucosaminyltransferase. glucosyltransferase. 2.4. 2O7 Xyloglucan:Xyloglucosyltransferase. 2.4.1. 62 Aldose beta-D-fructosyltransferase. 40 2.4. 2O8 Diglucosyl diacylglycerol synthase. 2.4.1. 63 Beta-galactosyl-N- 2.4. 209 Cis-p-coumarate glucosyltransferase. acetylglucosaminylgalactosylglucosyl-ceramide beta 2.4. 210 Limonoid glucosyltransferase. 1,3-acetylglucosaminyltransferase. 2.4. 211 ,3-beta-galactosyl-N- 2.4.1. 64 Galactosyl-N- acetylhexosamine phosphorylase. acetylglucosaminylgalactosylglucosyl-ceramide beta 2.4. 212 . 1,6-N-acetylglucosaminyltransferase. 45 2.4. 213 Glucosylglycerol-phosphate synthase. 2.4.1. 65 N 2.4. .214 Glycoprotein 3-alpha-L- acetylneuraminylgalactosylglucosylceramide beta-1,4- fucosyltransferase. N-acetylgalactosaminyltransferase. 2.4. 215 Cis-zeatin O-beta-D- 2.4.1. 66 Raffinose-raffinose alpha glucosyltransferase. galactosyltransferase. 2.4. 216 Trehalose 6-phosphate 2.4.1. 67 Sucrose 6 (F)-alpha-galactosyltransferase. 50 phosphorylase. 2.4.1. 68 Xyloglucan 4-glucosyltransferase. 2.4. 217 Mannosyl-3-phosphoglycerate 2.4.1. 70 Isoflavone 7-O-glucosyltransferase. synthase. 2.4.1. 71 Methyl-ONN-azoxymethanol beta-D- 2.4. 218 glucosyltransferase. glucosyltransferase. 2.4. 219 Vomilenine glucosyltransferase. 2.4.1. 72 Salicyl-alcohol beta-D- 2.4. 220 (ndoxyl-UDPG glucosyltransferase. glucosyltransferase. 55 2.4. 221 Peptide-O-fucosyltransferase. 2.4.1. 73 Sterol 3-beta-glucosyltransferase. 2.4. 222 O-fucosylpeptide 3-beta-N- 2.4.1. 74 Glucuronylgalactosylproteoglycan 4-beta acetylglucosaminyltransferase. N-acetylgalactosaminyltransferase. 2.4. 223 Glucuronyl-galactosyl-proteoglycan 2.4.1. 75 Glucuronosyl-N-acetylgalactosaminyl 4-alpha-N-acetylglucosaminyltransferase. proteoglycan 4-beta-N- 2.4. .224 Glucuronosyl-N- acetylgalactosaminyltransferase. acetylglucosaminyl-proteoglycan 4-alpha-N- 2.4.1. 76 Gibberellin beta-D-glucosyltransferase. 60 acetylglucosaminyltransferase. 2.4.1. 77 Cinnamate beta-D-glucosyltransferase. 2.4. 225 N-acetylglucosaminyl-proteoglycan 2.4.1. 78 Hydroxymandelonitrile 4-beta-glucuronosyltransferase. glucosyltransferase. 2.4. 226 N-acetylgalactosaminyl 2.4.1. 79 Lactosylceramide beta-1,3- proteoglycan 3-beta-glucuronosyltransferase. galactosyltransferase. 2.4. 227 Undecaprenyldiphospho 2.4.1. Lipopolysaccharide N 65 muramoylpentapeptide beta-N- acetylmannosaminouronosyltransferase. acetylglucosaminyltransferase. US 8,119,385 B2 77 78 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.41.228 Lactosylceramide4-alpha 2.499.10 Neolactotetraosylceramide alpha-2,3- galactosyltransferase. sialyltransferase. 24.1229 Skp1-protein-hydroxyproline N 2.499.11 Lactosylceramide alpha-2,6-N- acetylglucosaminyltransferase. sialyltransferase. 24.1230 Koibiose phosphorylase. 2.5.1.1 Dimethylallyltranstransferase. 2.4.1.231 Alpha,alpha-trehalose phosphorylase 2.5.1.2 Thiamine pyridinylase. (configuration-retaining). 10 2.5.1.3 Thiamine-phosphate diphosphorylase. 24.1.232 Initiation-specific alpha-1,6- 2.5.1.4 Adenosylmethionine cyclotransferase. mannosyltransferase. 2.5.1.5 Galactose-6-sulfurylase. 2.4.2.1 Purine-nucleoside phosphorylase. 2.5.1.6 Methionine adenosyltransferase. 2.4.2.2 Pyrimidine-nucleoside phosphorylase. 2.5.1.7 UDP-N-acetylglucosamine 1 2.4.2.3 Uridine phosphorylase. carboxyvinyltransferase. 2.4.2.4 . 15 2.5.1.8 tRNA isopentenyltransferase. 24.25 Nucleoside ribosyltransferase. 2.5.19 synthase. 2.4.2.6 Nucleoside deoxyribosyltransferase. 2.5.1.10 . 24.2.7 Adenine phosphoribosyltransferase. 2.5.1.11 Trans-octaprenyltranstransferase. 2.4.2.8 2.5.1.15 . phosphoribosyltransferase. 2.5.1.16 . 2.4.2.9 Uracil phosphoribosyltransferase. 2.5.1.17 Cob(I)yrinic acid a,c-diamide 2.4.2.10 Orotate phosphoribosyltransferase. adenosyltransferase. 2.4.2.11 Nicotinate 2.5.1.18 Glutathione transferase. phosphoribosyltransferase. 2.5.1.19 3-phosphoshikimate 1 2.4.2.12 Nicotinamide carboxyvinyltransferase. phosphoribosyltransferase. 2.5.1.20 Rubber cis-polyprenylcistransferase. 2.4.2.14 Amidophosphoribosyltransferase. 2.5.1.21 Farnesyl-diphosphate 24.2.15 Guanosine phosphorylase. 25 . 2.4.2.16 Urate-ribonucleotide phosphorylase. 2.5.1.22 . 24.2.17 ATP phosphoribosyltransferase. 2.5.1.23 Sym-norspermidine synthase. 2.4.2.18 2.5.1.24 Discadenine synthase. phosphoribosyltransferase. 2.5.1.25 tRNA-uridine 24.2.19 Nicotinate-nucleotide diphosphorylase aminocarboxypropyltransferase. (carboxylating). 30 2.5.1.26 Alkylglycerone-phosphate synthase. 2.4.2.20 Dioxotetrahydropyrimidine 2.5.1.27 Adenylate dimethylallyltransferase. phosphoribosyltransferase. 2.5.1.28 Dimethylallylcistransferase. 2.4.2.21 Nicotinate-nucleotide-- 2.5.1.29 . dimethylbenzimidazole phosphoribosyltransferase. 2.5.1.30 Trans-hexaprenyltranstransferase. 2.4.2.22 Xanthine phosphoribosyltransferase. 2.5.1.31 Di-transpoly-cis 2.4.2.23 Deoxyuridine phosphorylase. 35 decaprenylcistransferase. 2.4.2.24 4-beta-D-xylan synthase. 2.5.1.32 Geranylgeranyl-diphosphate 24.2.25 Flavone apiosyltransferase. geranylgeranyltransferase. 2.4.2.26 Protein . 2.5.1.33 Trans-pentaprenyltranstransferase. 24.2.27 dTDP-dihydrostreptose--streptidine-6- 2.5.134 Tryptophan dimethylallyltransferase. phosphate dihydrostreptosyltransferase. 2.5.1.35 Aspulvinone dimethylallyltransferase. 2.4.2.28 S-methyl-5-thioadenosine phosphorylase. 2.5.1.36 Trihydroxypterocarpan 24.2.29 Queuine tRNA-ribosyltransferase. 40 dimethylallyltransferase. 24.2.30 NAD(+) ADP-ribosyltransferase. 2.5.1.38 Isonocardicin synthase. 2.4.2.31 NAD(P)(+)--arginine ADP 2.5.1.39 4-hydroxybenzoate ribosyltransferase. nonaprenyltransferase. 2.4.2.32 Dolichyl-phosphate D-xylosyltransferase. 2.5.141 Phosphoglycerol 242.33 Dolichyl-xylosyl-phosphate-protein geranylgeranyltransferase. xylosyltransferase. 45 2.5.1.42 Geranylgeranylglycerol-phosphate 2.4.2.34 Indolylacetylinositol . geranylgeranyltransferase. 24.2.35 Flavonol-3-O-glycoside xylosyltransferase. 2.5.1.43 . 24.2.36 NAD(+)--diphthamide ADP 2.5.1.44 . ribosyltransferase. 2.5.1.45 Homospermidine synthase (spermidine 24.2.37 NAD(+)--dinitrogen-reductase ADP-D- specific). ribosyltransferase. 50 2.5.146 . 24.238 Glycoprotein 2-beta-D-xylosyltransferase. 2.S.147 . 24.2.39 Xyloglucan 6-Xylosyltransferase. 2.5.148 CyStathionine gamma-synthase. 2.4.2.40 Zeatin O-beta-D-xylosyltransferase. 2.5.1.49 O-acetylhomoserine 24.99.1 Beta-galactoside alpha-2,6-Sialyltransferase. aminocarboxypropyltransferase. 24.99.2 MonoSialoganglioside sialyltransferase. 2.5.1. SO Zeatin 9-aminocarboxyethyltransferase. 24.99.3 Alpha-N-acetylgalactosaminide alpha-2,6- 55 2.5.1.51 Beta-pyrazolylalanine synthase. sialyltransferase. 2.5.1.52 L-mimosine synthase. 24.99.4 Beta-galactoside alpha-2,3-Sialyltransferase. 2.5.1.53 Uracilylalanine synthase. 2.5.1.54 3-deoxy-7-phosphoheptulonate synthase. 24.99.5 Galactosyldiacylglycerol alpha-2,3- 2.5.1.55 3-deoxy-8-phosphooctulonate synthase. sialyltransferase. 2.5.1.56 N-acetylneuraminate synthase. 24.99.6 N-acetylactosaminide alpha-2,3- 2.5.1.57 N-acylneuraminate-9-phosphate synthase. sialyltransferase. 60 2.5.1.58 Protein farnesyltransferase. 24.99.7 (Alpha-N-acetylneuraminyl-2,3-beta 2.5.1.59 Protein geranylgeranyltransferase type I. galactosyl-1,3)-N-acetyl-galactosaminide 6-alpha 2.5.1.60 Protein geranylgeranyltransferase type II. sialyltransferase. 2.5.1.61 Hydroxymethylbilane synthase. 24.99.8 Alpha-N-acetylneuraminate alpha-2,8- 2.5.1.62 synthase. sialyltransferase. 2.5.1.63 Adenosyl-fluoride synthase. 24.99.9 Lactosylceramide alpha-2,3- 65 2.5.1.64 2-succinyl-6-hydroxy-2,4-cyclohexadiene sialyltransferase. 1-carboxylate synthase. US 8,119,385 B2 79 80 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.6. Aspartate transaminase. 2.6. .67 2-aminohexanoate transaminase. 2.6. Alanine transaminase. 2.6. 68 Ornithine(lysine) transaminase. 2.6. Cysteine transaminase. 2.6. 70 Aspartate--phenylpyruvate transaminase. 2.6. Glycine transaminase. 2.6. .71 Lysine-pyruvate 6-transaminase. 2.6. transaminase. 2.6. 72 D-4-hydroxyphenylglycine transaminase. 2.6. Leucine transaminase. 2.6. 73 Methionine-glyoxylate transaminase. 2.6. Kynurenine-oxoglutarate transaminase. 10 2.6. .74 Cephalosporin-C transaminase. 2.6. 2,5-diaminovalerate transaminase. 2.6. .75 Cysteine-conjugate transaminase. 2.6.1. Histidinol-phosphate transaminase. 2.6. .76 Diaminobutyrate-2-oxoglutarate 2.6.1. Acetylornithine transaminase. transaminase. 2.6.1. Alanine-oxo-acid transaminase. 2.6. 77 Taurine-pyruvate aminotransferase. 2.6.1. Ornithine-oxo-acid transaminase. 2.6. 3.1 Oximinotransferase. 2.6.1. Asparagine-oxo-acid transaminase. 15 2.6. 99.1 dATP(dGTP)--DNA purinetransferase. 2.6.1. Glutamine-pyruvate transaminase. 2.7. Hexokinase. 2.6.1. Glutamine-fructose-6-phosphate 2.7. Glucokinase. transaminase (isomerizing). 2.7. Ketohexokinase. 2.6. 17 Succinyldiaminopimelate transaminase. 2.7. Fructokinase. 2.6. 18 Beta-alanine-pyruvate transaminase. 2.7. Rhamnulokinase. 2.6. 19 4-aminobutyrate transaminase. 2.7. Galactokinase. 2.6. .21 D-alanine transaminase. 2.7. Mannokinase. 2.6. 22 (S)-3-amino-2-methylpropionate 2.7. Glucosamine kinase. transaminase. 2.7. 1O Phosphoglucokinase. 2.6. .23 4-hydroxyglutamate transaminase. 2.7. .11 6-phosphofructokinase. 2.6. .24 Diiodotyrosine transaminase. 2.7. .12 Gluconokinase. 2.6. 26 Thyroid-hormone transaminase. 2.7. 13 Dehydrogluconokinase. 2.6. 27 Tryptophan transaminase. 25 2.7. .14 Sedoheptulokinase. 2.6. 28 Tryptophan-phenylpyruvate 2.7. 1S Ribokinase. transaminase. 2.7. 16 Ribulokinase. 2.6. 29 Diamine transaminase. 2.7. 17 Xylulokinase. 2.6. 30 Pyridoxamine-pyruvate transaminase. 2.7. 18 Phosphoribokinase. 2.6. 31 Pyridoxamine--Oxaloacetate 2.7. 19 Phosphoribulokinase. transaminase. 30 2.7. 2O Adenosine kinase. 2.6. 32 Valine-3-methyl-2-oxovalerate 2.7. .21 Thymidine kinase. transaminase. 2.7. 22 Ribosylnicotinamide kinase. 2.6. .33 dTDP-4-amino-4,6-dideoxy-D-glucose 2.7. 23 NAD(+) kinase. transaminase. 2.7. .24 Dephospho-CoA kinase. 2.6. 34 UDP-2-acetamido-4-amino-2,4,6- 2.7. 25 Adenylyl-sulfate kinase. trideoxyglucose transaminase. 35 2.7. 26 Riboflavin kinase. 2.6. 35 Glycine-oxaloacetate transaminase. 2.7. 27 Erythritol kinase. 2.6. 36 L-lysine 6-transaminase. 2.7. 28 Triokinase. 2.6. 37 2-aminoethylphosphonate-pyruvate 2.7. 29 Glycerone kinase. transaminase. 2.7. 30 Glycerol kinase. 2.6. 38 Histidine transaminase. 2.7. 31 Glycerate kinase. 2.6. 39 2-aminoadipate transaminase. 2.7. 32 Choline kinase. 2.6. .40 (R)-3-amino-2-methylpropionate-- 40 2.7. .33 Pantothenate kinase. pyruvate transaminase. 2.7. 34 Pantetheline kinase. 2.6. .41 D-methionine-pyruvate transaminase. 2.7. 35 Pyridoxal kinase. 2.6. .42 Branched-chain-amino-acid 2.7. 36 . transaminase. 2.7. 37 Protein kinase. 2.6. 43 Aminolevulinate transaminase. 2.7. 38 Phosphorylase kinase. 2.6.1. Alanine-glyoxylate transaminase. 45 2.7. 39 Homoserine kinase. 2.6. 45 Serine-glyoxylate transaminase. 2.7. 40 Pyruvate kinase. 2.6. .46 Diaminobutyrate--pyruvate 2.7. 41 Glucose-1-phosphate transaminase. phosphodismutase. 2.6. 47 Alanine-oxomalonate transaminase. 2.7. 42 Riboflavin phosphotransferase. 2.6. 48 5-aminovalerate transaminase. 2.7. 43 Glucuronokinase. 2.6. 49 Dihydroxyphenylalanine transaminase. 50 2.7. .44 Galacturonokinase. 2.6. SO Glutamine-Scyllo-inositol 2.7. 45 2-dehydro-3-deoxygluconokinase. transaminase. 2.7. 46 L-arabinokinase. 2.6. S1 Serine-pyruvate transaminase. 2.7. 47 D-ribulokinase. 2.6. 52 Phosphoserine transaminase. 2.7. 48 Uridine kinase. 2.6. 54 Pyridoxamine-phosphate transaminase. 2.7. 49 Hydroxymethylpyrimidine kinase. 2.6. 55 Taurine-2-oxoglutarate transaminase. 55 2.7. SO Hydroxyethylthiazole kinase. 2.6. S6 D-1-guanidino-3-amino-1,3-dideoxy 2.7. S1 L-fuculokinase. Scyllo-inositol transaminase. 2.7. 52 Fucokinase. 2.6. 57 Aromatic-amino-acid transaminase. 2.7. 53 L-xylulokinase. 2.6. S8 Phenylalanine(histidine) transaminase. 2.7. 54 D-arabinokinase. 2.6. 59 dTDP-4-amino-4,6-dideoxygalactose transaminase. 2.7. 55 Allose kinase. 2.6. 60 Aromatic-amino-acid-glyoxylate 60 2.7. S6 1-phosphofructokinase. transaminase. 2.7. S8 2-dehydro-3-deoxygalactonokinase. 2.6. 62 Adenosylmethionine-8-amino-7- 2.7. 59 N-acetylglucosamine kinase. Oxononanoate transaminase. 2.7. 60 N-acylmannosamine kinase. 2.6. 63 Kynurenine-glyoxylate transaminase. 2.7. 61 Acyl-phosphate--hexose 2.6. .64 Glutamine-phenylpyruvate transaminase. phosphotransferase. 2.6. N(6)-acetyl-beta-lysine transaminase. 65 2.7. 62 Phosphoramidate--hexose 2.6. Valine-pyruvate transaminase. phosphotransferase. US 8,119,385 B2 81 82 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.7. 63 Polyphosphate--glucose 2.7.1.1.37 Phosphatidylinosito 3-kinase. phosphotransferase. 2.7.1.138 Ceramide kinase. 2.7. nositol 3-kinase. 2.7.1.140 nositol-tetrakisphosphate 5-kinase. 2.7. Scyllo-inosamine 4-kinase. 2.7.1.141 RNA-polymerase-subunit kinase. 2.7. Undecaprenol kinase. 2.7.1.142 Glycerol-3-phospha e--glucose 2.7. -phosphatidylinositol 4-kinase. phosphotransferase. 2.7. -phosphatidylinositol-4-phosphate 10 2.7.1.143 Diphosphate-purine nucleoside kinase. 5-kinase. 2.7.1.144 Tagatose-6-phospha e kinase. 2.7. 69 Protein-N(pi)-phosphohistidine--sugar 2.7.1.145 Deoxynucleoside kinase. phosphotransferase. 2.7.1.146 ADP-specific phosp hofructokinase. 2.7. 71 Shikimate kinase. 2.7.1.147 ADP-specific glucokinase. 2.7. 72 Streptomycin 6-kinase. 2.7.1.148 4-(cytidine 5'-diphospho)-2-C-methyl 2.7. 73 nosine kinase. 15 D-erythritol kinase. 2.7. .74 Deoxycytidine kinase. 2.7.1.149 -phosphatidylinosi ol-5-phosphate 4 2.7. .76 Deoxyadenosine kinase. kinase. 2.7. 77 Nucleoside phosphotransferase. 2.7.1.1SO -phosphatidylinosi ol-3-phosphate 5 2.7. .78 Polynucleotide 5'-hydroxy-kinase. kinase. 2.7. .79 Diphosphate-glycerol phosphotransferase. 2.7.1.151 nositol-polyphosphate multikinase. 2.7. 8O Diphosphate--serine phosphotransferase. 2.7.1.153 Phosphatidylinosito -4,5-bisphosphate 3 2.7. 81 Hydroxylysine kinase. kinase. 2.7. 82 Ethanolamine kinase. 2.7.1.154 Phosphatidylinosito -4-phosphate 3 2.7. .83 Pseudouridine kinase. kinase. 2.7. 84 Alkylglycerone kinase. 2.7.1.15S Diphosphoinositol-pentakisphosphate 2.7. .85 Beta-glucoside kinase. kinase. 2.7. 86 NADH kinase. 2.7.1.156 Adenosylcobinamide kinase. 2.7. .87 Streptomycin 3"-kinase. 25 2.72. 2.7. 88 Dihydrostreptomycin-6-phosphate 3'-alpha 27.2.2 &l&SC. 2.7.2.3 Phosphoglycerate kinase. 2.7. 89 Thiamine kinase. 27.2.4 Aspartate kinase. 2.7. 90 Diphosphate--fructose-6-phosphate 1 2.72.6 Formate kinase. phosphotransferase. 2.72.7 Butyrate kinase. 2.7. .91 Sphinganine kinase. 30 2.72.8 Acetylglutamate kinase. 2.7. 92 5-dehydro-2-deoxygluconokinase. 27.2.10 Phosphoglycerate kinase (GTP). 2.7. .93 Alkylglycerol kinase. 27.2.11 Glutamate 5-kinase. 2.7. .94 Acylglycerol kinase. 2.7212 Acetate kinase (diphosphate). 2.7. 95 Kanamycin kinase. 2.7.2.13 Glutamate 1-kinase. 2.7. .99 Pyruvate dehydrogenase (lipoamide) kinase. 27.2.14 Branched-chain-fatty-acid kinase. 2.7.1. S-methyl-5-thioribose kinase. 35 2.7.3.1 Guanidinoacetate kinase. 2.7.1. Tagatose kinase. 2.7.3.2 Creatine kinase. 2.7.1. Hamamelose kinase. 2.7.3.3 Arginine kinase. 2.7.1. Viomycin kinase. 2.73.4 Taurocyamine kinase. 2.7.1. Diphosphate--protein phosphotransferase. 2.73.5 Lombricine kinase. 2.7.1. 6-phosphofructo-2-kinase. 2.7.3.6 Hypotaurocyamine kinase. 2.7.1. Glucose-1,6-bisphosphate synthase. 2.73.7 Opheline kinase. 2.7.1. . 40 2.7.3.8 Ammonia kinase. 2.7.1. Dolichol kinase. 2.7.3.9 Phosphoenolpyruvate--protein 2.7.1. Hydroxymethylglutaryl-CoA reductase phosphotransferase. (NADPH) kinase. 2.7.3.10 Agmatine kinase. 2.7.1. 10 Dephospho-reductase kinase kinase. 2.73.11 Protein-histidine pros-kinase. 2.7.1. 12 Protein-tyrosine kinase. 2.7.3.12 Protein-histidine tele-kinase. 2.7.1. 13 Deoxyguanosine kinase. 45 2.74.1 Polyphosphate kinase. 2.7.1. 14 AMP--thymidine kinase. 2.74.2 . 2.7.1. 15 3-methyl-2-oxobutanoate dehydrogenase 27.4.3 Adenylate kinase. (lipoamide) kinase. 2.7.4.4 Nucleoside-phosphate kinase. 2.7.1. 16 Isocitrate dehydrogenase (NADP+) kinase. 2.74.6 Nucleoside-diphosphate kinase. 2.7.1. 17 Myosin light-chain kinase. 2.74.7 Phosphomethylpyrimidine kinase. 2.7.1. 18 ADP--thymidine kinase. 50 2.74.8 Guanylate kinase. 2.7.1. 19 Hygromycin-B kinase. 2.74.9 dTMP kinase. 2.7.1. 2O Caldesmon kinase. 2.74.10 Nucleoside-triphosphate--adenylate 2.7.1. 21 Phosphoenolpyruvate-glycerone kinase. phosphotransferase. 27.4.11 (Deoxy)adenylate kinase. 2.7.1. 22 Xylitol kinase. 2.7412 T(2)-induced deoxynucleotide kinase. 2.7.1. 23 Calcium calmodulin-dependent protein 55 2.74.13 (Deoxy) nucleoside-phosphate kinase. kinase. 2.7.4.14 Cytidylate kinase. 2.7.1. 24 Tyrosine 3-monooxygenase kinase. 2.74.15 Thiamine-diphosphate kinase. 2.7.1. 25 Rhodopsin kinase. 2.74.16 2.7.1. 26 Beta-adrenergic-receptor kinase. Thiamine-phosphate kinase. 2.74.17 3-phosphoglyceroyl-phosphate-- 2.7.1. 27 nositol-trisphosphate 3-kinase. 2.7.1. 28 Acetyl-CoA carboxylase kinase. polyphosphate phosphotransferase. 2.7.1. 29 Myosin heavy-chain kinase. 60 2.74.18 Farnesyl-diphosphate kinase. 2.7.1. 30 Tetraacyldisaccharide 4'-kinase. 27.4.19 5-methyldeoxycytidine-5'-phosphate 2.7.1. 31 Low-density lipoprotein receptor kinase. kinase. 2.74.2O Dolichyl-diphosphate-polyphosphate 2.7.1. 32 Tropomyosin kinase. phosphotransferase. 2.7.1. 34 nositol-tetrakisphosphate 1-kinase. 2.74.21 Inositol-hexakisphosphate kinase. 2.7.1. 35 Tau protein kinase. 65 2.76.1 Ribose-phosphate diphosphokinase. 2.7.1. 36 Macrollide 2'-kinase. 2.76.2 Thiamine diphosphokinase. US 8,119,385 B2 83 84 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.76.3 2-amino-4-hydroxy-6- 2.7.7.58 (2,3-dihydroxybenzoyl)adenylate hydroxymethyldihydropteridine diphosphokinase. synthase. 27.6.4 Nucleotide diphosphokinase. 2.7.7.59 Protein-PII uridylyltransferase. 2.76.5 GTP diphosphokinase. 2.7.7.60 2-C-methyl-D-erythritol 4-phosphate 2.77.1 Nicotinamide-nucleotide cytidylyltransferase. adenylyltransferase. 2.7.7.61 Holo-ACP synthase. 2.77.2 FMN adenylyltransferase. 10 2.7.7.62 Adenosylcobinamide-phosphate 2.7.7.3 Pantetheine-phosphate adenylyltransferase. guanylyltransferase. 2.7.7.4 Sulfate adenylyltransferase. 2.78.1 Ethanolaminephosphotransferase. 2.77.5 Sulfate adenylyltransferase (ADP). 2.78.2 Diacylglycerol 2.7.7.6 DNA-directed RNA polymerase. cholinephosphotransferase. 2.77.7 DNA-directed DNA polymerase. 2.78.3 Ceramide cholinephosphotransferase. 2.77.8 Polyribonucleotide . 15 2.78.4 Serine-phosphoethanolamine synthase. 2.7.7.9 UTP-glucose-1-phosphate 2.78.5 CDP-diacylglycerol-glycerol-3-phosphate uridylyltransferase. 3-phosphatidyltransferase. 2.77.10 UTP-hexose-1-phosphate 2.78.6 Undecaprenyl-phosphate galactose uridylyltransferase. phosphotransferase. 2.7.7.11 UTP--xylose-1-phosphate 2.78.7 Holo-acyl-carrier-protein synthase. uridylyltransferase. 2.78.8 CDP-diacylglycerol--serine O 2.77.1.2 UDP-glucose-hexose-1-phosphate phosphatidyltransferase. uridylyltransferase. 2.78.9 Phosphomannan 2.77.13 Mannose-1-phosphate guanylyltransferase. mannosephosphotransferase. 2.77.14 Ethanolamine-phosphate 2.78.10 Sphingosine cholinephosphotransferase. cytidylyltransferase. 2.78.11 CDP-diacylglycerol--inositol 3 2.7.7.15 Choline-phosphate cytidylyltransferase. phosphatidyltransferase. 2.77.18 Nicotinate-nucleotide adenylyltransferase. 25 2.78.12 CDP-glycerol glycerophosphotransferase. 2.7.7.19 Polynucleotide adenylyltransferase. 2.7.8.13 Phospho-N-acetylmuramoyl-pentapeptide 2.7.7.21 RNA cytidylyltransferase. transferase. 2.7.7.22 Mannose-1-phosphate guanylyltransferase 2.78.14 CDP-ribitol ribitolphosphotransferase. (GDP). 2.78.15 UDP-N-acetylglucosamine--dolichyl 2.7.7.23 UDP-N-acetylglucosamine phosphate N-acetylglucosaminephosphotransferase. diphosphorylase. 30 2.7.8.17 UDP-N-acetylglucosamine--lysosomal 2.7.7.24 Glucose-1-phosphate enzyme N-acetylglucosaminephosphotransferase. hymidylyltransferase. 2.7.8.18 UDP-galactose-UDP-N-acetylglucosamine 2.77.25 RNA adenylyltransferase. galactose phosphotransferase. 2.77.27 Glucose-1-phosphate adenylyltransferase. 2.7.8.19 UDP-glucose--glycoprotein glucose 2.7.7.28 Nucleoside-triphosphate-aldose 1 phosphotransferase. phosphate nucleotidyltransferase. 35 2.7.8.20 Phosphatidylglycerol--membrane 2.7.7.30 Fucose-1-phosphate guanylyltransferase. oligosaccharide glycerophosphotransferase. 2.7.7.31 DNA nucleotidylexotransferase. 2.78.21 Membrane-oligosaccharide 2.7.7.32 Galactose-1-phosphate glycerophosphotransferase. hymidylyltransferase. 2.78.22 -alkenyl-2-acylglycerol choline 2.7.7.33 Glucose-1-phosphate cytidylyltransferase. phosphotransferase. 2.7.7.34 Glucose-1-phosphate 2.78.23 Carboxyvinyl-carboxyphosphonate guanylyltransferase. 40 phosphorylmutase. 2.77.35 Ribose-5-phosphate adenylyltransferase. 2.78.24 Phosphatidylcholine synthase. 2.7.7.36 Aldose-1-phosphate adenylyltransferase. 2.78.25 Triphosphoribosyl-dephospho-CoA 2.7.737 Aldose-1-phosphate synthase. nucleotidyltransferase. 2.78.26 Adenosylcobinamide-GDP 2.77.38 3-deoxy-manno-octulosonate ribazoletransferase. cytidylyltransferase. 45 2.79.1 Pyruvate, phosphate dikinase. 2.7.7.39 Glycerol-3-phosphate 2.79.2 Pyruvate, water dikinase. cytidylyltransferase. 2.79.3 Selenide, water dikinase. 2.7.7.40 D-ribitol-5-phosphate 2.79.4 Alpha-glucan, water dikinase. cytidylyltransferase. 2.8.1.1 Thiosulfate sulfur-transferase. 2.77.41 Phosphatidate cytidylyltransferase. 2.8.1.2 3-mercaptopyruvate Sulfur-transferase. 2.7.7.42 Glutamate-ammonia-ligase 50 2.8.1.3 Thiosulfate--thiol sulfur-transferase. adenylyltransferase. 2.8.1.4 tRNA sulfur-transferase. 2.77.43 N-acylneuraminate cytidylyltransferase. 2.8.1.5 Thiosulfate--dithiol sulfur-transferase. 2.77.44 Glucuronate-1-phosphate 2.8.1.6 . uridylyltransferase. 2.8.1.7 . 2.7.7:45 Guanosine-triphosphate 2.8.2.1 Aryl . guanylyltransferase. 55 2.8.2.2 . 2.77.46 Gentamicin 2"-nucleotidyltransferase. 2.8.2.3 . 2.7.747 Streptomycin 3'-adenylyltransferase. 2.8.2.4 . 2.7.7.48 RNA-directed RNA polymerase. 2.8.2.5 Chondroitin 4-sulfotransferase. 2.7.7.49 RNA-directed DNA polymerase. 2.8.2.6 . 2.77.50 mRNA guanylyltransferase. 2.8.2.7 UDP-N-acetylgalactosamine-4-sulfate 2.7.7.51 Adenylylsulfate--ammonia Sulfotransferase. adenylyltransferase. 60 2.8.2.8 Heparan sulfate-glucosamine N 2.7.7.52 RNA uridylyltransferase. Sulfotransferase. 2.7.7.53 ATP adenylyltransferase. 2.8.2.9 Tyrosine-ester sulfotransferase. 2.77.54 Phenylalanine adenylyltransferase. 2.8.2.10 Renilla-luciferin sulfotransferase. 2.77.55 Anthranilate adenylyltransferase. 2.8.2.11 Galactosylceramide Sulfotransferase. 2.77.56 tRNA nucleotidyltransferase. 2.8.2.13 Psychosine sulfotransferase. 2.7.7.57 N-methylphosphoethanolamine 65 2.8.2.14 Bile- sulfotransferase. cytidylyltransferase. 2.8.2.15 . US 8,119,385 B2 85 86 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 2.8.2. 16 Thiol sulfotransferase. 39 Actinomycin lactonase. 2.8.2. 17 Chondroitin 6-sulfotransferase. .1.40 Orsellinate-depside hydrolase. 2.8.2. 18 Cortisol sulfotransferase. .1.41 Cephalosporin-C deacetylase. 2.8.2. 19 Triglucosylalkylacylglycerol .1.42 Chlorogenate hydrolase. Sulfotransferase. 1.43 Alpha-amino-acid esterase. 2.8.2. 2O Protein-tyrosine Sulfotransferase. .1.44 4-methyloxaloacetate esterase. 2.8.2. 21 . 10 1.45 Carboxymethylenebutenolidase. 2.8.2. 22 Arylsulfate sulfotransferase. .1.46 Deoxylimonate A-ring-lactonase. 2.8.2. 23 Heparan sulfate-glucosamine 3 1.47 1-alkyl-2-acetylglycerophosphocholine Sulfotransferase 1. esterase. 2.8.2. 24 Desulfoglucosinolate Sulfotransferase. .1.48 Fusarinine-Cornithinesterase. 2.8.2. 25 Flavonol 3-sulfotransferase. 1.49 Sinapine esterase. 2.8.2. 26 Quercetin-3-sulfate 3'-sulfotransferase. 15 SO Wax-ester hydrolase. 2.8.2. 27 Quercetin-3-sulfate 4'-sulfotransferase. 1.51 Phorbol-diester hydrolase. 2.8.2. 28 Quercetin-3,3'-bisSulfate 7 1.52 Phosphatidylinositol deacylase. Sulfotransferase. 1.53 Sialate O-acetylesterase. 2.8.2. 29 Heparan sulfate-glucosamine 3 1.54 Acetoxybutynylbithiophene deacetylase. Sulfotransferase 2. 1.55 Acetylsalicylate deacetylase. 2.8.2. 30 Heparan sulfate-glucosamine 3 1.56 Methylumbelliferyl-acetate deacetylase. Sulfotransferase 3. 1.57 2-pyrone-4,6-dicarboxylate lactonase. 2.83. 1 Propionate CoA-transferase. N-acetylgalactosaminoglycan 2.83. 2 Oxalate CoA-transferase. deacetylase. 2.83. 3 Malonate CoA-transferase. 59 Juvenile-hormone esterase. 2.83. 5 3-oxoacid CoA-transferase. 1.60 Bis(2-ethylhexyl)phthalate esterase. 2.83. 6 3-oxoadipate CoA-transferase. 1.61 Protein-glutamate methylesterase. 2.83. 7 Succinate-citramalate CoA-transferase. 25 1.63 11-cis-retinyl-palmitate hydrolase. 2.83. 8 Acetate CoA-transferase. .1.64 All-trans-retinyl-palmitate hydrolase. 2.83. 9 Butyrate--acetoacetate CoA-transferase. 1.65 L-rhamnono-1,4-lactonase. 2.83. O Citrate CoA-transferase. 66 5-(3,4-diacetoxybut-1-ynyl)-2,2'- 2.83. 1 Citramalate CoA-transferase. bithiophene deacetylase. 2.83. 2 Glutaconate CoA-transferase. Fatty-acyl-ethyl-ester synthase. 2.83. 3 Succinate--hydroxymethylglutarate CoA 30 1.68 Xylono-1,4-lactonase. transferase. 1.70 Cetraxate benzylesterase. 2.83. 5-hydroxypentanoate CoA-transferase. 1.71 Acetylalkylglycerol acetylhydrolase. 2.83. Succinyl-CoA:(R)-benzylsuccinate CoA 1.72 Acetylxylan esterase. transferase. 1.73 Feruloyl esterase. 2.83. 6 Formyl-CoA transferase. 1.74 . 2.83. Cinnamoyl-CoA:phenylactate CoA 35 1.75 Poly(3-hydroxybutyrate) depolymerase. transferase. .76 Poly(3-hydroxyoctanoate) 2.8.4. Coenzyme-B sulfoethylthiotransferase. depolymerase. 2.9.1. L-seryl-tRNA(Sec) selenium transferase. 77 Acyloxyacyl hydrolase. ENZYME: 3. . . . 1.78 Polyneuridine-aldehyde esterase. 1.79 Hormone-sensitive lipase. Carboxylesterase. .2.1 Acetyl-CoA hydrolase. Arylesterase. 40 .2.2 Palmitoyl-CoA hydrolase. Triacylglycerol lipase. 2.3 Succinyl-CoA hydrolase. Phospholipase A(2). .2.4 3-hydroxyisobutyryl-CoA hydrolase. Lysophospholipase. 2.5 Hydroxymethylglutaryl-CoA hydrolase. Acetylesterase. .2.6 Hydroxyacylglutathione hydrolase. . 2.7 Glutathione thiolesterase. . 45 2.10 Formyl-CoA hydrolase. 1O Tropinesterase. .2.11 Acetoacetyl-CoA hydrolase. . .2.12 S-formylglutathione hydrolase. 1.13 Sterol esterase. 2.13 S-Succinylglutathione hydrolase. .1.14 Chlorophyllase. .2.14 Oleoyl-acyl-carrier-protein hydrolase. 1.15 L-arabinonolactonase. 2.15 thiolesterase. 1.17 Gluconolactonase. 50 .2.16 Citrate-(pro-3S)-lyase thiolesterase. 1.19 Uronolactonase. 2.17 (S)-methylmalonyl-CoA hydrolase. 120 Tannase. 2.18 ADP-dependent short-chain-acyl-CoA .1.21 Retinyl-palmitate esterase. hydrolase. .1.22 Hydroxybutyrate-dimer hydrolase. 2.19 ADP-dependent medium-chain-acyl-CoA 1.23 Acylglycerol lipase. hydrolase. 3-oxoadipate enol-lactonase. .1.24 55 2.2O Acyl-CoA hydrolase. 1.25 1,4-lactonase. .2.21 Dodecanoyl-acyl-carrier protein .1.26 Galactolipase. hydrolase. 1.27 4-pyridoxolactonase. .2.22 Palmitoyl-protein hydrolase. 1.28 Acylcarnitine hydrolase. 2.23 4-hydroxybenzoyl-CoA . 1.29 Aminoacyl-tRNA hydrolase. .2.24 2-(2-hydroxyphenyl)benzenesulfinate 1.30 D-arabinonolactonase. hydrolase. 1.31 6-phosphogluconolactonase. 60 2.25 Phenylacetyl-CoA hydrolase. 1.32 Phospholipase A(1). .3.1 . 1.33 6-acetylglucose deacetylase. 3.2 . .1.34 . 3.3 Phosphoserine phosphatase. 1.35 Dihydrocoumarin hydrolase. 3.4 Phosphatidate phosphatase. 1.36 Limonin-D-ring-lactonase. 3.5 5'-. 1.37 Steroid-lactonase. 65 3.6 3'-nucleotidase. 38 Triacetate-lactonase. 3.7 3'(2),5'-bisphosphate nucleotidase. US 8,119,385 B2 87 88 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.8 3-phytase. .4.2 Glycerophosphocholine phosphodiesterase. 3.9 Glucose-6-phosphatase. .4.3 Phospholipase C. 3.10 Glucose-1-phosphatase. .4.4 . 3.11 Fructose-bisphosphatase. .4.11 Phosphoinositide phospholipase C. 3.12 Trehalose-phosphatase. .4.12 Sphingomyelin phosphodiesterase. 3.13 Bisphosphoglycerate phosphatase. .4.13 Serine-ethanolaminephosphate 3.14 Methylphosphothioglycerate phosphatase. 10 phosphodiesterase. 3.15 Histidinol-phosphatase. .4.14 Acyl-carrier-protein phosphodiesterase. 3.16 Phosphoprotein phosphatase. 3. 4.15 Adenylyl-glutamate-ammonia ligase 3.17 Phosphorylase phosphatase. hydrolase. 3.18 Phosphoglycolate phosphatase. .4.16 2',3'-cyclic-nucleotide 2'- 3.19 Glycerol-2-phosphatase. phosphodiesterase. 3.2O Phosphoglycerate phosphatase. 15 4.17 3',5'-cyclic-nucleotide phosphodiesterase. 3.21 Glycerol-1-phosphatase. 4.35 3',5'-cyclic-GMP phosphodiesterase. 3.22 Mannitol-1-phosphatase. 4.37 2',3'-cyclic-nucleotide 3'- 3.23 Sugar-phosphatase. phosphodiesterase. 3.24 Sucrose-phosphatase. 4.38 ycerophosphocholine 3.25 Inositol-1 (or 4)-monophosphatase. holinephosphodiesterase. 3.26 4-phytase. 4.39 kylglycerophosphoethanolamine 3.27 Phosphatidylglycerophosphatase. hosphodiesterase. 3.28 ADP-phosphoglycerate phosphatase. .4.40 MP-N-acylneuraminate 3.29 N-acylneuraminate-9-phosphatase. hosphodiesterase. 3.31 Nucleotidase. .4.41 phingomyelin phosphodiesterase D. 3.32 Polynucleotide 3'-phosphatase. .4.42 ycerol-1,2-cyclic-phosphate 2 3.33 Polynucleotide 5'-phosphatase. phosphodiesterase. 3.34 Deoxynucleotide 3'-phosphatase. 25 4.43 G ycerophosphoinositol 3.35 Thymidylate 5'-phosphatase. nositolphosphodiesterase. 3.36 Phosphoinositide 5-phosphatase. .4.44 ycerophosphoinositol 3.37 Sedoheptulose-bisphosphatase. ycerophosphodiesterase. 3.38 3-phosphoglycerate phosphatase. 445 -acetylglucosamine-1-phosphodiester 3.39 Streptomycin-6-phosphatase. pha-N-acetylglucosaminidase. 3.40 Guanidinodeoxy-Scyllo-inositol-4- 30 .4.46 ycerophosphodiester phosphodiesterase. phosphatase. 4.48 olichylphosphate-glucose 3.41 4-nitrophenylphosphatase. phosphodiesterase. 3.42 Glycogen-synthase-D] phosphatase. .4.49 Dolichylphosphate-mannose 3.43 Pyruvate dehydrogenase (lipoamide)- phosphodiesterase. phosphatase. 4.50 Glycosylphosphatidylinositol 3.44 Acetyl-CoA carboxylase 35 phospholipase D. hosphatase. 4.51 Glucose-1-phospho-D- 3.45 -deoxy-manno-octulosonate-8- mannosylglycoprotein phosphodiesterase. hosphatase. 5.1 dGTPase. 3.46 ructose-2,6-bisphosphate 2 .6.1 . phosphatase. .6.2 Steryl-. 3.47 Hydroxymethylglutaryl-CoA 6.3 Glycosulfatase. reductase (NADPH)-phosphatase. 40 .6.4 N-acetylgalactosamine-6-sulfatase. 3.48 Protein-tyrosine-phosphatase. .6.6 Choline-sulfatase. 3.49 Pyruvate kinase-phosphatase. 6.7 Cellulose-polysulfatase. 3.SO Sorbitol-6-phosphatase. 6.8 Cerebroside-sulfatase. 3.51 Dolichyl-phosphatase. 6.9 Chondro-4-sulfatase. 3.52 3-methyl-2-oxobutanoate 6.10 Chondro-6-sulfatase. dehydrogenase (lipoamide)-phosphatase. 45 .6.11 Disulfoglucosamine-6-sulfatase. 3.53 Myosin light-chain-phosphatase. .6.12 N-acetylgalactosamine-4-Sulfatase. 3.54 Fructose-2,6-bisphosphate 6 6.13 Iduronate-2-sulfatase. phosphatase. .6.14 N-acetylglucosamine-6-sulfatase. 3.55 Caldesmon-phosphatase. 6.15 N-sulfoglucosamine-3-sulfatase. 3.56 nositol-polyphosphate 5-phosphatase. 6.16 Monomethyl-sulfatase. s 3.57 nositol-1,4-bisphosphate 1 50 6.17 D-lactate-2-sulfatase. phosphatase. 6.18 Glucuronate-2-sulfatase. 3.58 S l 9. 8 terminal-phosphatase. 7.1 Prenyl-diphosphatase. 3.59 Al ky cetylglycerophosphatase. 7.2 Guanosine-3',5'-bis(diphosphate) 3'- 3.60 Phosphoenolpyruvate phosphatase. diphosphatase. 3.62 Multiple inositol-polyphosphate 7.3 Monoterpenyl-diphosphatase. phosphatase. 55 .8.1 Aryldialkylphosphatase. 3.63 2-carboxy-D-arabinitol-1-phosphatase. 8.2 Diisopropyl-fluorophosphatase. s 3.64 Phosphatidylinositol-3-phosphatase. .11.1 I. 3.66 Phosphatidylinositol-3,4-bisphosphate .11.2 Exodeoxyribonuclease III. 4-phosphatase. 11.3 Exodeoxyribonuclease (lambda 3.67 Phosphatidylinositol-3,4,5- induced). Erisphosphate 3-phosphatase. 3. .11.4 Exodeoxyribonuclease (phage Sp3 3.68 2-deoxyglucose-6-phosphatase. 60 induced). 3.69 Glucosylglycerol 3-phosphatase. 11.5 Exodeoxyribonuclease V. 3.70 Mannosyl-3-phosphoglycerate .11.6 Exodeoxyribonuclease VII. phosphatase. 13.1 II. 3.71 2-phosphosulfolactate phosphatase. 13.2 Exoribonuclease H. 3.72 5-phytase. 13.3 Oligonucleotidase. 3.73 Alpha-ribazole phosphatase. 65 .13.4 Poly(A)-specific . .4.1 Phosphodiesterase I. .14.1 Yeast ribonuclease. US 8,119,385 B2 89 90 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 15.1 Venom exonuclease. 3.2. 49 pha-N-acetylgalactosaminidase. .16.1 Spleen exonuclease. 3.2. SO pha-N-acetylglucosaminidase. .21.1 I. 3.2. S1 pha-L-. .21.2 Deoxyribonuclease IV (phage-T(4)- 3.2. 52 eta-N-acetylhexosaminidase. induced). 3.2. 53 eta-N-acetylgalactosaminidase. 21.3 Type I site-specific deoxyribonuclease. 3.2. 54 yclomaltodextrinase. .21.4 Type II site-specific deoxyribonuclease. 10 3.2. 55 pha-N-arabinofuranosidase. 21.5 Type III site-specific deoxyribonuclease. 3.2. S6 lucuronosyl-disulfoglucosamine .21.6 CC-preferring . lucuronidase. 21.7 Deoxyribonuclease V. 3.2. 57 opullulanase. .22.1 Deoxyribonuclease II. 3.2. S8 Glucan 1,3-beta-glucosidase. .22.2 Aspergillus deoxyribonuclease K(1). 3.2. 59 Glucan endo-1,3-alpha-glucosidase. .22.4 Crossover junction . 15 3.2. 60 G lucan 1,4-alpha 22.5 Deoxyribonuclease X. altotetraohydrolase. 25.1 Deoxyribonuclease (pyrimidine dimer). 3.2. 61 Mycodextranase. .26.1 Physarum polycephalum ribonuclease. 3.2. 62 Glycosylceramidase. 26.2 ibonuclease alpha. 3.2. 63 2-alpha-L-fucosidase. 26.3 bonuclease III. 3.2. .64 2,6-beta-fructan 6-levanbiohydrolase. .26.4 bonuclease H. 3.2. .65 LeW88Se. 26.5 bonuclease P. 3.2. 66 Quercitrinase. 26.6 bonuclease IV. 3.2. .67 Galacturan 1,4-alpha-galacturonidase. 26.7 bonuclease P4. 3.2. 68 soamylase. 26.8 bonuclease M5. 3.2. 70 Glucan 1,6-alpha-glucosidase. 26.9 bonuclease (poly-(U)-specific). 3.2. .71 Glucan endo-1,2-beta-glucosidase. 26.10 bonuclease IX. 3.2. 72 Xylan 1,3-beta-xylosidase. .26.11 bonuclease Z. 25 3.2. 73 Licheninase. 27.1 bonuclease T(2). 3.2. .74 Glucan 1,4-beta-glucosidase. 27.2 cillus Subtilis ribonuclease. 3.2. .75 Glucan endo-1,6-beta-glucosidase. 27.3 bonuclease T(1). 3.2. .76 L-. 27.4 bonuclease U(2). 3.2. 77 Mannan 1,2-(1,3)-alpha-mannosidase. 27.5 ancreatic ribonuclease. 3.2. .78 Mannan endo-1,4-beta-mannosidase. 27.6 terobacter ribonuclease. 30 3.2. 8O Fructan beta-fructosidase. 27.7 bonuclease F. 3.2. 81 Agarase. 27.8 bonuclease V. 3.2. 82 Exo-poly-alpha-galacturonosidase. 27.9 RNA-intron . 3.2. .83 Kappa-carrageenase. 27.10 rRNA endonuclease. 3.2. 84 Glucan 1,3-alpha-glucosidase. 30.1 Aspergillus nuclease S(1). 3.2. .85 6-phospho-beta-galactosidase. 30.2 Serratia marcescens nuclease. 35 3.2. 86 6-phospho-beta-glucosidase. 31.1 . 3.2. 87 Capsular-polysaccharide endo-1,3-alpha Alpha-amylase. galactosidase. Beta-amylase. 3.2. 88 Beta-L-arabinosidase. Glucan 1,4-alpha-glucosidase. 3.2. 89 Arabinogalactan endo-1,4-beta . galactosidase. Endo-1,3(4)-beta-glucanase. 3.2. 91 Cellulose 1,4-beta-cellobiosidase. nulinase. 40 3.2. 92 Peptidoglycan beta-N-acetylmuramidase. Endo-1,4-beta-xylanase. 3.2. .93 Alpha,alpha-phosphotrehalase. 1O Oligo-1,6-glucosidase. 3.2. .94 Glucan 1,6-alpha-isomaltosidase. Dextranase. 3.2. 95 Dextran 1,6-alpha-isomaltotriosidase. .14 . 3.2. .96 Mannosyl-glycoprotein endo-beta-N- 1S Polygalacturonase. acetylglucosaminidase. 17 . 45 3.2. 97 Glycopeptide alpha-N- 18 Exo-alpha-Sialidase. acetylgalactosaminidase. 2O Alpha-glucosidase. 3.2. .98 Glucan 1,4-alpha-maltohexaosidase. .21 Beta-glucosidase. 3.2. .99 Arabinan endo-1,5-alpha-L-arabinosidase. 22 Alpha-galactosidase. 3.2.1. Mannan 1,4-mannobiosidase. .23 Beta-galactosidase. 3.2.1. Mannan endo-1,6-alpha-mannosidase. .24 Alpha-mannosidase. 50 3.2.1. Blood-group-substance endo-1,4-beta 25 Beta-mannosidase. galactosidase. 26 Beta-fructofuranosidase. 3.2.1. Keratan-sulfate endo-1,4-beta 28 Alpha,alpha-. galactosidase. 31 Beta-glucuronidase. 3.2.1. Steryl-beta-glucosidase. 32 Xylan endo-1,3-beta-xylosidase. 3.2.1. Strictosidine beta-glucosidase. .33 Amylo-alpha-1,6-glucosidase. 55 3.2.1. Mannosyl-oligosaccharide glucosidase. 35 Hyaluronoglucosaminidase. 3.2.1. Protein-glucosylgalactosylhydroxylysine 36 Hyaluronoglucuronidase. glucosidase. 37 Xylan 1,4-beta-xylosidase. 3.2.1. O8 . 38 Beta-D-fucosidase. 3.2.1. 09 Endogalactosaminidase. 39 Glucan endo-1,3-beta-D-glucosidase. 3.2.1. 10 Mucinaminylserine mucinaminidase. .40 Alpha-L-rhamnosidase. 3.2.1. 11 1,3-alpha-L-fucosidase. .41 Pullulanase. 60 3.2.1. 12 2-deoxyglucosidase. .42 GDP-glucosidase. 3.2.1. 13 Mannosyl-oligosaccharide 1,2-alpha 43 Beta-L-rhamnosidase. mannosidase. Fucoidanase. 3.2.1. 14 Mannosyl-oligosaccharide 1,3-1,6-alpha 45 . mannosidase. .46 . 3.2.1. 15 Branched-dextran exo-1,2-alpha 47 Galactosylgalactosylglucosylceramidase. 65 glucosidase. 48 Sucrose alpha-glucosidase. 3.2.1. 16 Glucan 1,4-alpha-maltotriohydrolase. US 8,119,385 B2 91 92 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.2.1.117 beta-glucosidase. 5 3.4.11.2 Membrane alanyl . 3.2.1.118 Prunasin beta-glucosidase. 3.4.11.3 Cystinyl aminopeptidase. 3.2.1.119 Vicianin beta-glucosidase. 3.4.11.4 Tripeptide aminopeptidase. 3.2.1.12O Oligoxyloglucan beta-glycosidase. 3.4.11.5 Prolylaminopeptidase. 3.2.1.121 Polymannuronate hydrolase. 3.4.11.6 Aminopeptidase B. 3.2.1122 Maltose-6-phosphate glucosidase. 3.4.11.7 Glutamylaminopeptidase. 3.2.1.123 Endoglycosylceramidase. 10 3.4.11.9 Xaa-Pro aminopeptidase. 3.2.1.124 3-deoxy-2-octulosonidase. 3.4.11.10 Bacterial . 3.2.1.125 Raucaffricine beta-glucosidase. 3.4.11.13 Clostridial aminopeptidase. 3.2.1.126 Coniferin beta-glucosidase. 3.4.11.14 Cytosol alanyl aminopeptidase. 3.2.1.127 1,6-alpha-L-fucosidase. 3.4.11.15 Aminopeptidase Y. 3.2.1.128 Glycyrrhizinate beta-glucuronidase. 3.4.11.16 Xaa-Trp aminopeptidase. 3.2.1.129 Endo-alpha-Sialidase. 15 3.4.11:17 Tryptophanyl aminopeptidase. 3.2.1.130 Glycoprotein endo-alpha-1,2- 3.4.11.18 Methionyl aminopeptidase. mannosidase. 3.4.11:19 D-stereospecific aminopeptidase. 3.2.1.131 Xylan alpha-1,2-glucuronosidase. 3.4.11.2O Aminopeptidase Ey. 3.2.1.132 Chitosanase. 3.4.11.21 Aspartyl aminopeptidase. 3.2.1.133 Glucan 1,4-alpha-maltohydrolase. 3.4.11.22 Aminopeptidase I. 3.2.1.134 Difructose-anhydride synthase. 3.4.11.23 PepBaminopeptidase. 3.2.1.135 Neopululanase. 20 3.4.13.3 Xaa-His . 3.2.1.136 Glucuronoarabinoxylan endo-1,4- 3.4.13.4 Xaa-Arg dipeptidase. beta-xylanase. 3.4.13.5 Xaa-methyl-His dipeptidase. 3.2.1.1.37 Mannan exo-1,2-1,6-alpha- 3.4.13.7 Glu-Glu dipeptidase. mannosidase. 3.4.13.9 Xaa-Pro dipeptidase. 3.2.1.139 Alpha-glucuronidase. 3.4.13:12 Met-Xaa dipeptidase. 3.2.1.140 Lacto-N-biosidase. 25 3.4.13.17 Non-stereospecific dipeptidase. 3.2.1.141 4-alpha-D-(1->4)-alpha-D- 3.4.13.18 Cytosol nonspecific dipeptidase. glucano trehalose trehalohydrolase. 3.4.13.19 Membrane dipeptidase. 3.2.1.142 Limit dextrinase. 3.4.13.2O Beta-Ala-His dipeptidase. 3.2.1.143 Poly(ADP-ribose) glycohydrolase. 3.4.13.21 Dipeptidase E. 3.2.1.144 3-deoxyoctulosonase. 3.4.14.1 Dipeptidyl-peptidase I. 3.2.1.145 Galactan 1,3-beta-galactosidase. 30 3.4.14.2 Dipeptidyl-peptidase II. 3.2.1.146 Beta-galactofuranosidase. 3.4.14.4 Dipeptidyl-peptidase III. 3.2.1.147 Thioglucosidase. 3.4.14.5 Dipeptidyl-peptidase IV. 3.2.1.148 Ribosylhomocysteinase. 3.4.14.6 Dipeptidyl-dipeptidase. 3.2.1.149 Beta-primeverosidase. 3.4.14.9 Tripeptidyl-peptidase I. 3.2.1.150 Oligoxyloglucan reducing-end- 3.4.14.10 Tripeptidyl-peptidase II. specific cellobiohydrolase. is 3.4.14.11 Xaa-Pro dipeptidyl-peptidase. 3.2.1.151 Xyloglucan-specific endo-beta-1,4- 3.4.15.1 Peptidyl-dipeptidase A. glucanase. 3.4.15.4 Peptidyl-dipeptidase B. 3.2.2. Purine nucleosidase. 3.4.15.5 Peptidyl-dipeptidase Dcp. 3.2.2.2 Inosine nucleosidase. 3.4.16.2 Lysosomal Pro-X carboxypeptidase. 3.22.3 Uridine nucleosidase. 3.4.16.4 Serine-type D-Ala-D-Ala 3.2.2.4 AMP nucleosidase. carboxypeptidase. 3.2.2.5 NAD(+) nucleosidase. 40 3.416.5 Carboxypeptidase C. 3.22.6 NAD(P)(+) nucleosidase. 3.416.6 Carboxypeptidase D. 3.2.2.7 Adenosine nucleosidase. 3.4.17.1 . 3.22.8 Ribosylpyrimidine nucleosidase. 3.4.17.2 . 3.22.9 Adenosylhomocysteine nucleosidase. 3.4.17.3 Lysine carboxypeptidase. 3.22.10 Pyrimidine-5'-nucleotide 3.4.17.4 Gly-X carboxypeptidase. nucleosidase. 45 3.4.17.6 Alanine carboxypeptidase. 3.22.11 Beta-aspartyl-N-acetylglucosaminidase. 3.4.17.8 Muramoylpentapeptide 3.2.2.12 nosinate nucleosidase. carboxypeptidase. 3.22.13 -methyladenosine nucleosidase. 3.4.17.10 . 3.2.2.14 NMN nucleosidase. 3.4.17.11 Glutamate carboxypeptidase. 3.22.15 DNA-deoxyinosine glycosylase. 3.4.17.12 Carboxypeptidase M. 3.22.16 Methylthioadenosine nucleosidase. 50 3.4.17.13 Muramoyltetrapeptide 3.2.2.17 Deoxyribodipyrimidine endonucleosidase. carboxypeptidase. 3.22.19 Protein ADP-ribosylarginine hydrolase. 3.4.17.14 D-Ala-D-Ala carboxypeptidase. 3.22.2O DNA-3-methyladenine glycosylase I. 3.4.17.15 . 3.22.21 DNA-3-methyladenine glycosylase II. 3.4.17.16 Membrane Pro-X carboxypeptidase. 3.22.22 rRNA N-glycosylase. 3.4.17.17 Tubulinyl-Tyr carboxypeptidase. 3.22.23 DNA-formamidopyrimidine glycosylase. 55 3.4.17.18 Carboxypeptidase T. 3.2.2.24 ADP-ribosyl-dinitrogen reductase 3.4.17.19 Carboxypeptidase Taq. hydrolase. 3.4.17.2O Carboxypeptidase U. 3.3.1.1 Adenosylhomocysteinase. 3.4.17.21 Glutamate carboxypeptidase II. 3.31.2 Adenosylmethionine hydrolase. 3.4.17.22 Metallocarboxypeptidase D. 3.3.2.1 Sochorismatase. 3.4.18.1 Cathepsin X. 3.3.2.2 Alkenylglycerophosphocholine hydrolase. 3.4.19.1 Acylaminoacyl-peptidase. 3.32.3 Epoxide hydrolase. 60 3.4.19.2 Peptidyl-glycinamidase. 3.32.4 Trans-epoxysuccinate hydrolase. 3.4.19.3 Pyroglutamyl-peptidase I. 3.3.2.5 Alkenylglycerophosphoethanolamine 3.4.19.5 Beta-aspartyl-peptidase. hydrolase. 3.4.19.6 Pyroglutamyl-peptidase II. 3.32.6 Leukotriene-A(4) hydrolase. 3.4.19.7 N-formylmethionyl-peptidase. 3.32.7 Hepoxilin-epoxide hydrolase. 3.4.19.9 Gamma-glutamyl hydrolase. 3.32.8 Limonene-1,2-epoxide hydrolase. 65 3.4.19.11 Gamma-D-glutamyl-meso 3.4.11.1 Leucyl aminopeptidase. diaminopimelate peptidase. US 8,119,385 B2 93 94 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.4.1912 Ubiquitinyl hydrolase 1. 5 3.4.21.96 Lactocepin. 3.4.2 Chymotrypsin. 3.421.97 Assemblin. 3.4.2 Chymotrypsin C. 3.421.98 Hepacivirin. 3.4.2 Metridin. 3.421.99 Spermosin. 3.4.2 Trypsin. 3.4.21.100 Pseudomonalisin. 3.4.2 Thrombin. 3.4.21.101 Xanthomonalisin. 3.4.2 Coagulation factor Xa. 10 3.4.21.102 C-terminal processing peptidase. 3.4.2 Plasmin. 3.4.21.103 Physarolisin. 3.4.2 Enteropeptidase. 3.4.22.1 Cathepsin B. 3.4.2 1O Acrosin. 3.4.22.2 3.4.2 .12 Alpha-lytic . 3.4.223 3.4.2 19 Glutamyl endopeptidase. 3.4.22.6 3.4.2 2O Cathepsin G. 15 3.4.22.7 3.4.2 .21 Coagulation factor VIIa. 3.4.22.8 Clostripain. 3.4.2 22 Coagulation factor IXa. 3.4.22.10 Streptopain. 3.4.2 25 Cucumisin. 3.4.22.14 Actinidain. 3.4.2 26 Prolyl oligopeptidase. 3.4.22.15 Cathepsin L. 3.4.2 27 Coagulation factor XIa. 3.4.22.16 Cathepsin H. 3.4.2 32 Brachyurin. 3.4.22.24 Cathepsin T. 3.4.2 34 Plasma kallikrein. 20 3.4.22.25 Glycyl endopeptidase. 3.4.2 35 Tissue kallikrein. 3.4.22.26 Cancer procoagulant. 3.4.2 36 Pancreatic elastase. 3.4.2227 Cathepsin S. 3.4.2 37 Leukocyte elastase. 3.4.22.28 Picornain 3C. 3.4.2 38 Coagulation factor XIIa. 3.4.22.29 Picornain 2A. 3.4.2 39 Chymase. 3.4.22.30 Caricain. 3.4.2 41 Complement Subcomponent C1r. 25 3.4.22.31 Ananain. 3.4.2 42 Complement Subcomponent C1s. 3.4.22.32 Stem bromelain. 3.4.2 43 Classical-complement-pathway C3/C5 3.4.22.33 Fruit bromelain. convertase. 3.4.22.34 Legumain. 3.4.2 45 Complement factor I. 3.4.22.35 Histolysain. 3.4.2 46 Complement factor D. 3.4.22.36 Caspase-1. 3.4.2 47 Alternative-complement-pathway C3/C5 30 3.4.22.37 Gingipain R. convertase. 3.4.22.38 Cathepsin K. 3.4.2 48 Cerevisin. 3.4.2239 Adenain. 3.4.2 49 Hypodermin C. 3.4.22.40 Bleomycin hydrolase. 3.4.2 SO Lysyl endopeptidase. 3.4.22.41 Cathepsin F. 3.4.2 53 Endopeptidase La. 3.4.22.42 Cathepsin O. Cathepsin V. 3.4.2 54 Gamma-renin. 35 3.4.22.43 3.4.2 55 Venombin AB. 3.4.22.44 Nuclear-inclusion-a endopeptidase. 3.4.2 57 Leucyl endopeptidase. 3.4.22.45 Helper-component proteinase. 3.4.2 59 Tryptase. 3.4.22.46 L-peptidase. 3.4.2 60 Scutelarin. 3.4.22.47 Gingipain K. 3.4.2 61 Kexin. 3.4.22.48 Staphopain. 3.4.2 62 Subtilisin. 3.4.22.49 Separase. 3.4.2 63 Oryzin. 40 3.4.22.50 V-cath endopeptidase. 3.4.2 .64 Endopeptidase K. 3.4.22.51 Cruzipain. 3.4.2 .65 Thermomycolin. 3.4.22.52 Calpain-1. 3.4.2 66 Thermitase. 3.4.22.53 Calpain-2. 3.4.2 .67 Endopeptidase So. 3.4.23.1 Pepsin A. 3.4.2 68 T-plasminogen activator. 3.4.23.2 Pepsin B. 3.4.2 69 Protein C (activated). 45 3.4.23.3 Gastricsin. 3.4.2 70 Pancreatic endopeptidase E. 3.4.23.4 Chymosin. 3.4.2 .71 Pancreatic elastase II. 3.4.23.5 Cathepsin D. 3.4.2 72 gA-specific serine endopeptidase. 3.4.23.12 Nepenthesin. 3.4.2 73 U-plasminogen activator. 3.4.23.15 Renin. 3.4.2 .74 Venombin A. 3.4.23.16 HIV-1 retropepsin. 3.4.2 .75 Furin. 50 3.4.23.17 Pro-opiomelanocortin converting enzyme. 3.4.2 .76 Myeloblastin. 3.4.23.18 Aspergillopepsin I. 3.4.2 77 Semenogelase. 3.4.23.19 Aspergillopepsin II. 3.4.2 .78 Granzyme A. 3.4.23.2O Penicillopepsin. 3.4.2 .79 Granzyme B. 3.4.23.21 Rhizopuspepsin. 3.4.2 8O Streptogrisin A. 3.4.23.22 Endothiapepsin. 3.4.2 81 Streptogrisin B. 55 3.4.23.23 Mucorpepsin. 3.4.2 82 Glutamyl endopeptidase II. 3.4.23.24 Candidapepsin. 3.4.2 .83 Oligopeptidase B. 3.4.23.25 Saccharopepsin. 3.4.2 84 Limulus clotting factor C. 3.42326 Rhodotorulapepsin. 3.4.2 .85 Limulus clotting factor B. 3.4.23.28 Acrocylindropepsin. 3.4.2 86 Limulus clotting enzyme. 3.4.23.29 Polyporopepsin. 3.4.2 87 Omptin. 3.4.23.30 Pycnoporopepsin. 3.4.2 88 Repressor lexA. 60 3.4.23.31 Scytalidopepsin A. 3.4.2 89 Signal peptidase I. 3.4.23.32 Scytalidopepsin B. 3.4.2 90 Togavirin. 3.4.23.34 Cathepsin E. 3.4.2 91 Flavivirin. 3.4.23.35 Barrierpepsin. 3.4.2 92 Endopeptidase Clp. 3.4.23.36 Signal peptidase II. 3.4.2 .93 Proprotein convertase 1. 3.4.23.38 Plasmepsin I. 3.4.2 .94 Proprotein convertase 2. 65 3.4.23.39 Plasmepsin II. 3.4.2 95 Snake venom factor V activator. 3.4.23.40 Phytepsin. US 8,119,385 B2 95 96 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.4.23.41 Yapsin 1. 3.424.75 . 3.4.23.42 Thermopsin. 3.424.76 Flavastacin. 3.4.23.43 Prepilin peptidase. 3.424.77 Snapalysin. 3.4.23.44 Nodavirus endopeptidase. 3.424.78 GPR endopeptidase. 3.4.23.45 Memapsin 1. 3.424.79 Pappalysin-1. 3.4.23.46 Memapsin 2. 3.424.8O Membrane-type matrix 3.4.23.47 HIV-2 retropepsin. 10 -1. 3.4.23.48 Plasminogen activator Pla. 3.424.81 ADAM10 endopeptidase. 3.4.24.1 Atrolysin A. 3.424.82 ADAMTS-4 endopeptidase. 3.424.3 Microbial . 3.424.83 Anthrax lethal factor endopeptidase. 3.4.24.6 Leucolysin. 3.4.24.84 Ste24 endopeptidase. 3.424.7 Interstitial collagenase. 3.424.85 S2P endopeptidase. 3.4.24.11 . 15 3.424.86 ADAM17 endopeptidase. 3.4.24.12 Envelysin. 3.425.1 Proteasome endopeptidase complex. 3.424.13 IgA-specific . 3.5.1.1 . 3.4.24.14 Procollagen N-endopeptidase. 3.5.1.2 . 3.424.15 Thimet oligopeptidase. 3.5.1.3 Omega-amidase. 3.4.24.16 Neurolysin. 3.5.1.4 Amidase. 3.424.17 Stromelysin 1. 3.5.1.5 . 3.424.18 Meprin A. 3.5.1.6 Beta-ureidopropioinase. 3.424.19 Procollagen C-endopeptidase. 3.5.1.7 Ureidosuccinase. 3.424.2O Peptidyl-Lys metalloendopeptidase. 3.5.1.8 Formylaspartate deformylase. 3.4.24.21 Astacin. 3.5.1.9 Arylformamidase. 3.4.24.22 Stromelysin 2. 3.5.1.10 Formyltetrahydrofolate deformylase. 3.424.23 Matrilysin. 3.5.1.11 Penicillin amidase. 3.4.24.24 A. 25 3.5.1.12 . 3.424.25 Vibriolysin. 3.5.1.13 Aryl-acylamidase. 3.4.24.26 Pseudolysin. 3.5.1.14 Aminoacylase. 3.424.27 . 3.5.1.15 . 3.424.28 Bacillolysin. 3.5.1.16 Acetylornithine deacetylase. 3.424.29 Aureolysin. 3.5.1.17 Acyl-lysine deacylase. 3.424.30 Coccolysin. 30 3.5.1.18 Succinyl-diaminopimelate desuccinylase. 3.424.31 Mycolysin. 3.5.1.19 Nicotinamidase. 3.424.32 Beta-lytic metalloendopeptidase. 3.5.1.20 Citrullinase. 3.424.33 Peptidyl-Asp metalloendopeptidase. 3.5.1.21 N-acetyl-beta-alanine deacetylase. 3.4.24.34 Neutrophil collagenase. 3.5.1.22 Pantothenase. 3.424.35 Gelatinase B. 3.5.1.23 . 3.424.36 Leishmanolysin. 35 3.5.1.24 Choloylglycine hydrolase. 3.424.37 Saccharolysin. 3.5.1.25 N-acetylglucosamine-6-phosphate 3.424.38 Gametolysin. deacetylase. 3.424.39 Deuterolysin. 3.5.1.26 N(4)-(beta-N-acetylglucosaminyl)-L- 3.4.24.40 Serralysin. asparaginase. 3.4.24.41 Atrolysin B. 3.5.1.27 N-formylmethionylaminoacyl-tRNA 3.4.24.42 Atrolysin C. deformylase. 3.4.24.43 AtroXase. 40 3.5.1.28 N-acetylmuramoyl-L-alanine amidase. 3.4.24.44 Atrolysin E. 3.5.1.29 2-(acetamidomethylene)Succinate 3.424.45 Atrolysin F. hydrolase. 3.4.24.46 Adamalysin. 3.5.1.30 5-aminopentanamidase. 3.424.47 Horrilysin. 3.5.1.31 Formylmethionine deformylase. 3.4.24.48 Ruberlysin. 3.5.1.32 Hippurate hydrolase. 3.424.49 Bothropasin. 45 3.5.1.33 N-acetylglucosamine deacetylase. 3.424.50 Bothrolysin. 3.5.1.35 D-glutaminase. 3.424.51 Ophiolysin. 3.5.1.36 N-methyl-2-oxoglutaramate hydrolase. 3.424.52 Trimerelysin I. 3.5.1.38 Glutamin-(asparagin-)ase. 3.424.53 Trimerelysin II. 3.5.1.39 Alkylamidase. 3.424.54 Mucrolysin. 3.5.140 Acylagmatine amidase. 3.424.55 Pitrilysin. 50 3.5.141 Chitin deacetylase. 3.4.2456 Insulysin. 3.5.1.42 Nicotinamide-nucleotide amidase. 3.424.57 O-Sialoglycoprotein endopeptidase. 3.5.1.43 Peptidyl-glutaminase. 3.424.58 Russellysin. 3.5.1.44 Protein-glutamine glutaminase. 3.424.59 Mitochondrial intermediate 3.5.146 6-aminohexanoate-dimer hydrolase. peptidase. 3.S.147 N-acetyldiaminopimelate deacetylase. 3.424.60 Dactylysin. 55 3.5.148 Acetylspermidine deacetylase. 3.4.24.61 Nardilysin. 3.5.1.49 Formamidase. 3.4.24.62 Magnolysin. 3.5.1. SO Pentanamidase. 3.424.63 Meprin B. 3.5.1.51 4-acetamidobutyryl-CoA deacetylase. 3.4.24.64 Mitochondrial processing peptidase. 3.5.1.52 Peptide-N(4)-(N-acetyl-beta 3.424.65 Macrophage elastase. glucosaminyl)asparagine amidase. 3.424.66 Choriolysin L. 3.5.1.53 N-carbamoylputrescine amidase. 3.424.67 Choriolysin H. 60 3.5.1.54 Allophanate hydrolase. 3.424.68 Tentoxilysin. 3.5.1.55 Long-chain-fatty-acyl-glutamate deacylase. 3.424.69 Bontoxilysin. 3.5.1.56 N,N-dimethylformamidase. 3.424.70 Oligopeptidase A. 3.5.1.57 Tryptophanamidase. 3.424.71 Endothelin-converting enzyme 1. 3.5.1.58 N-benzyloxycarbonylglycine hydrolase. 3.424.72 Fibrolase. 3.5.1.59 N-carbamoylsarcosine amidase. 3.424.73 Jararhagin. 65 3.5.1.60 N-(long-chain-acyl)ethanolamine deacylase. 3.424.74 Fragilysin. 3.5.1.61 Mimosinase. US 8,119,385 B2 97 98 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.5.1.62 Acetylputrescine deacetylase. 3.5.44 . 3.5.1.63 4-acetamidobutyrate deacetylase. 3.S.45 . 3.5.1.64 N(alpha)-benzyloxycarbonyl leucine 3.54.6 AMP deaminase. hydrolase. 3.S.4.7 ADP deaminase. 3.5.1.65 Theanine hydrolase. 3.54.8 Aminoimidazolase. 3.5.1.66 2-(hydroxymethyl)-3- 3.54.9 Methenyltetrahydrofolate cyclohydrolase. (acetamidomethylene)Succinate hydrolase. 10 3.54.10 IMP cyclohydrolase. 3.5.1.67 4-methyleneglutaminase. 3.54.11 Pterin deaminase. 3.5.1.68 N-formylglutamate deformylase. 3.54.12 dCMP deaminase. 3.5.1.69 Glycosphingolipid deacylase. 35.4.13 dCTP deaminase. 3.5.1.70 Aculeacin-A deacylase. 3.54.14 Deoxycytidine deaminase. 3.5.1.71 N-feruloylglycine deacylase. 3.S.4.15 Guanosine deaminase. 3.5.1.72 D-benzoylarginine-4-nitroanilide 15 3.54.16 GTP cyclohydrolase I. amidase. 3.S.4.17 Adenosine-phosphate deaminase. 3.5.1.73 Carnitinamidase. 35.4.18 ATP deaminase. 3.S. 1.74 Chenodeoxycholoyltaurine hydrolase. 35.4.19 Phosphoribosyl-AMP cyclohydrolase. 3.5.1.75 Urethanase. 3.54.20 Pyrithiamine deaminase. 3.5.1.76 Arylalkyl acylamidase. 3.54.21 Creatinine deaminase. 3.5.177 N-carbamoyl-D-amino acid hydrolase. 35.4.22 1-pyrroline-4-hydroxy-2-carboxylate 3.S. 1.78 Glutathionylspermidine amidase. deaminase. 3.S. 1.79 Phthalyl amidase. 3.54.23 Blasticidin-S deaminase. 3.5.1.81 N-acyl-D-amino-acid deacylase. 3.54.24 Sepiapterin deaminase. 3.5.1.82 N-acyl-D-glutamate deacylase. 3.S.4.25 GTP cyclohydrolase II. 3.5.1.83 N-acyl-D-aspartate deacylase. 35.426 Diaminohydroxyphosphoribosylaminopyrimidine 3.5.1.84 Biuret . deaminase. 3.5.1.85 25 3.S.4.27 Methenyltetrahydromethanopterin hydrolase. cyclohydrolase. 3.5.1.86 Mandelamide amidase. 3.54.28 S-adenosylhomocysteine deaminase. 3.5.1.87 N-carbamoyl-L-amino-acid hydrolase. 3.54.29 GTP cyclohydrolase IIa. 3.5.1.88 Peptide deformylase. 3.54.30 dCTP deaminase (dUMP-forming). 3.5.1.89 3.S.S.1 Nitrilase. acetylglucosaminylphosphatidylinositol 30 3.S.S.2 Ricinine nitrilase. deacetylase. 3.S.S.4 Cyanoalanine nitrilase. 3.5.1.90 Adenosylcobinamide hydrolase. 3.5.5.5 Arylacetonitrilase. 3.5.2.1 . 3.S.S.6 Bromoxynil nitrilase. 3.5.2.2 . 3.5.5.7 Aliphatic nitrilase. 3.5.2.3 . 3.S.S.8 Thiocyanate hydrolase. 3.5.2.4 Carboxymethylhydantoinase. 35 3.S.99.1 Riboflavinase. 3.5.2.5 Allantoinase. 3.S.99.2 . 3.5.2.6 Beta-lactamase. 3.S.99.3 Hydroxydechloroatrazine 3.5.2.7 midazolonepropionase. ethylaminohydrolase. 3.5.2.9 5-oxoprolinase (ATP-hydrolyzing). 3.5.99.4 N-isopropylammelide 3.5.2.10 Creatininase. isopropylaminohydrolase. 3.5.2.11 L-lysine-lactamase. 3.S.99.5 2-aminomuconate deaminase. 3.5.2.12 6-aminohexanoate-cyclic-dimer 40 3.S.99.6 Glucosamine-6-phosphate deaminase. hydrolase. 3.S.99.7 1-aminocyclopropane-1-carboxylate 3.5.2.13 2,5-dioxopiperazine hydrolase. deaminase. 3.5.2.14 N-methylhydantoinase (ATP 3.6. Inorganic diphosphatase. hydrolyzing). 3.6. Trimetaphosphatase. 3.5.2.15 Cyanuric acid amidohydrolase. 3.6. Adenosinetiriphosphatase. 3.5.2.16 Maleimide hydrolase. 45 3.6. . 3.5.2.17 Hydroxyisourate hydrolase. 3.6. Nucleoside-diphosphatase. 3.5.3.1 . 3.6. Acylphosphatase. 3.5.3.2 Guanidinoacetase. 3.6. ATP diphosphatase. 3.5.3.3 Creatinase. 3.6. Nucleotide diphosphatase. 3.5.3.4 Allantoicase. 3.6.1. Endopolyphosphatase. 3.5.3.5 Formimidoylaspartate deiminase. 50 3.6.1. Exopolyphosphatase. 3.5.3.6 Arginine deiminase. 3.6.1. dCTP diphosphatase. 3.5.3.7 Guanidinobutyrase. 3.6.1. ADP-ribose diphosphatase. 3.5.3.8 Formimidoylglutamase. 3.6.1. Adenosine-tetraphosphatase. 3.5.3.9 Allantoate deiminase. 3.6.1. Nucleoside-triphosphatase. 3.5.3.10 D-arginase. 3.6.1. CDP-glycerol diphosphatase. 3.5.3.11 . 55 3.6.1. Bis(5'-nucleosyl)-tetraphosphatase 3.5.3.12 Agmatine deiminase. (asymmetrical). 3.5.3.13 Formimidoylglutamate deiminase. 3.6. 18 FAD diphosphatase. 3.5.3.14 Amidinoaspartase. 3.6. 19 Nucleoside-triphosphate diphosphatase. 3.5.3.15 Protein-arginine deiminase. 3.6. 2O 5'-acylphosphoadenosine hydrolase. 3.5.3.16 Methylguanidinase. 3.6. .21 ADP-Sugar diphosphatase. 3.5.3.17 Guanidinopropionase. 3.6. 22 NAD+ diphosphatase. 3.5.3.18 Dimethylargininase. 60 3.6. 23 dUTP diphosphatase. 3.5.3.19 Ureidoglycolate hydrolase. 3.6. .24 Nucleoside phosphoacylhydrolase. 3.5.3.20 Diguanidinobutanase. 3.6. 25 Triphosphatase. 3.5.3.21 Methylenediurea deaminase. 3.6. 26 CDP-diacylglycerol diphosphatase. 3.5.3.22 Proclavaminate amidinohydrolase. 3.6. 27 Undecaprenyl-diphosphatase. 3.54.1 Cytosine deaminase. 3.6. 28 Thiamine-triphosphatase. 3.54.2 Adenine deaminase. 65 3.6. 29 Bis(5'-adenosyl)-triphosphatase. 3.543 . 3.6. 30 M(7)G(5')pppN diphosphatase. US 8,119,385 B2 99 100 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 3.6.1.31 Phosphoribosyl-ATP diphosphatase. 5 3.6.3.52 Chloroplast protein-transporting 3.6.139 Thymidine-triphosphatase. ATPase. 3.6.140 Guanosine-5'-triphosphate,3'- 3.6.353 Ag(+)-exporting ATPase. diphosphate diphosphatase. 3.6.4.1 Myosin ATPase. 3.6.1.41 Bis(5'-nucleosyl)-tetraphosphatase 3.6.4.2 ATPase. (symmetrical). 36.4.3 Microtubule-severing ATPase. 3.6.1.42 Guanosine-diphosphatase. 10 3.6.4.4 Plus-end-directed ATPase. 3.6.1.43 Dolichyldiphosphatase. 36.45 Minus-end-directed kinesin ATPase. 3.6.1.44 Oligosaccharide-diphosphodolichol 3.6.46 Vesicle-fusing ATPase. diphosphatase. 36.4.7 Peroxisome-assembly ATPase. 3.6.1.45 UDP-Sugar diphosphatase. 36.4.8 Proteasome ATPase. 3.6.1.52 Diphosphoinositol-polyphosphate 36.4.9 Chaperonin ATPase. diphosphatase. 15 36.4.10 Non-chaperonin molecular chaperone 3.6.2.1 Adenylylsulfatase. ATPase. 3.6.2.2 Phosphoadenylylsulfatase. 3.6.4.11 Nucleoplasmin ATPase. 3.6.3.1 Phospholipid-translocating ATPase. 36.5.1 Heterotrimeric G-protein GTPase. 3.6.32 Magnesium-importing ATPase. 36.5.2 Small monomeric GTPase. 3.6.3.3 Cadmium-exporting ATPase. 3.6.5.3 Protein-synthesizing GTPase. 3.6.3.4 Copper-exporting ATPase. 36.5.4 Signal-recognition-particle GTPase. 3.6.3.5 Zinc-exporting ATPase. 3.6.S.S GTPase. 36.36 Proton-exporting ATPase. 36.5.6 Tubulin GTPase. 3.6.3.7 Sodium-exporting ATPase. 3.7. Oxaloacetase. 36.38 Calcium-transporting ATPase. 3.7. Fumarylacetoacetase. 3.6.3.9 Sodium/potassium-exchanging 3.7. Kynureninase. ATPase. 3.7. Phloretin hydrolase. 3.6.3.10 Hydrogenipotassium-exchanging 25 3.7. Acylpyruvate hydrolase. ATPase. 3.7. Acetylpyruvate hydrolase. 3.6.3.11 Chloride-transporting ATPase. 3.7. Beta-diketone hydrolase. 36.312 Potassium-transporting ATPase. 3.7. 2,6-dioxo-6-phenylhexa-3-enoate 3.6.3.14 H(+)-transporting two-sector ATPase. hydrolase. 3.6.3.15 Sodium-transporting two-sector 3.7. 9 2-hydroxymuconate-semialdehyde ATPase. 30 hydrolase. 36.316 Arsenite-transporting ATPase. 3.7. O Cyclohexane-1,3-dione hydrolase. 3.6.3.17 Monosaccharide-transporting ATPase. 3.8. Alkylhalidase. 3.6.3.18 Oligosaccharide-transporting ATPase. 3.8. (S)-2-haloacid dehalogenase. 3.6.3.19 Maltose-transporting ATPase. 3.8. Haloacetate dehalogenase. 3.6.32O Glycerol-3-phosphate-transporting 3.8. Haloalkane dehalogenase. ATPase. 35 3.8. 4-chlorobenzoate dehalogenase. 3.6.321 Polar-amino-acid-transporting 3.8. 4-chlorobenzoyl-CoA dehalogenase. ATPase. 3.8. Atrazine chlorohydrolase. 3.6.322 Nonpolar-amino-acid-transporting 3.8. (R)-2-haloacid dehalogenase. ATPase. 3.8. O 2-haloacid dehalogenase (configuration 3.6.323 Oligopeptide-transporting ATPase. inverting). 3.6.324 Nickel-transporting ATPase. 3. 8. 1 1 2-haloacid dehalogenase (configuration 3.6.3.25 Sulfate-transporting ATPase. 40 retaining). 3.6.326 Nitrate-transporting ATPase. Phosphoamidase. 3.6.3.27 Phosphate-transporting ATPase. 0.1.1 N-sulfoglucosamine Sulfohydrolase. 3.6.328 Phosphonate-transporting ATPase. O.1.2 Cyclamate Sulfohydrolase. 3.6.329 Molybdate-transporting ATPase. 1.1.1 Phosphonoacetaldehyde hydrolase. 3.6.3.30 Fe(3+)-transporting ATPase. 1.1.2 Phosphonoacetate hydrolase. 3.6.3.31 Polyamine-transporting ATPase. 45 2.1.1 Trithionate hydrolase. 3.6.3.32 Quaternary-amine-transporting 3.11 UDP-Sulfoquinovose synthase. ATPase. ENZYME: 4 - 3.6.3.33 Vitamin B12-transporting ATPase. 3.6.3.34 Iron-chelate-transporting ATPase. . 3.6.3.35 Manganese-transporting ATPase. Oxalate decarboxylase. 3.6.3.36 Taurine-transporting ATPase. 50 Oxaloacetate decarboxylase. 3.6.3.37 Guanine-transporting ATPase. Acetoacetate decarboxylase. 3.6.3.38 Capsular-polysaccharide-transporting Acetolactate decarboxylase. ATPase. Aconitate decarboxylase. 3.6.3.39 Lipopolysaccharide-transporting Benzoylformate decarboxylase. ATPase. Oxalyl-CoA decarboxylase. 3.6.3.40 Teichoic-acid-transporting ATPase. Malonyl-CoA decarboxylase. 55 Aspartate 1-decarboxylase. 3.6.3.41 Heme-transporting ATPase. Aspartate 4-decarboxylase. 3.6.3.42 Beta-glucan-transporting ATPase. Valine decarboxylase. 3.6.3.43 Peptide-transporting ATPase. . 3.6.3.44 Xenobiotic-transporting ATPase. Hydroxyglutamate decarboxylase. 3.6.3.45 Steroid-transporting ATPase. . 3.6.3.46 Cadmium-transporting ATPase. 60 . 3.6.3.47 Fatty-acyl-CoA-transporting ATPase. . 3.6.3.48 Alpha-factor-transporting ATPase. Diaminopimelate decarboxylase. 3.6.3.49 Channel-conductance-controlling 2 1 Phosphoribosylaminoimidazole ATPase. carboxylase. 3.6.3.50 Protein-secreting ATPase. 4. 22 . 3.6.3.51 Mitochondrial protein-transporting 65 4. 1.23 Orotidine-5'-phosphate decarboxylase. ATPase. 4. .24 Aminobenzoate decarboxylase. US 8,119,385 B2 101 102 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 25 Tyrosine decarboxylase. 2.18 2-dehydro-3-deoxy-L-pentonate aldolase. 1.28 Aromatic-L-amino-acid decarboxylase. 2.19 Rhamnulose-1-phosphate aldolase. 1.29 Sulfinoalanine decarboxylase. 2.2O 2-dehydro-3-deoxyglucarate aldolase. 1.30 Pantothenoylcysteine decarboxylase. .2.21 2-dehydro-3-deoxy-6- 1.31 Phosphoenolpyruvate carboxylase. phosphogalactonate aldolase. 32 Phosphoenolpyruvate carboxykinase .2.22 Fructose-6-phosphate phosphoketolase. (GTP). 10 2.23 3-deoxy-D-manno-octuloSonate aldolase. .33 Diphosphomevalonate decarboxylase. .2.24 Dimethylaniline-N-oxide aldolase. .1.34 Dehydro-L-gulonate decarboxylase. 2.25 Dihydroneopterin aldolase. 1.35 UDP-glucuronate decarboxylase. 2.26 Phenylserine aldolase. : 36 Phosphopantothenoylcysteine 2.27 Sphinganine-1-phosphate aldolase. decarboxylase. 2.28 2-dehydro-3-deoxy-D-pentonate 37 Uroporphyrinogen decarboxylase. 15 aldolase. 38 Phosphoenolpyruvate carboxykinase 4. 2.29 5-dehydro-2-deoxyphosphogluconate (diphosphate). aldolase. 39 Ribulose-bisphosphate carboxylase. 2.30 17-alpha-hydroxyprogesterone aldolase. .1.40 Hydroxypyruvate decarboxylase. 2.32 Trimethylamine-oxide aldolase. .1.41 Methylmalonyl-CoA decarboxylase. 2.33 Fucosterol-epoxide lyase. .1.42 Carnitine decarboxylase. .2.34 4-(2-carboxyphenyl)-2-oxobut-3-enoate 1.43 Phenylpyruvate decarboxylase. aldolase. .44 4-carboxymuconolactone 2.35 Propioin synthase. decarboxylase. 2.36 Lactate aldolase. 4. 45 Aminocarboxymuconate-semialdehyde 2.37 Acetone-cyanohydrin lyase. decarboxylase. 2.38 Benzoin aldolase. .46 O-pyrocatechuate decarboxylase. 2.39 Hydroxynitrilase. 1.47 Tartronate-semialdehyde synthase. 25 .2.40 Tagatose-bisphosphate aldolase. .1.48 Indole-3-glycerol-phosphate synthase. .2.41 Vanillin synthase. : 49 Phosphoenolpyruvate carboxykinase .3.1 . (ATP). 3.3 N-acetylneuraminate lyase. SO Adenosylmethionine decarboxylase. 3.4 Hydroxymethylglutaryl-CoA lyase. 3-hydroxy-2-methylpyridine-4,5- 3.6 Citrate (pro-3S)-lyase. dicarboxylate 4-decarboxylase. 30 3.13 Oxalomalate lyase. 52 6-methylsalicylate decarboxylase. 3.14 3-hydroxyaspartate aldolase. 1.53 Phenylalanine decarboxylase. 3.16 4-hydroxy-2-oxoglutarate aldolase. 1.54 Dihydroxyfumarate decarboxylase. 3.17 4-hydroxy-4-methyl-2-oxoglutarate 1.55 4,5-dihydroxyphthalate decarboxylase. aldolase. 1.56 3-oxolaurate decarboxylase. 3.22 Citramalate lyase. 1.57 Methionine decarboxylase. 35 3.24 Malyl-CoA lyase. 1.58 Orsellinate decarboxylase. 3.25 Citramalyl-CoA lyase. 1.59 Gallate decarboxylase. 3.26 3-hydroxy-3-isohexenylglutaryl-CoA 160 Stipitatonate decarboxylase. lyase. 1.61 4-hydroxybenzoate decarboxylase. 3.27 Anthranilate synthase. .1.62 Gentisate decarboxylase. 3.30 Methylisocitrate lyase. 1.63 Protocatechuate decarboxylase. 3.32 2,3-dimethylmalate lyase. .64 2,2-dialkylglycine decarboxylase 40 3.34 Citryl-CoA lyase. (pyruvate). 3.35 (1-hydroxycyclohexan-1-yl)acetyl-CoA Phosphatidylserine decarboxylase. lyase. 1.66 Uracil-5-carboxylate decarboxylase. 3.36 Naphthoate synthase. 1.67 UDP-galacturonate decarboxylase. 3.38 Aminodeoxychorismate lyase. : 68 5-oxopent-3-ene-1,2,5-tricarboxylate 99.1 . decarboxylase. 45 99.2 Tyrosinephenol-lyase. 69 3,4-dihydroxyphthalate decarboxylase. .99.3 Deoxyribodipyrimidine photo-lyase. 1.70 Glutaconyl-CoA decarboxylase. 99.5 Octadecanal decarbonylase. 1.71 2-oxoglutarate decarboxylase. 99.11 Benzylsuccinate synthase. : 72 Branched-chain-2-oxoacid Carbonate . decarboxylase. Fumarate hydratase. 73 Tartrate decarboxylase. 50 Aconitate hydratase. 1.74 Indolepyruvate decarboxylase. Citrate dehydratase. 75 5-guanidino-2-oxopentanoate Arabinonate dehydratase. decarboxylase. Galactonate dehydratase. .76 Arylmalonate decarboxylase. Altronate dehydratase. 1.77 4-oxalocrotonate decarboxylase. Mannonate dehydratase. 1.78 Acetylenedicarboxylate decarboxylase. 55 Dihydroxy-acid dehydratase. 1.79 Sulfopyruvate decarboxylase. 3-dehydroquinate dehydratase. 18O 4-hydroxyphenylpyruvate decarboxylase. Phosphopyruvate hydratase. 1.81 Threonine-phosphate decarboxylase. Phosphogluconate dehydratase. .2.2 Ketotetrose-phosphate aldolase. Enoyl-CoA hydratase. .2.4 Deoxyribose-phosphate aldolase. Methylglutaconyl-CoA hydratase. 2.5 . Imidazoleglycerol-phosphate dehydratase. 2.9 Phosphoketolase. 60 . 2.10 . CyStathionine beta-synthase. .2.11 Hydroxymandelonitrile lyase. Porphobilinogen synthase. .2.12 2-dehydropantoate aldolase. L-arabinonate dehydratase. 2.13 Fructose-bisphosphate aldolase. Acetylenecarboxylate hydratase. .2.14 2-dehydro-3-deoxy-phosphogluconate Propanediol dehydratase. aldolase. 65 Glycerol dehydratase. 4. 2.17 L-fuculose-phosphate aldolase. Maleate hydratase. US 8,119,385 B2 103 104 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 4.2. 32 L(+)-tartrate dehydratase. 4.2.1.103 Cyclohexyl-isocyanide hydratase. 4.2. .33 3-isopropylmalate dehydratase. 4.2.1.104 Cyanate hydratase. 4.2. 34 (S)-2-methylmalate dehydratase. 4.2.2.1 . 4.2. 35 (R)-2-methylmalate dehydratase. 4.2.2.2 Pectate lyase. 4.2. 36 Homoaconitate hydratase. 4.2.2.3 Poly(beta-D-mannuronate) lyase. 4.2. 39 Gluconate dehydratase. 4.2.2.4 Chondroitin ABC lyase. 4.2. .40 Glucarate dehydratase. 10 4.2.2.5 Chondroitin AC lyase. 4.2. .41 5-dehydro-4-deoxyglucarate 4.2.2.6 Oligogalacturonide lyase. dehydratase. 4.2.2.7 Heparin lyase. 4.2. .42 Galactarate dehydratase. 4.2.2.8 Heparin-sulfate lyase. 4.2. 43 2-dehydro-3-deoxy-L-arabinonate 4.2.2.9 Pectate disaccharide-lyase. dehydratase. 4.2.2.10 Pectin lyase. 4.2. .44 Myo-inosose-2 dehydratase. 15 4.2.2.11 Poly(alpha-L-guluronate) lyase. 4.2. 45 CDP-glucose 4,6-dehydratase. 4.2.2.12 Xanthan lyase. 4.2. .46 dTDP-glucose 4,6-dehydratase. 4.2.2.13 EXO-(1->4)-alpha-D-glucan lyase. 4.2. 47 GDP-mannose 4,6-dehydratase. 4.2.2.14 Glucuronan lyase. 4.2. 48 D-glutamate cyclase. 4.2.2.15 Anhydrosialidase. 4.2. 49 Urocanate hydratase. 4.2.2.16 Levan fructotransferase (DFA-IV 4.2. SO Pyrazolylalanine synthase. orming). 4.2. S1 Prephenate dehydratase. 4.2.2.17 (nulin fructotransferase (DFA-I-forming). 4.2. 52 Dihydrodipicolinate synthase. 4.2.2.18 (nulin fructotransferase (DFA-III-forming). 4.2. 53 Oleate hydratase. 4.2.3.1 . 4.2. 54 Lactoyl-CoA dehydratase. 4.2.3.2 Ethanolamine-phosphate phospho-lyase. 4.2. 55 3-hydroxybutyryl-CoA dehydratase. 4.2.3.3 Methylglyoxal synthase. 4.2. S6 taconyl-CoA hydratase. 4.2.3.4 3-dehydroquinate synthase. 4.2. 57 Sohexenylglutaconyl-CoA 25 4.2.35 Chorismate synthase. hydratase. 4.2.3.6 Trichodiene synthase. 4.2. Crotonoyl-acyl-carrier-protein 4.2.3.7 Pentalenene synthase. hydratase. 4.2.38 Casbene synthase. 4.2. 59 3-hydroxyoctanoyl-acyl-carrier 4.23.9 Aristolochene synthase. protein dehydratase. 4.2.3.10 (-)-endo-fenchol synthase. 4.2. 60 3-hydroxy decanoyl-acyl-carrier 30 4.2.3.11 Sabinene-hydrate synthase. protein dehydratase. 4.2.3.12 6-pyruvoyltetrahydropterin 4.2. 61 3-hydroxypalmitoyl-acyl-carrier Synthase. protein dehydratase. 4.2.3.13 (+)-delta-cadinene synthase. 4.2. 62 5-alpha-hydroxysteroid dehydratase. 4.2.3.14 Pinene synthase. 4.2. 6S 3-cyanoalanine hydratase. 4.2.3.15 Myrcene synthase. 4.2. 66 Cyanide hydratase. 35 4.2.3.16 (4S)-limonene synthase. 4.2. .67 D-fuconate dehydratase. 4.2.3.17 Taxadiene synthase. 4.2. 68 L-fuconate dehydratase. 4.2.3.18 Abietadiene synthase. 4.2. 69 Cyanamide hydratase. 4.2.3.19 Ent-kaurene synthase. 4.2. 70 Pseudouridylate synthase. 42.32O (+)-limonene synthase. 4.2. 73 Protoaphin-aglucone dehydratase 4.2.3.21 Vetispiradiene synthase. (cyclizing). 4.2.99.12 Carboxymethyloxysuccinate lyase. 4.2. .74 Long-chain-enoyl-CoA hydratase. 40 4.2.99.18 DNA-(apurinic or apyrimidinicsite) 4.2. 75 Uroporphyrinogen-III synthase. yase. 4.2. .76 UDP-glucose 4,6-dehydratase. 4.2.99.19 2-hydroxypropyl-CoM lyase. 4.2. 77 Trans-L-3-hydroxyproline 4.3.1.1 Aspartate ammonia-lyase. dehydratase. 4.3. Methylaspartate ammonia-lyase. 4.2. .78 (S)-norcoclaurine synthase. 4.3. Histidine ammonia-lyase. 4.2. .79 2-methylcitrate dehydratase. 45 4.3. i Formimidoyltetrahydrofolate 4.2. 8O 2-oxopent-4-enoate hydratase. cyclodeaminase. 4.2. 81 D(-)-tartrate dehydratase. 4.3. Phenylalanine ammonia-lyase. 4.2. 82 Xylonate dehydratase. 4.3. Beta-alanyl-CoA ammonia-lyase. 4.2. .83 4-oxalmesaconate hydratase. 4.3. Ethanolamine ammonia-lyase. 4.2. 84 . 4.3. Glucosaminate ammonia-lyase. 4.2. .85 Dimethylmaleate hydratase. 50 4.3.1.10 Serine-Sulfate ammonia-lyase. 4.2. 86 6-dehydroprogesterone hydratase. 4.3.1.11 Dihydroxyphenylalanine ammonia 4.2. .87 Octopamine dehydratase. yase. 4.2. 88 Synephrine dehydratase. 4.3.1.12 Ornithine cyclodeaminase. 4.2. 89 Carnitine dehydratase. 4.3.1.13 Carbamoyl-serine ammonia-lyase. 4.2. 90 L-rhamnonate dehydratase. 4.3.1.14 3-aminobutyryl-CoA ammonia 4.2. .91 Carboxycyclohexadienyl dehydratase. 55 yase. 4.2. 92 Hydroperoxide dehydratase. 4.3.1.15 Diaminopropionate ammonia-lyase. 4.2. .93 ATP-dependent NAD(P)H-hydrate 4.3.1. 6 Threo-3-hydroxyaspartate dehydratase. ammonia-lyase. 4.2. .94 Scytalone dehydratase. 4.3.1. L-serine ammonia-lyase. 4.2. 95 Kievitone hydratase. 4.3.1. D-serine ammonia-lyase. 4.2. .96 4a-hydroxytetrahydrobiopterin 4.3.1. Threonine ammonia-lyase. dehydratase. 60 4.3.1.2 : Erythro-3-hydroxyaspartate 4.2. 97 Phaseollidin hydratase. ammonia-lyase. 4.2. .98 6-alpha-hydroxyprogesterone 4.3.2. Argininosuccinate lyase. dehydratase. 4.3.2.2 lyase. 4.2. 2-methylisocitrate dehydratase. 4.3.2.3 Ureidoglycolate lyase. 4.2. Cyclohexa-1,5-dienecarbonyl-CoA 4.3.2.4 Purine imidazole-ring cyclase. hydratase. 65 4.3.2.5 Peptidylamidoglycolate lyase. 4.2. 101 Trans-feruloyl-CoA hydratase. 4.3.3.1 3-ketovalidoxylamine C-N-lyase. US 8,119,385 B2 105 106 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 4.3. 3.2 Strictosidine synthase. 3.9 N-acylglucosamine-6-phosphate 2 4.3. 3.3 Deacetylisoipecoside synthase. epimerase. 4.3. 3.4 Deacetylipecoside synthase. 3.10 CDP-abequose epimerase. 4.4. CyStathionine gamma-lyase. 3.11 Cellobiose epimerase. 4.4. Homocysteine desulfhydrase. 3.12 UDP-glucuronate 5'-epimerase. 4.4. Dimethylpropiothetin dethiomethylase. 3.13 dTDP-4-dehydrorhamnose 3,5- 4.4. Alliin lyase. 10 epimerase. 4.4. Lactoylglutathione lyase. 3.14 UDP-N-acetylglucosamine 2 4.4. S-alkylcysteine lyase. epimerase. 4.4. CyStathionine beta-lyase. 3.15 Glucose-6-phosphate 1-epimerase. 4.4.1. L-3-cyanoalanine synthase. 3.16 UDP-glucosamine 4-epimerase. 4.4.1. Cysteine lyase. 3.17 Heparosan-N-Sulfate-glucuronate 5 4.4. .11 Methionine gamma-lyase. 15 epimerase. 4.4. 13 Cysteine-S-conjugate beta-lyase. 3.18 GDP-mannose 3,5-epimerase. 4.4. .14 -aminocyclopropane-1-carboxylate 3.19 Chondroitin-glucuronate 5 synthase. epimerase. 4.4. 1S D-cysteine desulfhydrase. 3.2O ADP-glyceroman no-heptose 6 4.4. 16 Selenocysteine lyase. epimerase. 4.4. 17 Holocytochrome-c synthase. 3.21 Maltose epimerase. 4.4. 19 Phosphosulfolactate synthase. 99.1 Methylmalonyl-CoA epimerase. 4.4. 2O Leukotriene-C(4) synthase. 99.2 16-hydroxysteroid epimerase. 4.5. DDT-dehydrochlorinase. .99.3 Allantoin racemase. 4.5. 3-chloro-D-alanine dehydrochlorinase. 99.4 Alpha-methylacyl-CoA racemase. 4.5. Dichloromethane dehalogenase. Maleate isomerase. 4.5. L-2-amino-4-chloropent-4-enoate Maleylacetoacetate isomerase. dehydrochlorinase. 25 Retinal isomerase. 4.5. S-carboxymethylcysteine synthase. Maleylpyruvate isomerase. 4.6. Adenylate cyclase. Linoleate isomerase. 4.6. Guanylate cyclase. Furylfuramide isomerase. 4.6.1. Cytidylate cyclase. Retinol isomerase. 4.6. 2-C-methyl-D-erythritol 2,4- Peptidylprolyl isomerase. cyclodiphosphate synthase. 30 Farnesol 2-isomerase. 4.6. 13 Phosphatidylinositol diacylglycerol-lyase. O 2-chloro-4-carboxymethylenebut-2-en-1,4- 4.6. .14 Glycosylphosphatidylinositol olide isomerase. diacylglycerol-lyase. 5.2.1. 1 1 4-hydroxyphenylacetaldehyde-oxime 4.6 15 FAD-AMP lyase (cyclizing). isomerase. 4.99.1.1 Ferrochelatase. 5.3. Triose-phosphate isomerase. 4.99.1.2 Alkylmercury lyase. 35 5.3. Arabinose isomerase. 4.99.13 Sirohydrochlorin cobaltochelatase. 5.3. L-arabinose isomerase. 4.99.14 Sirohydrochlorin ferrochelatase. 5.3. Xylose isomerase. 4.99.15 Aliphatic aldoxime dehydratase. 5.3. Ribose-5-phosphate isomerase. 4.99.1.6 Indoleacetaldoxime dehydratase. 5.3. Mannose isomerase. ENZYME: 5. . . . 5.3. Mannose-6-phosphate isomerase. 5.3. Glucose-6-phosphate isomerase. Alanine racemase. 40 5.3.1. Glucuronate isomerase. Methionine racemase. 5.3. 13 Arabinose-5-phosphate isomerase. Glutamate racemase. 5.3. .14 L-rhamnose isomerase. Proline racemase 5.3. 1S D-lyxose ketol-isomerase. Lysine racemase. 5.3. 16 -(5-phosphoribosyl)-5-((5- Threonine racemase. phosphoribosylamino)methylideneamino)imidazole-4- Diaminopimelate epimerase. 45 carboxamide isomerase. 4-hydroxyproline epimerase. 5.3. 17 4-deoxy-L-threo-5-hexoSulose-uronate Arginine racemase. ketol-isomerase. Amino-acid racemase. 5.3. 2O Ribose isomerase. Phenylalanine racemase (ATP 5.3. .21 side-chain-isomerase. hydrolyzing). 5.3. 22 Hydroxypyruvate isomerase. Ornithine racemase. 50 5.3. 23 S-methyl-5-thioribose-1-phosphate 1.13 Aspartate racemase. isomerase. .1.14 Nocardicin-A epimerase. 5.3. .24 Phosphoribosylanthranilate isomerase. 1.15 2-aminohexano-6-lactam racemase. 5.3. 25 L-fucose isomerase. .1.16 Protein-serine epimerase. 5.3. 26 Galactose-6-phosphate isomerase. 1.17 Isopenicillin-N epimerase. 5.3. 2.1 Phenylpyruvate tautomerase. .2.1 Lactate racemase. 55 5.3. 2.2 Oxaloacetate tautomerase. .2.2 Mandelate racemase. 5.3. 3.1 Steroid delta-isomerase. 2.3 3-hydroxybutyryl-CoA epimerase. 5.3. 3.2 sopentenyl-diphosphate delta-isomerase. .2.4 Acetoin racemase. 5.3. 3.3 Vinylacetyl-CoA delta-isomerase. 2.5 Tartrate epimerase. 5.3. 3.4 Muconolactone delta-isomerase .2.6 Isocitrate epimerase. 5.3. 3.5 Cholestenol delta-isomerase .3.1 Ribulose-phosphate 3-epimerase. 5.3. 3.6 Methylitaconate delta-isomerase. 3.2 UDP-glucose 4-epimerase. 60 5.3. 3.7 Aconitate delta-isomerase. 3.3 Aldose 1-epimerase. 5.3. 3.8 Dodecenoyl-CoA delta-isomerase. 3.4 L-ribulose-phosphate 4-epimerase. 5.3. 3.9 Prostaglandin-A(1) delta-isomerase. 3.5 UDP-arabinose 4-epimerase. 5.3. 3.10 5-carboxymethyl-2-hydroxymuconate 3.6 UDP-glucuronate 4-epimerase. delta-isomerase. 3.7 UDP-N-acetylglucosamine 4 5.3. 3.11 Isopiperitenone delta-isomerase. epimerase. 65 5.3. 3.12 Dopachrome isomerase. 3.8 N-acylglucosamine 2-epimerase. 5.3. 3.13 Polyenoic fatty acid isomerase. US 8,119,385 B2 107 108 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 53.4.1 Protein disulfide-isomerase. 6.1.1.11 Serine--tRNA ligase. 5.3.99.2 Prostaglandin-D synthase. 6.1.1.12 Aspartate--tRNA ligase. 5.3.99.3 Prostaglandin-E synthase. 6.1.1.13 D-alanine-poly(phosphoribitol) ligase. 5.3.99.4 Prostaglandin-I synthase. 6.1.1.14 Glycine-tRNA ligase. 5.3.99.5 Thromboxane-A Synthase. 6.1.1.15 Proline--tRNA ligase. 5.3.99.6 Allene-oxide cyclase. 6.1.1.16 Cysteine--tRNA ligase. 5.3.99.7 Styrene-oxide isomerase. 10 6.1.1.17 Glutamate--tRNA ligase. 5.4.1.1 Lysolecithin acylmutase. 6.1.1.18 Glutamine--tRNA ligase. 5.4.1.2 Precorrin-8X methylmutase. 6.1.1.19 Arginine--tRNA ligase. 54.2.1 Phosphoglycerate . 6.1.1.2O Phenylalanine--tRNA ligase. 54.2.2 . 6.1.1.21 Histidine--tRNA ligase. 54.23 Phosphoacetylglucosamine mutase. 6.1.1.22 Asparagine--tRNA ligase. 5.4.2.4 Bisphosphoglycerate mutase. 15 6.1.1.23 Aspartate--tRNA(ASn) ligase. 54.25 Phosphoglucomutase (glucose 6.1.1.24 Glutamate--tRNA(Gln) ligase. ofactor). 6.1.1.25 Lysine--tRNA(Pyl) ligase. 54.2.6 eta-phosphoglucomutase. 6.2.1.1 Acetate-CoA ligase. S.4.2.7 hosphopentomutase. 6.2.1.2 Butyrate--CoA ligase. 542.8 hosphomannomutase. 6.2.1.3 Long-chain-fatty-acid-CoA ligase. 5.42.9 hosphoenolpyruvate mutase. 6.2.1.4 Succinate--CoA ligase (GDP-forming). 54.2.10 hosphoglucosamine mutase. 6.2.1.5 Succinate--CoA ligase (ADP 543.2 Lysine 2,3-aminomutase. forming). 5.4.3.3 Beta-lysine 5,6-aminomutase. 6.2.1.6 Glutarate--CoA ligase. 5.43.4 D-lysine S,6-aminomutase. 6.2.1.7 Cholate--CoA ligase. 5.43.5 D-ornithine 4,5-aminomutase. 6.2.1.8 Oxalate--CoA ligase. 5.43.6 Tyrosine 2,3-aminomutase. 6.2.19 Malate--CoA ligase. 5.43.7 Leucine 2,3-aminomutase. 25 6.2.1.10 Acid--CoA ligase (GDP-forming). 5.43.8 Glutamate-1-semialdehyde 2,1- 6.2.1.11 Biotin-CoA ligase. aminomutase. 6.2.1.12 4-coumarate--CoA ligase. 5.4.4.1 (Hydroxyamino)benzene mutase. 6.2.1.13 Acetate-CoA ligase (ADP-forming). 5.4.4.2 Sochorismate synthase. 6.2.1.14 6-carboxylhexanoate--CoA ligase. 5.44.3 3-(hydroxyamino)phenol mutase. 6.2.1.15 Arachidonate--CoA ligase. 5.499.1 Methylaspartate mutase. 30 6.2.1.16 Acetoacetate--CoA ligase. 5.4.99.2 Methylmalonyl-CoA mutase. 6.2.1.17 Propionate-CoA ligase. 5.499.3 2-acetolactate mutase. 6.2.1.18 Citrate--CoA ligase. 5.499.4 2-methyleneglutarate mutase. 6.2.1.19 Long-chain-fatty-acid-luciferin 5.499.5 Chorismate mutase. component ligase. 5.4.99.7 synthase. 6.2.1.2O Long-chain-fatty-acid-acyl-carrier 5.4.99.8 Cycloartenol synthase. 35 protern ligase. 5.4.99.9 UDP-galactopyranose mutase. 6.2.1.22 Citrate (pro-3S)-lyase ligase. 5.499.11 Isomalitulose synthase. 6.2.1.23 Dicarboxylate-CoA ligase. 5.499.12 tRNA-pseudouridine synthase I. 6.2.1.24 Phytanate--CoA ligase. 5.499.13 Isobutyryl-CoA mutase. 6.2.1.25 Benzoate--CoA ligase. 5.499.14 4-carboxymethyl-4- 6.2.1.26 O-Succinylbenzoate--CoA ligase. methylbutenolide mutase. 6.2.1.27 4-hydroxybenzoate--CoA ligase. 5.499.15 (1->4)-alpha-D-glucan 1-alpha-D- 40 6.2.1.28 3-alpha,7-alpha-dihydroxy-5-beta glucosylmutase. cholestanate-CoA ligase. 5.499.16 Maltose alpha-D- 6.2.1.29 3-alpha,7-alpha,12-alpha-trihydroxy glucosyltransferase. 5-beta-cholestanate--CoA ligase. 5.499.17 Squalene-hopene cyclase. 6.2.1.30 Phenylacetate--CoA ligase. S.S.1.1 Muconate cycloisomerase. 6.2.1.31 2-furoate--CoA ligase. S.S. 1.2 3-carboxy-cis,cis-muconate 45 6.2.1.32 Anthranilate--CoA ligase. cycloisomerase. 6.2.1.33 4-chlorobenzoate-CoA ligase. S.S. 1.3 Tetrahydroxypteridine cycloisomerase. 6.2.1.34 Trans-feruloyl-CoA synthase. 5.5.1.4 Inositol-3-phosphate synthase. 6.3.1.1 Aspartate--ammonia ligase. 5.5.1.5 Carboxy-cis,cis-muconate cyclase. 6.3.1.2 Glutamate--ammonia ligase. S.S. 1.6 Chalcone isomerase. 6.3.1.4 Aspartate--ammonia ligase (ADP 5.5.1.7 Chloromuconate cycloisomerase. 50 forming). S.S.1.8 Geranyl-diphosphate cyclase. 6.3.1.5 NAD(+) synthase. S.S. 1.9 Cycloeucalenol cycloisomerase. 6.3.1.6 Glutamate--ethylamine ligase. S.S.1.10 Alpha-pinene-oxide decyclase. 6.3.1.7 4-methyleneglutamate--ammonia S.S.1.11 Dichloromuconate cycloisomerase. ligase. S.S. 1.12 Copalyl diphosphate synthase. 6.3.1.8 Glutathionylspermidine synthase. S.S. 1.13 Ent-copalyl diphosphate synthase. 55 6.3.1.9 Trypanothione synthase. 5.99.1.1 Thiocyanate isomerase. 6.3.1.10 Adenosylcobinamide-phosphate 5.99.1.2 DNA topoisomerase. synthase. 5.99.13 DNA topoisomerase (ATP-hydrolyzing). 6.3.2.1 ENZYME: 6. . . . Pantoate--beta-alanine ligase. 6.3.2.2 Glutamate-cysteine ligase. 6.1.1.1 Tyrosine--tRNA ligase. 6.3.2.3 Glutathione synthase. 6.1.1.2 Tryptophan-tRNA ligase. 60 6.3.2.4 D-alanine--D-alanine ligase. 6.1.1.3 Threonine--tRNA ligase. 6.3.2.5 Phosphopantothenate-cysteine ligase. 6.1.1.4 Leucine-tRNA ligase. 6.3.2.6 PhosphoribosylaminoimidazoleSuccinocarboxamide 6.1.1.5 Isoleucine--tRNA ligase. synthase. 6.1.1.6 Lysine--tRNA ligase. 6.3.2.7 UDP-N-acetylmuramoyl-L-alanyl-D- 6.11.7 Alanine--tRNA ligase. glutamate--L-lysine ligase. 6.1.1.9 Valine--tRNA ligase. 65 6.3.2.8 UDP-N-acetylmuramate--L-alanine 6.1.1.10 Methionine--tRNA ligase. ligase. US 8,119,385 B2 109 110 TABLE 2-continued TABLE 2-continued EC Numbers with the corresponding name given to each enzyme EC Numbers with the corresponding name given to each enzyme class, Subclass and Sub-Subclass. class, Subclass and Sub-Subclass. 6.3.2.9 UDP-N-acetylmuramoylalanine--D- 6.4.1.3 Propionyl-CoA carboxylase. glutamate ligase. 6.4.1.4 Methylcrotonoyl-CoA carboxylase. 6.3.2.10 UDP-N-acetylmuramoyl-tripeptide-- 6.4.15 Geranoyl-CoA carboxylase. D-alanyl-D-alanine ligase. 6.4.1.6 Acetone carboxylase. 6.3.2.11 Carnosine synthase. 6.5.1.1 DNA ligase (ATP). 6.3.2.12 Dihydrofolate synthase. 6.5.1.2 DNA ligase (NAD+). 6.3.2.13 UDP-N-acetylmuramoylalanyl-D- 10 6.5.13 RNA ligase (ATP). glutamate-2,6-diaminopimelate ligase. 6.5.1.4 RNA-3'-phosphate cyclase. 6.3.2.14 2,3-dihydroxybenzoate-serine 6.6.1.1 Magnesium chelatase. igase. 6.6.1.2 Cobaltochelatase. 6.3.2.16 D-alanine-alanyl 6.3.4.17 Formate-dihydrofolate ligase. poly(glycerolphosphate) ligase. 6.35.1 NAD(+) synthase (glutamine 6.3.2.17 Tetrahydrofolylpolyglutamate 15 hydrolyzing). synthase. 6.35.2 GMP synthase (glutamine-hydrolyzing). 6.3.2.18 Gamma-glutamylhistamine 6.3.5.3 Phosphoribosylformylglycinamidine synthase. synthase. 6.3.2.19 Ubiquitin-protein ligase. 6.35.4 Asparagine synthase (glutamine 6.3.2.2O indoleacetate-lysine synthetase. hydrolyzing). 6.3.2.21 Ubiquitin--calmodulin ligase. 6.3.S.S Carbamoyl-phosphate synthase 6.3.2.22 Diphthine--ammonia ligase. (glutamine-hydrolyzing). 6.3.2.23 Homoglutathione synthase. 6.35.6 Asparaginyl-tRNA synthase (glutamine 6.3.2.24 Tyrosine-arginine ligase. hydrolyzing). 6.3.2.25 Tubulin--tyrosine ligase. 6.3.5.7 Glutaminyl-tRNA synthase (glutamine 6.3.2.26 N-(5-amino-5-carboxypentanoyl)-L- hydrolyzing). cysteinyl-D-valine synthase. 6.35.8 Aminodeoxychorismate synthase. 6.3.2.27 Aerobactin synthase. 25 6.35.9 Hydrogenobyrinic acid a,c-diamide 6.3.3.1 Phosphoribosylformylglycinamidine synthase (glutamine-hydrolyzing). cyclo-ligase. 6.35.10 Adenosylcobyric acid synthase (glutamine 6.3.3.2 5-formyltetrahydrofolate cyclo hydrolyzing). ligase. 6.4.1.1 Pyruvate carboxylase. 6.3.3.3 Dethiobiotin synthase. 6.4.1.2 Acetyl-CoA carboxylase. 6.3.3.4 (Carboxyethyl)arginine beta-lactam 30 6.4.1.3 Propionyl-CoA carboxylase. synthase. 6.4.1.4 Methylcrotonoyl-CoA carboxylase. 6.3.4.1 GMP synthase. 6.4.15 Geranoyl-CoA carboxylase. 6.3.4.2 CTP synthase. 6.4.1.6 Acetone carboxylase. 6.3.4.3 Formate--tetrahydrofolate ligase. 6.5.1.1 DNA ligase (ATP). 6.3.4.4 AdenyloSuccinate synthase. 6.5.1.2 DNA ligase (NAD+). 6.3.4.5 Argininosuccinate synthase. 35 6.5.13 RNA ligase (ATP). 6.3.4.6 Urea carboxylase. 6.5.1.4 RNA-3'-phosphate cyclase. 6.3.4.7 Ribose-5-phosphate-ammonia 6.6.1.1 Magnesium chelatase. igase. 6.6.1.2 Cobaltochelatase. 6.3.4.8 midazoleacetate-- phosphoribosyldipliosphate ligase. 6.3.4.9 Biotin-methylmalonyl-CoA Table 3 Summarizes exemplary functions of exemplary carboxytransferase ligase. 40 6.3.4.10 Biotin-propionyl-CoA enzymes of the invention; these enzyme functions were deter carboxylase (ATP-hydrolyzing) ligase. mined using sequence identity comparison analysis using 6.3.4.11 Biotin-methylcrotonoyl-CoA closest BLAST hits to the exemplary polypeptides and poly carboxylase ligase. nucleotides of the invention. 6.3.4.12 Glutamate-methylamine ligase. 6.3.4.13 -glycine 45 The invention also provides isolated and recombinant igase. nucleic acids encoding polypeptides, e.g., SEQ ID NO:1, 6.3.4.14 Biotin carboxylase. 6.3.4.15 Biotin-acetyl-CoA-carboxylase ligase. SEQID NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, 6.3.4.16 Carbamoyl-phosphate synthase etc., and all additional nucleic acids disclosed in the SEQID (ammonia). listing, which include all odd numbered SEQID NOs: from 6.3.4.17 Formate-dihydrofolate ligase. 50 SEQ ID NO:1 through SEQ ID NO:26,897 (the exemplary 6.35.1 NAD(+) synthase (glutamine hydrolyzing). polynucleotides of the invention). The invention also pro 6.35.2 GMP synthase (glutamine-hydrolyzing). vides isolated and recombinant polypeptides, SEQID NO:2. 6.3.5.3 Phosphoribosylformylglycinamidine SEQID NO:4, SEQID NO:6, SEQIDNO:8, SEQID NO:10, synthase. etc., and all polypeptides disclosed in the SEQ ID listing, 6.35.4 Asparagine synthase (glutamine hydrolyzing). 55 which include all even numbered SEQID NOs: from SEQID 6.3.S.S Carbamoyl-phosphate synthase NO:2 through SEQID NO:26,898 (the exemplary polypep (glutamine-hydrolyzing). tides of the invention). 6.35.6 Asparaginyl-tRNA synthase (glutamine In another embodiment, the polypeptides of the invention hydrolyzing). 6.3.5.7 Glutaminyl-tRNA synthase (glutamine can be expressed in any expression system, in vitro or in vivo, hydrolyzing). 60 e.g., any microorganism or other cell system (e.g., eukaryotic, 6.35.8 Aminodeoxychorismate synthase. Such as yeast or mammaliancells) using procedures known in 6.35.9 Hydrogenobyrinic acid a,c-diamide the art. In other aspects, the polypeptides of the invention can synthase (glutamine-hydrolyzing). 6.35.10 Adenosylcobyric acid synthase be immobilized on a solid support prior to use in the methods (glutamine-hydrolyzing). of the invention. Methods for immobilizing enzymes on solid 6.4.1.1 Pyruvate carboxylase. 65 Supports are commonly known in the art, for example J. Mol. 6.4.1.2 Acetyl-CoA carboxylase. Cat. B: Enzymatic 6 (1999)29-39; Chivataetal. Biocatalysis: Immobilized cells and enzymes, J. Mol. Cat. 37 (1986) 1-24: US 8,119,385 B2 111 112 Sharma et al., Immobilized Biomaterials Techniques and structures as the naturally occurring protein are also included. Applications, Angew. Chem. Int. Ed. Engl. 21 (1982) 837-54: An example of this, is a "pro-form molecule, such as a low Laskin (Ed.). Enzymes and mobilized Cells in Biotechnol activity proprotein that can be modified by cleavage to pro Ogy. duce a mature enzyme with significantly higher activity. The term “variant” refers to polynucleotides or polypep DEFINITIONS tides of the invention modified at one or more base pairs, codons, introns, exons, or amino acid residues (respectively) A “coding sequence of ora'sequence encodes' aparticu yet still retain the biological activity of a polypeptide, lar polypeptide or protein, is a nucleic acid sequence which is enzyme, protein, e.g. structural or binding protein, of the transcribed and translated into a polypeptide or protein when 10 invention. Variants can be produced by any number of means placed under the control of appropriate regulatory sequences. included methods such as, for example, error-prone PCR, A promoter sequence is "operably linked to a coding shuffling, oligonucleotide-directed mutagenesis, assembly sequence when RNA polymerase which initiates transcrip PCR, sexual PCR mutagenesis, in vivo mutagenesis, cassette tion at the promoter will transcribe the coding sequence into mutagenesis, recursive ensemble mutagenesis, exponential mRNA. 15 ensemble mutagenesis, site-specific mutagenesis, gene reas The phrase “substantially identical in the context of two sembly, GSSM and any combination thereof. nucleic acids or polypeptides, refers to two or more sequences The term “saturation mutagenesis”. Gene Site Saturation that have, e.g., at least about 50%, 51%, 52%, 53%, 54%, Mutagenesis, or “GSSM includes a method that uses degen 55%, 56%, 57%, 58%, 59%, 60%, 61%. 62%, 63%, 64%, erate oligonucleotide primers to introduce point mutations 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, into a polynucleotide, as described in detail, below. 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, The term “optimized directed evolution system” or “opti 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, mized directed evolution' includes a method for reassem 95%, 96%, 97%, 98%, 99%, or more nucleotide oramino acid bling fragments of related nucleic acid sequences, e.g., residue (sequence) identity, when compared and aligned for related genes, and explained in detail, below. maximum correspondence, as measured using one of the 25 The term “synthetic ligation reassembly' or “SLR' known sequence comparison algorithms or by visual inspec includes a method of ligating oligonucleotide fragments in a tion. In alternative aspects, the Substantial identity exists over non-stochastic fashion, and explained in detail, below. a region of at least about 100 or more residues and most Nucleic Acids commonly the sequences are Substantially identical over at The invention provides nucleic acids (e.g., the exemplary least about 150 to 200 or more residues. In some aspects, the 30 SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQID NO:7, sequences are Substantially identical over the entire length of SEQ ID NO:9, SEQ ID NO:11, etc., including all nucleic the coding regions. acids disclosed in the SEQID listing, which include all odd Additionally a “substantially identical amino acid numbered SEQID NOs: from SEQID NO:1 through SEQID sequence is a sequence that differs from a reference sequence NO:26,897), including expression cassettes such as expres by one or more conservative or non-conservative amino acid 35 sion vectors, encoding polypeptides (e.g., enzymes) of the Substitutions, deletions, or insertions. In one aspect, the Sub invention. The invention also includes methods for discover stitution occurs at a site that is not the active site of the ing new polypeptide (e.g., enzyme) sequences using the molecule, or, alternatively the Substitution occurs at a site that nucleic acids of the invention. The invention also includes is the active site of the molecule, provided that the polypep methods for inhibiting the expression of enzymes, genes, tide essentially retains its functional (enzymatic) properties. 40 transcripts and polypeptides using the nucleic acids of the A conservative amino acid Substitution, for example, Substi invention. Also provided are methods for modifying the tutes one amino acid for another of the same class (e.g., nucleic acids of the invention by, e.g., synthetic ligation reas Substitution of one hydrophobic amino acid, such as isoleu sembly, optimized directed evolution system and/or satura cine, Valine, leucine, or methionine, for another, or Substitu tion mutagenesis. tion of one polar amino acid for another, such as Substitution 45 The nucleic acids of the invention can be made, isolated of arginine for lysine, for aspartic acid or and/or manipulated by, e.g., cloning and expression of cDNA glutamine for asparagine). One or more amino acids can be libraries, amplification of message or genomic DNA by PCR, deleted, for example, from a polypeptide, resulting in modi and the like. For example, exemplary sequences of the inven fication of the structure of the polypeptide, without signifi tion were initially derived from environmental sources. cantly altering its biological activity. For example, amino- or 50 In one aspect, the invention provides nucleic acids, and the carboxyl-terminal amino acids that are not required for a polypeptides encoded by them, with a common novelty in polypeptide, enzyme, protein, e.g. structural or binding pro that they are derived from a common source, e.g., an environ tein, biological activity can be removed. Modified polypep mental or a bacterial source. tide sequences of the invention can be assayed for enzyme, In practicing the methods of the invention, homologous structural or binding activity by any number of methods, 55 genes can be modified by manipulating a template nucleic including contacting the modified polypeptide sequence with acid, as described herein. The invention can be practiced in a Substrate and determining whether the modified polypep conjunction with any method or protocol or device known in tide decreases the amount of specific Substrate in the assay or the art, which are well described in the scientific and patent increases the bioproducts of the reaction of a functional literature. polypeptide, enzyme, protein, e.g. structural or binding pro 60 The phrases “nucleic acid' or “nucleic acid sequence' as tein, with the substrate. Assays for enzyme activity are well used herein refer to an oligonucleotide, nucleotide, poly known in the art. nucleotide, or to a fragment of any of these, to DNA or RNA "Fragments' as used herein are a portion of a naturally of genomic or synthetic origin which may be single-stranded occurring protein which can exist in at least two different or double-stranded and may represent a sense or antisense conformations. Fragments can have the same or Substantially 65 (complementary) Strand, to peptide nucleic acid (PNA), or to the same amino acid sequence as the naturally occurring any DNA-like or RNA-like material, natural or synthetic in protein. Fragments which have different three dimensional origin. The phrases “nucleic acid' or “nucleic acid sequence' US 8,119,385 B2 113 114 includes oligonucleotide, nucleotide, polynucleotide, or to a Scription by inducible promoters include anaerobic condi fragment of any of these, to DNA or RNA (e.g., mRNA, tions, elevated temperature, drought, or the presence of light. rRNA, tRNA, iRNA) of genomic or synthetic origin which “Plasmids’ can be commercially available, publicly avail may be single-stranded or double-stranded and may represent able on an unrestricted basis, or can be constructed from a sense or antisense Strand, to peptide nucleic acid (PNA), or available plasmids in accord with published procedures. to any DNA-like or RNA-like material, natural or synthetic in Equivalent plasmids to those described herein are known in origin, including, e.g., iRNA, ribonucleoproteins (e.g., e.g., the art and will be apparent to the ordinarily skilled artisan. double stranded iRNAs, e.g., iRNPs). The term encompasses In one aspect, the term “recombinant’ means that the nucleic acids, i.e., oligonucleotides, containing known ana nucleic acid is adjacent to a “backbone' nucleic acid to which 10 it is not adjacentin its natural environment. Additionally, to be logues of natural nucleotides. The term also encompasses “enriched the nucleic acids will represent 5% or more of the nucleic-acid-like structures with synthetic backbones, see number of nucleic acid inserts in a population of nucleic acid e.g., Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197: backbone molecules. Backbone molecules according to the Strauss-Soukup (1997) Biochemistry 36:8692-8698; Sam invention include nucleic acids such as expression vectors, stag (1996) Antisense Nucleic Acid Drug Dev 6:153-156. 15 self-replicating nucleic acids, viruses, integrating nucleic “Oligonucleotide' includes either a single stranded acids and other vectors or nucleic acids used to maintain or poly deoxynucleotide or two complementary polydeoxy manipulate a nucleic acid insert of interest. Typically, the nucleotide Strands which may be chemically synthesized. enriched nucleic acids represent 15% or more of the number Such synthetic oligonucleotides have no 5' phosphate and of nucleic acid inserts in the population of recombinant back thus will not ligate to another oligonucleotide without adding bone molecules. More typically, the enriched nucleic acids a phosphate with an ATP in the presence of a kinase. A represent 50% or more of the number of nucleic acid inserts in synthetic oligonucleotide can ligate to a fragment that has not the population of recombinant backbone molecules. In a one been dephosphorylated. aspect, the enriched nucleic acids represent 90% or more of A “coding sequence of or a "nucleotide sequence encod the number of nucleic acid inserts in the population of recom ing a particular polypeptide or protein, is a nucleic acid 25 binant backbone molecules. sequence which is transcribed and translated into a polypep One aspect of the invention is an isolated nucleic acid tide or protein when placed under the control of appropriate comprising one of the sequences of the invention, or a frag regulatory sequences. The term "gene' means the segment of ment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, DNA involved in producing a polypeptide chain; it includes 100, 150, 200, 300, 400, or 500 or more consecutive bases of regions preceding and following the coding region (leader 30 a nucleic acid of the invention. The isolated, nucleic acids and trailer) as well as, where applicable, intervening may comprise DNA, including cDNA, genomic DNA and sequences (introns) between individual coding segments (ex synthetic DNA. The DNA may be double-stranded or single ons). “Operably linked as used herein refers to a functional Stranded and if single stranded may be the coding Strand or relationship between two or more nucleic acid (e.g., DNA) non-coding (anti-sense) strand. Alternatively, the isolated segments. Typically, it refers to the functional relationship of 35 nucleic acids may comprise RNA. transcriptional regulatory sequence to a transcribed The isolated nucleic acids of the invention may be used to sequence. For example, a promoter is operably linked to a prepare one of the polypeptides of the invention, or fragments coding sequence, such as a nucleic acid of the invention, if it comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, stimulates or modulates the transcription of the coding or 150 or more consecutive amino acids of one of the polypep sequence in an appropriate host cell or other expression sys 40 tides of the invention. Accordingly, another aspect of the tem. Generally, promoter transcriptional regulatory invention is an isolated nucleic acid which encodes one of the sequences that are operably linked to a transcribed sequence polypeptides of the invention, or fragments comprising at are physically contiguous to the transcribed sequence, i.e., least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more they are cis-acting. However, Some transcriptional regulatory consecutive amino acids of one of the polypeptides of the sequences. Such as enhancers, need not be physically contigu 45 invention. The coding sequences of these nucleic acids may ous or located in close proximity to the coding sequences be identical to one of the coding sequences of one of the whose transcription they enhance. nucleic acids of the invention or may be different coding As used herein, the term “promoter includes all sequences sequences which encode one of the of the invention having at capable of driving transcription of a coding sequence in a cell, least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more e.g., a plant cell. Thus, promoters used in the constructs of the 50 consecutive amino acids of one of the polypeptides of the invention include cis-acting transcriptional control elements invention, as a result of the redundancy or degeneracy of the and regulatory sequences that are involved in regulating or genetic code. The genetic code is well known to those of skill modulating the timing and/or rate of transcription of a gene. in the art and can be obtained, e.g., on page 214 of B. Lewin, For example, a promoter can be a cis-acting transcriptional Genes VI, Oxford University Press, 1997. control element, including an enhancer, a promoter, a tran 55 The isolated nucleic acid which encodes one of the Scription terminator, an origin of replication, a chromosomal polypeptides of the invention, but is not limited to: only the integration sequence, 5' and 3' untranslated regions, or an coding sequence of a nucleic acid of the invention and addi intronic sequence, which are involved in transcriptional regu tional coding sequences, such as leader sequences or propro lation. These cis-acting sequences typically interact with pro tein sequences and non-coding sequences, such as introns or teins or other biomolecules to carry out (turn on/off, regulate, 60 non-coding sequences 5' and/or 3' of the coding sequence. modulate, etc.) transcription. “Constitutive' promoters are Thus, as used herein, the term “polynucleotide encoding a those that drive expression continuously under most environ polypeptide' encompasses a polynucleotide which includes mental conditions and states of development or cell differen only the coding sequence for the polypeptide as well as a tiation. “Inducible' or “regulatable' promoters direct expres polynucleotide which includes additional coding and/or non sion of the nucleic acid of the invention under the influence of 65 coding sequence. environmental conditions or developmental conditions. Alternatively, the nucleic acid sequences of the invention, Examples of environmental conditions that may affect tran may be mutagenized using conventional techniques, such as US 8,119,385 B2 115 116 site directed mutagenesis, or other techniques familiar to somes, see, e.g., Woon (1998) Genomics 50:306-316; P1-de those skilled in the art, to introduce silent changes into the rived vectors (PACs), see, e.g., Kern (1997) Biotechniques polynucleotides of the invention. As used herein, “silent 23:120-124; cosmids, recombinant viruses, phages or plas changes include, for example, changes which do not alter the mids. amino acid sequence encoded by the polynucleotide. Such In one aspect, a nucleic acid encoding a polypeptide of the changes may be desirable in order to increase the level of the invention is assembled in appropriate phase with a leader polypeptide produced by host cells containing a vector sequence capable of directing secretion of the translated encoding the polypeptide by introducing codons or codon polypeptide or fragment thereof. pairs which occur frequently in the host organism. The invention provides fusion proteins and nucleic acids The invention also relates to polynucleotides which have 10 encoding them. A polypeptide of the invention can be fused to nucleotide changes which result in amino acid Substitutions, a heterologous peptide or polypeptide. Such as N-terminal additions, deletions, fusions and truncations in the polypep identification peptides which impart desired characteristics, tides of the invention. Such nucleotide changes may be intro such as increased stability or simplified purification. Peptides duced using techniques such as site directed mutagenesis, and polypeptides of the invention can also be synthesized and random chemical mutagenesis, exonuclease III deletion and 15 expressed as fusion proteins with one or more additional other recombinant DNA techniques. Alternatively, such domains linked thereto for, e.g., producing a more immuno nucleotide changes may be naturally occurring allelic Vari genic peptide, to more readily isolate a recombinantly Syn ants which are isolated by identifying nucleic acids which thesized peptide, to identify and isolate antibodies and anti specifically hybridize to probes comprising at least 10, 15, 20, body-expressing B cells, and the like. Detection and 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 con purification facilitating domains include, e.g., metal chelating secutive bases of one of the sequences of the invention (or the peptides such as polyhistidine tracts and histidine-tryptophan sequences complementary thereto) under conditions of high, modules that allow purification on immobilized metals, pro moderate, or low Stringency as provided herein. tein A domains that allow purification on immobilized immu General Techniques noglobulin, and the domain utilized in the FLAGS extension/ The nucleic acids used to practice this invention, whether 25 affinity purification system (Immunex Corp, Seattle Wash.). RNA, iRNA, antisense nucleic acid, cDNA, genomic DNA, The inclusion of a cleavable linker sequences such as Factor vectors, viruses or hybrids thereof, may be isolated from a Xa or enterokinase (Invitrogen, San Diego Calif.) between a variety of Sources, genetically engineered, amplified, and/or purification domain and the motif-comprising peptide or expressed/generated recombinantly. Recombinant polypep polypeptide to facilitate purification. For example, an expres tides generated from these nucleic acids can be individually 30 sion vector can include an epitope-encoding nucleic acid isolated or cloned and tested for a desired activity. Any sequence linked to six histidine residues followed by a thiore recombinant expression system can be used, including bac doxin and an enterokinase cleavage site (see e.g., Williams terial, mammalian, yeast, insect or plant cell expression sys (1995) Biochemistry 34:1787-1797; Dobeli (1998) Protein temS. Expr. Purif. 12:404-414). The histidine residues facilitate Alternatively, these nucleic acids can be synthesized in 35 detection and purification while the enterokinase cleavage vitro by well-known chemical synthesis techniques, as site provides a means for purifying the epitope from the described in, e.g., Adams (1983).J. Am. Chem. Soc. 105:661; remainder of the fusion protein. Technology pertaining to Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel vectors encoding fusion proteins and application of fusion (1995) Free Radic. Biol. Med. 19:373-380: Blommers (1994) proteins are well described in the scientific and patent litera Biochemistry 33:7886-7896; Narang (1979) Meth. Enzymol. 40 ture, see e.g., Kroll (1993) DNA Cell. Biol. 12:441-53. 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage Transcriptional and Translational Control Sequences (1981) Tetra. Lett. 22:1859; U.S. Pat. No. 4,458,066. The invention provides nucleic acid (e.g., DNA) sequences Techniques for the manipulation of nucleic acids, such as, of the invention operatively linked to expression (e.g., tran e.g., Subcloning, labeling probes (e.g., random-primer label Scriptional or translational) control sequence(s), e.g., promot ing using Klenow polymerase, nick translation, amplifica 45 ers or enhancers, to director modulate RNA synthesis/expres tion), sequencing, hybridization and the like are well Sion. The expression control sequence can be in an expression described in the Scientific and patent literature, see, e.g., vector. Exemplary bacterial promoters include lac, lacz, T3, Sambrook, ed., MOLECULAR CLONING: A LABORA T7, gpt, lambda PR, PL and trp. Exemplary eukaryotic pro TORY MANUAL (2NDED.), Vols. 1-3, Cold Spring Harbor moters include CMV immediate early, HSV thymidine Laboratory, (1989); CURRENT PROTOCOLS IN 50 kinase, early and late SV40, LTRs from retrovirus, and mouse MOLECULARBIOLOGY, Ausubel, ed. John Wiley & Sons, metallothionein I. Inc., New York (1997); LABORATORY TECHNIQUES IN Promoters Suitable for expressing a polypeptide in bacteria BIOCHEMISTRY AND MOLECULAR BIOLOGY: include the E. colilac or trp promoters, the lacI promoter, the HYBRIDIZATION WITH NUCLEIC ACID PROBES, Part lacz promoter, the T3 promoter, the T7 promoter, the gpt I. Theory and Nucleic Acid Preparation, Tijssen, ed. Elsevier, 55 promoter, the lambda PR promoter, the lambda PL promoter, N.Y. (1993). promoters from operons encoding glycolytic enzymes Such Another useful means of obtaining and manipulating as 3-phosphoglycerate kinase (PGK), and the acid phos nucleic acids used to practice the methods of the invention is phatase promoter. Eukaryotic promoters include the CMV to clone from genomic samples, and, if desired, screen and immediate early promoter, the HSV thymidine kinase pro re-clone inserts isolated or amplified from, e.g., genomic 60 moter, heat shock promoters, the early and late SV40 pro clones or cDNA clones. Sources of nucleic acid used in the moter, LTRs from retroviruses, and the mouse metallothion methods of the invention include genomic or cDNA libraries ein-I promoter. Other promoters known to control expression contained in, e.g., mammalian artificial chromosomes of genes in prokaryotic or eukaryotic cells or their viruses (MACs), see, e.g., U.S. Pat. Nos. 5,721,118; 6,025,155: may also be used. Promoters suitable for expressing the human artificial chromosomes, see, e.g., Rosenfeld (1997) 65 polypeptide or fragment thereof in bacteria include the E. coli Nat. Genet. 15:333-335:yeast artificial chromosomes (YAC); lac or trp promoters, the lac promoter, the lacZ promoter, the bacterial artificial chromosomes (BAC); P1 artificial chromo T3 promoter, the T7 promoter, the gpt promoter, the lambda US 8,119,385 B2 117 118 P promoter, the lambda P, promoter, promoters from oper plants of a variety of ploidy levels, including polyploid, dip ons encoding glycolytic enzymes such as 3-phosphoglycer loid, haploid and hemizygous states. As used herein, the term ate kinase (PGK) and the acid phosphatase promoter. Fungal “transgenic plant includes plants or plant cells into which a promoters include the C-factor promoter. Eukaryotic promot heterologous nucleic acid sequence has been inserted, e.g., ers include the CMV immediate early promoter, the HSV the nucleic acids and various recombinant constructs (e.g., thymidine kinase promoter, heat shock promoters, the early expression cassettes) of the invention. and late SV40 promoter, LTRs from retroviruses and the In one aspect, a constitutive promoter such as the CaMV mouse metallothionein-I promoter. Other promoters known 35S promoter can be used for expression in specific parts of to control expression of genes in prokaryotic or eukaryotic the plant or seed or throughout the plant. For example, for cells or their viruses may also be used. 10 overexpression, a plant promoter fragment can be employed Tissue-Specific Promoters which will direct expression of a nucleic acid in some or all The invention provides expression cassettes that can be tissues of a plant, e.g., a regenerated plant. Such promoters are expressed in a tissue-specific manner, e.g., that can express a referred to herein as “constitutive' promoters and are active polypeptide, enzyme, protein, e.g. structural or binding pro under most environmental conditions and states of develop tein, of the invention in a tissue-specific manner. The inven 15 ment or cell differentiation. Examples of constitutive promot tion also provides plants or seeds that express a polypeptide, ers include the cauliflower mosaic virus (CaMV) 35S tran enzyme, protein, e.g. structural or binding protein, of the scription initiation region, the 1'- or 2'-promoter derived from invention in a tissue-specific manner. The tissue-specificity T-DNA of Agrobacterium tumefaciens, and other transcrip can be seed specific, stem specific, leaf specific, root specific, tion initiation regions from various plant genes known to fruit specific and the like. those of skill. Such genes include, e.g., ACT11 from Arabi The term “expression cassette' as used herein refers to a dopsis (Huang (1996) Plant Mol. Biol. 33:125-139); Cat3 nucleotide sequence which is capable of affecting expression from Arabidopsis (GenBank No. U43147, Zhong (1996) of a structural gene (i.e., a protein coding sequence, Such as a Mol. Gen. Genet. 251:196-203); the gene encoding stearoyl polypeptide, enzyme, protein, e.g. structural or binding pro acyl carrier protein desaturase from Brassica napus (Gen tein, of the invention) in a host compatible with such 25 bank No. X74782, Solocombe (1994) Plant Physiol. 104: sequences. Expression cassettes include at least a promoter 1167-1176); GPc1 from maize (GenBank No. X15596: operably linked with the polypeptide coding sequence; and, Martinez (1989).J. Mol. Biol. 208:551-565); the Gpc2 from optionally, with other sequences, e.g., transcription termina maize (GenBank No. U45855, Manjunath (1997) Plant Mol. tion signals. Additional factors necessary or helpful in effect Biol. 33:97-112); plant promoters described in U.S. Pat. Nos. ing expression may also be used, e.g., enhancers, alpha-fac 30 4,962,028; 5,633,440. tors. Thus, expression cassettes also include plasmids, The invention uses tissue-specific or constitutive promot expression vectors, recombinant viruses, any form of recom ers derived from viruses which can include, e.g., the tobam binant “naked DNA vector, and the like. A “vector” com ovirus subgenomic promoter (Kumagai (1995) Proc. Natl. prises a nucleic acid which can infect, transfect, transiently or Acad. Sci. USA 92:1679-1683; the rice tungro bacilliform permanently transduce a cell. It will be recognized that a 35 virus (RTBV), which replicates only in phloem cells in vector can be a naked nucleic acid, or a nucleic acid com infected rice plants, with its promoter which drives strong plexed with protein or lipid. The vector optionally comprises phloem-specific reporter ; the cassaya vein viral or bacterial nucleic acids and/or proteins, and/or mem mosaic virus (CVMV) promoter, with highest activity invas branes (e.g., a cell membrane, a viral lipid envelope, etc.). cular elements, in leaf mesophyll cells, and in root tips (Ver Vectors include, but are not limited to replicons (e.g., RNA 40 daguer (1996) Plant Mol. Biol. 31:1129-1139). replicons, bacteriophages) to which fragments of DNA may Alternatively, the plant promoter may direct expression of be attached and become replicated. Vectors thus include, but a polypeptide, enzyme, protein, e.g. structural or binding are not limited to RNA, autonomous self-replicating circular protein-expressing nucleic acid in a specific tissue, organ or or linear DNA or RNA (e.g., plasmids, viruses, and the like, cell type (i.e. tissue-specific promoters) or may be otherwise see, e.g., U.S. Pat. No. 5.217.879), and include both the 45 under more precise environmental or developmental control expression and non-expression plasmids. Where a recombi or under the control of an inducible promoter. Examples of nant microorganism or cell culture is described as hosting an environmental conditions that may affect transcription “expression vector' this includes both extra-chromosomal include anaerobic conditions, elevated temperature, the pres circular and linear DNA and DNA that has been incorporated ence of light, or sprayed with chemicals/hormones. For into the host chromosome(s). Where a vector is being main 50 example, the invention incorporates the drought-inducible tained by a host cell, the vector may either bestably replicated promoter of maize (Busk (1997) supra); the cold, drought, by the cells during mitosis as an autonomous structure, or is and high salt inducible promoter from potato (Kirch (1997) incorporated within the host’s genome. Plant Mol. Biol. 33:897-909). “Tissue-specific' promoters are transcriptional control ele Tissue-specific promoters can promote transcription only ments that are only active in particular cells or tissues or 55 within a certaintime frame of developmental stage within that organs, e.g., in plants or animals. Tissue-specific regulation tissue. See, e.g., Blazquez (1998) Plant Cell 10:791-800, may beachieved by certain intrinsic factors which ensure that characterizing the Arabidopsis LEAFY gene promoter. See genes encoding proteins specific to a given tissue are also Cardon (1997) Plant J 12:367-77, describing the tran expressed. Such factors are known to exist in and scription factor SPL3, which recognizes a conserved plants so as to allow for specific tissues to develop. 60 sequence motif in the promoter region of the A. thaliana floral The term "plant' includes whole plants, plant parts (e.g., meristem identity gene AP1; and Mandel (1995) Plant leaves, stems, flowers, roots, etc.), plant protoplasts, seeds Molecular Biology, Vol. 29, pp. 995-1004, describing the mer and plant cells and progeny of same. The class of plants which istem promoter eIF4. Tissue specific promoters which are can be used in the method of the invention is generally as active throughout the life cycle of a particular tissue can be broad as the class of higher plants amenable to transformation 65 used. In one aspect, the nucleic acids of the invention are techniques, including angiosperms (monocotyledonous and operably linked to a promoter active primarily only in cotton dicotyledonous plants), as well as gymnosperms. It includes fiber cells. In one aspect, the nucleic acids of the invention are US 8,119,385 B2 119 120 operably linked to a promoter active primarily during the induced at a particular stage of development of the plant. stages of cotton fiber cell elongation, e.g., as described by Thus, the invention also provides for transgenic plants con Rinehart (1996) supra. The nucleic acids can be operably taining an inducible gene encoding for polypeptides of the linked to the Fbl2A gene promoter to be preferentially invention whose host range is limited to target plant species, expressed in cotton fiber cells (Ibid). See also, John (1997) Such as corn, rice, barley, wheat, potato or other crops, induc Proc. Natl. Acad. Sci. USA 89:5769-5773: John, et al., U.S. ible at any stage of development of the crop. Pat. Nos. 5,608,148 and 5,602,321, describing cotton fiber One of skill will recognize that a tissue-specific plant pro specific promoters and methods for the construction of trans moter may drive expression of operably linked sequences in genic cotton plants. Root-specific promoters may also be used tissues other than the target tissue. Thus, a tissue-specific to express the nucleic acids of the invention. Examples of 10 promoter is one that drives expression preferentially in the root-specific promoters include the promoter from the alco target tissue or cell type, but may also lead to Some expression hol dehydrogenase gene (DeLisle (1990) Int. Rev. Cytol. in other tissues as well. 123:39-60). Other promoters that can be used to express the The nucleic acids of the invention can also be operably nucleic acids of the invention include, e.g., ovule-specific, linked to plant promoters which are inducible upon exposure embryo-specific, endosperm-specific, integument-specific, 15 to chemicals reagents. These reagents include, e.g., herbi seed coat-specific promoters, or some combination thereof a cides, synthetic auxins, or which can be applied, leaf-specific promoter (see, e.g., Busk (1997) Plant J. 11:1285 e.g., sprayed, onto transgenic plants. Inducible expression of 1295, describing a leaf-specific promoter in maize); the the polypeptide, enzyme, protein, e.g. structural or binding ORF13 promoter from Agrobacterium rhizogenes (which protein-producing nucleic acids of the invention will allow exhibits high activity in roots, see, e.g., Hansen (1997) Supra); the grower to select plants with the optimal polypeptide, a maize pollen specific promoter (see, e.g., Guerrero (1990) enzyme, protein, e.g. structural orbinding protein, expression Mol. Gen. Genet. 224:161 168); a tomato promoter active and/or activity. The development of plant parts can thus con during fruit ripening, senescence and abscission of leaves trolled. In this way the invention provides the means to facili and, to a lesser extent, offlowers can be used (see, e.g., Blume tate the harvesting of plants and plant parts. For example, in (1997) Plant J. 12:731 746); a pistil-specific promoter from 25 various embodiments, the maize In2-2 promoter, activated by the potato SK2 gene (see, e.g., Ficker (1997) Plant Mol. Biol. benzenesulfonamide herbicide safeners, is used (De Veylder 35:425 431); the Blec4 gene from pea, which is active in (1997) Plant Cell Physiol. 38:568-577); application of differ epidermal tissue of vegetative and floral shoot apices of trans ent herbicide Safeners induces distinct gene expression pat genic alfalfa making it a useful tool to target the expression of terns, including expression in the root, hydathodes, and the foreign genes to the epidermal layer of actively growing 30 shoot apical meristem. Coding sequences of the invention are shoots or fibers; the ovule-specific BEL1 gene (see, e.g., also under the control of a tetracycline-inducible promoter, Reiser (1995) Cell 83:735-742, GenBank No. U39944); and/ e.g., as described with transgenic tobacco plants containing or, the promoter in Klee, U.S. Pat. No. 5,589,583, describing the Avena sativa L. (oat) arginine decarboxylase gene (Mas a plant promoter region is capable of conferring high levels of grau (1997) Plant J. 11:465-473); or, a salicylic acid-respon transcription in meristematic tissue and/or rapidly dividing 35 sive element (Stange (1997) Plant J. 11:1315-1324). cells. In some aspects, proper polypeptide expression may Alternatively, plant promoters which are inducible upon require polyadenylation region at the 3'-end of the coding exposure to plant hormones, such as auxins, are used to region. The polyadenylation region can be derived from the express the nucleic acids of the invention. For example, the natural gene, from a variety of other plant (or animal or other) invention can use the auxin-response elements E1 promoter 40 genes, or from genes in the Agrobacterial T-DNA. fragment (AuXREs) in the Soybean (Glycine max L.) (Liu Expression Vectors and Cloning Vehicles (1997) Plant Physiol. 115:397–407); the auxin-responsive The invention provides expression vectors and cloning Arabidopsis GST6 promoter (also responsive to salicylic acid vehicles comprising nucleic acids of the invention, e.g., and hydrogen peroxide) (Chen (1996) Plant J. 10: 955-966); sequences encoding the polypeptide, enzyme, protein, e.g. the auxin-inducible parC promoter from tobacco (Sakai 45 structural or binding proteins of the invention. Expression (1996) 37:906-913); a plant biotin response element (Streit vectors and cloning vehicles of the invention can comprise (1997) Mol. Plant. Microbe Interact. 10:933-937); and, the viral particles, baculovirus, phage, plasmids, phagemids, promoter responsive to the stress hormone abscisic acid cosmids, fosmids, bacterial artificial chromosomes, viral (Sheen (1996) Science 274: 1900-1902). DNA (e.g., vaccinia, adenovirus, foulpox virus, pseudorabies The nucleic acids of the invention can also be operably 50 and derivatives of SV40), P1-based artificial chromosomes, linked to plant promoters which are inducible upon exposure yeast plasmids, yeast artificial chromosomes, and any other to chemicals reagents which can be applied to the plant. Such vectors specific for specific hosts of interest (Such as bacillus, as herbicides or antibiotics. For example, the maize In2-2 Aspergillus and yeast). Vectors of the invention can include promoter, activated by benzenesulfonamide herbicide safen chromosomal, non-chromosomal and synthetic DNA ers, can be used (De Veylder (1997) Plant Cell Physiol. 55 sequences. Large numbers of Suitable vectors are known to 38:568-577); application of different herbicide safeners those of skill in the art, and are commercially available. induces distinct gene expression patterns, including expres Exemplary vectors are include: bacterial: pCE vectors sion in the root, hydathodes, and the shoot apical meristem. (Qiagen), pBLUESCRIPT plasmids, pNH vectors, (lambda Coding sequence can be under the control of, e.g., a tetracy ZAP vectors (Stratagene); ptrc99a, pKK223-3, plR540, cline-inducible promoter, e.g., as described with transgenic 60 pRIT2T (Pharmacia); Eukaryotic: pXT1, pSG5 (Stratagene), tobacco plants containing the Avena sativa L. (oat) arginine pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia). However, decarboxylase gene (Masgrau (1997) Plant J. 11:465-473); any other plasmid or other vector may be used so long as they or, a salicylic acid-responsive element (Stange (1997) Plant J. are replicable and viable in the host. Low copy number or 11:1315-1324). Using chemically- (e.g., hormone- or pesti high copy number vectors may be employed with the present cide-) induced promoters, i.e., promoter responsive to a 65 invention. chemical which can be applied to the transgenic plant in the The expression vector can comprise a promoter, a ribo field, expression of a polypeptide of the invention can be Some binding site for translation initiation and a transcription US 8,119,385 B2 121 122 terminator. The vector may also include appropriate sequences of the invention can be inserted into a plant host sequences for amplifying expression. Mammalian expression cell genome becoming an integral part of the host chromo vectors can comprise an origin of replication, any necessary Somal DNA. Sense or antisense transcripts can be expressed ribosome binding sites, a polyadenylation site, splice donor in this manner. A vector comprising the sequences (e.g., pro and acceptor sites, transcriptional termination sequences, and 5 moters or coding regions) from nucleic acids of the invention 5' flanking non-transcribed sequences. In some aspects, DNA can comprise a marker gene that confers a selectable pheno sequences derived from the SV40 splice and polyadenylation type on a plant cell or a seed. For example, the marker may sites may be used to provide the required non-transcribed encode biocide resistance, particularly antibiotic resistance, genetic elements. Such as resistance to kanamycin, G418, bleomycin, hygro In one aspect, the expression vectors contain one or more 10 mycin, or herbicide resistance, Such as resistance to chloro selectable marker genes to permit selection of host cells con sulfuron or Basta. taining the vector. Such selectable markers include genes Expression vectors capable of expressing nucleic acids and encoding dihydrofolate reductase or genes conferring neo proteins in plants are well known in the art, and can include, mycin resistance for eukaryotic cell culture, genes conferring e.g., vectors from Agrobacterium spp., potato virus X (see, tetracycline or amplicillin resistance in E. coli, and the S. 15 e.g., Angell (1997) EMBO.J. 16:3675-3684), tobacco mosaic cerevisiae TRP1 gene. Promoter regions can be selected from virus (see, e.g., Casper (1996) Gene 173:69-73), tomato any desired gene using chloramphenicol transferase (CAT) bushy stunt virus (see, e.g., Hillman (1989) Virology 169:42 vectors or other vectors with selectable markers. 50), tobacco etch virus (see, e.g., Dolja (1997) Virology 234: Vectors for expressing the polypeptide or fragment thereof 243-252), bean golden mosaic virus (see, e.g., Morinaga in eukaryotic cells can also contain enhancers to increase (1993) Microbiol Immunol. 37:471-476), cauliflowermosaic expression levels. Enhancers are cis-acting elements of DNA virus (see, e.g., Cecchini (1997) Mol. Plant. Microbe Interact. that can be from about 10 to about 300 bp in length. They can 10: 1094-1101), maize Ac/Ds transposable element (see, e.g., act on a promoter to increase its transcription. Exemplary Rubin (1997) Mol. Cell. Biol. 17:6294-6302; Kunze (1996) enhancers include the SV40 enhancer on the late side of the Curr. Top. Microbiol. Immunol. 204:161-194), and the maize replication origin bp 100 to 270, the cytomegalovirus early 25 Suppressor-mutator (Spm) transposable element (see, e.g., promoter enhancer, the polyoma enhancer on the late side of Schlappi (1996) Plant Mol. Biol. 32:717-725); and deriva the replication origin, and the adenovirus enhancers. tives thereof. A nucleic acid sequence can be inserted into a vector by a In one aspect, the expression vector can have two replica variety of procedures. In general, the sequence is ligated to tion systems to allow it to be maintained in two organisms, for the desired position in the vector following digestion of the 30 example in mammalian or insect cells for expression and in a insert and the vector with appropriate restriction endonu prokaryotic host for cloning and amplification. Furthermore, cleases. Alternatively, blunt ends in both the insert and the for integrating expression vectors, the expression vector can vector may be ligated. A variety of cloning techniques are contain at least one sequence homologous to the host cell known in the art, e.g., as described in Ausubel and Sambrook. genome. It can contain two homologous sequences which Such procedures and others are deemed to be within the scope 35 flank the expression construct. The integrating vector can be of those skilled in the art. directed to a specific locus in the host cell by selecting the The vector can be in the form of a plasmid, a viral particle, appropriate homologous sequence for inclusion in the vector. or a phage. Other vectors include chromosomal, non-chro Constructs for integrating vectors are well known in the art. mosomal and synthetic DNA sequences, derivatives of SV40; Expression vectors of the invention may also include a bacterial plasmids, phage DNA, baculovirus, yeast plasmids, 40 selectable marker gene to allow for the selection of bacterial vectors derived from combinations of plasmids and phage strains that have been transformed, e.g., genes which render DNA, viral DNA such as vaccinia, adenovirus, fowl pox the bacteria resistant to drugs such as ampicillin, chloram virus, and pseudorabies. A variety of cloning and expression phenicol, erythromycin, kanamycin, neomycin and tetracy vectors for use with prokaryotic and eukaryotic hosts are cline. Selectable markers can also include biosynthetic genes, described by, e.g., Sambrook. 45 Such as those in the histidine, tryptophan and leucine biosyn Particular bacterial vectors which can be used include the thetic pathways. commercially available plasmids comprising genetic ele The DNA sequence in the expression vector is operatively ments of the well known cloning vector pBR322 (ATCC linked to an appropriate expression control sequence(s) (pro 37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala, moter) to direct RNA synthesis. Particular named bacterial Sweden), GEM1 (Promega Biotec, Madison, Wis., USA) 50 promoters include lacI, lac7, T3, T7, gpt, lambda P, P and pOE70, pGE60, pGE-9 (Qiagen), pID10, psiX174 pBLUE trp. Eukaryotic promoters include CMV immediate early, SCRIPT II KS, pNH8A, pNH16a, pNH18A, pNH46A (Strat HSV thymidine kinase, early and late SV40, LTRs from ret agene), ptrc99a, pKK223-3, pKK233-3, DR540, pRIT5 rovirus and mouse metallothionein-I. Selection of the appro (Pharmacia), pKK232-8 and pCM7. Particular eukaryotic priate vector and promoter is well within the level of ordinary vectors include pSV2CAT, pOG44, pXT1, pSG (Stratagene) 55 skill in the art. The expression vector also contains a ribosome pSVK3, pBPV, pMSG, and pSVL (Pharmacia). However, any binding site for translation initiation and a transcription ter other vector may be used as long as it is replicable and viable minator. The vector may also include appropriate sequences in the host cell. for amplifying expression. Promoter regions can be selected The nucleic acids of the invention can be expressed in from any desired gene using chloramphenicol transferase expression cassettes, vectors or viruses and transiently or 60 (CAT) vectors or other vectors with selectable markers. In stably expressed in plant cells and seeds. One exemplary addition, the expression vectors in one aspect contain one or transient expression system uses episomal expression sys more selectable marker genes to provide a phenotypic trait for tems, e.g., cauliflower mosaic virus (CaMV) viral RNA gen selection of transformed host cells such as dihydrofolate erated in the nucleus by transcription of an episomal mini reductase or neomycin resistance for eukaryotic cell culture, chromosome containing Supercoiled DNA, see, e.g., Covey 65 or such as tetracycline or amplicillin resistance in E. coli. (1990) Proc. Natl. Acad. Sci. USA 87: 1633-1637. Alterna Mammalian expression vectors may also comprise an ori tively, coding sequences, i.e., all or Sub-fragments of gin of replication, any necessary ribosome binding sites, a US 8,119,385 B2 123 124 polyadenylation site, splice donor and acceptor sites, tran cereus, Salmonella typhimurium and various species within Scriptional termination sequences and 5' flanking nontran the genera Streptomyces and Staphylococcus. Exemplary scribed sequences. In some aspects, DNA sequences derived insect cells include Drosophila S2 and Spodoptera Sf9. from the SV40 splice and polyadenylation sites may be used Exemplary animal cells include CHO, COS or Bowes mela to provide the required nontranscribed genetic elements. 5 noma or any mouse or human cell line. The selection of an Vectors for expressing the polypeptide or fragment thereof appropriate host is within the abilities of those skilled in the in eukaryotic cells may also contain enhancers to increase art. Techniques for transforming a wide variety of higher expression levels. Enhancers are cis-acting elements of DNA, plant species are well known and described in the technical usually from about 10 to about 300 bp in length that act on a and scientific literature. See, e.g., Weising (1988) Ann. Rev. promoter to increase its transcription. Examples include the 10 Genet. 22:421-477; U.S. Pat. No. 5,750,870. SV40 enhancer on the late side of the replication origin bp The vector can be introduced into the host cells using any 100 to 270, the cytomegalovirus early promoter enhancer, the of a variety of techniques, including transformation, transfec polyoma enhancer on the late side of the replication origin tion, transduction, viral , gene guns, or Ti-mediated and the adenovirus enhancers. gene transfer. Particular methods include calcium phosphate In addition, the expression vectors typically contain one or 15 transfection, DEAE-Dextran mediated transfection, lipofec more selectable marker genes to permit selection of host cells tion, or electroporation (Davis, L., Dibner, M., Battey, I., containing the vector. Such selectable markers include genes Basic Methods in Molecular Biology, (1986)). encoding dihydrofolate reductase or genes conferring neo In one aspect, the nucleic acids or vectors of the invention mycin resistance for eukaryotic cell culture, genes conferring are introduced into the cells for screening, thus, the nucleic tetracycline or ampicillin resistance in E. coli and the S. acids enter the cells in a manner Suitable for Subsequent cerevisiae TRP1 gene. expression of the nucleic acid. The method of introduction is In some aspects, the nucleic acid encoding one of the largely dictated by the targeted cell type. Exemplary methods polypeptides of the invention, or fragments comprising at include CaBO precipitation, liposome fusion, lipofection least about 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 (e.g., LIPOFECTINTM), electroporation, viral infection, etc. consecutive amino acids thereof is assembled in appropriate 25 The candidate nucleic acids may stably integrate into the phase with a leader sequence capable of directing secretion of genome of the host cell (for example, with retroviral intro the translated polypeptide or fragment thereof. Optionally, duction) or may exist either transiently or stably in the cyto the nucleic acid can encode a fusion polypeptide in which one plasm (i.e. through the use of traditional plasmids, utilizing of the polypeptides of the invention, or fragments comprising standard regulatory sequences, selection markers, etc.). As at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 30 many pharmaceutically important screens require human or consecutive amino acids thereof is fused to heterologous pep model mammalian cell targets, retroviral vectors capable of tides or polypeptides, such as N-terminal identification pep transfecting such targets can be used. tides which impart desired characteristics, such as increased Where appropriate, the engineered host cells can be cul stability or simplified purification. tured in conventional nutrient media modified as appropriate The appropriate DNA sequence may be inserted into the 35 for activating promoters, selecting transformants or amplify vector by a variety of procedures. In general, the DNA ing the genes of the invention. Following transformation of a sequence is ligated to the desired position in the vector fol Suitable host strain and growth of the host strain to an appro lowing digestion of the insert and the vector with appropriate priate cell density, the selected promoter may be induced by restriction . Alternatively, blunt ends in both appropriate means (e.g., temperature shift or chemical induc the insert and the vector may be ligated. A variety of cloning 40 tion) and the cells may be cultured for an additional period to techniques are disclosed in Ausubeletal. Current Protocols in allow them to produce the desired polypeptide or fragment Molecular Biology, John Wiley 503 Sons, Inc. 1997 and thereof. Sambrook et al., Molecular Cloning: A Laboratory Manual Cells can be harvested by centrifugation, disrupted by 2nd Ed., Cold Spring Harbor Laboratory Press (1989. Such physical or chemical means, and the resulting crude extract is procedures and others are deemed to be within the scope of 45 retained for further purification. Microbial cells employed for those skilled in the art. expression of proteins can be disrupted by any convenient The vector may be, for example, in the form of a plasmid, method, including freeze-thaw cycling, Sonication, mechani a viral particle, or a phage. Other vectors include chromo cal disruption, or use of cell lysing agents. Such methods are Somal, nonchromosomal and synthetic DNA sequences, well known to those skilled in the art. The expressed polypep derivatives of SV40; bacterial plasmids, phage DNA, bacu 50 tide or fragment thereof can be recovered and purified from lovirus, yeast plasmids, vectors derived from combinations of recombinant cell cultures by methods including ammonium plasmids and phage DNA, Viral DNA Such as vaccinia, aden Sulfate or ethanol precipitation, acid extraction, anion or cat ovirus, fowlpox virus and pseudorabies. A variety of cloning ion exchange chromatography, phosphocellulose chromatog and expression vectors for use with prokaryotic and eukary raphy, hydrophobic interaction chromatography, affinity otic hosts are described by Sambrook, et al., Molecular Clon 55 chromatography, hydroxylapatite chromatography and lectin ing: A Laboratory Manual, 2nd Ed., Cold Spring Harbor, chromatography. Protein refolding steps can be used, as nec N.Y., (1989). essary, in completing configuration of the polypeptide. If Host Cells and Transformed Cells desired, high performance liquid chromatography (HPLC) The invention also provides a transformed cell comprising can be employed for final purification steps. a nucleic acid sequence of the invention, e.g., a sequence 60 The constructs in host cells can be used in a conventional encoding a polypeptide, enzyme, protein, e.g. structural or manner to produce the gene product encoded by the recom binding protein, of the invention, or a vector of the invention. binant sequence. Depending upon the host employed in a The host cell may be any of the host cells familiar to those recombinant production procedure, the polypeptides pro skilled in the art, including prokaryotic cells, eukaryotic cells, duced by host cells containing the vector may be glycosylated Such as bacterial cells, fungal cells, yeast cells, mammalian 65 or may be non-glycosylated. Polypeptides of the invention cells, insect cells, or plant cells. Exemplary bacterial cells may or may not also include an initial methionine amino acid include E. coli, Streptomyces, , Bacillus residue. US 8,119,385 B2 125 126 Cell-free translation systems can also be employed to pro melanoma and adenoviruses. The selection of an appropriate duce a polypeptide of the invention. Cell-free translation host is within the abilities of those skilled in the art. systems can use mRNAs transcribed from a DNA construct The vector may be introduced into the host cells using any comprising a promoter operably linked to a nucleic acid of a variety of techniques, including transformation, transfec encoding the polypeptide or fragment thereof. In some 5 tion, transduction, viral infection, gene guns, or Ti-mediated aspects, the DNA construct may be linearized prior to con gene transfer. Particular methods include calcium phosphate ducting an in vitro transcription reaction. The transcribed transfection, DEAE-Dextran mediated transfection, lipofec mRNA is then incubated with an appropriate cell-free trans tion, or electroporation (Davis, L., Dibner, M., Battey, I., lation extract, Such as a rabbit reticulocyte extract, to produce Basic Methods in Molecular Biology, (1986)). 10 Where appropriate, the engineered host cells can be cul the desired polypeptide or fragment thereof. tured in conventional nutrient media modified as appropriate The expression vectors can contain one or more selectable for activating promoters, selecting transformants or amplify marker genes to provide a phenotypic trait for selection of ing the genes of the invention. Following transformation of a transformed host cells such as dihydrofolate reductase or Suitable host strain and growth of the host strain to an appro neomycin resistance for eukaryotic cell culture, or Such as 15 priate cell density, the selected promoter may be induced by tetracycline or amplicillin resistance in E. coli. appropriate means (e.g., temperature shift or chemical induc Host cells containing the polynucleotides of interest, e.g., tion) and the cells may be cultured for an additional period to nucleic acids of the invention, can be cultured in conventional allow them to produce the desired polypeptide or fragment nutrient media modified as appropriate for activating promot thereof. ers, selecting transformants or amplifying genes. The culture Cells are typically harvested by centrifugation, disrupted conditions, such as temperature, pH and the like, are those by physical or chemical means and the resulting crude extract previously used with the host cell selected for expression and is retained for further purification. Microbial cells employed will be apparent to the ordinarily skilled artisan. The clones for expression of proteins can be disrupted by any convenient which are identified as having the specified enzyme activity method, including freeze-thaw cycling, Sonication, mechani may then be sequenced to identify the polynucleotide 25 cal disruption, or use of cell lysing agents. Such methods are sequence encoding an enzyme having the enhanced activity. well known to those skilled in the art. The expressed polypep The invention provides a method for overexpressing a tide or fragment thereof can be recovered and purified from recombinant polypeptide, enzyme, protein, e.g. structural or recombinant cell cultures by methods including ammonium binding protein, in a cell comprising expressing a vector Sulfate or ethanol precipitation, acid extraction, anion or cat comprising a nucleic acid of the invention, e.g., a nucleic acid 30 ion exchange chromatography, phosphocellulose chromatog comprising a nucleic acid sequence with at least about 50%, raphy, hydrophobic interaction chromatography, affinity 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, chromatography, hydroxylapatite chromatography and lectin 61%. 62%, 63%, 64%. 65%, 66%, 67%, 68%, 69%, 70%, chromatography. Protein refolding steps can be used, as nec 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, essary, in completing configuration of the polypeptide. If 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 35 desired, high performance liquid chromatography (HPLC) 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more can be employed for final purification steps. sequence identity to an exemplary sequence of the invention Various mammalian cell culture systems can also be over a region of at least about 100 residues, wherein the employed to express recombinant protein. Examples of mam sequence identities are determined by analysis with a malian expression systems include the COS-7 lines of mon sequence comparison algorithm or by visual inspection, or, a 40 key kidney fibroblasts (described by Gluzman, Cell, 23:175, nucleic acid that hybridizes under stringent conditions to a 1981) and other cell lines capable of expressing proteins from nucleic acid sequence of the invention. The overexpression a compatible vector, such as the C127, 3T3, CHO, HeLa and can be effected by any means, e.g., use of a high activity BHK cell lines. promoter, a dicistronic vector or by gene amplification of the The constructs in host cells can be used in a conventional Vector. 45 manner to produce the gene product encoded by the recom The nucleic acids of the invention can be expressed, or binant sequence. Depending upon the host employed in a overexpressed, in any in vitro or in vivo expression system. recombinant production procedure, the polypeptides pro Any cell culture systems can be employed to express, or duced by host cells containing the vector may be glycosylated over-express, recombinant protein, including bacterial, or may be non-glycosylated. Polypeptides of the invention insect, yeast, fungal or mammalian cultures. Over-expression 50 may or may not also include an initial methionine amino acid can be effected by appropriate choice of promoters, enhanc residue. ers, vectors (e.g., use of replicon Vectors, dicistronic vectors Alternatively, the polypeptides of the invention, or frag (see, e.g., Gurtu (1996) Biochem. Biophys. Res. Commun. ments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 229:295-8), media, culture systems and the like. In one 100, or 150 or more consecutive amino acids thereof can be aspect, gene amplification using selection markers, e.g., 55 synthetically produced by conventional peptide synthesizers. glutamine synthetase (see, e.g., Sanders (1987) Dev. Biol. In other aspects, fragments or portions of the polypeptides Stand. 66:55-63), in cell systems are used to overexpress the may be employed for producing the corresponding full polypeptides of the invention. length polypeptide by peptide synthesis; therefore, the frag The host cell may be any of the host cells familiar to those ments may be employed as intermediates for producing the skilled in the art, including prokaryotic cells, eukaryotic cells, 60 full-length polypeptides. mammaliancells, insect cells, or plant cells. As representative Cell-free translation systems can also be employed to pro examples of appropriate hosts, there may be mentioned: bac duce one of the polypeptides of the invention, or fragments terial cells, such as E. coli, Streptomyces, Bacillus subtilis, comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, Bacillus cereus, Salmonella typhimurium and various species or 150 or more consecutive amino acids thereof using within the genera Streptomyces and Staphylococcus, fungal 65 mRNAs transcribed from a DNA construct comprising a pro cells, such as yeast, insect cells Such as Drosophila S2 and moteroperably linked to a nucleic acid encoding the polypep Spodoptera S9, animal cells such as CHO, COS or Bowes tide or fragment thereof. In some aspects, the DNA construct US 8,119,385 B2 127 128 may be linearized prior to conducting an in vitro transcription Press, N.Y. (1990) and PCR STRATEGIES (1995), ed. Inis, reaction. The transcribed mRNA is then incubated with an Academic Press, Inc., N.Y., ligase chain reaction (LCR) (see, appropriate cell-free translation extract, Such as a rabbit e.g., Wu (1989) Genomics 4:560; Landegren (1988) Science reticulocyte extract, to produce the desired polypeptide or 241:1077; Barringer (1990) Gene 89:117); transcription fragment thereof. 5 amplification (see, e.g., Kwoh (1989) Proc. Natl. Acad. Sci. Amplification of Nucleic Acids USA 86: 1173); and, self-sustained sequence replication (see, In practicing the invention, nucleic acids encoding the e.g., Guatelli (1990) Proc. Natl. Acad. Sci. USA87: 1874); Q polypeptides of the invention, or modified nucleic acids, can Beta replicase amplification (see, e.g., Smith (1997) J. Clin. be reproduced by, e.g., amplification. The invention provides Microbiol. 35:1477-1491), automated Q-beta replicase amplification primer sequence pairs for amplifying nucleic 10 amplification assay (see, e.g., Burg (1996) Mol. Cell. Probes acids encoding polypeptides (e.g., enzymes) of the invention. 10:257-271) and other RNA polymerase mediated techniques In one aspect, the primer pairs are capable of amplifying (e.g., NASBA, Cangene, Mississauga, Ontario); see also nucleic acid sequences of the invention, e.g., including the Berger (1987) Methods Enzymol. 152:307-316; Sambrook: exemplary SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ Ausubel; U.S. Pat. Nos. 4,683,195 and 4,683.202: Sooknanan ID NO:7, SEQ ID NO:9, SEQ ID NO:11, etc., including all 15 (1995) Biotechnology 13:563-564. nucleic acids disclosed in the SEQID listing, which include Determining the Degree of Sequence Identity all odd numbered SEQID NOs: from SEQID NO:1 through The invention provides nucleic acids comprising SEQ ID NO:26,897, or a subsequence thereof, etc. One of sequences having at least about 50%, 51%, 52%. 53%, 54%, skill in the art can design amplification primer sequence pairs 55%, 56%, 57%, 58%, 59%, 60%, 61%. 62%, 63%, 64%, for any part of or the full length of these sequences. 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, In one aspect, the invention provides a nucleic acid ampli 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, fied by a primer pair of the invention, e.g., a primer pair as set 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) 20, 21, 22, 23, 24, or 25 or more residues of a nucleic acid of sequence identity to an exemplary nucleic acid of the inven the invention, and about the first (the 5') 15, 16, 17, 18, 19, 20, 25 tion (e.g., SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ 21, 22, 23, 24, or 25 or more residues of the complementary ID NO:7, SEQ ID NO:9, SEQ ID NO:11, etc., including all Strand. nucleic acids disclosed in the SEQID listing, which include The invention provides an amplification primer sequence all even numbered SEQID NOs: from SEQID NO:1 through pair for amplifying a nucleic acid encoding a polypeptide SEQ ID NO:26,897, and nucleic acids encoding SEQ ID having an enzyme, structural or binding activity, wherein the 30 NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID primer pair is capable of amplifying a nucleic acid compris NO:10, etc., and all polypeptides disclosed in the SEQ ID ing a sequence of the invention, or fragments or subsequences listing, which include all even numbered SEQID NOs: from thereof. One or each member of the amplification primer SEQ ID NO:2 through SEQID NO:26,898) over a region of sequence pair can comprise an oligonucleotide comprising at at least about 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, least about 10 to 50 or more consecutive bases of the 35 500, 550, 600, 650, 700, 750, 800, 850,900,950, 1000, 1050, sequence, or about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22. 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550 23, 24, or 25 or more consecutive bases of the sequence. The or more, residues. The invention provides polypeptides com invention provides amplification primer pairs, wherein the prising sequences having at least about 50%, 51%, 52%. 53%, primer pair comprises a first member having a sequence as set 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 40 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 20, 21, 22, 23, 24, or 25 or more residues of a nucleic acid of 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, the invention, and a second member having a sequence as set 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 94%. 95%, 96%, 97%, 98%, 99%, or more, or complete 20, 21, 22, 23, 24, or 25 or more residues of the complemen (100%) sequence identity to an exemplary polypeptide of the tary strand of the first member. The invention provides a 45 invention. The extent of sequence identity (homology) may polypeptide, enzyme, protein, e.g. structural or binding pro be determined using any computer program and associated tein, generated by amplification, e.g., polymerase chain reac parameters, including those described herein, such as BLAST tion (PCR), using an amplification primer pair of the inven 2.2.2. or FASTA version 3.0t78, with the default parameters. tion. The invention provides methods of making a As used herein, the terms “computer.” “computer pro polypeptide, enzyme, protein, e.g. structural or binding pro- 50 gram' and “processor are used in their broadest general tein, by amplification, e.g., polymerase chain reaction (PCR). contexts and incorporate all Such devices, as described in using an amplification primer pair of the invention. In one detail, below. aspect, the amplification primer pair amplifies a nucleic acid Nucleic acid sequences of the invention can comprise at from a library, e.g., a gene library, such as an environmental least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200,300,400, library. 55 or 500 or more consecutive nucleotides of an exemplary Amplification reactions can also be used to quantify the sequence of the invention and sequences Substantially iden amount of nucleic acid in a sample (such as the amount of tical thereto. Homologous sequences and fragments of message in a cell sample), label the nucleic acid (e.g., to apply nucleic acid sequences of the invention can refer to a it to an array or a blot), detect the nucleic acid, or quantify the sequence having at least about 50%, 51%, 52%. 53%, 54%, amount of a specific nucleic acid in a sample. In one aspect of 60 55%, 56%, 57%, 58%, 59%, 60%, 61%. 62%, 63%, 64%, the invention, message isolated from a cellor a cDNA library 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, are amplified. 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, The skilled artisan can select and design Suitable oligo 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, nucleotide amplification primers. Amplification methods are 95%, 96%, 97%, 98%, 99%, or more sequence identity (ho also well known in the art, and include, e.g., polymerase chain 65 mology) to these sequences. Homology (sequence identity) reaction, PCR (see, e.g., PCR PROTOCOLS, A GUIDE TO may be determined using any of the computer programs and METHODS AND APPLICATIONS, ed. Innis, Academic parameters described herein, including FASTA version US 8,119,385 B2 129 130 3.0t78 with the default parameters. Homologous sequences ware Package, Genetics Computer Group, 575 Science Dr. also include RNA sequences in which uridines replace the Madison, Wis.), or by manual alignment and visual inspec in the nucleic acid sequences of the invention. The tion. Other algorithms for determining homology or identity homologous sequences may be obtained using any of the include, for example, in addition to a BLAST program (Basic procedures described herein or may result from the correction Local Alignment Search Tool at the National Center for Bio of a sequencing error. It will be appreciated that the nucleic logical Information), ALIGN, AMAS (Analysis of Multiply acid sequences of the invention can be represented in the Aligned Sequences), AMPS (Protein Multiple Sequence traditional single character format (See the inside back cover Alignment), ASSET (Aligned Segment Statistical Evaluation of Stryer, Lubert. Biochemistry, 3rd Ed., W. H. Freeman & Tool), BANDS, BESTSCOR, BIOSCAN (Biological Co., New York.) or in any other format which records the 10 Sequence Comparative Analysis Node), BLIMPS (BLocks identity of the nucleotides in a sequence. IMProved Searcher), FASTA, Intervals & Points, BMB, Various sequence comparison programs identified else CLUSTAL V, CLUSTAL W, CONSENSUS, LCONSEN where in this patent specification are particularly contem SUS, WCONSENSUS, Smith-Waterman algorithm, DAR plated for use in this aspect of the invention. Protein and/or WIN. Las Vegas algorithm, FNAT (Forced Nucleotide Align nucleic acid sequence homologies may be evaluated using 15 ment Tool), Framealign, Framesearch, DYNAMIC, FILTER, any of the variety of sequence comparison algorithms and FSAP (Fristensky Sequence Analysis Package), GAP (Glo programs known in the art. Such algorithms and programs bal Alignment Program), GENAL, GIBBS, GenQuest, ISSC include, but are by no means limited to, TBLASTN, (Sensitive Sequence Comparison), LALIGN (Local BLASTP, FASTA, TFASTA and CLUSTALW (see, e.g., Sequence Alignment), LCP (Local Content Program), Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85 (8):2444 MACAW (Multiple Alignment Construction & Analysis 2448, 1988: Altschul et al., J. Mol. Biol. 215(3):403-410, Workbench), MAP (Multiple Alignment Program), MBLKP, 1990; Thompson Nucleic Acids Res. 22(2):4673-4680, 1994; MBLKN, PIMA (Pattern-Induced Multi-sequence Align Higgins et al., Methods Enzymol. 266:383-402, 1996; Alts ment), SAGA (Sequence Alignment by Genetic Algorithm) chuletal., J. Mol. Biol. 215(3):403-410, 1990: Altschulet al., and WHAT-IF. Such alignment programs can also be used to Nature Genetics 3:266-272, 1993). 25 screen genome databases to identify polynucleotide Homology or identity is often measured using sequence sequences having Substantially identical sequences. A num analysis Software (e.g., Sequence Analysis Software Package ber of genome databases are available, for example, a Sub of the Genetics Computer Group, University of Wisconsin stantial portion of the human genome is available as part of Biotechnology Center, 1710 University Avenue, Madison, the Human Genome Sequencing Project (Gibbs, 1995). At Wis. 53705). Such software matches similar sequences by 30 least twenty-one other genomes have already been assigning degrees of homology to various deletions, Substi sequenced, including, for example, M. genitalium (Fraser et tutions and other modifications. The terms "homology and al., 1995), M. jannaschii (Bult et al., 1996), H. influenzae “identity” in the context of two or more nucleic acids or (Fleischmann et al., 1995), E. coli (Blattner et al., 1997) and polypeptide sequences, refer to two or more sequences or yeast (S. cerevisiae) (Mewes et al., 1997) and D. melano Subsequences that are the same or have a specified percentage 35 gaster (Adams et al., 2000). Significant progress has also of amino acid residues or nucleotides that are the same when been made in sequencing the genomes of model organism, compared and aligned for maximum correspondence over a Such as mouse, C. elegans and Arabadopsis sp. Several data comparison window or designated region as measured using bases containing genomic information annotated with some any number of sequence comparison algorithms or by manual functional information are maintained by different organiza alignment and visual inspection. 40 tions and may be accessible via internet. For sequence comparison, typically one sequence acts as a One example of a useful algorithm is BLAST and BLAST reference sequence, to which test sequences are compared. 2.0 algorithms, which are described in Altschul et al., Nuc. When using a sequence comparison algorithm, test and ref Acids Res. 25:3389-3402, 1977 and Altschul et al., J. Mol. erence sequences are entered into a computer, Subsequence Biol. 215:403-410, 1990, respectively. Software for perform coordinates are designated, if necessary and sequence algo 45 ing BLAST analyses is publicly available through the rithm program parameters are designated. Default program National Center for Biotechnology Information. This algo parameters can be used, or alternative parameters can be rithm involves first identifying high scoring sequence pairs designated. The sequence comparison algorithm then calcu (HSPs) by identifying short words of length W in the query lates the percent sequence identities for the test sequences sequence, which either match or satisfy some positive-valued relative to the reference sequence, based on the program 50 threshold score T when aligned with a word of the same parameters. length in a database sequence. T is referred to as the neigh A “comparison window', as used herein, includes refer borhood word score threshold (Altschul et al., supra). These ence to a segment of any one of the number of contiguous initial neighborhood word hits act as seeds for initiating positions selected from the group consisting of from 20 to searches to find longer HSPs containing them. The word hits 600, usually about 50 to about 200, more usually about 100 to 55 are extended in both directions along each sequence for as far about 150 in which a sequence may be compared to a refer as the cumulative alignment score can be increased. Cumu ence sequence of the same number of contiguous positions lative scores are calculated using, for nucleotide sequences, after the two sequences are optimally aligned. Methods of the parameters M (reward score for a pair of matching resi alignment of sequence for comparison are well-known in the dues; always >0). Foramino acid sequences, a scoring matrix art. Optimal alignment of sequences for comparison can be 60 is used to calculate the cumulative score. Extension of the conducted, e.g., by the local homology algorithm of Smith & word hits in each direction are halted when: the cumulative Waterman, Adv. Appl. Math. 2:482, 1981, by the homology alignment score falls off by the quantity X from its maximum alignment algorithm of Needleman & Wunsch, J. Mol. Biol. achieved value; the cumulative score goes to Zero or below, 48:443, 1970, by the search for similarity method of person & due to the accumulation of one or more negative-scoring Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988, by com 65 residue alignments; or the end of either sequence is reached. puterized implementations of these algorithms (GAP, BEST The BLAST algorithm parameters W.T and X determine the FIT, FASTA and TFASTA in the Wisconsin Genetics Soft sensitivity and speed of the alignment. The BLASTN pro US 8,119,385 B2 131 132 gram (for nucleotide sequences) uses as defaults a wordlength acid and polypeptide sequences of the invention. As used (W) of 11, an expectation (E) of 10, M=5, N=-4 and a com herein, the words “recorded and “stored refer to a process parison of both strands. For amino acid sequences, the for storing information on a computer medium. A skilled BLASTP program uses as defaults a wordlength of 3 and artisan can readily adopt any known methods for recording expectations (E) of 10 and the BLOSUM62 scoring matrix 5 information on a computer readable medium to generate (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA manufactures comprising one or more of the nucleic acid 89:10915, 1989) alignments (B) of 50, expectation (E) of 10, and/or polypeptide sequences of the invention. M=5, N=-4 and a comparison of both strands. The polypeptides of the invention include the polypeptide The BLAST algorithm also performs a statistical analysis sequences of the invention, e.g., the exemplary sequences of of the similarity between two sequences (see, e.g., Karlin & 10 the invention, and sequences Substantially identical thereto, Altschul, Proc. Natl. Acad. Sci. USA 90:5873, 1993). One and fragments of any of the preceding sequences. Substan measure of similarity provided by BLAST algorithm is the tially identical, or homologous, polypeptide sequences refer smallest sum probability (P(N)), which provides an indica to a polypeptide sequence having at least 50%, 51%, 52%, tion of the probability by which a match between two nucle 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, otide or amino acid sequences would occur by chance. For 15 63%, 64%. 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, example, a nucleic acid is considered similar to a references 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, sequence if the Smallest Sum probability in a comparison of 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, the test nucleic acid to the reference nucleic acid is less than 93%, 94%, 95%, 96%,97%.98%, 99%, or more, or complete about 0.2, more in one aspect less than about 0.01 and most in (100%) sequence identity (homology) to an exemplary one aspect less than about 0.001. sequence of the invention. In one aspect, protein and nucleic acid sequence homolo Homology (sequence identity) may be determined using gies are evaluated using the Basic Local Alignment Search any of the computer programs and parameters described Tool (“BLAST) In particular, five specific BLAST programs herein. A nucleic acid or polypeptide sequence of the inven are used to perform the following task: tion can be stored, recorded and manipulated on any medium (1) BLASTP and BLAST3 compare an amino acid query 25 which can be read and accessed by a computer. As used sequence against a protein sequence database; herein, the words “recorded and “stored refer to a process (2) BLASTN compares a nucleotide query sequence for storing information on a computer medium. A skilled against a nucleotide sequence database; artisan can readily adopt any of the presently known methods (3) BLASTX compares the six-frame conceptual transla for recording information on a computer readable medium to tion products of a query nucleotide sequence (both 30 generate manufactures comprising one or more of the nucleic strands) against a protein sequence database; acid sequences of the invention, one or more of the polypep (4) TBLASTN compares a query protein sequence against tide sequences of the invention. Another aspect of the inven a nucleotide sequence database translated in all six read tion is a computer readable medium having recorded thereon ing frames (both strands); and at least 2, 5, 10, 15, or or more nucleic acid or polypeptide (5) TBLASTX compares the six-frame translations of a 35 sequences of the invention. nucleotide query sequence against the six-frame trans Another aspect of the invention is a computer readable lations of a nucleotide sequence database. medium having recorded thereon one or more of the nucleic The BLAST programs identify homologous sequences by acid sequences of the invention. Another aspect of the inven identifying similar segments, which are referred to herein as tion is a computer readable medium having recorded thereon “high-scoring segment pairs.” between a query amino or 40 one or more of the polypeptide sequences of the invention. nucleic acid sequence and a test sequence which is in one Another aspect of the invention is a computer readable aspect obtained from a protein or nucleic acid sequence data medium having recorded thereon at least 2, 5, 10, 15, or 20 or base. High-scoring segment pairs are in one aspect identified more of the nucleic acid or polypeptide sequences as set forth (i.e., aligned) by means of a scoring matrix, many of which above. are known in the art. In one aspect, the scoring matrix used is 45 Computer readable media include magnetically readable the BLOSUM62 matrix (Gonnet (1992) Science 256:1443 media, optically readable media, electronically readable 1445; Henikoff and Henikoff (1993) Proteins 17:49-61). Less media and magnetic/optical media. For example, the com in one aspect, the PAM or PAM250 matrices may also be used puter readable media may be a hard disk, a floppy disk, a (see, e.g., Schwartz and Dayhoff, eds., 1978, Matrices for magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Detecting Distance Relationships. Atlas of Protein Sequence 50 dom. Access Memory (RAM), or Read Only Memory (ROM) and Structure, Washington: National Biomedical Research as well as other types of other media known to those skilled in Foundation). BLAST programs are accessible through the the art. U.S. National Library of Medicine. Aspects of the invention include systems (e.g., internet The parameters used with the above algorithms may be based systems), particularly computer systems which store adapted depending on the sequence length and degree of 55 and manipulate the sequence information described herein. homology studied. In some aspects, the parameters may be One example of a computer system 100 is illustrated in block the default parameters used by the algorithms in the absence diagram form in FIG.1. As used herein, “a computer system” of instructions from the user. refers to the hardware components, Software components and Computer Systems and Computer Program Products data storage components used to analyze a nucleotide To determine and identify sequence identities, structural 60 sequence of a nucleic acid sequence of the invention, or a homologies, motifs and the like in silico, a nucleic acid or polypeptide sequence of the invention. The computer system polypeptide sequence of the invention can be stored, 100 typically includes a processor for processing, accessing recorded, and manipulated on any medium which can be read and manipulating the sequence data. The processor 105 can and accessed by a computer. be any well-known type of central processing unit, such as, Accordingly, the invention provides computers, computer 65 for example, the Pentium III from Intel Corporation, or simi systems, computer readable mediums, computer programs lar processor from Sun, Motorola, Compaq, AMD or Inter products and the like recorded or stored thereon the nucleic national Business Machines. US 8,119,385 B2 133 134 Typically the computer system 100 is a general purpose The process 200 then moves to a state 206 wherein the first system that comprises the processor 105 and one or more sequence stored in the database is read into a memory on the internal data storage components 110 for storing data and one computer. A comparison is then performed at a state 210 to or more data retrieving devices for retrieving the data stored determine if the first sequence is the same as the second on the data storage components. A skilled artisan can readily sequence. It is important to note that this step is not limited to appreciate that any one of the currently available computer performing an exact comparison between the new sequence systems are Suitable. and the first sequence in the database. Well-known methods In one particular aspect, the computer system 100 includes are known to those of skill in the art for comparing two a processor 105 connected to a bus which is connected to a nucleotide or protein sequences, even if they are not identical. main memory 115 (in one aspect implemented as RAM) and 10 For example, gaps can be introduced into one sequence in one or more internal data storage devices 110. Such as a hard order to raise the homology level between the two tested drive and/or other computer readable media having data sequences. The parameters that control whether gaps or other recorded thereon. In some aspects, the computer system 100 features are introduced into a sequence during comparison further includes one or more data retrieving device 118 for are normally entered by the user of the computer system. reading the data stored on the internal data storage devices 15 Once a comparison of the two sequences has been per 110. formed at the state 210, a determination is made at a decision The data retrieving device 118 may represent, for example, state 210 whether the two sequences are the same. Of course, a floppy disk drive, a compact disk drive, a magnetic tape the term “same is not limited to sequences that are absolutely drive, or a modem capable of connection to a remote data identical. Sequences that are within the homology parameters storage system (e.g., via internet) etc. In some aspects, the entered by the user will be marked as “same' in the process internal data storage device 110 is a removable computer 2OO. readable medium Such as a floppy disk, a compact disk, a If a determination is made that the two sequences are the magnetic tape, etc. containing control logic and/or data same, the process 200 moves to a state 214 wherein the name recorded thereon. The computer system 100 may advanta of the sequence from the database is displayed to the user. geously include or be programmed by appropriate Software 25 This state notifies the user that the sequence with the dis for reading the control logic and/or the data from the data played name fulfills the homology constraints that were storage component once inserted in the data retrieving device. entered. Once the name of the stored sequence is displayed to The computer system 100 includes a display 120 which is the user, the process 200 moves to a decision state 218 used to display output to a computer user. It should also be wherein a determination is made whether more sequences noted that the computer system 100 can be linked to other 30 exist in the database. If no more sequences exist in the data computer systems 125A-C in a network or wide area network base, then the process 200 terminates at an end state 220. to provide centralized access to the computer system 100. However, if more sequences do exist in the database, then the Software for accessing and processing the nucleotide process 200 moves to a state 224 wherein a pointer is moved sequences of a nucleic acid sequence of the invention, or a to the next sequence in the database so that it can be compared polypeptide sequence of the invention, (such as search tools, 35 to the new sequence. In this manner, the new sequence is compare tools and modeling tools etc.) may reside in main aligned and compared with every sequence in the database. memory 115 during execution. It should be noted that if a determination had been made at In some aspects, the computer system 100 may further the decision state 212 that the sequences were not homolo comprise a sequence comparison algorithm for comparing a gous, then the process 200 would move immediately to the nucleic acid sequence of the invention, or a polypeptide 40 decision state 218 in order to determine if any other sequences sequence of the invention, stored on a computer readable were available in the database for comparison. medium to a reference nucleotide or polypeptide sequence(s) Accordingly, one aspect of the invention is a computer stored on a computer readable medium. A 'sequence com system comprising a processor, a data storage device having parison algorithm' refers to one or more programs which are stored thereon a nucleic acid sequence of the invention, or a implemented (locally or remotely) on the computer system 45 polypeptide sequence of the invention, a data storage device 100 to compare a nucleotide sequence with other nucleotide having retrievably stored thereon reference nucleotide sequences and/or compounds stored within a data storage sequences or polypeptide sequences to be compared to a means. For example, the sequence comparison algorithm nucleic acid sequence of the invention, or a polypeptide may compare the nucleotide sequences of a nucleic acid sequence of the invention and a sequence comparer for con sequence of the invention, or a polypeptide sequence of the 50 ducting the comparison. The sequence comparer may indi invention, stored on a computer readable medium to reference cate a homology level between the sequences compared or sequences stored on a computer readable medium to identify identify structural motifs in the above described nucleic acid homologies or structural motifs. code a nucleic acid sequence of the invention, or a polypep FIG. 2 is a flow diagram illustrating one aspect of a process tide sequence of the invention, or it may identify structural 200 for comparing a new nucleotide or protein sequence with 55 motifs in sequences which are compared to these nucleic acid a database of sequences in order to determine the homology codes and polypeptide codes. In some aspects, the data Stor levels between the new sequence and the sequences in the age device may have stored thereon the sequences of at least database. The database of sequences can be a private database 2, 5, 10, 15, 20, 25, 30 or 40 or more of the nucleic acid stored within the computer system 100, or a public database sequences of the invention, or the polypeptide sequences of such as GENBANK that is available through the Internet. 60 the invention. The process 200 begins at a start state 201 and then moves Another aspect of the invention is a method for determin to a state 202 wherein the new sequence to be compared is ing the level of homology between a nucleic acid sequence of stored to a memory in a computer system 100. As discussed the invention, or a polypeptide sequence of the invention and above, the memory could be any type of memory, including a reference nucleotide sequence. The method including read RAM or an internal storage device. 65 ing the nucleic acid code or the polypeptide code and the The process 200 then moves to a state 204 wherein a reference nucleotide or polypeptide sequence through the use database of sequences is opened for analysis and comparison. of a computer program which determines homology levels US 8,119,385 B2 135 136 and determining homology between the nucleic acid code or through use of a computer program which identifies differ polypeptide code and the reference nucleotide or polypeptide ences between nucleic acid sequences and identifying differ sequence with the computer program. The computer program ences between the nucleic acid code and the reference nucle may be any of a number of computer programs for determin otide sequence with the computer program. In some aspects, ing homology levels, including those specifically enumerated the computer program is a program which identifies single herein, (e.g., BLAST2N with the default parameters or with nucleotide polymorphisms. The method may be implemented any modified parameters). The method may be implemented by the computer systems described above and the method using the computer systems described above. The method illustrated in FIG. 3. The method may also be performed by may also be performed by reading at least 2, 5, 10, 15, 20, 25, reading at least 2, 5, 10, 15, 20, 25, 30, or 40 or more of the 30 or 40 or more of the above described nucleic acid 10 nucleic acid sequences of the invention and the reference sequences of the invention, or the polypeptide sequences of nucleotide sequences through the use of the computer pro the invention through use of the computer program and deter gram and identifying differences between the nucleic acid mining homology between the nucleic acid codes or polypep codes and the reference nucleotide sequences with the com tide codes and reference nucleotide sequences or polypeptide puter program. Sequences. 15 In other aspects the computer based system may further FIG.3 is a flow diagram illustrating one aspect of a process comprise an identifier for identifying features within a 250 in a computer for determining whether two sequences are nucleic acid sequence of the invention or a polypeptide homologous. The process 250 begins at a start state 252 and sequence of the invention. then moves to a state 254 wherein a first sequence to be An “identifier” refers to one or more programs which iden compared is stored to a memory. The second sequence to be tifies certain features within a nucleic acid sequence of the compared is then stored to a memory at a state 256. The invention, or a polypeptide sequence of the invention. In one process 250 then moves to a state 260 wherein the first char aspect, the identifier may comprise a program which identi acter in the first sequence is read and then to a state 262 fies an open reading frame in a nucleic acid sequence of the wherein the first character of the second sequence is read. It invention. should be understood that if the sequence is a nucleotide 25 FIG. 4 is a flow diagram illustrating one aspect of an sequence, then the character would normally be either A.T. C. identifier process 300 for detecting the presence of a feature in G or U. If the sequence is a protein sequence, then it is in one a sequence. The process 300 begins at a start state 302 and aspect in the single letter amino acid code so that the first and then moves to a state 304 wherein a first sequence that is to be sequence sequences can be easily compared. checked for features is stored to a memory 115 in the com A determination is then made at a decision state 264 30 puter system 100. The process 300 then moves to a state 306 whether the two characters are the same. If they are the same, wherein a database of sequence features is opened. Such a then the process 250 moves to a state 268 wherein the next database would include a list of each feature’s attributes along characters in the first and second sequences are read. A deter with the name of the feature. For example, a feature name mination is then made whether the next characters are the could be “Initiation Codon and the attribute would be same. If they are, then the process 250 continues this loop 35 “ATG'. Another example would be the feature name until two characters are not the same. If a determination is “TAATAA Box” and the feature attribute would be made that the next two characters are not the same, the pro “TAATAA’. An example of such a database is produced by cess 250 moves to a decision state 274 to determine whether the University of Wisconsin Genetics Computer Group. there are any more characters either sequence to read. Alternatively, the features may be structural polypeptide If there are not any more characters to read, then the pro 40 motifs such as alpha helices, beta sheets, or functional cess 250 moves to a state 276 wherein the level of homology polypeptide motifs such as enzymatic active sites, helix-turn between the first and second sequences is displayed to the helix motifs or other motifs known to those skilled in the art. user. The level of homology is determined by calculating the Once the database of features is opened at the state 306, the proportion of characters between the sequences that were the process 300 moves to a state 308 wherein the first feature is same out of the total number of sequences in the first 45 read from the database. A comparison of the attribute of the sequence. Thus, if every character in a first 100 nucleotide first feature with the first sequence is then made at a state 310. sequence aligned with a every character in a second sequence, A determination is then made at a decision state 316 whether the homology level would be 100%. the attribute of the feature was found in the first sequence. If Alternatively, the computer program may be a computer the attribute was found, then the process 300 moves to a state program which compares the nucleotide sequences of a 50 318 wherein the name of the found feature is displayed to the nucleic acid sequence as set forth in the invention, to one or USC. more reference nucleotide sequences in order to determine The process 300 then moves to a decision state 320 wherein whether the nucleic acid code of the invention, differs from a a determination is made whether move features exist in the reference nucleic acid sequence at one or more positions. database. If no more features do exist, then the process 300 Optionally Such a program records the length and identity of 55 terminates at an end state 324. However, if more features do inserted, deleted or substituted nucleotides with respect to the exist in the database, then the process 300 reads the next sequence of either the reference polynucleotide or a nucleic sequence feature at a state 326 and loops back to the state 310 acid sequence of the invention. In one aspect, the computer wherein the attribute of the next feature is compared against program may be a program which determines whether a the first sequence. It should be noted, that if the feature nucleic acid sequence of the invention, contains a single 60 attribute is not found in the first sequence at the decision state nucleotide polymorphism (SNP) with respect to a reference 316, the process 300 moves directly to the decision state 320 nucleotide sequence. in order to determine if any more features exist in the data Accordingly, another aspect of the invention is a method base. for determining whether a nucleic acid sequence of the inven Accordingly, another aspect of the invention is a method of tion, differs at one or more nucleotides from a reference 65 identifying a feature within a nucleic acid sequence of the nucleotide sequence comprising the steps of reading the invention, or a polypeptide sequence of the invention, com nucleic acid code and the reference nucleotide sequence prising reading the nucleic acid code(s) or polypeptide code US 8,119,385 B2 137 138 (s) through the use of a computer program which identifies plary sequence of the invention (e.g., SEQID NO:3, SEQID features therein and identifying features within the nucleic NO:5, SEQID NO:7, SEQ ID NO:9). The stringent condi acid code(s) with the computer program. In one aspect, com tions can be highly stringent conditions, medium stringent puter program comprises a computer program which identi conditions and/or low stringent conditions, including the high fies open reading frames. The method may be performed by and reduced Stringency conditions described herein. In one reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or aspect, it is the stringency of the wash conditions that set forth 40 of the nucleic acid sequences of the invention, or the the conditions which determine whether a nucleic acid is polypeptide sequences of the invention, through the use of the within the scope of the invention, as discussed below. computer program and identifying features within the nucleic “Hybridization” refers to the process by which a nucleic acid codes or polypeptide codes with the computer program. 10 A nucleic acid sequence of the invention, or a polypeptide acid strand joins with a complementary Strand through base sequence of the invention, may be stored and manipulated in pairing. Hybridization reactions can be sensitive and selective a variety of data processor programs in a variety of formats. so that a particular sequence of interest can be identified even For example, a nucleic acid sequence of the invention, or a in samples in which it is present at low concentrations. Suit polypeptide sequence of the invention, may be stored as text 15 ably stringent conditions can be defined by, for example, the in a word processing file, such as Microsoft WORDTM or concentrations of salt or formamide in the prehybridization WORDPERFECTTM or as an ASCII file in a variety of data and hybridization solutions, or by the hybridization tempera base programs familiar to those of skill in the art, such as ture and are well known in the art. In particular, stringency DB2TM, SYBASETM, or ORACLETM. In addition, many com can be increased by reducing the concentration of salt, puter programs and databases may be used as sequence com increasing the concentration of formamide, or raising the parison algorithms, identifiers, or sources of reference nucle hybridization temperature. In alternative aspects, nucleic otide sequences or polypeptide sequences to be compared to acids of the invention are defined by their ability to hybridize a nucleic acid sequence of the invention, or a polypeptide under various stringency conditions (e.g., high, medium, and sequence of the invention. The following list is intended not to low), as set forth herein. limit the invention but to provide guidance to programs and 25 For example, hybridization under high Stringency condi databases which are useful with the nucleic acid sequences of tions could occur in about 50% formamide at about 37° C. to the invention, or the polypeptide sequences of the invention. 42°C. Hybridization could occur under reduced stringency The programs and databases which may be used include, conditions in about 35% to 25% formamide at about 30°C. to but are not limited to: MacPattern (EMBL), DiscoveryBase 35° C. In one aspect, hybridization occurs under high strin (Molecular Applications Group). GeneMine (Molecular 30 gency conditions, e.g., at 42°C. in 50% formamide, 5xSSPE, Applications Group), Look (Molecular Applications Group), 0.3% SDS and 200 n/ml sheared and denatured salmon sperm MacLook (Molecular Applications Group). BLAST and DNA. Hybridization could occur under these reduced strin BLAST2 (NCBI), BLASTN and BLASTX (Altschuletal, J. gency conditions, but in 35% formamide at a reduced tem Mol. Biol. 215: 403, 1990), FASTA (Pearson and Lipman, perature of 35°C. The temperature range corresponding to a Proc. Natl. Acad. Sci. USA, 85: 2444, 1988), FASTDB (Brut 35 particular level of stringency can be further narrowed by lag et al. Comp. App. Biosci. 6:237-245, 1990), Catalyst calculating the purine to pyrimidine ratio of the nucleic acid (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular of interest and adjusting the temperature accordingly. Varia Simulations Inc.), Cerius.DBAccess (Molecular Simula tions on the above ranges and conditions are well known in tions Inc.), HypoCien (Molecular Simulations Inc.), Insight the art. II, (Molecular Simulations Inc.), Discover (Molecular Simu 40 In alternative aspects, nucleic acids of the invention as lations Inc.), CHARMm (Molecular Simulations Inc.), Felix defined by their ability to hybridize under stringent condi (Molecular Simulations Inc.), DelPhi, (Molecular Simula tions can be between about five residues and the full length of tions Inc.), QuanteMM, (Molecular Simulations Inc.), nucleic acid of the invention; e.g., they can be at least 5, 10. Homology (Molecular Simulations Inc.), Modeler (Molecu 15, 20, 25, 30, 35, 40, 50,55, 60, 65,70, 75, 80,90, 100, 150, lar Simulations Inc.), ISIS (Molecular Simulations Inc.), 45 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, Quanta/Protein Design (Molecular Simulations Inc.), 800, 850,900,950, 1000, or more, residues in length. Nucleic WebLab (Molecular Simulations Inc.), WebLab Diversity acids shorter than full length are also included. These nucleic Explorer (Molecular Simulations Inc.), Gene Explorer (Mo acids can be useful as, e.g., hybridization probes, labeling lecular Simulations Inc.), SeqPold (Molecular Simulations probes, PCR oligonucleotide probes, iRNA (single or double Inc.), the MDL. Available Chemicals Directory database, the 50 Stranded), antisense or sequences encoding antibody binding MDL Drug Data Report data base, the Comprehensive peptides (epitopes), motifs, active sites and the like. Medicinal Chemistry database, Derwents's World Drug In one aspect, nucleic acids of the invention are defined by Index database, the BioByteMasterFile database, the Gen their ability to hybridize under high Stringency comprises bank database and the Genseqn database. Many other pro conditions of about 50% formamide at about 37° C. to 42°C. grams and databases would be apparent to one of skill in the 55 In one aspect, nucleic acids of the invention are defined by art given the present disclosure. their ability to hybridize under reduced stringency compris Motifs which may be detected using the above programs ing conditions in about 35% to 25% formamide at about 30° include sequences encoding leucine Zippers, helix-turn-helix C. to 35° C. motifs, glycosylation sites, ubiquitination sites, alpha helices Alternatively, nucleic acids of the invention are defined by and beta sheets, signal sequences encoding signal peptides 60 their ability to hybridize under high Stringency comprising which direct the secretion of the encoded proteins, sequences conditions at 42°C. in 50% formamide, 5xSSPE, 0.3% SDS, implicated in transcription regulation such as homeoboxes, and a repetitive sequence blocking nucleic acid. Such as cot-1 acidic stretches, enzymatic active sites, Substrate binding or salmon sperm DNA (e.g., 200 n/ml sheared and denatured sites and enzymatic cleavage sites. salmon sperm DNA). In one aspect, nucleic acids of the Hybridization of Nucleic Acids 65 invention are defined by their ability to hybridize under The invention provides isolated or recombinant nucleic reduced stringency conditions comprising 35% formamide at acids that hybridize under Stringent conditions to an exem a reduced temperature of 35° C. US 8,119,385 B2 139 140 In nucleic acid hybridization reactions, the conditions used of “low stringency” hybridization conditions is when the to achieve aparticular level of stringency will vary, depending above hybridization is conducted at 45° C. on the nature of the nucleic acids being hybridized. For Alternatively, the hybridization may be carried out in buff example, the length, degree of complementarity, nucleotide ers, such as 6xSSC, containing formamide at a temperature of sequence composition (e.g., GC v. AT content) and nucleic 42°C. In this case, the concentration of formamide in the acid type (e.g., RNA v. DNA) of the hybridizing regions of the hybridization buffer may be reduced in 5% increments from nucleic acids can be considered in selecting hybridization 50% to 0% to identify clones having decreasing levels of conditions. An additional consideration is whether one of the homology to the probe. Following hybridization, the filter nucleic acids is immobilized, for example, on a filter. may be washed with 6xSSC, 0.5% SDS at 50° C. These Hybridization may be carried out under conditions of low 10 conditions are considered to be “moderate' conditions above stringency, moderate Stringency or high Stringency. As an 25% formamide and “low” conditions below 25% forma example of nucleic acid hybridization, a polymer membrane mide. A specific example of “moderate’ hybridization con containing immobilized denatured nucleic acids is first pre ditions is when the above hybridization is conducted at 30% hybridized for 30 minutes at 45° C. in a solution consisting of formamide. A specific example of “low stringency” hybrid 0.9 M. NaCl, 50 mM NaHPO, pH 7.0, 5.0 mM NaFDTA, 15 ization conditions is when the above hybridization is con 0.5% SDS, 10xDenhardt’s and 0.5 mg/ml polyriboadenylic ducted at 10% formamide. acid. Approximately 2x10 cpm (specific activity 4-9x10 However, the selection of a hybridization format is not cpm/ug) of Pend-labeled oligonucleotide probe are then critical—it is the stringency of the wash conditions that set added to the solution. After 12-16 hours of incubation, the forth the conditions which determine whether a nucleic acid membrane is washed for 30 minutes at room temperature in is within the scope of the invention. Wash conditions used to 1xSET (150 mMNaCl, 20 mM Tris hydrochloride, pH 7.8, 1 identify nucleic acids within the scope of the invention mM NaEDTA) containing 0.5% SDS, followed by a 30 include, e.g.: a salt concentration of about 0.02 molar at pH 7 minute wash in fresh 1xSET at T-10°C. for the oligonucle and a temperature of at least about 50° C. or about 55° C. to otide probe. The membrane is then exposed to auto-radio about 60° C.; or, a salt concentration of about 0.15 MNaCl at graphic film for detection of hybridization signals. 25 72°C. for about 15 minutes; or, a salt concentration of about All of the foregoing hybridizations would be considered to 0.2xSSC at a temperature of at least about 50° C. or about 55° be under conditions of high Stringency. C. to about 60° C. for about 15 to about 20 minutes; or, the Following hybridization, a filter can be washed to remove hybridization complex is washed twice with a solution with a any non-specifically bound detectable probe. The stringency salt concentration of about 2xSSC containing 0.1% SDS at used to wash the filters can also be varied depending on the 30 room temperature for 15 minutes and then washed twice by nature of the nucleic acids being hybridized, the length of the 0.1xSSC containing 0.1% SDS at 68° C. for 15 minutes; or, nucleic acids being hybridized, the degree of complementa equivalent conditions. See Sambrook, Tijssen and Ausubel rity, the nucleotide sequence composition (e.g., GC v. AT for a description of SSC buffer and equivalent conditions. content) and the nucleic acid type (e.g., RNA v. DNA). These methods may be used to isolate nucleic acids of the Examples of progressively higher stringency condition 35 invention. For example, the preceding methods may be used washes are as follows: 2xSSC, 0.1% SDS at room tempera to isolate nucleic acids having a sequence with at least about ture for 15 minutes (low stringency); 0.1xSSC, 0.5% SDS at 97%, at least 95%, at least 90%, at least 85%, at least 80%, at room temperature for 30 minutes to 1 hour (moderate strin least 75%, at least 70%, at least 65%, at least 60%, at least gency); 0.1xSSC, 0.5% SDS for 15 to 30 minutes at between 55%, or at least 50% sequence identity (homology) to a the hybridization temperature and 68°C. (high stringency): 40 nucleic acid sequence selected from the group consisting of and 0.15M NaCl for 15 minutes at 72° C. (very high strin one of the sequences of the invention, or fragments compris gency). A final low stringency wash can be conducted in ing at least about 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 0.1 xSSC at room temperature. The examples above are 200, 300, 400, or 500 consecutive bases thereof and the merely illustrative of one set of conditions that can be used to sequences complementary thereto. Sequence identity (ho wash filters. One of skill in the art would know that there are 45 mology) may be measured using the alignment algorithm. For numerous recipes for different stringency washes. Some example, the homologous polynucleotides may have a coding other examples are given below. sequence which is a naturally occurring allelic variant of one In one aspect, hybridization conditions comprise a wash of the coding sequences described herein. Such allelic vari step comprising a wash for 30 minutes at room temperature in ants may have a Substitution, deletion or addition of one or a solution comprising 1x150 mM NaCl, 20 mM Tris hydro 50 more nucleotides when compared to the nucleic acids of the chloride, pH 7.8, 1 mM NaEDTA, 0.5% SDS, followed by a invention. Additionally, the above procedures may be used to 30 minute wash in fresh solution. isolate nucleic acids which encode polypeptides having at Nucleic acids which have hybridized to the probe are iden least about 99%. 95%, at least 90%, at least 85%, at least 80%, tified by autoradiography or other conventional techniques. at least 75%, at least 70%, at least 65%, at least 60%, at least The above procedure may be modified to identify nucleic 55 55%, or at least 50% sequence identity (homology) to a acids having decreasing levels of homology to the probe polypeptide of the invention, or fragments comprising at least sequence. For example, to obtain nucleic acids of decreasing 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive homology to the detectable probe, less Stringent conditions amino acids thereof as determined using a sequence align may be used. For example, the hybridization temperature may ment algorithm (e.g., such as the FASTA version 3.0t78 algo be decreased in increments of 5° C. from 68° C. to 42°C. in 60 rithm with the default parameters). a hybridization buffer having a Na" concentration of approxi Oligonucleotides Probes and Methods for Using Them mately 1M. Following hybridization, the filter may be washed The invention also provides nucleic acid probes that can be with 2xSSC, 0.5% SDS at the temperature of hybridization. used, e.g., for identifying nucleic acids encoding a polypep These conditions are considered to be “moderate' conditions tide with an enzyme, structural or binding activity or frag above 50° C. and “low” conditions below 50° C. A specific 65 ments thereof or for identifying polypeptide, enzyme, pro example of “moderate’ hybridization conditions is when the tein, e.g. structural or binding protein, genes. In one aspect, above hybridization is conducted at 55°C. A specific example the probe comprises at least 10 consecutive bases of a nucleic US 8,119,385 B2 141 142 acid of the invention. Alternatively, a probe of the invention cations 1:25-33, 1991; and Walker G. T. et al., “Strand can be at least about 5, 6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, Displacement Amplification—an Isothermal in vitro DNA 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45,50, 60, 70, 80,90, Amplification Technique'. Nucleic Acid Research 20:1691 100, 110, 120, 130, 150 or about 10 to 50, about 20 to 60 about 1696, 1992). In such procedures, the nucleic acids in the 30 to 70, consecutive bases of a sequence as set forth in a sample are contacted with the probes, the amplification reac nucleic acid of the invention. The probes identify a nucleic tion is performed and any resulting amplification product is acid by binding and/or hybridization. The probes can be used detected. The amplification product may be detected by per in arrays of the invention, see discussion below, including, forming gel electrophoresis on the reaction products and e.g., capillary arrays. The probes of the invention can also be staining the gel with an intercalator Such as ethidium bro used to isolate other nucleic acids or polypeptides. 10 mide. Alternatively, one or more of the probes may be labeled The isolated nucleic acids of the invention, the sequences with a radioactive isotope and the presence of a radioactive complementary thereto, or a fragment comprising at least 10, amplification product may be detected by autoradiography 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200,300, 400, or 500 after gel electrophoresis. consecutive bases of one of the sequences of the invention, or Probes derived from sequences near the ends of the the sequences complementary thereto may also be used as 15 sequences of the invention, may also be used in chromosome probes to determine whether a biological sample, such as a walking procedures to identify clones containing genomic soil sample, contains an organism having a nucleic acid sequences located adjacent to the sequences of the invention. sequence of the invention or an organism from which the Such methods allow the isolation of genes which encode nucleic acid was obtained. In such procedures, a biological additional proteins from the host organism. sample potentially harboring the organism from which the The isolated nucleic acids of the invention, the sequences nucleic acid was isolated is obtained and nucleic acids are complementary thereto, or a fragment comprising at least 10, obtained from the sample. The nucleic acids are contacted 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200,300, 400, or 500 with the probe under conditions which permit the probe to consecutive bases of one of the sequences of the invention, or specifically hybridize to any complementary sequences from the sequences complementary thereto may be used as probes which are present therein. 25 to identify and isolate related nucleic acids. In some aspects, Where necessary, conditions which permit the probe to the related nucleic acids may be cDNAs or genomic specifically hybridize to complementary sequences may be from organisms other than the one from which the nucleic determined by placing the probe in contact with complemen acid was isolated. For example, the other organisms may be tary sequences from samples known to contain the comple related organisms. In Such procedures, a nucleic acid sample mentary sequence as well as control sequences which do not 30 is contacted with the probe under conditions which permit the contain the complementary sequence. Hybridization condi probe to specifically hybridize to related sequences. Hybrid tions, such as the salt concentration of the hybridization ization of the probe to nucleic acids from the related organism buffer, the formamide concentration of the hybridization is then detected using any of the methods described above. buffer, or the hybridization temperature, may be varied to By varying the stringency of the hybridization conditions identify conditions which allow the probe to hybridize spe 35 used to identify nucleic acids, such as cDNAS or genomic cifically to complementary nucleic acids. DNAs, which hybridize to the detectable probe, nucleic acids If the sample contains the organism from which the nucleic having different levels of homology to the probe can be iden acid was isolated, specific hybridization of the probe is then tified and isolated. Stringency may be varied by conducting detected. Hybridization may be detected by labeling the the hybridization at varying temperatures below the melting probe with a detectable agent such as a radioactive isotope, a 40 temperatures of the probes. The melting temperature, T, is fluorescent dye or an enzyme capable of catalyzing the for the temperature (under defined ionic strength and pH) at mation of a detectable product. which 50% of the target sequence hybridizes to a perfectly Many methods for using the labeled probes to detect the complementary probe. Very stringent conditions are selected presence of complementary nucleic acids in a sample are to be equal to or about 5° C. lower than the T for a particular familiar to those skilled in the art. These include Southern 45 probe. The melting temperature of the probe may be calcu Blots, Northern Blots, colony hybridization procedures and lated using the following formulas: dot blots. Protocols for each of these procedures are provided For probes between 14 and 70 nucleotides in length the in Ausubel et al. Current Protocols in Molecular Biology, melting temperature (T,) is calculated using the formula: John Wiley 503 Sons, Inc. (1997) and Sambrook et al., T81.5+16.6(log Na+I)+0.41 (fraction G+C)-(600/N) Molecular Cloning: A Laboratory Manual 2nd Ed., Cold 50 where N is the length of the probe. Spring Harbor Laboratory Press (1989. If the hybridization is carried out in a solution containing Alternatively, more than one probe (at least one of which is formamide, the melting temperature may be calculated using capable of specifically hybridizing to any complementary the equation: T81.5+16.6(log Na+I)+0.41 (fraction sequences which are present in the nucleic acid sample), may G+C)-(0.63% formamide)-(600/N) where N is the length of be used in an amplification reaction to determine whether the 55 the probe. sample contains an organism containing a nucleic acid Prehybridization may be carried out in 6xSSC, 5xDen sequence of the invention (e.g., an organism from which the hardt’s reagent, 0.5% SDS, 100 ug denatured fragmented nucleic acid was isolated). Typically, the probes comprise salmon sperm DNA or 6xSSC, 5xDenhardt's reagent, 0.5% oligonucleotides. In one aspect, the amplification reaction SDS, 100 ug denatured fragmented salmon sperm DNA, 50% may comprise a PCR reaction. PCR protocols are described in 60 formamide. The formulas for SSC and Denhardt’s solutions Ausubel and Sambrook, Supra. Alternatively, the amplifica are listed in Sambrook et al., Supra. tion may comprise a ligase chain reaction, 3SR, or strand Hybridization is conducted by adding the detectable probe displacement reaction. (See Barany, F., “The Ligase Chain to the prehybridization solutions listed above. Where the Reaction in a PCR World, PCR Methods and Applications probe comprises double stranded DNA, it is denatured before 1:5-16, 1991; E. Fahy et al., “Self-sustained Sequence Rep 65 addition to the hybridization solution. The filter is contacted lication (3SR): An Isothermal Transcription-based Amplifi with the hybridization solution for a sufficient period of time cation System Alternative to PCR. PCR Methods and Appli to allow the probe to hybridize to cDNAs or genomic DNAS US 8,119,385 B2 143 144 containing sequences complementary thereto or homologous therapies, e.g., as anti-microbials for, e.g., Salmonella, or to thereto. For probes over 200 nucleotides in length, the hybrid neutralize a biological warfare agent, e.g., anthrax. ization may be carried out at 15-25° C. below the T. For Antisense Oligonucleotides shorter probes, such as oligonucleotide probes, the hybrid The invention provides antisense oligonucleotides capable ization may be conducted at 5-10° C. below the T. In one of binding a polypeptide, enzyme, protein, e.g. structural or aspect, for hybridizations in 6xSSC, the hybridization is con binding protein, message which, in one aspect, can inhibit a ducted at approximately 68°C. Usually, for hybridizations in polypeptide, enzyme, protein, e.g. structural or binding pro 50% formamide containing solutions, the hybridization is tein, activity by targeting mRNA. Strategies for designing conducted at approximately 42°C. antisense oligonucleotides are well described in the scientific 10 and patent literature, and the skilled artisan can design Such a Inhibiting Expression of Polypeptides, Enzymes, Proteins polypeptide, enzyme, protein, e.g. structural or binding pro The invention provides nucleic acids complementary to tein, oligonucleotides using the novel reagents of the inven (e.g., antisense sequences to) the nucleic acids of the inven tion. For example, gene walking/RNA mapping protocols to tion, e.g., nucleic acids comprising antisense, iRNA, screen for effective antisense oligonucleotides are well ribozymes. Nucleic acids of the invention comprising anti 15 known in the art, see, e.g., Ho (2000) Methods Enzymol. sense sequences can be capable of inhibiting the transport, 314:168-183, describing an RNA mapping assay, which is splicing or transcription of polypeptide, enzyme, protein, e.g. based on standard molecular techniques to provide an easy structural or binding protein genes. The inhibition can be and reliable method for potent antisense sequence selection. effected through the targeting of genomic DNA or messenger See also Smith (2000) Eur. J. Pharm. Sci. 11:191-198. RNA. The transcription or function of targeted nucleic acid Naturally occurring nucleic acids are used as antisense can be inhibited, for example, by hybridization and/or cleav oligonucleotides. The antisense oligonucleotides can be of age. In one aspect, inhibitors of the invention include oligo any length; for example, in alternative aspects, the antisense nucleotides which are able to either bind a polypeptide, oligonucleotides are between about 5 to 100, about 10 to 80, enzyme, protein, e.g. structural or binding protein, gene or about 15 to 60, about 18 to 40. The optimal length can be message, in either case preventing or inhibiting the produc 25 determined by routine screening. The antisense oligonucle tion or function of a polypeptide, enzyme, protein, e.g. struc otides can be present at any concentration. The optimal con tural or binding protein. The association can be through centration can be determined by routine Screening. A wide sequence specific hybridization. Another useful class of variety of synthetic, non-naturally occurring nucleotide and inhibitors includes oligonucleotides which cause inactivation nucleic acid analogues are known which can address this or cleavage of a polypeptide, enzyme, protein, e.g. structural 30 potential problem. For example, peptide nucleic acids or binding protein, message. The oligonucleotide can have (PNAS) containing non-ionic backbones, such as N-(2-ami enzyme activity which causes such cleavage, such as noethyl)glycine units can be used. Antisense oligonucle ribozymes. The oligonucleotide can be chemically modified otides having phosphorothioate linkages can also be used, as or conjugated to an enzyme or composition capable of cleav described in WO 97/03211; WO 96/3915.4; Mata (1997) ing the complementary nucleic acid. A pool of many different 35 Toxicol Appl Pharmacol 144:189-197; Antisense Therapeu such oligonucleotides can be screened for those with the tics, ed. Agrawal (Humana Press, Totowa, N.J., 1996). Anti desired activity. Thus, the invention provides various compo sense oligonucleotides having synthetic DNA backbone ana sitions for the inhibition of a polypeptide, enzyme, protein, logues provided by the invention can also include phosphoro e.g. structural or binding protein, expression on a nucleic acid dithioate, methylphosphonate, phosphoramidate, alkyl and/or protein level, e.g., antisense, iRNA and ribozymes 40 phosphotriester, Sulfamate, 3'-thioacetal, methylene(meth comprising a polypeptide, enzyme, protein, e.g. structural or ylimino), 3'-N-carbamate, and morpholino carbamate nucleic binding protein, sequences of the invention and the anti acids, as described above. polypeptide, anti-enzyme, anti-protein, e.g. anti-structural or Combinatorial chemistry methodology can be used to cre anti-binding protein antibodies of the invention. ate vast numbers of oligonucleotides that can be rapidly Inhibition of a polypeptide, enzyme, protein, e.g. structural 45 screened for specific oligonucleotides that have appropriate or binding protein, expression can have a variety of industrial binding affinities and specificities toward any target. Such as applications. For example, inhibition of a polypeptide, the sense and antisense a polypeptide, enzyme, protein, e.g. enzyme, protein, e.g. structural orbinding protein, expression structural or binding protein, sequences of the invention (see, can slow or prevent spoilage. In one aspect, use of composi e.g., Gold (1995).J. of Biol. Chem. 270: 13581-13584). tions of the invention that inhibit the expression and/or activ 50 Inhibitory Ribozymes ity of a polypeptide, enzyme, protein, e.g. structural or bind The invention provides ribozymes capable of binding a ing protein, e.g., antibodies, antisense oligonucleotides, polypeptide, enzyme, protein, e.g. structural or binding pro ribozymes and RNAi are used to slow or prevent spoilage. tein, message. These ribozymes can inhibit a polypeptide, Thus, in one aspect, the invention provides methods and enzyme, protein, e.g. structural orbinding protein, activity by, compositions comprising application onto a plant or plant 55 e.g., targeting mRNA. Strategies for designing ribozymes and product (e.g., a cereal, a grain, a fruit, seed, root, leaf, etc.) selecting the polypeptide, enzyme, protein, e.g. structural or antibodies, antisense oligonucleotides, ribozymes and RNAi binding protein-specific antisense sequence for targeting are of the invention to slow or prevent spoilage. These composi well described in the scientific and patent literature, and the tions also can be expressed by the plant (e.g., a transgenic skilled artisan can design such ribozymes using the novel plant) or another organism (e.g., a bacterium or other micro 60 reagents of the invention. Ribozymes act by binding to a organism transformed with a polypeptide, enzyme, protein, target RNA through the target RNA binding portion of a e.g. structural or binding protein, gene of the invention). ribozyme which is held in close proximity to an enzymatic The compositions of the invention for the inhibition of a portion of the RNA that cleaves the target RNA. Thus, the polypeptide, enzyme, protein, e.g. structural or binding pro ribozyme recognizes and binds a target RNA through tein, expression, e.g., antisense, iRNA (e.g., siRNA, 65 complementary base-pairing, and once bound to the correct miRNA), ribozymes, antibodies, can be used as pharmaceu site, acts enzymatically to cleave and inactivate the target tical compositions, e.g., as anti-pathogen agents or in other RNA. Cleavage of a target RNA in such a manner will destroy US 8,119,385 B2 145 146 its ability to direct synthesis of an encoded protein if the tion are used in gene-silencing therapeutics, see, e.g., Shuey cleavage occurs in the coding sequence. After a ribozyme has (2002) Drug Discov. Today 7:1040-1046. In one aspect, the bound and cleaved its RNA target, it can be released from that invention provides methods to selectively degrade RNA using RNA to bind and cleave new targets repeatedly. the RNAi’s of the invention. The process may be practiced in In some circumstances, the enzymatic nature of a ribozyme vitro, ex vivo or in vivo. In one aspect, the RNAi molecules of can be advantageous over other technologies, such as anti the invention can be used to generate a loss-of-function muta sense technology (where a nucleic acid molecule simply tion in a cell, an organ oran animal. Methods for making and binds to a nucleic acid target to block its transcription, trans using RNAi molecules for selectively degrade RNA are well lation or association with another molecule) as the effective known in the art, see, e.g., U.S. Pat. Nos. 6,506.559; 6.511, concentration of ribozyme necessary to effect a therapeutic 10 treatment can be lower than that of an antisense oligonucle 824; 6,515,109; 6,489,127. otide. This potential advantage reflects the ability of the Modification of Nucleic Acids ribozyme to act enzymatically. Thus, a single ribozyme mol The invention provides methods of generating variants of ecule is able to cleave many molecules of target RNA. In the nucleic acids of the invention, e.g., those encoding a addition, a ribozyme is typically a highly specific inhibitor, 15 polypeptide, enzyme, protein, e.g. structural or binding pro with the specificity of inhibition depending not only on the tein. These methods can be repeated or used in various com base pairing mechanism of binding, but also on the mecha binations to generate a polypeptide, enzyme, protein, e.g. nism by which the molecule inhibits the expression of the structural or binding protein, having an altered or different RNA to which it binds. That is, the inhibition is caused by activity or an altered or different stability from that of a cleavage of the RNA target and so specificity is defined as the polypeptide, enzyme, protein, e.g. structural or binding pro ratio of the rate of cleavage of the targeted RNA over the rate tein, encoded by the template nucleic acid. These methods of cleavage of non-targeted RNA. This cleavage mechanism also can be repeated or used in various combinations, e.g., to is dependent upon factors additional to those involved in base generate variations in gene/message expression, message pairing. Thus, the specificity of action of a ribozyme can be translation or message stability. In another aspect, the genetic greater than that of antisense oligonucleotide binding the 25 composition of a cell is altered by, e.g., modification of a same RNA site. homologous gene ex vivo, followed by its reinsertion into the The ribozyme of the invention, e.g., an enzymatic cell. ribozyme RNA molecule, can be formed in a hammerhead A nucleic acid of the invention can be altered by any means. motif a hairpin motif, as a hepatitis delta virus motif, a group For example, random or stochastic methods, or, non-stochas I intron motif and/oran RNaseP-like RNA in association with 30 tic, or “directed evolution, methods, see, e.g., U.S. Pat. No. an RNA guide sequence. Examples of hammerhead motifs 6,361.974. Methods for random mutation of genes are well are described by, e.g., Rossi (1992) Aids Research and known in the art, see, e.g., U.S. Pat. No. 5,830,696. For Human Retroviruses 8:183; hairpin motifs by Hampel (1989) example, mutagens can be used to randomly mutate a gene. Biochemistry 28:4929, and Hampel (1990) Nuc. Acids Res. Mutagens include, e.g., ultraviolet light orgamma irradiation, 18:299; the hepatitis delta virus motif by Perrotta (1992) 35 or a chemical mutagen, e.g., mitomycin, nitrous acid, photo Biochemistry 31:16; the RNaseP motif by Guerrier-Takada activated psoralens, alone or in combination, to induce DNA (1983) Cell 35:849; and the group I intron by Cech U.S. Pat. breaks amenable to repair by recombination. Other chemical No. 4,987,071. The recitation of these specific motifs is not mutagens include, for example, Sodium bisulfite, nitrous acid, intended to be limiting. Those skilled in the art will recognize hydroxylamine, or formic acid. Other mutagens that a ribozyme of the invention, e.g., an enzymatic RNA 40 are analogues of nucleotide precursors, e.g., nitrosoguani molecule of this invention, can have a specific Substrate bind dine, 5-bromouracil, 2-aminopurine, or . These ing site complementary to one or more of the target gene RNA agents can be added to a PCR reaction in place of the nucle regions. A ribozyme of the invention can have a nucleotide otide precursor thereby mutating the sequence. Intercalating sequence within or Surrounding that Substrate binding site agents such as proflavine, acriflavine, quinacrine and the like which imparts an RNA cleaving activity to the molecule. 45 can also be used. RNA Interference (RNAi) Any technique in molecular biology can be used, e.g., In one aspect, the invention provides an RNA inhibitory random PCR mutagenesis, see, e.g., Rice (1992) Proc. Natl. molecule, a so-called “RNAi molecule, comprising a Acad. Sci. USA 89:5467-5471; or, combinatorial multiple polypeptide, enzyme, protein, e.g. structural or binding pro cassette mutagenesis, see, e.g., Crameri (1995) Biotech tein, sequence of the invention. The RNAi molecule com 50 niques 18:194-196. Alternatively, nucleic acids, e.g., genes, prises a double-stranded RNA (dsRNA) molecule. The RNAi can be reassembled after random, or “stochastic fragmen can inhibit expression of a polypeptide, enzyme, protein, e.g. tation, see, e.g., U.S. Pat. Nos. 6.291,242: 6.287,862; 6,287, structural or binding protein, gene. In one aspect, the RNAi is 861; 5,955,358; 5,830,721; 5,824,514; 5,811,238; 5,605,793. about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex In alternative aspects, modifications, additions or deletions nucleotides in length. While the invention is not limited by 55 are introduced by error-prone PCR, shuffling, oligonucle any particular mechanism of action, the RNAi can enter a cell otide-directed mutagenesis, assembly PCR, sexual PCR and cause the degradation of a single-stranded RNA (ssRNA) mutagenesis, in Vivo mutagenesis, cassette mutagenesis, of similar or identical sequences, including endogenous recursive ensemble mutagenesis, exponential ensemble mRNAs. When a cell is exposed to double-stranded RNA mutagenesis, site-specific mutagenesis, gene reassembly, (dsRNA), mRNA from the homologous gene is selectively 60 Gene Site Saturation Mutagenesis (GSSM), synthetic liga degraded by a process called RNA interference (RNAi). A tion reassembly (SLR), recombination, recursive sequence possible basic mechanism behind RNAi, e.g., siRNA for recombination, phosphothioate-modified DNA mutagenesis, inhibiting transcription and/or miRNA to inhibit translation, uracil-containing template mutagenesis, gapped duplex is the breaking of a double-stranded RNA (dsRNA) matching mutagenesis, point mismatch repair mutagenesis, repair-de a specific gene sequence into short pieces called short inter 65 ficient host strain mutagenesis, chemical mutagenesis, radio fering RNA, which trigger the degradation of mRNA that genic mutagenesis, deletion mutagenesis, restriction-selec matches its sequence. In one aspect, the RNAi’s of the inven tion mutagenesis, restriction-purification mutagenesis, US 8,119,385 B2 147 148 artificial gene synthesis, ensemble mutagenesis, chimeric M13-derived vectors: an efficient and general procedure for nucleic acid multimer creation, and/or a combination of these the production of point mutations in any DNA fragment and other methods. Nucleic Acids Res. 10:6487-6500; Zoller & Smith (1983) The following publications describe a variety of recursive “Oligonucleotide-directed mutagenesis of DNA fragments recombination procedures and/or methods which can be cloned into M13 vectors' Methods in Enzymol. 100:468 incorporated into the methods of the invention: Stemmer 500; and Zoller (1987) Oligonucleotide-directed mutagen (1999) “Molecular breeding of viruses for targeting and other esis: a simple method using two oligonucleotide primers and clinical properties’ Tumor Targeting 4:1-4; Ness (1999) a single-stranded DNA template' Methods in Enzymol. 154: Nature Biotechnology 17:893-896; Chang (1999)“Evolution 329-350); phosphorothioate-modified DNA mutagenesis of a cytokine using DNA family shuffling Nature Biotech 10 (Taylor (1985) “The use of phosphorothioate-modified DNA nology 17:793-797; Minshull (1999) “Protein evolution by in reactions to prepare nicked DNA'Nucl. molecular breeding Current Opinion in Chemical Biology Acids Res. 13:8749-8764; Taylor (1985) “The rapid genera 3:284-290; Christians (1999) “Directed evolution of thymi tion of oligonucleotide-directed mutations at high frequency dine kinase for AZT phosphorylation using DNA family shuf using phosphorothioate-modified DNA' Nucl. Acids Res. 13: fling Nature Biotechnology 17:259-264; Crameri (1998) 15 8765-8787 (1985): Nakamaye (1986) “Inhibition of restric “DNA shuffling of a family of genes from diverse species tion endonuclease Nici I cleavage by phosphorothioate groups accelerates directed evolution' Nature 391:288-291; Crameri and its application to oligonucleotide-directed mutagenesis' (1997) “Molecular evolution of an arsenate detoxification Nucl. Acids Res. 14:9679–9698; Sayers (1988) “Y-T Exonu pathway by DNA shuffling.” Nature Biotechnology 15:436 cleases in phosphorothioate-based oligonucleotide-directed 438; Zhang (1997) “Directed evolution of an effective fucosi mutagenesis' Nucl. Acids Res. 16:791-802; and Sayers et al. dase from a galactosidase by DNA shuffling and screening (1988) "Strand specific cleavage of phosphorothioate-con Proc. Natl. Acad. Sci. USA94:4504-4509: Patten et al. (1997) taining DNA by reaction with restriction endonucleases in the Applications of DNA Shuffling to Pharmaceuticals and Vac presence of ethidium bromide' Nucl. Acids Res. 16: 803 cines’ Current Opinion in Biotechnology 8:724-733; 814); mutagenesis using gapped duplex DNA (Kramer et al. Crameri et al. (1996) “Construction and evolution of anti 25 (1984) “The gapped duplex DNA approach to oligonucle body-phage libraries by DNA shuffling Nature Medicine otide-directed mutation construction' Nucl. Acids Res. 12: 2:100-103; Gates et al. (1996) “Affinity selective isolation of 9441-9456; Kramer & Fritz (1987) Methods in Enzymol. ligands from peptide libraries through display on a lac repres “Oligonucleotide-directed construction of mutations via sor headpiece dimer' Journal of Molecular Biology 255: gapped duplex DNA' 154:350-367; Kramer (1988) 373-386; Stemmer (1996) “Sexual PCR and Assembly PCR” 30 “Improved enzymatic in vitro reactions in the gapped duplex In: The Encyclopedia of Molecular Biology. VCHPublishers, DNA approach to oligonucleotide-directed construction of New York. pp. 447-457; Crameri and Stemmer (1995)“Com mutations” Nucl. Acids Res. 16: 7207; and Fritz (1988) “Oli binatorial multiple cassette mutagenesis creates all the per gonucleotide-directed construction of mutations: a gapped mutations of mutant and wildtype cassettes’ BioTechniques duplex DNA procedure without enzymatic reactions in vitro 18:194-195; Stemmer et al. (1995) “Single-step assembly of 35 Nucl. Acids Res. 16: 6987-6999). a gene and entire plasmid form large numbers of oligodeox Additional protocols that can be used to practice the inven yribonucleotides' Gene, 164:49-53; Stemmer (1995) “The tion include point mismatch repair (Kramer (1984) “Point Evolution of Molecular Computation” Science 270: 1510; Mismatch Repair Cell 38:879-887), mutagenesis using Stemmer (1995) “Searching Sequence Space” Bio/Technol repair-deficient host strains (Carter et al. (1985) “Improved ogy 13:549-553; Stemmer (1994) “Rapid evolution of a pro 40 oligonucleotide site-directed mutagenesis using M13 vec tein in vitro by DNA shuffling Nature 370:389-391; and tors' Nucl. Acids Res. 13: 4431-4443; and Carter (1987) Stemmer (1994) “DNA shuffling by random fragmentation “Improved oligonucleotide-directed mutagenesis using M13 and reassembly: In vitro recombination for molecular evolu vectors' Methods in Enzymol. 154: 382-403), deletion tion. Proc. Natl. Acad. Sci., USA 91:10747-10751. mutagenesis (Eghtedarzadeh (1986) “Use of oligonucle Mutational methods of generating diversity include, for 45 otides to generate large deletions' Nucl. Acids Res. 14: example, site-directed mutagenesis (Ling et al. (1997) 5115), restriction-selection and restriction-selection and Approaches to DNA mutagenesis: an overview” Anal Bio restriction-purification (Wells et al. (1986) “Importance of chem. 254(2): 157-178; Dale et al. (1996) “Oligonucleotide hydrogen-bond formation in stabilizing the transition state of directed random mutagenesis using the phosphorothioate subtilisin' Phil. Trans. R. Soc. Lond. A 317: 415-423), method Methods Mol. Biol. 57:369-374; Smith (1985) “In 50 mutagenesis by total gene synthesis (Nambiar et al. (1984) vitro mutagenesis' Ann. Rev. Genet. 19:423-462; Botstein & “Total synthesis and cloning of a gene coding for the ribonu Shortle (1985) “Strategies and applications of in vitro clease S protein’ Science 223: 1299-1301: Sakamar and mutagenesis' Science 229:1193-1201; Carter (1986) “Site Khorana (1988) “Total synthesis and expression of a gene for directed mutagenesis' Biochem. J. 237:1-7: and Kunkel the a-subunit of bovine rod outer segment guanine nucle (1987) “The efficiency of oligonucleotide directed mutagen 55 otide-binding protein ()” Nucl. Acids Res. 14: esis' in Nucleic Acids & Molecular Biology (Eckstein, F. and 6361-6372; Wells et al. (1985) “Cassette mutagenesis: an Lilley, D. M. J. eds. Springer Verlag, Berlin)); mutagenesis efficient method for generation of multiple mutations at using uracil containing templates (Kunkel (1985) “Rapid and defined sites' Gene34:315-323; and Grundstrometal. (1985) efficient site-specific mutagenesis without phenotypic selec “Oligonucleotide-directed mutagenesis by microscale shot tion” Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. 60 gun gene synthesis' Nucl. Acids Res. 13: 3305-3316), (1987) “Rapid and efficient site-specific mutagenesis without double-strand break repair (Mandecki (1986); Arnold (1993) phenotypic selection' Methods in Enzymol. 154, 367-382; “Protein engineering for unusual environments’ Current and Bass et al. (1988) “Mutant Trp repressors with new DNA Opinion in Biotechnology 4:450-455. “Oligonucleotide-di binding specificities’ Science 242:240-245); oligonucle rected double-strand break repair in plasmids of Escherichia otide-directed mutagenesis (Methods in Enzymol. 100: 468 65 coli: a method for site-specific mutagenesis' Proc. Natl. 500 (1983); Methods in Enzymol. 154; 329-350 (1987); Acad. Sci. USA, 83:7177-7181). Additional details on many Zoller (1982) “Oligonucleotide-directed mutagenesis using of the above methods can be found in Methods in Enzymol US 8,119,385 B2 149 150 ogy Volume 154, which also describes useful controls for STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES trouble-shooting problems with various mutagenesis meth HAVING DESIRED CHARACTERISTICS” by Selifonov et ods. al., filed Jan. 18, 2000, (PCT/US00/01202) and, e.g. “METH Protocols that can be used to practice the invention are ODS FOR MAKING CHARACTER STRINGS, POLY described, e.g., in U.S. Pat. No. 5,605,793 to Stemmer (Feb. NUCLEOTIDES & POLYPEPTIDES HAVING DESIRED 25, 1997), “Methods for In Vitro Recombination.” U.S. Pat. CHARACTERISTICS” by Selifonov et al., filed Jul.18, 2000 No. 5,811.238 to Stemmeretal. (Sep. 22, 1998)“Methods for (U.S. Ser. No. 09/618,579); “METHODS OF POPULATING Generating Polynucleotides having Desired Characteristics DATA STRUCTURES FOR USE IN EVOLUTIONARY by Iterative Selection and Recombination: U.S. Pat. No. SIMULATIONS''' by Selifonov and Stemmer, filed Jan. 18, 5.830,721 to Stemmer et al. (Nov. 3, 1998), “DNA Mutagen 10 2000 (PCT/US00/01138); and “SINGLE-STRANDED esis by Random Fragmentation and Reassembly: U.S. Pat. NUCLEICACID TEMPLATE-MEDIATED RECOMBINA No. 5,834,252 to Stemmer, et al. (Nov. 10, 1998) “End TION AND NUCLEIC ACID FRAGMENT ISOLATION Complementary Polymerase Reaction.” U.S. Pat. No. 5,837, by Affholter, filed Sep. 6, 2000 (U.S. Ser. No. 09/656,549): 458 to Minshull, et al. (Nov. 17, 1998), “Methods and Com and U.S. Pat. Nos. 6,177.263; 6,153,410. positions for Cellular and Metabolic Engineering:” WO 15 Non-stochastic, or “directed evolution, methods include, 95/22625, Stemmer and Crameri, “Mutagenesis by Random e.g., Saturation mutagenesis, such as Gene Site Saturation Fragmentation and Reassembly:” WO 96/33207 by Stemmer Mutagenesis (GSSM), synthetic ligation reassembly (SLR), and Lipschutz “End Complementary Polymerase Chain or a combination thereofare used to modify the nucleic acids Reaction:” WO97/20078 by Stemmer and Crameri “Meth of the invention to generate a polypeptide, enzyme, protein, ods for Generating Polynucleotides having Desired Charac e.g. structural or binding protein, with new or altered proper teristics by Iterative Selection and Recombination: WO ties (e.g., activity under highly acidic or alkaline conditions, 97/35966 by Minshull and Stemmer, “Methods and Compo high or low temperatures, and the like). Polypeptides encoded sitions for Cellular and Metabolic Engineering:” WO by the modified nucleic acids can be screened for an activity 99/41402 by Punnonen et al. “Targeting of Genetic Vaccine before testing for glucan hydrolysis or other activity. Any Vectors:” WO99/41383 by Punnonen et al. “Antigen Library 25 testing modality or protocol can be used, e.g., using a capil Immunization:” WO99/41369 by Punnonen et al. “Genetic lary array platform. See, e.g., U.S. Pat. Nos. 6,361.974; 6,280. Vaccine Vector Engineering.” WO99/41368 by Punnonen et 926; 5,939,250. al. “Optimization of Immunomodulatory Properties of Gene Site Saturation Mutagenesis, or, GSSM Genetic Vaccines;” EP 752008 by Stemmer and Crameri, The invention also provides methods for making enzyme “DNA Mutagenesis by Random Fragmentation and Reas 30 using Gene Site Saturation mutagenesis, or, GSSM, as sembly;” EP 0932670 by Stemmer “Evolving Cellular DNA described herein, and also in U.S. Pat. Nos. 6,171,820 and Uptake by Recursive Sequence Recombination:” WO 6,579,258. 99/23107 by Stemmer et al., “Modification of Virus Tropism In one aspect, codon primers containing a degenerate N.N. and Host Range by Viral Genome Shuffling:” WO99/21979 G/T sequence are used to introduce point mutations into a by Apt et al., “Human Papillomavirus Vectors: WO 35 polynucleotide, e.g., a polypeptide, enzyme, protein, e.g. 98/31837 by del Cardayre et al. “Evolution of Whole Cells structural or binding protein, or an antibody of the invention, and Organisms by Recursive Sequence Recombination: WO So as to generate a set of progeny polypeptides in which a full 98/27230 by Patten and Stemmer, “Methods and Composi range of singleamino acid Substitutions is represented at each tions for Polypeptide Engineering:” WO 98/27230 by Stem amino acid position, e.g., an amino acid residue in an enzyme mer et al., “Methods for Optimization of Gene Therapy by 40 active site or ligand binding site targeted to be modified. Recursive Sequence Shuffling and Selection.” WO 00/00632, These oligonucleotides can comprise a contiguous first “Methods for Generating Highly Diverse Libraries.” WO homologous sequence, a degenerate N.N.G/T sequence, and, 00/09679, “Methods for Obtaining in Vitro Recombined optionally, a second homologous sequence. The downstream Polynucleotide Sequence Banks and Resulting Sequences.” progeny translational products from the use of Such oligo WO 98/42832 by Arnold et al., “Recombination of Poly 45 nucleotides include all possible amino acid changes at each nucleotide Sequences. Using Random or Defined Primers.” amino acid site along the polypeptide, because the degen WO99/29902 by Arnold et al., “Method for Creating Poly eracy of the N.N.G/T sequence includes codons for all 20 nucleotide and Polypeptide Sequences.” WO 98/41653 by amino acids. In one aspect, one Such degenerate oligonucle Vind, “An in Vitro Method for Construction of a DNA otide (comprised of, e.g., one degenerate N.N.G/T cassette) is Library.” WO 98/41622 by Borchert et al., “Method for Con 50 used for Subjecting each original codon in a parental poly structing a Library Using DNA Shuffling,” and WO 98/.42727 nucleotide template to a full range of codon Substitutions. In by Pati and Zarling, “Sequence Alterations using Homolo another aspect, at least two degenerate cassettes are used— gous Recombination.” either in the same oligonucleotide or not, for Subjecting at Protocols that can be used to practice the invention (pro least two original codons in a parental polynucleotide tem viding details regarding various diversity generating meth 55 plate to a full range of codon Substitutions. For example, more ods) are described, e.g., in U.S. patent application Ser. No. than one N.N.G/T sequence can be contained in one oligo 09/.407,800, “SHUFFLING OF CODON ALTERED nucleotide to introduce amino acid mutations at more than GENES’ by Patten et al. filed Sep. 28, 1999; “EVOLUTION one site. This plurality of N.N.G/T sequences can be directly OF WHOLE CELLS AND ORGANISMS BY RECURSIVE contiguous, or separated by one or more additional nucleotide SEQUENCE RECOMBINATION” by del Cardayre et al., 60 sequence(s). In another aspect, oligonucleotides serviceable U.S. Pat. No. 6,379,964; “OLIGONUCLEOTIDE MEDI for introducing additions and deletions can be used either ATED NUCLEICACID RECOMBINATION” by Crameriet alone or in combination with the codons containing an N.N. al., U.S. Pat. Nos. 6,319,714; 6,368,861; 6,376,246; 6,423, G/T sequence, to introduce any combination or permutation 542; 6,426.224 and PCT/US00/01203: “USE OF CODON of amino acid additions, deletions, and/or Substitutions. VARIED OLIGONUCLEOTIDE SYNTHESIS FOR SYN 65 In one aspect, simultaneous mutagenesis of two or more THETIC SHUFFLING” by Welchet al., U.S. Pat. No. 6,436, contiguous amino acid positions is done using an oligonucle 675; “METHODS FOR MAKING CHARACTER otide that contains contiguous N.N.G/T triplets, i.e. a degen US 8,119,385 B2 151 152 erate (N.N.G/T)n sequence. In another aspect, degenerate were previously examined—6 single point mutations (i.e. 2 at cassettes having less degeneracy than the N.N.G/T sequence each of three positions) and no change at any position. are used. For example, it may be desirable in some instances In yet another aspect, site-saturation mutagenesis can be to use (e.g. in an oligonucleotide) a degenerate triplet used together with shuffling, chimerization, recombination sequence comprised of only one N, where said N can be in the and other mutagenizing processes, along with screening. This first second or third position of the triplet. Any other bases invention provides for the use of any mutagenizing process including any combinations and permutations thereof can be (es), including saturation mutagenesis, in an iterative manner. used in the remaining two positions of the triplet. Alterna In one exemplification, the iterative use of any mutagenizing tively, it may be desirable in Some instances to use (e.g. in an process(es) is used in combination with screening. oligo) a degenerate N.N.N triplet sequence. 10 The invention also provides for the use of proprietary In one aspect, use of degenerate triplets (e.g., N.N.G/T codon primers (containing a degenerate N.N.N sequence) to triplets) allows for systematic and easy generation of a full introduce point mutations into a polynucleotide, so as to range of possible natural amino acids (for a total of 20 amino generate a set of progeny polypeptides in which a full range of acids) into each and every amino acid position in a polypep single amino acid Substitutions is represented at each amino tide (in alternative aspects, the methods also include genera 15 acid position; e.g., with Gene Site Saturation Mutagenesis tion of less than all possible Substitutions per amino acid (GSSM). The oligos used are comprised contiguously of a residue, or codon, position). For example, for a 100 amino first homologous sequence, a degenerate N.N.N sequence and acid polypeptide, 2000 distinct species (i.e. 20 possible amino in one aspect but not necessarily a second homologous acids per positionX 100 amino acid positions) can be gener sequence. The downstream progeny translational products ated. Through the use of an oligonucleotide or set of oligo from the use of Such oligos include all possible amino acid nucleotides containing a degenerate N.N.G/T triplet, 32 indi changes at each amino acid site along the polypeptide, vidual sequences can code for all 20 possible natural amino because the degeneracy of the N.N.N sequence includes acids. Thus, in a reaction vessel in which a parental poly codons for all 20 amino acids. nucleotide sequence is Subjected to Saturation mutagenesis In one aspect, one such degenerate oligo (comprised of one using at least one Such oligonucleotide, there are generated 32 25 degenerate N.N.N cassette) is used for Subjecting each origi distinct progeny polynucleotides encoding 20 distinct nal codon in a parental polynucleotide template to a full range polypeptides. In contrast, the use of a non-degenerate oligo of codon Substitutions. In another aspect, at least two degen nucleotide in site-directed mutagenesis leads to only one erate N.N.N cassettes are used—either in the same oligo or progeny polypeptide product per reaction vessel. Nondegen not, for Subjecting at least two original codons in a parental erate oligonucleotides can optionally be used in combination 30 polynucleotide template to a full range of codon Substitutions. with degenerate primers disclosed; for example, nondegen Thus, more than one N.N.N sequence can be contained in one erate oligonucleotides can be used to generate specific point oligo to introduce amino acid mutations at more than one site. mutations in a working polynucleotide. This provides one This plurality of N.N.N sequences can be directly contiguous, means to generate specific silent point mutations, point muta or separated by one or more additional nucleotide sequence tions leading to corresponding amino acid changes, and point 35 (s). In another aspect, oligos serviceable for introducing addi mutations that cause the generation of stop codons and the tions and deletions can be used either alone or in combination corresponding expression of polypeptide fragments. with the codons containing an N.N.N sequence, to introduce In one aspect, each Saturation mutagenesis reaction vessel any combination or permutation of amino acid additions, contains polynucleotides encoding at least 20 progeny deletions and/or substitutions. polypeptide (e.g., a polypeptide, enzyme, protein, e.g. struc 40 In a particular exemplification, it is possible to simulta tural or binding protein) molecules such that all 20 natural neously mutagenize two or more contiguous amino acid posi amino acids are represented at the one specific amino acid tions using an oligo that contains contiguous N.N.N triplets, position corresponding to the codon position mutagenized in i.e. a degenerate (N.N.N), sequence. the parental polynucleotide (other aspects use less than all 20 In another aspect, the present invention provides for the use natural combinations). The 32-fold degenerate progeny 45 of degenerate cassettes having less degeneracy than the polypeptides generated from each saturation mutagenesis N.N.N sequence. For example, it may be desirable in some reaction vessel can be subjected to clonal amplification (e.g. instances to use (e.g. in an oligo) a degenerate triplet sequence cloned into a suitable host, e.g., E. coli host, using, e.g., an comprised of only one N, where the N can be in the first expression vector) and Subjected to expression screening. second or third position of the triplet. Any other bases includ When an individual progeny polypeptide is identified by 50 ing any combinations and permutations thereof can be used in screening to display a favorable change in property (when the remaining two positions of the triplet. Alternatively, it compared to the parental polypeptide. Such as increased glu may be desirable in Some instances to use (e.g., in an oligo) a can hydrolysis activity under alkaline or acidic conditions), it degenerate N.N.N triplet sequence, N.N.G/T, oran N.N. G/C can be sequenced to identify the correspondingly favorable triplet sequence. amino acid Substitution contained therein. 55 It is appreciated, however, that the use of a degenerate In one aspect, upon mutagenizing each and every amino triplet (such as N.N.G/T or an N.N. G/C triplet sequence) as acid position in a parental polypeptide using Saturation disclosed in the instant invention is advantageous for several mutagenesis as disclosed herein, favorable amino acid reasons. In one aspect, this invention provides a means to changes may be identified at more than one amino acid posi systematically and fairly easily generate the Substitution of tion. One or more new progeny molecules can be generated 60 the full range of possible amino acids (for a total of 20 amino that contain a combination of all or part of these favorable acids) into each and every amino acid position in a polypep amino acid substitutions. For example, if 2 specific favorable tide. Thus, for a 100 amino acid polypeptide, the invention amino acid changes are identified in each of 3 amino acid provides a way to systematically and fairly easily generate positions in a polypeptide, the permutations include 3 possi 2000 distinct species (i.e., 20 possible amino acids per posi bilities at each position (no change from the original amino 65 tion times 100 amino acid positions). It is appreciated that acid, and each of two favorable changes) and 3 positions. there is provided, through the use of an oligo containing a Thus, there are 3x3x3 or 27 total possibilities, including 7that degenerate N.N.G/T or an N.N. G/C triplet sequence, 32 US 8,119,385 B2 153 154 individual sequences that code for 20 possible amino acids. A/G, A/T, C/G, C/T, G/T, C/G/T, A/G/T, A/C/T, A/C/G, or E, Thus, in a reaction vessel in which a parental polynucleotide where E is any base that is not A, C, G, or T (E can be referred sequence is subjected to Saturation mutagenesis using one to as a designer oligo). Such oligo, there are generated 32 distinct progeny polynucle In a general sense, Saturation mutagenesis is comprised of otides encoding 20 distinct polypeptides. In contrast, the use mutagenizing a complete set of mutagenic cassettes (wherein of a non-degenerate oligo in site-directed mutagenesis leads each cassette is in one aspect about 1-500 bases in length) in to only one progeny polypeptide product per reaction vessel. defined polynucleotide sequence to be mutagenized (wherein This invention also provides for the use of nondegenerate the sequence to be mutagenized is in one aspect from about 15 oligos, which can optionally be used in combination with to 100,000 bases in length). Thus, a group of mutations (rang degenerate primers disclosed. It is appreciated that in some 10 ing from 1 to 100 mutations) is introduced into each cassette situations, it is advantageous to use nondegenerate oligos to to be mutagenized. A grouping of mutations to be introduced generate specific point mutations in a working polynucle into one cassette can be different or the same from a second otide. This provides a means to generate specific silent point grouping of mutations to be introduced into a second cassette mutations, point mutations leading to corresponding amino during the application of one round of Saturation mutagen acid changes and point mutations that cause the generation of 15 esis. Such groupings are exemplified by deletions, additions, stop codons and the corresponding expression of polypeptide groupings of particular codons and groupings of particular fragments. nucleotide cassettes. Thus, in one aspect of this invention, each Saturation Defined sequences to be mutagenized include a whole mutagenesis reaction vessel contains polynucleotides encod gene, pathway, cDNA, an entire open reading frame (ORF) ing at least 20 progeny polypeptide molecules Such that all 20 and entire promoter, enhancer, repressor/transactivator, ori amino acids are represented at the one specific amino acid gin of replication, intron, operator, or any polynucleotide position corresponding to the codon position mutagenized in . Generally, a "defined sequences for this the parental polynucleotide. The 32-fold degenerate progeny purpose may be any polynucleotide that a 15 base-polynucle polypeptides generated from each saturation mutagenesis otide sequence and polynucleotide sequences of lengths reaction vessel can be subjected to clonal amplification (e.g., 25 between 15 bases and 15,000 bases (this invention specifi cloned into a suitable E. coli host using an expression vector) cally names every integer in between). Considerations in and Subjected to expression screening. When an individual choosing groupings of codons include types of amino acids progeny polypeptide is identified by Screening to display a encoded by a degenerate mutagenic cassette. favorable change in property (when compared to the parental In one exemplification a grouping of mutations that can be polypeptide), it can be sequenced to identify the correspond 30 introduced into a mutagenic cassette, this invention specifi ingly favorable amino acid Substitution contained therein. cally provides for degenerate codon Substitutions (using It is appreciated that upon mutagenizing each and every degenerate oligos) that code for 2, 3,4,5,6,7,8,9, 10, 11, 12, amino acid position in a parental polypeptide using Saturation 13, 14, 15, 16, 17, 18, 19 and 20 amino acids at each position mutagenesis as disclosed herein, favorable amino acid and a library of polypeptides encoded thereby. changes may be identified at more than one amino acid posi 35 Synthetic Ligation Reassembly (SLR) tion. One or more new progeny molecules can be generated The invention provides a non-stochastic gene modification that contain a combination of all or part of these favorable system termed 'synthetic ligation reassembly, or simply amino acid substitutions. For example, if 2 specific favorable “SLR.” a “directed evolution process.” to generate polypep amino acid changes are identified in each of 3 amino acid tides, e.g., a polypeptide, enzyme, protein, e.g. structural or positions in a polypeptide, the permutations include 3 possi 40 binding protein, or antibodies of the invention, with new or bilities at each position (no change from the original amino altered properties. SLR is a method of ligating oligonucle acid and each of two favorable changes) and 3 positions. otide fragments together non-stochastically. This method dif Thus, there are 3x3x3 or 27 total possibilities, including 7that fers from stochastic oligonucleotide shuffling in that the were previously examined—6 single point mutations (i.e., 2 nucleic acid building blocks are not shuffled, concatenated or at each of three positions) and no change at any position. 45 chimerized randomly, but rather are assembled non-stochas Thus, in a non-limiting exemplification, this invention pro tically. See, e.g., U.S. Pat. Nos. 6,773,900; 6,740,506: 6,713, vides for the use of Saturation mutagenesis in combination 282; 6,635,449; 6,605,449; 6,537,776. with additional mutagenization processes, such as process In one aspect, SLR comprises the following steps: (a) pro where two or more related polynucleotides are introduced viding a template polynucleotide, wherein the template poly into a suitable host cell such that a hybrid polynucleotide is 50 nucleotide comprises sequence encoding a homologous generated by recombination and reductive reassortment. gene; (b) providing a plurality of building block polynucle In addition to performing mutagenesis along the entire otides, wherein the building block polynucleotides are sequence of a gene, the instant invention provides that designed to cross-over reassemble with the template poly mutagenesis can be use to replace each of any number of nucleotide at a predetermined sequence, and a building block bases in a polynucleotide sequence, wherein the number of 55 polynucleotide comprises a sequence that is a variant of the bases to be mutagenized is in one aspect every integer from 15 homologous gene and a sequence homologous to the template to 100,000. Thus, instead of mutagenizing every position polynucleotide flanking the variant sequence; (c) combining along a molecule, one can Subject every or a discrete number a building block polynucleotide with a template polynucle of bases (in one aspect a subset totaling from 15 to 100,000) otide such that the building block polynucleotide cross-over to mutagenesis. In one aspect, a separate nucleotide is used 60 reassembles with the template polynucleotide to generate for mutagenizing each position or group of positions along a polynucleotides comprising homologous gene sequence polynucleotide sequence. A group of 3 positions to be variations. mutagenized may be a codon. The mutations can be intro SLR does not depend on the presence of high levels of duced using a mutagenic primer, containing a heterologous homology between polynucleotides to be rearranged. Thus, cassette, also referred to as a mutagenic cassette. Exemplary 65 this method can be used to non-stochastically generate librar cassettes can have from 1 to 500 bases. Each nucleotide ies (or sets) of progeny molecules comprised of over 10' position in Such heterologous cassettes be N.A, C, G, T, A/C, different chimeras. SLR can be used to generate libraries US 8,119,385 B2 155 156 comprised of over 10" different progeny chimeras. Thus, In another aspect, the ligation reassembly method is per aspects of the present invention include non-stochastic meth formed systematically. For example, the method is performed ods of producing a set of finalized chimeric nucleic acid in order to generate a systematically compartmentalized molecule shaving an overall assembly order that is chosen by library of progeny molecules, with compartments that can be design. This method includes the steps of generating by screened systematically, e.g. one by one. In other words this design a plurality of specific nucleic acid building blocks invention provides that, through the selective and judicious having serviceable mutually compatible ligatable ends, and use of specific nucleic acid building blocks, coupled with the assembling these nucleic acid building blocks, such that a selective and judicious use of sequentially stepped assembly designed overall assembly order is achieved. reactions, a design can be achieved where specific sets of 10 progeny products are made in each of several reaction vessels. The mutually compatible ligatable ends of the nucleic acid This allows a systematic examination and screening proce building blocks to be assembled are considered to be “ser dure to be performed. Thus, these methods allow a potentially viceable' for this type of ordered assembly if they enable the very large number of progeny molecules to be examined building blocks to be coupled in predetermined orders. Thus, systematically in Smaller groups. Because of its ability to the overall assembly order in which the nucleic acid building 15 perform chimerizations in a manner that is highly flexible yet blocks can be coupled is specified by the design of the ligat exhaustive and systematic as well, particularly when there is able ends. If more than one assembly step is to be used, then a low level of homology among the progenitor molecules, the overall assembly order in which the nucleic acid building these methods provide for the generation of a library (or set) blocks can be coupled is also specified by the sequential order comprised of a large number of progeny molecules. Because of the assembly step(s). In one aspect, the annealed building of the non-stochastic nature of the instant ligation reassembly pieces are treated with an enzyme. Such as a ligase (e.g. T4 invention, the progeny molecules generated in one aspect DNA ligase), to achieve covalent bonding of the building comprise a library of finalized chimeric nucleic acid mol pieces. ecules having an overall assembly order that is chosen by In one aspect, the design of the oligonucleotide building design. The saturation mutagenesis and optimized directed blocks is obtained by analyzing a set of progenitor nucleic 25 evolution methods also can be used to generate different acid sequence templates that serve as a basis for producing a progeny molecular species. It is appreciated that the invention progeny set of finalized chimeric polynucleotides. These provides freedom of choice and control regarding the selec parental oligonucleotide templates thus serve as a source of tion of demarcation points, the size and number of the nucleic sequence information that aids in the design of the nucleic acid building blocks, and the size and design of the couplings. acid building blocks that are to be mutagenized, e.g., chimer 30 It is appreciated, furthermore, that the requirement for inter ized or shuffled. In one aspect of this method, the sequences molecular homology is highly relaxed for the operability of of a plurality of parental nucleic acid templates are aligned in this invention. In fact, demarcation points can even be chosen order to select one or more demarcation points. The demar in areas of little or no intermolecular homology. For example, cation points can be located at an area of homology, and are because of codon wobble, i.e. the degeneracy of codons, comprised of one or more nucleotides. These demarcation 35 nucleotide Substitutions can be introduced into nucleic acid points are in one aspect shared by at least two of the progeni building blocks without altering the amino acid originally tor templates. The demarcation points can thereby be used to encoded in the corresponding progenitor template. Alterna delineate the boundaries of oligonucleotide building blocks tively, a codon can be altered Such that the coding for an to be generated in order to rearrange the parental polynucle originally amino acid is altered. This invention provides that otides. The demarcation points identified and selected in the 40 Such substitutions can be introduced into the nucleic acid progenitor molecules serve as potential chimerization points building block in order to increase the incidence of intermo in the assembly of the final chimeric progeny molecules. A lecular homologous demarcation points and thus to allow an demarcation point can be an area of homology (comprised of increased number of couplings to be achieved among the at least one homologous nucleotide base) shared by at least building blocks, which in turn allows a greater number of two parental polynucleotide sequences. Alternatively, a 45 progeny chimeric molecules to be generated. demarcation point can be an area of homology that is shared In one aspect, the present invention provides a non-sto by at least half of the parental polynucleotide sequences, or, it chastic method termed synthetic gene reassembly, that is can be an area of homology that is shared by at least two thirds Somewhat related to stochastic shuffling, save that the nucleic of the parental polynucleotide sequences. Even more in one acid building blocks are not shuffled or concatenated or chi aspect a serviceable demarcation points is an area of homol 50 merized randomly, but rather are assembled non-stochasti ogy that is shared by at least three fourths of the parental cally. polynucleotide sequences, or, it can be shared by at almost all The synthetic gene reassembly method does not depend on of the parental polynucleotide sequences. In one aspect, a the presence of a high level of homology between polynucle demarcation point is an area of homology that is shared by all otides to be shuffled. The invention can be used to non of the parental polynucleotide sequences. 55 stochastically generate libraries (or sets) of progeny mol In one aspect, a ligation reassembly process is performed ecules comprised of over 10' different chimeras. exhaustively in order to generate an exhaustive library of Conceivably, synthetic gene reassembly can even be used to progeny chimeric polynucleotides. In other words, all pos generate libraries comprised of over 10" different progeny sible ordered combinations of the nucleic acid building chimeras. blocks are represented in the set of finalized chimeric nucleic 60 Thus, in one aspect, the invention provides a non-stochas acid molecules. At the same time, in another aspect, the tic method of producing a set of finalized chimeric nucleic assembly order (i.e. the order of assembly of each building acid molecules having an overall assembly order that is cho block in the 5' to 3 sequence of each finalized chimeric nucleic sen by design, which method is comprised of the steps of acid) in each combination is by design (or non-stochastic) as generating by design a plurality of specific nucleic acid build described above. Because of the non-stochastic nature of this 65 ing blocks having serviceable mutually compatible ligatable invention, the possibility of unwanted side products is greatly ends and assembling these nucleic acid building blocks. Such reduced. that a designed overall assembly order is achieved. US 8,119,385 B2 157 158 The mutually compatible ligatable ends of the nucleic acid ucts are made in each of several reaction vessels. This allows building blocks to be assembled are considered to be “ser a systematic examination and screening procedure to be per viceable' for this type of ordered assembly if they enable the formed. Thus, it allows a potentially very large number of building blocks to be coupled in predetermined orders. Thus, progeny molecules to be examined systematically in Smaller in one aspect, the overall assembly order in which the nucleic groups. acid building blocks can be coupled is specified by the design Because of its ability to perform chimerizations in a man of the ligatable ends and, if more than one assembly step is to ner that is highly flexible yet exhaustive and systematic as be used, then the overall assembly order in which the nucleic well, particularly when there is a low level of homology acid building blocks can be coupled is also specified by the among the progenitor molecules, the instant invention pro sequential order of the assembly step(s). In a one aspect of the 10 vides for the generation of a library (or set) comprised of a invention, the annealed building pieces are treated with an large number of progeny molecules. Because of the non enzyme. Such as a ligase (e.g., T4 DNA ligase) to achieve stochastic nature of the instant gene reassembly invention, the covalent bonding of the building pieces. progeny molecules generated in one aspect comprise a library In a another aspect, the design of nucleic acid building of finalized chimeric nucleic acid molecules having an overall blocks is obtained upon analysis of the sequences of a set of 15 assembly order that is chosen by design. In a particularly progenitor nucleic acid templates that serve as a basis for aspect, such a generated library is comprised of greater than producing a progeny set of finalized chimeric nucleic acid 10 to greater than 10" different progeny molecular spe molecules. These progenitor nucleic acid templates thus cies. serve as a source of sequence information that aids in the In one aspect, a set of finalized chimeric nucleic acid mol design of the nucleic acid building blocks that are to be ecules, produced as described is comprised of a polynucle mutagenized, i.e. chimerized or shuffled. otide encoding a polypeptide. According to one aspect, this In one exemplification, the invention provides for the chi polynucleotide is a gene, which may be a man-made gene. merization of a family of related genes and their encoded According to another aspect, this polynucleotide is a gene family of related products. In a particular exemplification, the pathway, which may be a man-made gene pathway. The encoded products are enzymes. The polypeptide, enzyme, 25 invention provides that one or more man-made genes gener protein, e.g. structural or binding proteins of the present ated by the invention may be incorporated into a man-made invention can be mutagenized in accordance with the meth gene pathway, Such as pathway operable in a eukaryotic ods described herein. organism (including a plant). Thus according to one aspect of the invention, the In another exemplification, the synthetic nature of the step sequences of a plurality of progenitor nucleic acid templates 30 in which the building blocks are generated allows the design (e.g., polynucleotides of the invention) are aligned in order to and introduction of nucleotides (e.g., one or more nucle select one or more demarcation points, which demarcation otides, which may be, for example, codons or introns or points can be located at an area of homology. The demarca regulatory sequences) that can later be optionally removed in tion points can be used to delineate the boundaries of nucleic an in vitro process (e.g., by mutagenesis) or in an in vivo acid building blocks to be generated. Thus, the demarcation 35 process (e.g., by utilizing the gene splicing ability of a host points identified and selected in the progenitor molecules organism). It is appreciated that in many instances the intro serve as potential chimerization points in the assembly of the duction of these nucleotides may also be desirable for many progeny molecules. other reasons in addition to the potential benefit of creating a Typically a serviceable demarcation point is an area of serviceable demarcation point. homology (comprised of at least one homologous nucleotide 40 Thus, according to another aspect, the invention provides base) shared by at least two progenitor templates, but the that a nucleic acid building block can be used to introduce an demarcation point can be an area of homology that is shared intron. Thus, the invention provides that functional introns by at least half of the progenitor templates, at least two thirds may be introduced into a man-made gene of the invention. of the progenitor templates, at least three fourths of the pro The invention also provides that functional introns may be genitor templates and in one aspect at almost all of the pro 45 introduced into a man-made gene pathway of the invention. genitor templates. Even more in one aspect still a serviceable Accordingly, the invention provides for the generation of a demarcation point is an area of homology that is shared by all chimeric polynucleotide that is a man-made gene containing of the progenitor templates. one (or more) artificially introduced intron(s). In a one aspect, the gene reassembly process is performed Accordingly, the invention also provides for the generation exhaustively in order to generate an exhaustive library. In 50 of a chimeric polynucleotide that is a man-made gene path other words, all possible ordered combinations of the nucleic way containing one (or more) artificially introduced intron(s). acid building blocks are represented in the set of finalized In one aspect, the artificially introduced intron(s) are func chimeric nucleic acid molecules. At the same time, the assem tional in one or more host cells for gene splicing much in the bly order (i.e. the order of assembly of each building block in way that naturally-occurring introns serve functionally in the 5' to 3 sequence of each finalized chimeric nucleic acid) in 55 gene splicing. The invention provides a process of producing each combination is by design (or non-stochastic). Because of man-made intron-containing polynucleotides to be intro the non-stochastic nature of the method, the possibility of duced into host organisms for recombination and/or splicing. unwanted side products is greatly reduced. A man-made gene produced using the invention can also In another aspect, the method provides that the gene reas serve as a substrate for recombination with another nucleic sembly process is performed systematically, for example to 60 acid. Likewise, a man-made gene pathway produced using generate a systematically compartmentalized library, with the invention can also serve as a Substrate for recombination compartments that can be screened systematically, e.g., one with another nucleic acid. In one aspect, the recombination is by one. In other words the invention provides that, through the facilitated by, or occurs at, areas of homology between the selective and judicious use of specific nucleic acid building man-made, intron-containing gene and a nucleic acid, which blocks, coupled with the selective and judicious use of 65 serves as a recombination partner. In one aspect, the recom sequentially stepped assembly reactions, an experimental bination partner may also be a nucleic acid generated by the design can be achieved where specific sets of progeny prod invention, including a man-made gene or a man-made gene US 8,119,385 B2 159 160 pathway. Recombination may be facilitated by or may occur approach may also be useful in the study of repetitive DNA at areas of homology that exist at the one (or more) artificially sequences. Finally, this approach may be useful to mutate introduced intron(s) in the man-made gene. ribozymes or aptamers. The synthetic gene reassembly method of the invention In one aspect the invention described herein is directed to utilizes a plurality of nucleic acid building blocks, each of 5 the use of repeated cycles of reductive reassortment, recom which in one aspect has two ligatable ends. The two ligatable bination and selection which allow for the directed molecular ends on each nucleic acid building block may be two blunt evolution of highly complex linear sequences, such as DNA, ends (i.e. each having an overhang of Zero nucleotides), or in RNA or proteins thorough recombination. one aspect one blunt end and one overhang, or more in one Optimized Directed Evolution System 10 The invention provides a non-stochastic gene modification aspect still two overhangs. system termed “optimized directed evolution system' to gen A useful overhang for this purpose may be a 3' overhang or erate polypeptides, e.g., a polypeptide, enzyme, protein, e.g. a 5' overhang. Thus, a nucleic acid building block may have a structural or binding protein, or antibodies of the invention, 3' overhang or alternatively a 5' overhang or alternatively two with new or altered properties. Optimized directed evolution 3' overhangs or alternatively two 5" overhangs. The overall 15 is directed to the use of repeated cycles of reductive reassort order in which the nucleic acid building blocks are assembled ment, recombination and selection that allow for the directed to form a finalized chimeric nucleic acid molecule is deter molecular evolution of nucleic acids through recombination. mined by purposeful experimental design and is not random. Optimized directed evolution allows generation of a large In one aspect, a nucleic acid building block is generated by population of evolved chimeric sequences, wherein the gen chemical synthesis of two single-stranded nucleic acids (also erated population is significantly enriched for sequences that referred to as single-stranded oligos) and contacting them so have a predetermined number of crossover events. as to allow them to anneal to form a double-stranded nucleic A crossover event is a point in a chimeric sequence where acid building block. a shift in sequence occurs from one parental variant to another A double-stranded nucleic acid building block can be of parental variant. Such a point is normally at the juncture of variable size. The sizes of these building blocks can be small 25 where oligonucleotides from two parents are ligated together or large. Exemplary sizes for building block range from 1 to form a single sequence. This method allows calculation of base pair (not including any overhangs) to 100,000 base pairs the correct concentrations of oligonucleotide sequences so (not including any overhangs). Other exemplary size ranges that the final chimeric population of sequences is enriched for are also provided, which have lower limits of from 1 bp to the chosen number of crossover events. This provides more 10,000 bp (including every integer value in between) and 30 control over choosing chimeric variants having a predeter upper limits of from 2 bp to 100,000 bp (including every mined number of crossover events. integer value in between). In addition, this method provides a convenient means for Many methods exist by which a double-stranded nucleic exploring a tremendous amount of the possible protein vari acid building block can be generated that is serviceable for the ant space in comparison to other systems. Previously, if one invention; and these are known in the art and can be readily 35 generated, for example, 10' chimeric molecules during a performed by the skilled artisan. reaction, it would be extremely difficult to test such a high According to one aspect, a double-stranded nucleic acid number of chimeric variants for a particular activity. More building block is generated by first generating two single over, a significant portion of the progeny population would Stranded nucleic acids and allowing them to anneal to form a have a very high number of crossover events which resulted in double-stranded nucleic acid building block. The two strands 40 proteins that were less likely to have increased levels of a of a double-stranded nucleic acid building block may be particular activity. By using these methods, the population of complementary at every nucleotide apart from any that form chimerics molecules can be enriched for those variants that an overhang; thus containing no mismatches, apart from any have a particular number of crossover events. Thus, although overhang(s). According to another aspect, the two strands of one can still generate 10" chimeric molecules during a reac a double-stranded nucleic acid building block are comple 45 tion, each of the molecules chosen for further analysis most mentary at fewer than every nucleotide apart from any that likely has, for example, only three crossover events. Because form an overhang. Thus, according to this aspect, a double the resulting progeny population can be skewed to have a Stranded nucleic acid building block can be used to introduce predetermined number of crossover events, the boundaries on codon degeneracy. In one aspect the codon degeneracy is the functional variety between the chimeric molecules is introduced using the site-saturation mutagenesis described 50 reduced. This provides a more manageable number of vari herein, using one or more N.N.G/T cassettes or alternatively ables when calculating which oligonucleotide from the origi using one or more N.N.N cassettes. nal parental polynucleotides might be responsible for affect The in vivo recombination method of the invention can be ing a particular trait. performed blindly on a pool of unknown hybrids or alleles of One method for creating a chimeric progeny polynucle a specific polynucleotide or sequence. However, it is not 55 otide sequence is to create oligonucleotides corresponding to necessary to know the actual DNA or RNA sequence of the fragments or portions of each parental sequence. Each oligo specific polynucleotide. nucleotide in one aspect includes a unique region of overlap The approach of using recombination within a mixed so that mixing the oligonucleotides together results in a new population of genes can be useful for the generation of any variant that has each oligonucleotide fragment assembled in useful proteins, for example, interleukin I, antibodies, tpA 60 the correct order. Alternatively protocols for practicing these and growth hormone. This approach may be used to generate methods of the invention can be found in U.S. Pat. Nos. proteins having altered specificity or activity. The approach 6,773,900; 6,740,506; 6,713,282; 6,635,449; 6,605,449; may also be useful for the generation of hybrid nucleic acid 6,537,776; 6,361974. sequences, for example, promoter regions, introns, exons, The number of oligonucleotides generated for each paren enhancer sequences, 31 untranslated regions or 51 untrans 65 tal variant bears a relationship to the total number of resulting lated regions of genes. Thus this approach may be used to crossovers in the chimeric molecule that is ultimately created. generate genes having increased rates of expression. This For example, three parental nucleotide sequence variants US 8,119,385 B2 161 162 might be provided to undergo a ligation reaction in order to ing which oligonucleotide from the original parental poly find a chimeric variant having, for example, greateractivity at nucleotides might be responsible for affecting a particular high temperature. As one example, a set of 50 oligonucleotide trait. sequences can be generated corresponding to each portions of In one aspect, the method creates a chimeric progeny poly each parental variant. Accordingly, during the ligation reas nucleotide sequence by creating oligonucleotides corre sembly process there could be up to 50 crossover events sponding to fragments or portions of each parental sequence. within each of the chimeric sequences. The probability that Each oligonucleotide in one aspect includes a unique region each of the generated chimeric polynucleotides will contain of overlap so that mixing the oligonucleotides together results oligonucleotides from each parental variant in alternating in a new variant that has each oligonucleotide fragment 10 assembled in the correct order. See also U.S. Ser. No. 09/332, order is very low. If each oligonucleotide fragment is present 835. in the ligation reaction in the same molar quantity it is likely Determining Crossover Events that in some positions oligonucleotides from the same paren Aspects of the invention include a system and Software that tal polynucleotide will ligate next to one another and thus not receive a desired crossover probability density function result in a crossover event. If the concentration of each oli 15 (PDF), the number of parent genes to be reassembled, and the gonucleotide from each parent is kept constant during any number of fragments in the reassembly as inputs. The output ligation step in this example, there is a /3 chance (assuming 3 of this program is a “fragment PDF that can be used to parents) that an oligonucleotide from the same parental Vari determine a recipe for producing reassembled genes, and the ant will ligate within the chimeric sequence and produce no estimated crossover PDF of those genes. The processing COSSOV. described herein is in one aspect performed in MATLABTM Accordingly, a probability density function (PDF) can be (The Mathworks, Natick, Mass.) a programming language determined to predict the population of crossover events that and development environment for technical computing. are likely to occur during each step in a ligation reaction given Iterative Processes a set number of parental variants, a number of oligonucle In practicing the invention, these processes can be itera otides corresponding to each variant, and the concentrations 25 tively repeated. For example, a nucleic acid (or, the nucleic of each variant during each step in the ligation reaction. The acid) responsible for an altered or new a polypeptide, enzyme, statistics and mathematics behind determining the PDF is protein, e.g. structural or binding protein, phenotype is iden described below. By utilizing these methods, one can calcu tified, re-isolated, again modified, re-tested for activity. This late such a probability density function, and thus enrich the process can be iteratively repeated until a desired phenotype chimeric progeny population for a predetermined number of 30 is engineered. For example, an entire biochemical anabolic or crossover events resulting from a particular ligation reaction. catabolic pathway can be engineered into a cell, including, Moreover, a target number of crossover events can be prede e.g., a polypeptide, enzyme, protein, e.g. structural or binding termined, and the system then programmed to calculate the protein, activity. starting quantities of each parental oligonucleotide during Similarly, if it is determined that a particular oligonucle each step in the ligation reaction to result in a probability 35 otide has no affect at all on the desired trait (e.g., a new a density function that centers on the predetermined number of polypeptide, enzyme, protein, e.g. structural or binding pro crossover events. These methods are directed to the use of tein, phenotype), it can be removed as a variable by synthe repeated cycles of reductive reassortment, recombination and sizing larger parental oligonucleotides that include the selection that allow for the directed molecular evolution of a sequence to be removed. Since incorporating the sequence nucleic acid encoding a polypeptide through recombination. 40 within a larger sequence prevents any crossover events, there This system allows generation of a large population of will no longer be any variation of this sequence in the progeny evolved chimeric sequences, wherein the generated popula polynucleotides. This iterative practice of determining which tion is significantly enriched for sequences that have a prede oligonucleotides are most related to the desired trait, and termined number of crossover events. A crossover event is a which are unrelated, allows more efficient exploration all of point in a chimeric sequence where a shift in sequence occurs 45 the possible protein variants that might be provide a particular from one parental variant to another parental variant. Such a trait or activity. point is normally at the juncture of where oligonucleotides In Vivo Shuffling from two parents are ligated together to form a single In vivo shuffling of molecules is use in methods of the sequence. The method allows calculation of the correct con invention that provide variants of polypeptides of the inven centrations of oligonucleotide sequences so that the final 50 tion, e.g., antibodies, a polypeptide, enzyme, protein, e.g. chimeric population of sequences is enriched for the chosen structural or binding protein, and the like. In vivo shuffling number of crossover events. This provides more control over can be performed utilizing the natural property of cells to choosing chimeric variants having a predetermined number recombine multimers. While recombination in vivo has pro of crossover events. vided the major natural route to molecular diversity, genetic In addition, these methods provide a convenient means for 55 recombination remains a relatively complex process that exploring a tremendous amount of the possible protein vari involves 1) the recognition of homologies; 2) Strand cleavage, ant space in comparison to other systems. By using the meth Strand invasion, and metabolic steps leading to the production ods described herein, the population of chimerics molecules of recombinant chiasma; and finally 3) the resolution of chi can be enriched for those variants that have a particular num asma into discrete recombined molecules. The formation of ber of crossover events. Thus, although one can still generate 60 the chiasma requires the recognition of homologous 1013 chimeric molecules during a reaction, each of the mol Sequences. ecules chosen for further analysis most likely has, for In another aspect, the invention includes a method for example, only three crossover events. Because the resulting producing a hybrid polynucleotide from at, least a first poly progeny population can be skewed to have a predetermined nucleotide and a second polynucleotide. The invention can be number of crossover events, the boundaries on the functional 65 used to produce a hybrid polynucleotide by introducing at variety between the chimeric molecules is reduced. This pro least a first polynucleotide and a second polynucleotide (e.g., vides a more manageable number of variables when calculat one, or both, being an exemplary polypeptide-, enzyme-, US 8,119,385 B2 163 164 protein-, e.g. structural or binding protein-encoding sequence favor the loss of discrete units. Thus, it is preferable with the of the invention) which share at least one region of partial present method that the sequences are in the same orientation. into a Suitable host cell. The regions of Random orientation of quasi-repeated sequences will result partial sequence homology promote processes which result in in the loss of reassortment efficiency, while consistent orien sequence reorganization producing a hybrid polynucleotide. tation of the sequences will offer the highest efficiency. How The term “hybrid polynucleotide', as used herein, is any ever, while having fewer of the contiguous sequences in the nucleotide sequence which results from the method of the same orientation decreases the efficiency, it may still provide present invention and contains sequence from at least two sufficient elasticity for the effective recovery of novel mol original polynucleotide sequences. Such hybrid polynucle ecules. Constructs can be made with the quasi-repeated otides can result from intermolecular recombination events 10 sequences in the same orientation to allow higher efficiency. which promote sequence integration between DNA mol Sequences can be assembled in a head to tail orientation ecules. In addition, such hybrid polynucleotides can result using any of a variety of methods, including the following: from intramolecular reductive reassortment processes which a) Primers that include a poly-Ahead and poly-T tail which utilize repeated sequences to alter a nucleotide sequence when made single-stranded would provide orientation within a DNA molecule. 15 can be utilized. This is accomplished by having the first In vivo reassortment is focused on “inter-molecular pro few bases of the primers made from RNA and hence cesses collectively referred to as “recombination” which in easily removed RNaseH. bacteria, is generally viewed as a “RecA-dependent' phe b) Primers that include unique restriction cleavage sites can nomenon. The invention can rely on recombination processes be utilized. Multiple sites, a battery of unique sequences of a host cell to recombine and re-assort sequences, or the and repeated synthesis and ligation steps would be cells’ ability to mediate reductive processes to decrease the required. complexity of quasi-repeated sequences in the cell by dele c) The inner few bases of the primer could be thiolated and tion. This process of “reductive reassortment’ occurs by an an exonuclease used to produce properly tailed mol “intra-molecular, RecA-independent process. ecules. Therefore, in another aspect of the invention, novel poly 25 The recovery of the re-assorted sequences relies on the nucleotides can be generated by the process of reductive identification of cloning vectors with a reduced repetitive reassortment. The method involves the generation of con index (RI). The re-assorted encoding sequences can then be structs containing consecutive sequences (original encoding recovered by amplification. The products are re-cloned and sequences), their insertion into an appropriate vector and their expressed. The recovery of cloning vectors with reduced RI Subsequent introduction into an appropriate host cell. The 30 can be affected by: reassortment of the individual molecular identities occurs by 1) The use of vectors only stably maintained when the con combinatorial processes between the consecutive sequences struct is reduced in complexity. in the construct possessing regions of homology, or between 2) The physical recovery of shortened vectors by physical quasi-repeated units. The reassortment process recombines procedures. In this case, the cloning vector would be recov and/or reduces the complexity and extent of the repeated 35 ered using standard plasmid isolation procedures and size sequences and results in the production of novel molecular fractionated on either an agarose gel, or column with a low species. Various treatments may be applied to enhance the molecular weight cut off utilizing standard procedures. rate of reassortment. These could include treatment with 3) The recovery of vectors containing interrupted genes ultra-violet light, or DNA damaging chemicals and/or the use which can be selected when insert size decreases. of host cell lines displaying enhanced levels of "genetic insta 40 4) The use of direct selection techniques with an expression bility”. Thus the reassortment process may involve homolo vector and the appropriate selection. gous recombination or the natural property of quasi-repeated Encoding sequences (for example, genes) from related sequences to direct their own evolution. organisms may demonstrate a high degree of homology and Repeated or “quasi-repeated sequences play a role in encode quite diverse protein products. These types of genetic instability. In the present invention, “quasi-repeats' 45 sequences are particularly useful in the present invention as are repeats that are not restricted to their original unit struc quasi-repeats. However, while the examples illustrated below ture. Quasi-repeated units can be presented as an array of demonstrate the reassortment of nearly identical original sequences in a construct; consecutive units of similar encoding sequences (quasi-repeats), this process is not lim sequences. Once ligated, the junctions between the consecu ited to Such nearly identical repeats. tive sequences become essentially invisible and the quasi 50 The following example demonstrates a method of the repetitive nature of the resulting construct is now continuous invention. Encoding nucleic acid sequences (quasi-repeats) at the molecular level. The deletion process the cell performs derived from three (3) unique species are described. Each to reduce the complexity of the resulting construct operates sequence encodes a protein with a distinct set of properties. between the quasi-repeated sequences. The quasi-repeated Each of the sequences differs by a single or a few base pairs at units provide a practically limitless repertoire of templates 55 a unique position in the sequence. The quasi-repeated upon which slippage events can occur. The constructs con sequences are separately or collectively amplified and ligated taining the quasi-repeats thus effectively provide Sufficient into random assemblies such that all possible permutations molecular elasticity that deletion (and potentially insertion) and combinations are available in the population of ligated events can occur virtually anywhere within the quasi-repeti molecules. The number of quasi-repeat units can be con tive units. 60 trolled by the assembly conditions. The average number of When the quasi-repeated sequences are all ligated in the quasi-repeated units in a construct is defined as the repetitive same orientation, for instance head to tail or vice versa, the index (RI). cell cannot distinguish individual units. Consequently, the Once formed, the constructs may, or may not be size frac reductive process can occur throughout the sequences. In tionated on an agarose gel according to published protocols, contrast, when for example, the units are presented head to 65 inserted into a cloning vector and transfected into an appro head, rather than head to tail, the inversion delineates the priate host cell. The cells are then propagated and “reductive endpoints of the adjacent unit so that deletion formation will reassortment' is effected. The rate of the reductive reassort US 8,119,385 B2 165 166 ment process may be stimulated by the introduction of DNA binding protein, coding sequence (e.g., a gene, cDNA or damage if desired. Whether the reduction in RI is mediated by message) of the invention, which can be altered by any means, deletion formation between repeated sequences by an “intra including, e.g., random or stochastic methods, or, non-sto molecular mechanism, or mediated by recombination-like chastic, or “directed evolution, methods, as described above. events through "inter-molecular mechanisms is immaterial. 5 The isolated variants may be naturally occurring. Variant The end result is a reassortment of the molecules into all can also be created in vitro. Variants may be created using possible combinations. genetic engineering techniques such as site directed mutagen Optionally, the method comprises the additional step of esis, random chemical mutagenesis, Exonuclease III deletion screening the library members of the shuffled pool to identify procedures, and standard cloning techniques. Alternatively, individual shuffled library members having the ability to bind 10 Such variants, fragments, analogs, or derivatives may be cre or otherwise interact, or catalyze a particular reaction (e.g., ated using chemical synthesis or modification procedures. Such as catalytic domain of an enzyme) with a predetermined Other methods of making variants are also familiar to those macromolecule. Such as for example a proteinaceous recep skilled in the art. These include procedures in which nucleic tor, an oligosaccharide, virion, or other predetermined com acid sequences obtained from natural isolates are modified to pound or structure. 15 generate nucleic acids which encode polypeptides having The polypeptides that are identified from such libraries can characteristics which enhance their value in industrial or be used for therapeutic, diagnostic, research and related pur laboratory applications. In Such procedures, a large number poses (e.g., catalysts, Solutes for increasing osmolarity of an of variant sequences having one or more nucleotide differ aqueous Solution and the like) and/or can be subjected to one ences with respect to the sequence obtained from the natural or more additional cycles of shuffling and/or selection. isolate are generated and characterized. These nucleotide dif In another aspect, it is envisioned that prior to or during ferences can result in amino acid changes with respect to the recombination or reassortment, polynucleotides generated by polypeptides encoded by the nucleic acids from the natural the method of the invention can be subjected to agents or isolates. processes which promote the introduction of mutations into For example, variants may be created using error prone the original polynucleotides. The introduction of Such muta 25 PCR. In error prone PCR. PCR is performed under conditions tions would increase the diversity of resulting hybrid poly where the copying fidelity of the DNA polymerase is low, nucleotides and polypeptides encoded therefrom. The agents Such that a high rate of point mutations is obtained along the or processes which promote mutagenesis can include, but are entire length of the PCR product. Error prone PCR is not limited to: (+)-CC-1065, or a synthetic analog such as described, e.g., in Leung (1989) Technique 1:11-15) and (+)-CC-1065-(N-3-Adenine (See Sun and Hurley, (1992); an 30 Caldwell (1992) PCR Methods Applic. 2:28-33. Briefly, in N-acetylated or deacetylated 4'-fluoro-4-aminobiphenyl Such procedures, nucleic acids to be mutagenized are mixed adduct capable of inhibiting DNA synthesis (See, for with PCR primers, reaction buffer, MgCl, MnCl, Taq poly example, van de Poll et al. (1992)); or a N-acetylated or merase and an appropriate concentration of dNTPs for deacetylated 4-aminobiphenyl adduct capable of inhibiting achieving a high rate of point mutation along the entire length DNA synthesis (See also, van de Poll et al. (1992), pp. 751 35 of the PCR product. For example, the reaction may be per 758); trivalent chromium, a trivalent chromium salt, a poly formed using 20 fmoles of nucleic acid to be mutagenized, 30 cyclic aromatic hydrocarbon (PAH) DNA adduct capable of pmole of each PCR primer, a reaction buffer comprising 50 inhibiting DNA replication, such as 7-bromomethyl-benza mMKC1, 10 mM Tris HCl (pH 8.3) and 0.01% gelatin, 7 mM anthracene (“BMA'), tris(2,3-dibromopropyl)phosphate MgCl2, 0.5 mM MnCl, 5 units of Taq polymerase, 0.2 mM (“Tris-BP), 1,2-dibromo-3-chloropropane (“DBCP), 40 dGTP, 0.2 mM dATP, 1 mM dCTP and 1 mM dTTP, PCR may 2-bromoacrolein (2BA), benzoapyrene-7,8-dihydrodiol-9- be performed for 30 cycles of 94° C. for 1 min, 45° C. for 1 10-epoxide (“BPDE'), a platinum(II) halogen salt, N-hy min, and 72°C. for 1 min. However, it will be appreciated that droxy-2-amino-3-methylimidazo[4,5-f-quinoline (“N-hy these parameters may be varied as appropriate. The droxy-IQ) and N-hydroxy-2-amino-1-methyl-6- mutagenized nucleic acids are cloned into an appropriate phenylimidazo 4.5-f-pyridine (“N-hydroxy-PhIP). 45 vector and the activities of the polypeptides encoded by the Exemplary means for slowing or halting PCR amplification mutagenized nucleic acids are evaluated. consist of UV light (+)-CC-1065 and (+)-CC-1065-(N3-Ad Variants may also be created using oligonucleotide enine). Particularly encompassed means are DNA adducts or directed mutagenesis to generate site-specific mutations in polynucleotides comprising the DNA adducts from the poly any cloned DNA of interest. Oligonucleotide mutagenesis is nucleotides or polynucleotides pool, which can be released or 50 described, e.g., in Reidhaar-Olson (1988) Science 241:53-57. removed by a process including heating the solution compris Briefly, in such procedures a plurality of double stranded ing the polynucleotides prior to further processing. oligonucleotides bearing one or more mutations to be intro In another aspect the invention is directed to a method of duced into the cloned DNA are synthesized and inserted into producing recombinant proteins having biological activity by the cloned DNA to be mutagenized. Clones containing the treating a sample comprising double-stranded template poly 55 mutagenized DNA are recovered and the activities of the nucleotides encoding a wild-type protein under conditions polypeptides they encode are assessed. according to the invention which provide for the production Another method for generating variants is assembly PCR. of hybrid or re-assorted polynucleotides. Assembly PCR involves the assembly of a PCR product from Producing Sequence Variants a mixture of small DNA fragments. A large number of differ The invention also provides additional methods for making 60 ent PCR reactions occur in parallel in the same vial, with the sequence variants of the nucleic acid (e.g., polypeptide, products of one reaction priming the products of another enzyme, protein, e.g. structural or binding protein) sequences reaction. Assembly PCR is described in, e.g., U.S. Pat. No. of the invention. The invention also provides additional meth 5,965,408. ods for isolating a polypeptide, enzyme, protein, e.g. struc Still another method of generating variants is sexual PCR tural orbinding protein, using the nucleic acids and polypep 65 mutagenesis. In sexual PCR mutagenesis, forced homolo tides of the invention. In one aspect, the invention provides for gous recombination occurs between DNA molecules of dif variants of a polypeptide, enzyme, protein, e.g. structural or ferent but highly related DNA sequence in vitro, as a result of US 8,119,385 B2 167 168 random fragmentation of the DNA molecule based on polypeptides as described in U.S. Pat. No. 5,965,408, filed sequence homology, followed by fixation of the crossover by Jul. 9, 1996, entitled, “Method of DNA Reassembly by Inter primer extension in a PCR reaction. Sexual PCR mutagenesis rupting Synthesis” and U.S. Pat. No. 5,939,250, filed May 22, is described, e.g., in Stemmer (1994) Proc. Natl. Acad. Sci. 1996, entitled, “Production of Enzymes Having Desired USA91:10747-10751. Briefly, in such procedures a plurality Activities by Mutagenesis. of nucleic acids to be recombined are digested with DNase to The variants of the polypeptides of the invention may be generate fragments having an average size of 50-200 nucle variants in which one or more of the amino acid residues of otides. Fragments of the desired average size are purified and the polypeptides of the sequences of the invention are substi resuspended in a PCR mixture. PCR is conducted under con tuted with a conserved or non-conserved amino acid residue ditions which facilitate recombination between the nucleic 10 (in one aspect a conserved amino acid residue) and Such acid fragments. For example, PCR may be performed by Substituted amino acid residue may or may not be one resuspending the purified fragments at a concentration of encoded by the genetic code. 10-30 ng/ul in a solution of 0.2 mM of each dNTP 2.2 mM Conservative substitutions are those that substitute a given MgCl, 50 mM KCL, 110 mM Tris HCl, pH 9.0, and 0.1% amino acid in a polypeptide by another amino acid of like Triton X-100. 2.5 units of Taq polymerase per 100:1 of reac 15 characteristics. Typically seen as conservative Substitutions tion mixture is added and PCR is performed using the follow are the following replacements: replacements of an aliphatic ing regime: 94° C. for 60 seconds, 94° C. for 30 seconds, amino acid such as Alanine, Valine, Leucine and Isoleucine 50-55° C. for 30 seconds, 72° C. for 30 seconds (30-45 times) with another aliphatic amino acid; replacement of a Serine and 72°C. for 5 minutes. However, it will be appreciated that with a Threonine or vice versa; replacement of an acidic these parameters may be varied as appropriate. In some residue such as Aspartic acid and Glutamic acid with another aspects, oligonucleotides may be included in the PCR reac acidic residue; replacement of a residue bearing an amide tions. In other aspects, the Klenow fragment of DNA poly group, such as Asparagine and Glutamine, with another resi merase I may be used in a first set of PCR reactions and Taq due bearing an amide group; exchange of a basic residue Such polymerase may be used in a Subsequent set of PCR reactions. as Lysine and Arginine with another basic residue, and Recombinant sequences are isolated and the activities of the 25 replacement of an aromatic residue such as Phenylalanine, polypeptides they encode are assessed. Tyrosine with another aromatic residue. Variants may also be created by in Vivo mutagenesis. In Other variants are those in which one or more of the amino Some aspects, random mutations in a sequence of interest are acid residues of a polypeptide of the invention includes a generated by propagating the sequence of interest in a bacte Substituent group. rial strain, Such as an E. coli Strain, which carries mutations in 30 Still other variants are those in which the polypeptide is one or more of the DNA repair pathways. Such “mutator associated with another compound. Such as a compound to strains have a higher random mutation rate than that of a increase the half-life of the polypeptide (for example, poly wild-type parent. Propagating the DNA in one of these strains ethylene glycol). will eventually generate random mutations within the DNA. Additional variants are those in which additional amino Mutator strains suitable for use for in vivo mutagenesis are 35 acids are fused to the polypeptide, such as a leader sequence, described in PCT Publication No. WO 91/16427, published a secretory sequence, a proprotein sequence or a sequence Oct. 31, 1991, entitled “Methods for Phenotype Creation which facilitates purification, enrichment, or stabilization of from Multiple Gene Populations”. the polypeptide. Variants may also be generated using cassette mutagenesis. In some aspects, the fragments, derivatives and analogs In cassette mutagenesis a small region of a double stranded 40 retain the same biological function or activity as the polypep DNA molecule is replaced with a synthetic oligonucleotide tides of the invention. In other aspects, the fragment, deriva “cassette' that differs from the native sequence. The oligo tive, or analog includes a proprotein, such that the fragment, nucleotide often contains completely and/or partially ran derivative, or analog can be activated by cleavage of the domized native sequence. proprotein portion to produce an active polypeptide. Recursive ensemble mutagenesis may also be used togen 45 Optimizing Codons to Achieve High Levels of Protein erate variants. Recursive ensemble mutagenesis is an algo Expression in Host Cells rithm for protein engineering (protein mutagenesis) devel The invention provides methods for modifying polypep oped to produce diverse populations of phenotypically related tide-, enzyme-, protein-, e.g. structural or binding protein mutants whose members differ in amino acid sequence. This encoding nucleic acids to modify codon usage. In one aspect, method uses a feedback mechanism to control Successive 50 the invention provides methods for modifying codons in a rounds of combinatorial cassette mutagenesis. Recursive nucleic acid encoding a polypeptide, enzyme, protein, e.g. ensemble mutagenesis is described, e.g., in Arkin (1992) structural or binding protein, to increase or decrease its Proc. Natl. Acad. Sci. USA 89:7811-7815. expression in a host cell. The invention also provides nucleic In some aspects, variants are created using exponential acids encoding a polypeptide, enzyme, protein, e.g. structural ensemble mutagenesis. Exponential ensemble mutagenesis is 55 or binding protein, modified to increase its expression in a a process for generating combinatorial libraries with a high host cell, a polypeptide, enzyme, protein, e.g. structural or percentage of unique and functional mutants, wherein Small binding protein, so modified, and methods of making the groups of residues are randomized in parallel to identify, at modified a polypeptide, enzyme, protein, e.g. structural or each altered position, amino acids which lead to functional binding protein. The method comprises identifying a “non proteins. Exponential ensemble mutagenesis is described, 60 preferred’ or a “less preferred codon in a polypeptide e.g., in Delegrave (1993) Biotechnology Res. 11:1548-1552. enzyme-, protein-, e.g. structural orbinding protein-encoding Random and site-directed mutagenesis are described, e.g., in nucleic acid and replacing one or more of these non-preferred Arnold (1993) Current Opinion in Biotechnology 4:450-455. or less preferred codons with a “preferred codon’ encoding In some aspects, the variants are created using shuffling the same amino acid as the replaced codon and at least one procedures wherein portions of a plurality of nucleic acids 65 non-preferred or less preferred codon in the nucleic acid has which encode distinct polypeptides are fused together to cre been replaced by a preferred codon encoding the same amino ate chimeric nucleic acid sequences which encode chimeric acid. A preferred codon is a codon over-represented in coding US 8,119,385 B2 169 170 sequences in genes in the host cell and a non-preferred or less of transgenic dairy animals; Baguisi (1999) Nat. Biotechnol. preferred codon is a codon under-represented in coding 17:456-461, demonstrating the production of transgenic sequences in genes in the host cell. goats. U.S. Pat. No. 6.211,428, describes making and using Host cells for expressing the nucleic acids, expression cas transgenic non-human mammals which express in their settes and vectors of the invention include bacteria, yeast, brains a nucleic acid construct comprising a DNA sequence. fungi, plant cells, insect cells and mammalian cells. Thus, the U.S. Pat. No. 5,387.742, describes injecting cloned recombi invention provides methods for optimizing codon usage in all nant or synthetic DNA sequences into fertilized mouse eggs, of these cells, codon-altered nucleic acids and polypeptides implanting the injected eggs in pseudo-pregnant females, and made by the codon-altered nucleic acids. Exemplary host growing to term transgenic mice. U.S. Pat. No. 6,187,992, cells include gram negative bacteria, such as : 10 describes making and using a transgenic mouse. gram positive bacteria, Such as Streptomyces sp., Lactobacil “Knockout animals' can also be used to practice the meth lus gasseri, Lactococcus lactis, Lactococcus Cremoris, Bacil ods of the invention. For example, in one aspect, the trans lus subtilis, Bacillus cereus. Exemplary host cells also genic or modified animals of the invention comprise a include eukaryotic organisms, e.g., various yeast, Such as “knockout animal.” e.g., a "knockout mouse.” engineered not Saccharomyces sp., including Saccharomyces cerevisiae, 15 to express an endogenous gene, which is replaced with a gene Schizosaccharomyces pombe, Pichia pastoris, and Kluyvero expressing a polypeptide, enzyme, protein, e.g. structural or myces lactis, Hansenula polymorpha, Aspergillus niger, and binding protein, of the invention, or, a fusion protein com mammalian cells and cell lines and insect cells and cell lines. prising a polypeptide, enzyme, protein, e.g. structural orbind Thus, the invention also includes nucleic acids and polypep ing protein, of the invention. tides optimized for expression in these organisms and spe Transgenic Plants and Seeds C1GS. The invention provides transgenic plants and seeds com For example, the codons of a nucleic acid encoding a prising a nucleic acid, a polypeptide (e.g., a polypeptide, polypeptide, enzyme, protein, e.g. structural or binding pro enzyme, protein, e.g. structural or binding protein), an tein, isolated from a bacterial cell are modified such that the expression cassette or vector or a transfected or transformed nucleic acid is optimally expressed in a bacterial cell different 25 cell of the invention. The invention also provides plant prod from the bacteria from which the polypeptide, enzyme, pro ucts, e.g., oils, seeds, leaves, extracts and the like, comprising tein, e.g. structural or binding protein was derived, a yeast, a a nucleic acid and/or a polypeptide (e.g., a polypeptide, fungi, a plant cell, an insect cell or a mammalian cell. Meth enzyme, protein, e.g. structural or binding protein) of the ods for optimizing codons are well known in the art, see, e.g., invention. The transgenic plant can be dicotyledonous (a U.S. Pat. No. 5,795,737; Baca (2000) Int. J. Parasitol. 30: 113 30 dicot) or monocotyledonous (a monocot). The invention also 118; Hale (1998) Protein Expr. Purif. 12:185-188: Narum provides methods of making and using these transgenic (2001) Infect. Immun.69:7250-7253. See also Narum (2001) plants and seeds. The transgenic plant or plant cell expressing Infect. Immun. 69:7250-7253, describing optimizing codons a polypeptide of the present invention may be constructed in in mouse systems; Outchkourov (2002) Protein Expr. Purif. accordance with any method known in the art. See, for 24:18-24, describing optimizing codons in yeast; Feng 35 example, U.S. Pat. No. 6,309,872. (2000) Biochemistry 39:15399-15409, describing optimiz Nucleic acids and expression constructs of the invention ing codons in E. coli; Humphreys (2000) Protein Expr. Purif. can be introduced into a plant cell by any means. For example, 20:252-264, describing optimizing codon usage that affects nucleic acids or expression constructs can be introduced into secretion in E. coli. the genome of a desired plant host, or, the nucleic acids or Transgenic Non-Human Animals 40 expression constructs can be episomes. Introduction into the The invention provides transgenic non-human animals genome of a desired plant can be such that the host’s a comprising a nucleic acid, a polypeptide (e.g., a polypeptide, polypeptide, enzyme, protein, e.g. structural or binding pro enzyme, protein, e.g. structural or binding protein), an tein, production is regulated by endogenous transcriptional or expression cassette or vector or a transfected or transformed translational control elements. The invention also provides cell of the invention. The invention also provides methods of 45 “knockout plants' where insertion of gene sequence by, e.g., making and using these transgenic non-human animals. homologous recombination, has disrupted the expression of The transgenic non-human animals can be, e.g., goats, the endogenous gene. Means to generate “knockout” plants rabbits, sheep, pigs (including all Swine, hogs and related are well-known in the art, see, e.g., Strepp (1998) Proc Natl. animals), cows, rats and mice, comprising the nucleic acids of Acad. Sci. USA 95:4368-4373; Miao (1995) Plant J 7:359 the invention. These animals can be used, e.g., as in vivo 50 365. See discussion on transgenic plants, below. models to study a polypeptide, enzyme, protein, e.g. struc The nucleic acids of the invention can be used to confer tural or binding protein, activity, or, as models to Screen for desired traits on essentially any plant, e.g., on starch-produc agents that change the polypeptide, enzyme, protein, e.g. ing plants, such as potato, wheat, rice, barley, and the like. structural or binding protein activity in vivo. The coding Nucleic acids of the invention can be used to manipulate sequences for the polypeptides to be expressed in the trans 55 metabolic pathways of a plant in order to optimize or alter genic non-human animals can be designed to be constitutive, host’s expression of polypeptide, enzyme, protein, e.g. struc or, under the control of tissue-specific, developmental-spe tural or binding protein. The can change a polypeptide, cific or inducible transcriptional regulatory factors. Trans enzyme, protein, e.g. structural or binding protein, activity in genic non-human animals can be designed and generated a plant. Alternatively, a polypeptide, enzyme, protein, e.g. using any method known in the art; see, e.g., U.S. Pat. Nos. 60 structural or binding protein, of the invention can be used in 6,211,428; 6,187,992: 6,156,952: 6,118,044: 6,111,166; production of a transgenic plant to produce a compound not 6,107,541; 5,959,171; 5,922,854; 5,892,070; 5,880,327; naturally produced by that plant. This can lower production 5,891,698; 5,639,940; 5,573,933; 5,387,742; 5,087,571, costs or create a novel product. describing making and using transformed cells and eggs and In one aspect, the first step in production of a transgenic transgenic mice, rats, rabbits, sheep, pigs and cows. See also, 65 plant involves making an expression construct for expression e.g., Pollock (1999) J. Immunol. Methods 231:147-157, in a plant cell. These techniques are well known in the art. describing the production of recombinant proteins in the milk They can include selecting and cloning a promoter, a coding US 8,119,385 B2 171 172 sequence for facilitating efficient binding of ribosomes to Nucleic acids, e.g., expression constructs, can also be mRNA and selecting the appropriate gene terminator introduced in to plant cells using recombinant viruses. Plant sequences. One exemplary constitutive promoter is cells can be transformed using viral vectors, such as, e.g., CaMV35S, from the cauliflower mosaic virus, which gener tobacco mosaic virus derived vectors (Rouwendal (1997) ally results in a high degree of expression in plants. Other 5 Plant Mol. Biol. 33:989-999), see Porta (1996). “Use of viral promoters are more specific and respond to cues in the plants replicons for the expression of genes in plants.” Mol. Bio internal or external environment. An exemplary light-induc technol. 5:209-221. ible promoter is the promoter from the cab gene, encoding the Alternatively, nucleic acids, e.g., an expression construct, major chlorophyll afb binding protein. can be combined with suitable T-DNA flanking regions and In one aspect, the nucleic acid is modified to achieve 10 introduced into a conventional Agrobacterium tumefaciens greater expression in a plant cell. For example, a sequence of host vector. The virulence functions of the Agrobacterium the invention is likely to have a higher percentage of A-T tumefaciens host will direct the insertion of the construct and nucleotide pairs compared to that seen in a plant, some of adjacent marker into the plant cell DNA when the cell is which prefer G-C nucleotide pairs. Therefore, A-T nucle infected by the bacteria. Agrobacterium tumefaciens-medi otides in the coding sequence can be substituted with G-C 15 ated transformation techniques, including disarming and use nucleotides without significantly changing the amino acid of binary vectors, are well described in the scientific litera sequence to enhance production of the gene product in plant ture. See, e.g., Horsch (1984) Science 233:496-498; Fraley cells. (1983) Proc. Natl. Acad. Sci. USA 80:4803 (1983); Gene Selectable marker gene can be added to the gene construct Transfer to Plants, Potrykus, ed. (Springer-Verlag, Berlin in order to identify plant cells or tissues that have successfully 1995). The DNA in an A. tumefaciens cell is contained in the integrated the transgene. This may be necessary because bacterial chromosome as well as in another structure known achieving incorporation and expression of genes in plant cells as a Ti (tumor-inducing) plasmid. The Tiplasmid contains a is a rare event, occurring injust a few percent of the targeted stretch of DNA termed T-DNA (~20 kb long) that is trans tissues or cells. Selectable marker genes encode proteins that ferred to the plant cell in the infection process and a series of provide resistance to agents that are normally toxic to plants, 25 Vir (virulence) genes that direct the infection process. A. such as antibiotics or herbicides. Only plant cells that have tumefaciens can only infect a plant through wounds: when a integrated the selectable marker gene will survive when plant root or stem is wounded it gives off certain chemical grown on a medium containing the appropriate antibiotic or signals, in response to which, the vir genes of A. tumefaciens herbicide. As for other inserted genes, marker genes also become activated and direct a series of events necessary for require promoter and termination sequences for proper func 30 the transfer of the T-DNA from the Tiplasmid to the plants tion. chromosome. The T-DNA enters the plant cell through the In one aspect, making transgenic plants or seeds comprises wound. One speculation is that the T-DNA waits until the incorporating sequences of the invention and, optionally, plant DNA is being replicated or transcribed, then inserts marker genes into a target expression construct (e.g., a plas itself into the exposed plant DNA. In order to use A. tumefa mid), along with positioning of the promoter and the termi 35 ciens as a transgene vector, the tumor-inducing section of nator sequences. This can involve transferring the modified T-DNA have to be removed, while retaining the T-DNA bor gene into the plant through a suitable method. For example, a der regions and the vir genes. The transgene is then inserted construct may be introduced directly into the genomic DNA between the T-DNA border regions, where it is transferred to of the plant cell using techniques such as electroporation and the plant cell and becomes integrated into the plant's chro microinjection of plant cell protoplasts, or the constructs can 40 OSOS. be introduced directly to plant tissue using ballistic methods, The invention provides for the transformation of mono Such as DNA particle bombardment. For example, see, e.g., cotyledonous plants using the nucleic acids of the invention, Christou (1997) Plant Mol. Biol. 35:197-203; Pawlowski including important cereals, see Hiei (1997) Plant Mol. Biol. (1996) Mol. Biotechnol. 6:17-30; Klein (1987) Nature 327: 35:205-218. See also, e.g., Horsch, Science (1984) 233:496; 70-73; Takumi (1997) Genes Genet. Syst. 72:63-69, discuss 45 Fraley (1983) Proc. Natl. Acad. Sci. USA 80:4803; Thykjaer ing use of particle bombardment to introduce transgenes into (1997) supra; Park (1996) Plant Mol. Biol. 32:1135-1148, wheat; and Adam (1997) supra, for use of particle bombard discussing T-DNA integration into genomic DNA. See also ment to introduce YACs into plant cells. For example, Rine D'Halluin, U.S. Pat. No. 5,712,135, describing a process for hart (1997) supra, used particle bombardment to generate the stable integration of a DNA comprising a gene that is transgenic cotton plants. Apparatus for accelerating particles 50 functional in a cell of a cereal, or other monocotyledonous is described U.S. Pat. No. 5,015,580; and, the commercially plant. available BioRad (Biolistics) PDS-2000 particle acceleration In one aspect, the third step can involve selection and instrument; see also, John, U.S. Pat. No. 5,608,148; and Ellis, regeneration of whole plants capable of transmitting the U.S. Pat. No. 5,681,730, describing particle-mediated trans incorporated target gene to the next generation. Such regen formation of gymnosperms. 55 eration techniques rely on manipulation of certain phytohor In one aspect, protoplasts can be immobilized and injected mones in a tissue culture growth medium, typically relying on with a nucleic acids, e.g., an expression construct. Although a biocide and/or herbicide marker that has been introduced plant regeneration from protoplasts is not easy with cereals, together with the desired nucleotide sequences. Plant regen plant regeneration is possible in legumes using Somatic eration from cultured protoplasts is described in Evans et al., embryogenesis from protoplast derived callus. Organized tis 60 Protoplasts Isolation and Culture, Handbook of Plant Cell Sues can be transformed with naked DNA using gene gun Culture, pp. 124-176, MacMillilan Publishing Company, technique, where DNA is coated on tungsten microprojec New York, 1983; and Binding, Regeneration of Plants, Plant tiles, shot /100 th the size of cells, which carry the DNA deep Protoplasts, pp. 21-73, CRC Press, Boca Raton, 1985. into cells and organelles. Transformed tissue is then induced Regeneration can also be obtained from plant callus, explants, to regenerate, usually by Somatic embryogenesis. This tech 65 organs, or parts thereof. Such regeneration techniques are nique has been Successful in several cereal species including described generally in Klee (1987) Ann. Rev. of Plant Phys. maize and rice. 38:467-486. To obtain whole plants from transgenic tissues US 8,119,385 B2 173 174 Such as immature embryos, they can be grown under con transgene mRNA or protein in transgenic plants. Means for trolled environmental conditions in a series of media contain detecting and quantitation of mRNAS or proteins are well ing nutrients and hormones, a process known as tissue cul known in the art. ture. Once whole plants are generated and produce seed, Polypeptides and Peptides evaluation of the progeny begins. In one aspect, the invention provides isolated or recombi After the expression cassette is stably incorporated in nant polypeptides having a sequence identity (e.g., at least transgenic plants, it can be introduced into other plants by about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, sexual crossing. Any of a number of Standard breeding tech 59%, 60%, 61%, 62%, 63%, 64%. 65%, 66%, 67%, 68%, niques can be used, depending upon the species to be crossed. 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, Since transgenic expression of the nucleic acids of the inven 10 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, tion leads to phenotypic changes, plants comprising the 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, recombinant nucleic acids of the invention can be sexually 99%, or more, or complete (100%) sequence identity, or crossed with a second plant to obtaina final product. Thus, the homology) to an exemplary sequence of the invention, e.g., seed of the invention can be derived from a cross between two proteins having a sequence as set forth in SEQID NO:2, SEQ transgenic plants of the invention, or a cross between a plant 15 ID NO:4, SEQID NO:6, SEQID NO:8, SEQID NO:10, etc., of the invention and another plant. The desired effects (e.g., and all polypeptides disclosed in the SEQ ID listing, which expression of the polypeptides of the invention to produce a include all even numbered SEQID NOs: from SEQID NO:2 plant in which flowering behavior is altered) can be enhanced through SEQID NO:26,898). The percent sequence identity when both parental plants express the polypeptides (e.g., a can be over the full length of the polypeptide, or, the identity polypeptide, enzyme, protein, e.g. structural or binding pro can be over a region of at least about 50, 60, 70, 80,90, 100, tein) of the invention. The desired effects can be passed to 150, 200,250, 300,350, 400, 450, 500, 550, 600, 650, 700 or future plant generations by Standard propagation means. more residues. The nucleic acids and polypeptides of the invention are “Amino acid' or "amino acid sequence' as used herein expressed in or inserted in any plant or seed. Transgenic refer to an oligopeptide, peptide, polypeptide, or protein plants of the invention can be dicotyledonous or monocoty 25 sequence, or to a fragment, portion, or subunit of any of these ledonous. Examples of monocot transgenic plants of the and to naturally occurring or synthetic molecules. Amino invention are grasses, such as meadow grass (bluegrass, Poa), acid' or "amino acid sequence' include an oligopeptide, pep forage grass such as festuca, lolium, temperate grass, such as tide, polypeptide, or protein sequence, or to a fragment, por Agrostis, and cereals, e.g., wheat, oats, rye, barley, rice, Sor tion, or subunit of any of these, and to naturally occurring or ghum, and maize (corn). Examples of dicot transgenic plants 30 synthetic molecules. The term “polypeptide' as used herein, of the invention are tobacco, legumes, such as lupins, potato, refers to amino acids joined to each other by peptide bonds or sugar beet, pea, bean and soybean, and cruciferous plants modified peptide bonds, i.e., peptide isosteres and may con (family Brassicaceae). Such as cauliflower, rape seed, and the tain modified amino acids other than the 20 gene-encoded closely related model organism Arabidopsis thaliana. Thus, amino acids. The polypeptides may be modified by either the transgenic plants and seeds of the invention include a 35 natural processes, such as post-translational processing, or by broad range of plants, including, but not limited to, species chemical modification techniques which are well known in from the genera Anacardium, Arachis, Asparagus, Atropa, the art. Modifications can occur anywhere in the polypeptide, Avena, Brassica, Citrus, Citrullus, Capsicum, Carthamus, including the peptide backbone, the amino acid side-chains Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, and the amino or carboxyl termini. It will be appreciated that Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, 40 the same type of modification may be present in the same or Hordeum, Hyoscyamus, Lactuca, Linum, Lolium, Lupinus, varying degrees at several sites in a given polypeptide. Also a Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicoti given polypeptide may have many types of modifications. ana, Olea, Oryza, Panieum, Pannisetum, Persea, Phaseolus, Modifications include acetylation, acylation, ADP-ribosyla Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, tion, amidation, covalent attachment of flavin, covalent Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigo 45 attachment of a heme moiety, covalent attachment of a nucle nella, Triticum, Vicia, Vitis, Vigna, and Zea. otide or nucleotide derivative, covalent attachment of a lipid In alternative embodiments, the nucleic acids of the inven or lipid derivative, covalent attachment of a phosphatidyli tion are expressed in plants which contain fiber cells, includ nositol, cross-linking cyclization, disulfide bond formation, ing, e.g., cotton, silk cotton tree (Kapok, Ceiba pentandra), demethylation, formation of covalent cross-links, formation desert willow, creosote bush, winterfat, balsa, ramie, kenaf, 50 of cysteine, formation of pyroglutamate, formylation, hemp, roselle, jute, sisal abaca and flax. In alternative gamma-carboxylation, glycosylation, GPI anchor formation, embodiments, the transgenic plants of the invention can be hydroxylation, iodination, methylation, myristolyation, oxi members of the genus Gossypium, including members of any dation, pegylation, glucan hydrolase processing, phosphory Gossypium species, such as G. arboreum, G. herbaceum, G. lation, , racemization, selenoylation, Sulfation and barbadense, and G. hirsutum. 55 transfer-RNA mediated addition of amino acids to protein The invention also provides for transgenic plants to be used Such as arginylation. (See Creighton, T. E. Proteins—Struc for producing large amounts of the polypeptides (e.g., a ture and Molecular Properties 2nd Ed., W.H. Freeman and polypeptide, enzyme, protein, e.g. structural or binding pro Company, New York (1993); Posttranslational Covalent tein, or antibody) of the invention. For example, see Palmgren Modification of Proteins, B. C. Johnson, Ed., Academic (1997) Trends Genet. 13:348; Chong (1997) Transgenic Res. 60 Press, New York, pp. 1-12 (1983)). The peptides and polypep 6:289-296 (producing human milk protein beta-casein in tides of the invention also include all “mimetic' and "pepti transgenic potato plants using an auxin-inducible, bidirec domimetic' forms, as described in further detail, below. tional mannopine synthase (mas 1'.2) promoter with Agro As used herein, the term "isolated” means that the material bacterium tumefaciens-mediated leaf disc transformation is removed from its original environment (e.g., the natural methods). 65 environment if it is naturally occurring). For example, a natu Using known procedures, one of skill can screen for plants rally-occurring polynucleotide or polypeptide present in a of the invention by detecting the increase or decrease of living animal is not isolated, but the same polynucleotide or US 8,119,385 B2 175 176 polypeptide, separated from some or all of the coexisting Polypeptides and peptides of the invention can be isolated materials in the natural system, is isolated. Such polynucle from natural sources, be synthetic, or be recombinantly gen otides could be part of a vector and/or such polynucleotides or erated polypeptides. Peptides and proteins can be recombi polypeptides could be part of a composition and still be iso nantly expressed in vitro or in vivo. The peptides and lated in that such vector or composition is not part of its polypeptides of the invention can be made and isolated using natural environment. As used herein, the term “purified” does any method known in the art. Polypeptide and peptides of the not require absolute purity; rather, it is intended as a relative invention can also be synthesized, whole or in part, using definition. Individual nucleic acids obtained from a library chemical methods well known in the art. See e.g., Caruthers have been conventionally purified to electrophoretic homo (1980) Nucleic Acids Res. Symp. Ser. 215-223; Horn (1980) 10 Nucleic Acids Res. Symp. Ser. 225-232; Banga, A. K. Thera geneity. The sequences obtained from these clones could not peutic Peptides and Proteins. Formulation, Processing and be obtained directly either from the library or from total Delivery Systems (1995) Technomic Publishing Co., Lan human DNA. The purified nucleic acids of the invention have caster, Pa. For example, peptide synthesis can be performed been purified from the remainder of the genomic DNA in the using various Solid-phase techniques (see e.g., Roberge organism by at least 104-106 fold. However, the term “puri 15 (1995) Science 269:202: Merrifield (1997) Methods Enzy fied also includes nucleic acids which have been purified mol. 289:3-13) and automated synthesis may be achieved, from the remainder of the genomic DNA or from other e.g., using the ABI 431A Peptide Synthesizer (PerkinElmer) sequences in a library or other environment by at least one in accordance with the instructions provided by the manufac order of magnitude, typically two or three orders and more turer. typically four or five orders of magnitude. The peptides and polypeptides of the invention can also be “Recombinant polypeptides or proteins refer to polypep glycosylated. The glycosylation can be added post-transla tides or proteins produced by recombinant DNA techniques: tionally either chemically or by cellular biosynthetic mecha i.e., produced from cells transformed by an exogenous DNA nisms, wherein the later incorporates the use of known gly construct encoding the desired polypeptide or protein. “Syn cosylation motifs, which can be native to the sequence or can thetic' polypeptides or protein are those prepared by chemi 25 be added as a peptide or added in the nucleic acid coding cal synthesis. Solid-phase chemical peptide synthesis meth sequence. The glycosylation can be O-linked or N-linked. ods can also be used to synthesize the polypeptide or The peptides and polypeptides of the invention, as defined fragments of the invention. Such method have been known in above, include all “mimetic' and “peptidomimetic' forms. the art since the early 1960s (Merrifield, R. B., J. Am. Chem. The terms “mimetic' and "peptidomimetic' refer to a syn Soc., 85:2149-2154, 1963) (See also Stewart, J. M. and 30 thetic chemical compound which has Substantially the same Young, J. D., Solid Phase Peptide Synthesis, 2nd Ed., Pierce structural and/or functional characteristics of the polypep Chemical Co., Rockford, Ill., pp. 11-12)) and have recently tides of the invention. The mimetic can be either entirely been employed in commercially available laboratory peptide composed of synthetic, non-natural analogues of amino design and synthesis kits (Cambridge Research Biochemi acids, or, is a chimeric molecule of partly natural peptide cals). Such commercially available laboratory kits have gen 35 amino acids and partly non-natural analogs of amino acids. erally utilized the teachings of H. M. Geysen etal, Proc. Natl. The mimetic can also incorporate any amount of natural Acad. Sci., USA, 81:3998 (1984) and provide for synthesizing amino acid conservative Substitutions as long as such substi peptides upon the tips of a multitude of “rods” or “pins' all of tutions also do not substantially alter the mimetic's structure which are connected to a single plate. and/or activity. As with polypeptides of the invention which Polypeptides of the invention can also be shorter than the 40 are conservative variants or members of a genus of polypep full length of exemplary polypeptides. In alternative aspects, tides of the invention (e.g., having about 50% or more the invention provides polypeptides peptides, fragments) sequence identity to an exemplary sequence of the invention), ranging in size between about 5 and the full length of a routine experimentation will determine whether a mimetic is polypeptide, e.g., an enzyme, such as a polypeptide, enzyme, within the scope of the invention, i.e., that its structure and/or protein, e.g. structural or binding protein; exemplary sizes 45 function is not Substantially altered. Thus, in one aspect, a being of about 5, 10, 15, 20, 25, 30,35, 40, 45, 50, 55, 60, 65, mimetic composition is within the scope of the invention if it 70, 75,80, 85,90, 100,125,150, 175,200,250,300,350, 400, has a polypeptide, enzyme, protein, e.g. structural or binding 450, 500, 550, 600, 650, 700, or more residues, e.g., contigu protein’s activity. ous residues of an exemplary a polypeptide, enzyme, protein, Polypeptide mimetic compositions of the invention can e.g. structural or binding protein, of the invention. Peptides of 50 contain any combination of non-natural structural compo the invention (e.g., a Subsequence of an exemplary polypep nents. In alternative aspect, mimetic compositions of the tide of the invention) can be useful as, e.g., labeling probes, invention include one or all of the following three structural antigens, toleragens, motifs, a polypeptide, enzyme, protein, groups: a) residue linkage groups other than the natural amide e.g. structural or binding protein, active sites (e.g., “catalytic bond ("peptide bond') linkages; b) non-natural residues in domains'), signal sequences and/or prepro domains. 55 place of naturally occurring amino acid residues; or c) resi In alternative aspects, polypeptides of the invention having dues which induce secondary structural mimicry, i.e., to enzyme, structural or binding activity are members of a genus induce or stabilize a secondary structure, e.g., a beta turn, of polypeptides sharing specific structural elements, e.g., gamma turn, beta sheet, conformation, and the amino acid residues, that correlate with enzyme, structural or like. For example, a polypeptide of the invention can be binding activity. These shared structural elements can be used 60 characterized as a mimetic when all or some of its residues are for the routine generation of polypeptide, enzyme, protein, joined by chemical means other than natural peptide bonds. e.g. structural or binding protein, variants. These shared Individual peptidomimetic residues can be joined by peptide structural elements of a polypeptide, enzyme, protein, e.g. bonds, other chemical bonds or coupling means, such as, e.g., structural or binding protein, of the invention can be used as glutaraldehyde, N-hydroxysuccinimide esters, bifunctional guidance for the routine generation of a polypeptide, enzyme, 65 maleimides, N,N'-dicyclohexylcarbodiimide (DCC) or protein, e.g. structural or binding protein, variants within the N,N'diisopropylcarbodiimide (DIC). Linking groups that can Scope of the genus of polypeptides of the invention. be an alternative to the traditional amide bond ("peptide US 8,119,385 B2 177 178 bond') linkages include, e.g., ketomethylene (e.g., by reacting lysinyl with, e.g., Succinic or other carboxylic C(=O)—CH2— for —C(=O) NH ), aminomethyl acid anhydrides. Lysine and other alpha-amino-containing ene (CH NH), ethylene, olefin (CH-CH), ether (CH residue mimetics can also be generated by reaction with imi O), thioether (CH-S), tetrazole (CN ), thiazole, retroa doesters, such as methyl picolinimidate, pyridoxal phos mide, thioamide, or ester (see, e.g., Spatola (1983) in phate, pyridoxal, chloroborohydride, trinitro-benzene Chemistry and Biochemistry of Amino Acids, Peptides and Sulfonic acid, O-methylisourea, 2.4, pentanedione, and Proteins, Vol. 7, pp. 267-357, “Peptide Backbone Modifica transamidase-catalyzed reactions with glyoxylate. Mimetics tions.” Marcell Dekker, N.Y.). of methionine can be generated by reaction with, e.g., A polypeptide of the invention can also be characterized as methionine Sulfoxide. Mimetics of proline include, e.g., pipe a mimetic by containing all or some non-natural residues in 10 colic acid, thiazolidine carboxylic acid, 3- or 4-hydroxy pro place of naturally occurring amino acid residues. Non-natural line, dehydroproline, 3- or 4-methylproline, or 3.3-dimeth residues are well described in the scientific and patent litera ylproline. Histidine residue mimetics can be generated by ture; a few exemplary non-natural compositions useful as reacting histidyl with, e.g., diethylprocarbonate or para-bro mimetics of natural amino acid residues and guidelines are mophenacyl bromide. Other mimetics include, e.g., those described below. Mimetics of aromatic amino acids can be 15 generated by hydroxylation of proline and lysine; phospho generated by replacing by, e.g., D- or L-naphylalanine, D- or rylation of the hydroxyl groups of seryl or threonyl residues; L-phenylglycine; D- or L-2 thieneylalanine; D- or L-1, -2, 3-, methylation of the alpha-amino groups of lysine, arginine and or 4-pyreneylalanine: D- or L-3 thieneylalanine; D- or L-(2- histidine; acetylation of the N-terminal amine; methylation of pyridinyl)-alanine, D- or L-(3-pyridinyl)-alanine, D- or L-(2- main chain amide residues or substitution with N-methyl pyrazinyl)-alanine, D- or L-(4-isopropyl)-phenylglycine; amino acids; or amidation of C-terminal carboxyl groups. D-(trifluoromethyl)-phenylglycine; D-(trifluoromethyl)- A residue, e.g., an amino acid, of a polypeptide of the phenylalanine; D-p-fluoro-phenylalanine; D- or L-p-biphe invention can also be replaced by an amino acid (or peptido nylphenylalanine: D- or L-p-methoxy-biphenylphenylala mimetic residue) of the opposite chirality. Thus, any amino nine: D- or L-2-indole(alkyl)alanines; and, D- or acid naturally occurring in the L-configuration (which can L-alkylainines, where alkyl can be substituted or unsubsti 25 also be referred to as the R or S, depending upon the structure tuted methyl, ethyl, propyl, hexyl, butyl, pentyl, isopropyl. of the chemical entity) can be replaced with the amino acid of iso-butyl, sec-isotyl, iso-pentyl, or a non-acidic amino acids. the same chemical structural type or a peptidomimetic, but of Aromatic rings of a non-natural amino acid include, e.g., the opposite chirality, referred to as the D-amino acid, but also thiazolyl, thiophenyl, pyrazolyl, benzimidazolyl, naphthyl, can be referred to as the R- or S-form. furanyl, pyrrolyl, and pyridyl aromatic rings. 30 The invention also provides methods for modifying the Mimetics of acidic amino acids can be generated by Sub polypeptides of the invention by either natural processes, stitution by, e.g., non-carboxylate amino acids while main Such as post-translational processing (e.g., phosphorylation, taining a negative charge; (phosphono)alanine, Sulfated acylation, etc), or by chemical modification techniques, and threonine. Carboxyl side groups (e.g., aspartyl or glutamyl) the resulting modified polypeptides. Modifications can occur can also be selectively modified by reaction with carbodiim 35 anywhere in the polypeptide, including the peptide backbone, ides (R' N C N R") such as, e.g., 1-cyclohexyl-3(2- the amino acid side-chains and the amino or carboxyl termini. morpholinyl-(4-ethyl) carbodiimide or 1-ethyl-3 (4-azonia-4, It will be appreciated that the same type of modification may 4-dimetholpentyl) carbodiimide. Aspartyl or glutamyl can be present in the same or varying degrees at several sites in a also be converted to asparaginyl and glutaminyl residues by given polypeptide. Also a given polypeptide may have many reaction with ammonium ions. Mimetics of basic amino acids 40 types of modifications. Modifications include acetylation, can be generated by Substitution with, e.g., (in addition to acylation, ADP-ribosylation, amidation, covalent attachment lysine and arginine) the amino acids ornithine, , or of flavin, covalent attachment of a heme moiety, covalent (guanidino)-acetic acid, or (guanidino)alkyl-acetic acid, attachment of a nucleotide or nucleotide derivative, covalent where alkyl is defined above. Nitrile derivative (e.g., contain attachment of a lipid or lipid derivative, covalent attachment ing the CN-moiety in place of COOH) can be substituted for 45 of a phosphatidylinositol, cross-linking cyclization, disulfide asparagine or glutamine. Asparaginyl and glutaminyl resi bond formation, demethylation, formation of covalent cross dues can be deaminated to the corresponding aspartyl or links, formation of cysteine, formation of pyroglutamate, glutamyl residues. Arginine residue mimetics can be gener formylation, gamma-carboxylation, glycosylation, GPI ated by reacting arginyl with, e.g., one or more conventional anchor formation, hydroxylation, iodination, methylation, reagents, including, e.g., phenylglyoxal, 2,3-butanedione, 50 myristolyation, oxidation, pegylation, proteolytic process 1.2-cyclo-hexanedione, or ninhydrin, in one aspect under ing, phosphorylation, prenylation, racemization, selenoyla alkaline conditions. Tyrosine residue mimetics can be gener tion, sulfation, and transfer-RNA mediated addition of amino ated by reacting tyrosyl with, e.g., aromatic diazonium com acids to protein Such as arginylation. See, e.g., Creighton, T. pounds or tetranitromethane. N-acetylimidizol and tetrani E. Proteins—Structure and Molecular Properties 2nd Ed., tromethane can be used to form O-acetyl tyrosyl species and 55 W.H. Freeman and Company, New York (1993); Posttransla 3-nitro derivatives, respectively. Cysteine residue mimetics tional Covalent Modification of Proteins, B. C. Johnson, Ed., can be generated by reacting cysteinyl residues with, e.g., Academic Press, New York, pp. 1-12 (1983). alpha-haloacetates Such as 2-chloroacetic acid or chloroac Solid-phase chemical peptide synthesis methods can also etamide and corresponding amines; to give carboxymethyl or be used to synthesize the polypeptide or fragments of the carboxyamidomethyl derivatives. Cysteine residue mimetics 60 invention. Such method have been known in the art since the can also be generated by reacting cysteinyl residues with, e.g., early 1960's (Merrifield, R. B., J. Am. Chem. Soc., 85:2149 bromo-trifluoroacetone, alpha-bromo-beta-(5-imidozoyl) 2154, 1963) (See also Stewart, J. M. and Young, J. D., Solid propionic acid; chloroacetyl phosphate, N-alkylmaleimides, Phase Peptide Synthesis, 2nd Ed., Pierce Chemical Co., 3-nitro-2-pyridyl disulfide; methyl 2-pyridyl disulfide: Rockford, Ill., pp. 11-12)) and have recently been employed p-chloromercuribenzoate; 2-chloromercuri-4 nitrophenol; 65 in commercially available laboratory peptide design and Syn or, chloro-7-nitrobenzo-Oxa-1,3-diazole. Lysine mimetics thesis kits (Cambridge Research Biochemicals). Such com can be generated (and amino terminal residues can be altered) mercially available laboratory kits have generally utilized the US 8,119,385 B2 179 180 teachings of H. M. Geysen et al. Proc. Natl. Acad. Sci., USA, identified in this way can be used in industry and research to 81:3998 (1984) and provide for synthesizing peptides upon reduce or prevent undesired proteolysis. As with a polypep the tips of a multitude of “rods” or “pins' all of which are tide, enzyme, protein, e.g. structural or binding protein, connected to a single plate. When Such a system is utilized, a inhibitors can be combined to increase the spectrum of activ plate of rods or pins is inverted and inserted into a second ity. plate of corresponding wells or reservoirs, which contain The enzymes of the invention are also useful as research Solutions for attaching or anchoring an appropriate amino reagents to digest proteins or in protein sequencing. For acid to the pins or rod's tips. By repeating such a process example, the polypeptide, enzyme, protein, e.g. structural or step, i.e., inverting and inserting the rods and pin's tips into binding proteins may be used to break polypeptides into appropriate Solutions, amino acids are built into desired pep 10 Smaller fragments for sequencing using, e.g. an automated tides. In addition, a number of available FMOC peptide syn Sequencer. thesis systems are available. For example, assembly of a The invention also provides methods of discovering new a polypeptide or fragment can be carried out on a Solid Support polypeptide, enzyme, protein, e.g. structural or binding pro using an Applied Biosystems, Inc. Model 431 ATM automated tein, using the nucleic acids, polypeptides and antibodies of peptide synthesizer. Such equipment provides ready access to 15 the invention. In one aspect, phagemid libraries are screened the peptides of the invention, either by direct synthesis or by for expression-based discovery of a polypeptide, enzyme, synthesis of a series of fragments that can be coupled using protein, e.g. structural or binding protein. In another aspect, other known techniques. lambda phage libraries are screened for expression-based The polypeptides of the invention include a polypeptide, discovery of a polypeptide, enzyme, protein, e.g. structural or enzyme, protein, e.g. structural or binding protein, in an binding protein. Screening of the phage orphagemid libraries active or inactive form. For example, the polypeptides of the can allow the detection of toxic clones; improved access to invention include proproteins before “maturation' or pro Substrate; reduced need for engineering a host, by-passing the cessing of prepro sequences, e.g., by a proprotein-processing potential for any bias resulting from mass excision of the enzyme. Such as a proprotein convertase to generate an library; and, faster growth at low clone densities. Screening of “active' mature protein. The polypeptides of the invention 25 phage or phagemid libraries can be in liquid phase or in Solid include a polypeptide, enzyme, protein, e.g. structural or phase. In one aspect, the invention provides screening in binding protein, inactive for other reasons, e.g., before “acti liquid phase. This gives a greater flexibility in assay condi Vation” by a post-translational processing event, e.g., an tions; additional substrate flexibility; higher sensitivity for endo- or exo-peptidase or proteinase action, a phosphoryla weak clones; and ease of automation over Solid phase screen tion event, an amidation, a glycosylation or a Sulfation, a 30 ing. dimerization event, and the like. The polypeptides of the The invention provides screening methods using the pro invention include all active forms, including active subse teins and nucleic acids of the invention and robotic automa quences, e.g., catalytic domains or active sites, of the enzyme. tion to enable the execution of many thousands of biocatalytic The invention includes immobilized polypeptides, reactions and Screening assays in a short period of time, e.g., enzymes, proteins, e.g. structural or binding proteins, anti 35 per day, as well as ensuring a high level of accuracy and polypeptides, anti-enzymes, anti-proteins, e.g. anti-structural reproducibility (see discussion of arrays, below). As a result, or anti-binding proteins, antibodies and fragments thereof. a library of derivative compounds can be produced in a matter The invention provides methods for inhibiting a polypeptide, of weeks. For further teachings on modification of molecules, enzyme, protein, e.g. structural or binding protein, activity, including small molecules, see PCT/US94/09174. e.g., using dominant negative mutants or anti-polypeptide, 40 In one aspect, polypeptides or fragments of the invention anti-enzyme, anti-protein, e.g. anti-structural or anti-binding may be obtained through biochemical enrichment or purifi protein antibodies of the invention. The invention includes cation procedures. The sequence of potentially homologous heterocomplexes, e.g., fusion proteins, heterodimers, etc., polypeptides or fragments may be determined by a polypep comprising the polypeptide, enzyme, protein, e.g. structural tide, enzyme, protein, e.g. structural or binding protein, or binding proteins of the invention. 45 assays, gel electrophoresis and/or microSequencing. The Polypeptides of the invention can have an enzyme, struc sequence of the prospective polypeptide or fragment of the tural or binding activity under various conditions, e.g., invention can be compared to an exemplary polypeptide of extremes in pH and/or temperature, oxidizing agents, and the the invention, or a fragment, e.g., comprising at least about 5. like. The invention provides methods leading to alternative a 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more polypeptide, enzyme, protein, e.g. structural or binding pro 50 consecutive amino acids thereof using any of the programs tein, preparations with different catalytic efficiencies and sta described above. bilities, e.g., towards temperature, oxidizing agents and Another aspect of the invention is an assay for identifying changing wash conditions. In one aspect, a polypeptide, fragments or variants of the invention, which retain the enzy enzyme, protein, e.g. structural or binding protein, variants matic function of the polypeptides of the invention. For can be produced using techniques of site-directed mutagen 55 example the fragments or variants of said polypeptides, may esis and/or random mutagenesis. In one aspect, directed evo be used to catalyze biochemical reactions (e.g., production of lution can be used to produce a great variety of a polypeptide, a nootkatone from a Valencene), which indicate that the frag enzyme, protein, e.g. structural or binding protein, variants ment or variant retains the enzymatic activity of a polypeptide with alternative specificities and stability. of the invention. The proteins of the invention are also useful as research 60 An exemplary assay for determining if fragments of vari reagents to identify a polypeptide, enzyme, protein, e.g. struc ants retain the enzymatic activity of the polypeptides of the tural or binding protein, modulators, e.g., activators or inhibi invention includes the steps of contacting the polypeptide tors of a polypeptide, enzyme, protein, e.g. structural or bind fragment or variant with a Substrate molecule under condi ing protein, activity. Briefly, test samples (compounds, tions which allow the polypeptide fragment or variant to broths, extracts, and the like) are added to a polypeptide, 65 function and detecting either a decrease in the level of sub enzyme, protein, e.g. structural or binding protein, assays to strate or an increase in the level of the specific reaction prod determine their ability to inhibit substrate cleavage. Inhibitors uct of the reaction between the polypeptide and substrate. US 8,119,385 B2 181 182 The present invention exploits the unique catalytic proper tural moiety or a group of related structural moieties; and each ties of enzymes. Whereas the use of biocatalysts (i.e., purified biocatalyst reacts with many different small molecules which or crude enzymes, non-living or living cells) in chemical contain the distinct structural moiety. transformations normally requires the identification of a par ticular biocatalyst that reacts with a specific starting com A Polypeptide, Enzyme, Protein, e.g. Structural or Binding pound, the present invention uses selected biocatalysts and Protein, Signal Sequences, Prepro and Catalytic Domains reaction conditions that are specific for functional groups that The invention provides a polypeptide, enzyme, protein, are present in many starting compounds, such as Small mol e.g. structural or binding protein, signal sequences (e.g., sig ecules. Each biocatalyst is specific for one functional group, nal peptides (SPs)), prepro domains and catalytic domains or several related functional groups and can react with many 10 (CDs). The SPs, prepro domains and/or CDs of the invention starting compounds containing this functional group. can be isolated or recombinant peptides or can be part of a The biocatalytic reactions produce a population of deriva fusion protein, e.g., as a heterologous domain in a chimeric tives from a single starting compound. These derivatives can protein. The invention provides nucleic acids encoding these be subjected to another round of biocatalytic reactions to catalytic domains (CDs), prepro domains and signal produce a second population of derivative compounds. Thou 15 sequences (SPs, e.g., a peptide having a sequence compris sands of variations of the original Small molecule or com ing/consisting of amino terminal residues of a polypeptide of pound can be produced with each iteration of biocatalytic derivatization. the invention). Enzymes react at specific sites of a starting compound The invention provides isolated or recombinant signal without affecting the rest of the molecule, a process which is sequences (e.g., signal peptides) consisting of or comprising very difficult to achieve using traditional chemical methods. a sequence as set forth in residues 1 to 14, 1 to 15, 1 to 16, 1 This high degree of biocatalytic specificity provides the to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, means to identify a single active compound within the library. 1 to 25, 1 to 26, 1 to 27, 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 to 38, 1 to 40, The library is characterized by the series of biocatalytic reac 25 1 to 41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46, or 1 to 47, or tions used to produce it, a so-called “biosynthetic history”. more, of a polypeptide of the invention, e.g., SEQID NO:3, Screening the library for biological activities and tracing the SEQID NO:5, SEQID NO:7, SEQIDNO:9, SEQID NO:11, biosynthetic history identifies the specific reaction sequence SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID producing the active compound. The reaction sequence is NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, repeated and the structure of the synthesized compound deter 30 SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID mined. This mode of identification, unlike other synthesis and NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, screening approaches, does not require immobilization tech SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID nologies and compounds can be synthesized and tested free in NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, Solution using virtually any type of screening assay. It is 35 SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID important to note, that the high degree of specificity of NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, enzyme reactions on functional groups allows for the “track SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID ing of specific enzymatic reactions that make up the biocata NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID NO:81, lytically produced library. SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID Many of the procedural steps are performed using robotic 40 NO:89, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:95, automation enabling the execution of many thousands of SEQ ID NO:97, SEQID NO:99, SEQ ID NO:101, SEQID biocatalytic reactions and screening assays per day as well as NO:103, SEQID NO:105, and all polypeptides disclosed in ensuring a high level of accuracy and reproducibility. As a the SEQID listing, which include all odd numbered SEQID result, a library of derivative compounds can be produced in NOs: from SEQID NO:3 through SEQID NO:26,898. In one a matter of weeks, which would take years to produce using 45 aspect, the invention provides signal sequences comprising current chemical methods. the first 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, In a particular aspect, the invention provides a method for 28, 29, 30, 31, 32,33, 34,35,36, 37,38, 39, 40, 41,42, 43,44, modifying Small molecules, comprising contacting a 45, 46,47, 48,49, 50, 51, 52,53,54, 55,56, 57,58, 59, 60, 61, polypeptide encoded by a polynucleotide described herein or 62, 63, 64, 65, 66, 67, 68, 69, 70 or more amino terminal enzymatically active fragments thereof with a small molecule 50 residues of a polypeptide of the invention. to produce a modified small molecule. A library of modified The invention also provides isolated or recombinant signal small molecules is tested to determine if a modified small sequences comprising/consisting of the signal sequences set molecule is present within the library, which exhibits a forth in Table 4, and polypeptides comprising these signal desired activity. A specific biocatalytic reaction which pro 55 sequences. The polypeptide can be enzyme or protein of the duces the modified small molecule of desired activity is iden invention. For example, reading Table 4, the invention pro tified by systematically eliminating each of the biocatalytic vides an isolated or recombinant signal sequence as set forth reactions used to produce a portion of the library and then by residues 1 to 16 of SEQID NO:10010. This can be deter testing the Small molecules produced in the portion of the mined by reading the second column for the first row, “Prob 60 ability: 0.992 AA1: 16 AA2: 17, wherein the cleavage of library for the presence or absence of the modified small signal sequence takes place between amino acid 16 (AA16) molecule with the desired activity. The specific biocatalytic and amino acid 17 (AA17), with a probability of 0.992 that reactions which produce the modified small molecule of this is the correct cleavage site. Therefore, the signal sequence desired activity is optionally repeated. The biocatalytic reac is predicted to be from the amino acid in position 1 of SEQID tions are conducted with a group of biocatalysts that react 65 NO:10010 up to and including the amino acid in position 16 with distinct structural moieties found within the structure of of SEQID NO:10010. This signal sequence, in one aspect, is a small molecule, each biocatalyst is specific for one struc encoded by a subsequence of SEQID NO:10009. US 8,119,385 B2 183 184 TABLE

SEO ID NO: Signalp Cleavage Site Predicted Signal Sequence OOO 9, 10010 Probabi ity: O. 992 AA AA2 : 17 KSYFILLIFLIPL FA

O111, 101.12 Probabi ity: O. 964 AA AA2 : 18 KYFILWFLTTT FA

013, 1014 Probabi ity: O 584 AA2 : 21 KRWLLAIIGIILAIIWWWG

O147, 10148 Probabi ity: O. 999 AA2 : NKILIFIIISLFS NISA

O157, 1 O158 Probabi ity: O. 941 AA2 : KRIFILSLIAILICSNG

O217, 10218 Probabi ity: O. 999 AA2 : KKISILIIFILST TLS

O3O9, 10310 Probabi ity: O. 994 AA2 : RANLKKSYLIGLL (LFSIA

O327, 10328 Probabi ity: O. 647 AA2 : RYLFSLFIFTTLI FA

O355, 10356 Probabi ity: O. 592 AA2 : TKKWIWLSLIILL FINSS

O441, 10442 Probabi ity: O. 683 AA2 : KRTFLTITAAAFI

O447, 10448 Probabi ity: O. 928 AA2 : KNKLIILFIFSLF LLA

O525, 10526 Probabi ity: O. 728 AA2 : RWLFFIFISLTTL A.

O537, 1 O538 Probabi ity: O. 998 AA2 : KKIILLSTLIFLA NA

O543, 10544 Probabi ity: O. 991 AA2 : KRKWFIFILTALWTIA

O591, 10592 Probabi ity: O. 922 AA2 : FKLLIGIFIFIS WAYS

O659, 10660 Probabi ity: O. 967 AA2 : 21 KDWIIIGAGGAGLSAGLISA

O673, 10674 Probabi ity: O. 711 AA2 : KIWSTIKLWFISLWALWA

O711, 10712 Probabi ity: O 876 AA2 : 17 MKGISPGAALWFLMA

O731, 10732 Probabi ity: O. 997 AA2 : KLLMITILLSTSGWANS

O79, 108O Probabi ity: O. 929 AA2 : 18 RIIKLFALFFLTCACN

O915, 10916 Probabi ity: O. 93 4 AA2 : 18 KSRLLLSGFFIFWLMS

O47, 11048 Probabi ity: O. 53 O AA2 : 17 PEAAFSMSLPSKWFA

O9, 1110 Probabi ity: O. 777 AA2 : 19 KWLLYILILFSGFKSFG

11, 1112 Probabi ity: O. 765 AA2 : 19 KWLLYILILFSGFKSFG

19, 1120 Probabi ity: O. 87 O AA2 : 19 KKLFLILCIFFSWESFS

209 11210 Probabi ity: O. 910 AA2 : KOIILLFSILFIVGKSYS

253, 11254 Probabi ity: O. 987 AA2 : KNIFFFSILLFLSFTGKA

339, 11340 Probabi ity: O. 510 AA2 : KSISLFILITIWTGCSW

37, 1138 Probabi ity: O. 992 AA2 : 19 KILTIWFLVGFFCFVOA

401 11402 Probabi ity: O. 992 AA2 : TISKNKLLIASLLSWAFT

495, 11496 Probabi ity: O. 647 AA2 : 17 RYLFSLFIFTTLIFA

719, 1172O Probabi ity: O. 998 AA2 : 18 KIILLIFFLLISFSFA

745, 11746 Probabi ity: O. 972 AA2 : 19 KYKIIFIAAFMAFSTLV

77, 11.78 Probabi ity: O. 995 AA2 : 21 DOKKSLSLLFLIPAWSWIA

821, 11822 Probabi ity: O. 663 AA2 : 19 SNKSW1STLIISIFFTA

827, 11828 Probabi ity: Of 27 AA2 : YWMKILLLISILFYCLLA

935, 11936 Probabi ity: 1. OOO KKTILIIASLFWAAFIGOA

965, 11966 Probabi ity: O. 999 AA2 : KKILWLSWLLTWCLISFA

2O71, 12O72 Probabi ity: O. 773 AA2 : NKELLSFFSIFIALFWGA US 8,119,385 B2 185 186 TABL 4- Continued

SEO ID NO: Signalp Cleavage Site Predicted Signal Sequence 2157, 12158 Probabi ity: O. 983 AA AA2 : RLILLISLLWYTWFA

2377, 12378 Probabi ity: O. 562 AA AA2 : ASTTMIWSLIWAWA

27 O9, 12710 Probabi ity: O. 993 AA2 : NNLKOILAIWMLLSVTA

3OO5, 13 OO6 Probabi ity: O. 977 AA2 : FLRRLSILILLLFWFFTAK

3017, 13018 Probabi ity: O. 995 AA2 : FKNIIMSLLLGTFLSA

3139, 13140 Probabi ity: O. 849 AA2 : RWWWLWLFSLLHFLFA

33 O7, 133 08 Probabi ity: O. 995 AA2 : KKLILLLILGFSTNLIFS

3347, 13348 Probabi ity: O. 788 AA2 : LILLICAWYSWGCALA

343, 1344 Probabi ity: O. 708 AA2 : KSLIIIFSLILFFTACK

3475, 13476 Probabi ity: O. 998 AA2 : KIILLIFFLLISFSFA

3531, 13532 Probabi ity: O. 651 AA2 : SHLLFSTSWLILLWS

3543, 13544 Probabi ity: O. 995 AA2 : KFILTTLMMAYLILPGMA

36O3, 13604 Probabi ity: O. 734 AA2 : NFKNILYSLLISGCLYG

36O7, 13608 Probabi ity: O. 84 O AA2 : KKIILSLGWATLLLTTNL

3699, 137 OO Probabi ity: O. 544 21 AA2 : 22 MKLHTLISLIFAWLMFIFCM

3711, 13712 Probabi ity: O. 815 AA2 : 21 SNKSWISTLIISIFFTACT

3719, 1372O Probabi ity: 1. OOO AA2 : 21 KLTKIITWFMMWFSLSLMA

3777, 13778 Probabi ity: O. 682 AA2 : KSMRTIFISFLIILLLQG

3829, 1383O Probabi ity: O. 940 AA2 : KNLGLILLWLFLGLISTS

3891, 13892 Probabi ity: O. 993 AA2 : 17 KYFILLILIITTLNA

3915, 13916 Probabi ity: 1. OOO AA2 : 21 KKFFLALFLTSIWTISIAA

3933, 13934 Probabi ity: O. 962 AA2 : FMNKKWYISLITALWWNA

4081, 14082 Probabi ity: O. 918 AA2 : 19 TYLFLAIAIGLITAASK

41.33, 14134 Probabi ity: O989 AA2 : 21 NNLIKLILLITISFSSLLS

4197 141.98 Probabi ity: O. 995 AA2 : 19 KKITLILFAIFTALSMS

4267, 14268 Probabi ity: O. 815 AA2 : 21 SNKSWISTLIISIFFTACT

4369, 14370 Probabi ity: O. 669 AA2 : 18 KKYIIIFCIFSGFLYG

4505, 14506 Probabi ity: O.951 AA2 : 21 IRFGSSSSSILYFFRNTMA.

4573, 14574 Probabi ity: O. 992 AA2 : LRWFILLISWIWCLNWNA

461, 1462 Probabi ity: O. 908 AA2 : KKFLIFCLFLFLNKPLIS

4655, 14656 Probabi ity: O. 773 22 AA2 : 23 AQAVAISIAFFSVLLSLLLFN

47 O5, 147O6 Probabi ity: O. 599 21 AA2 : 22 GGLIAIIILSSRTWAPLGOA

4835, 14836 Probabi ity: O. 999 17 AA2 : 18 WKKLLFLALAFSISFA

4857, 14858 Probabi ity: 1. OOO 21 AA2 : 22 IROKIWLTMLLFCFSLITVA

4863, 14864 Probabi ity: O 990 17 AA2 : 18 RKYFLWLLLFCTSLLS

5045, 15046 Probabi ity: O. 984 21 AA2 : 22 KNIILSTLAFWLALFFSGCT

5 O49, 15 OSO Probabi ity: O. 845 19 AA2 : NFFIMPFLLMFLFIGIFA

5 O55, 15056 Probabi ity: O. 669 15 AA2 : 16 KFNLNSFLMSWSLA.

5111, 15112 Probabi ity: O.835 17 AA2 : 18 IKRLFSIWLSLGLWFN US 8,119,385 B2 187 188 TABL 4- Continued

SEO ID NO: Signalp Cleavage Site Predicted Signal Sequence 5135, 15136 Probabi ity: O853 AA 15 AA2 : 16 KYLLALCIFLILTG

5173, 15174 Probabi ity: O. 513 AA 19 AA2 : KKLNWAIYIWIWILSILFS

51.79, 1518O Probabi ity: O. 645 16 AA2 : 17 RYLFSLFIFTTLIPA

52O1, 152O2 Probabi ity: O. 883 AA2 : 21 KLLGIGSILLOWLLCSWSA

5235, 15236 Probabi ity: O. 792 19 AA2 : NFKQLFLSVILLILTIVLS

5251, 15252 Probabi ity: O. 998 17 AA2 : 18 KIILLIFFLLISFSFA

53, 154 Probabi ity: O. 824 AA2 : 21 IKTIXSLARCIIAFGILNA

5329, 15330 Probabi ity: O. 557 AA2 : 21 KNIYKIILLSLLIISIILG

541, 1542 Probabi ity: 1. OOO AA2 : KRNSLLLWLLALSLFTAA

5473, 15474 Probabi ity: O. 93 4 AA2 : RGTICSILILSFIFLITA

54.75, 15476 Probabi ity: O. 93 4 AA2 : 21 AAGDFFAIFGIFMSLSILA

5495, 15496 Probabi ity: O. 645 AA2 : 17 RYLFSLFIFTTLIFA

5521, 15522 Probabi ity: O. 972 AA2 : 19 IKWSIYIWLLLTSYIHA

5585, 15586 Probabi ity: O. 993 AA2 : 17 KLILLIFLWLLNWNA

5589, 1559 O Probabi ity: O. 967 AA2 : 18 NKKILILMIILGLAWA

5623, 15624 Probabi ity: O. 553 AA2 : 19 SSRWFLTSFLIIWPLTA

5635, 15636 Probabi ity: 1. OOO AA2 : KNILSIALAWLMIGSLHS

5659, 1566O Probabi ity: 1. OOO AA2 : 21 YKFITALISLFLLTTHSYA

5697, 15698 Probabi ity: O. 561. AA2 : 19 ISIKTAIAIILWIWATN

5765, 15766 Probabi ity: O. 936 AA2 : 19 K HKSLLILLILSFIWS

5783, 15784 Probabi ity: O.951 AA2 : 21 KIAWLGAGISGLGSAYLIS

585, 1586 Probabi ity: O. 668 AA2 : MFFTSISIXSXFPXIXLX

5855, 15856 Probabi ity: O. 677 AA2 : 19 KKLKLILGSWLSIWAFT

5873, 15874 Probabi ity: O. 784 AA2 : 17 IFFFIFWLFTFSWA

59 O7, 15908 Probabi ity: O. 998 AA2 : 21 SLKKYIFILTFLFISNLFA

5909, 1591O Probabi ity: O. 935 AA2 : 21 KOKLLKITLLTTLLTSAIA

6OO5, 16 OO6 Probabi ity: O. 932 AA2 : 21 KNLKNILFFLFFLIFCLNH

6O15, 16O16 Probabi ity: O. 541 AA2 : 17 IIIAISALIATTIIA

6171, 16172 Probabi ity: O. 985 AA2 : 21 KLNIGKIFILLIFPIITFA

6175 16176 Probabi ity: O. 95.7 AA2 : 18 MKTFIWFCWMSISIFA

6183, 16184 Probabi ity: O. 999 AA2 : 21 KILLILAIITSGWLS

6237, 16238 Probabi ity: O. 792 AA2 : FLSWLLILTIWLS

6289, 1629O Probabi ity: O. 995 AA2 : 17 RISI LAWWSSIIFA

63, 164 Probabi ity: O 860 AA2 : 21 OINR IWLLLIMISHKNFA

633, 1634 Probabi ity: O. 993 AA2 : KIYWILALLIFSSRSIYS

6339, 1634. O Probabi ity: 1. OOO AA2 : 19 KKLL IYILLISTITFA

6345, 16346 Probabi ity: Of 76 AA2 : GNIKWILWFISLFLIAIT

6373, 16374 Probabi ity: O. 995 AA2 : 17 RISI LAWWSSIIFA

641, 1642 Probabi ity: O 879 AA2 : 19 KKFI FLGFFYLISFFA US 8,119,385 B2 189 190 TABL 4- Continued

SEO ID NO: Signalp Cleavage Site Predicted Signal Sequence 6455, 16456 Probabi ity: O. 890 AA 19 AA2 : KKFNIKLI IIFISSLFLA

6467, 16468 Probabi ity: O. 681 AA AA2 : 21 ERRLFLKGATILASSAWIA.

647, 1648 Probabi ity: O. 812 AA2 : 21 RIKISLII LILFSGINGIA

6487, 16488 Probabi ity: O. 987 AA2 : RIFNYLIM SILLSWTLMA

669, 1670 Probabi ity: O. 999 AA2 : RATFIWLSWLLTSSWMS

6711, 16712 Probabi ity: O. 626 AA2 : KTILTF ILITNIFS

6747, 16748 Probabi ity: O. 628 AA2 : KNIFFL IAWLILSNCKN

6825, 16826 Probabi ity: O. 975 AA2 : FKKALLWFYIFLGITMA

6833, 16834 Probabi ity: O857 AA2 : NNKTKI PILLAMAIWLG

6885, 16886 Probabi ity: O. 993 AA2 :

6967, 16968 Probabi ity: O. 888 AA2 : KPTKLL LFILIFTFTTS

7035, 17036 Probabi ity: O. 977 AA2 : MKKYIIAL ISTFLYA

7O65, 17O66 Probabi ity: O. 982 AA2 : KHFLLCSWLILGWLDA

71, 172 Probabi ity: O. 956 AA2 : KRIIYIIL LPSWAWILSSCT

7157, 17158 Probabi ity: O. 952 AA2 : KILLIWIL FISSILFS

7331, 17332 Probabi ity: O. 981 AA2 : KKLLILT FITTISFA

7347, 17348 Probabi ity: O. 999 AA2 : SKIIILIL SFLIANA

7353, 17354 Probabi ity: O. 993 AA2 : 21 KLKYLLII IIITLGOFWIA

7359, 1736O Probabi ity: O. 932 AA2 : KIKHFILL FLFSLIALYS

7367, 17368 Probabi ity: O. 912 AA2 : 21 KKSKILFL LTLLIIMGIG

749, 1750 Probabi ity: O 990 AA2 : 19 NRIFLIWW FISSTCFS

7537, 17538 Probabi ity: O. 999 AA2 : 18 KFFF1LLI FMFNALS

7547, 17548 Probabi ity: O.959 AA2 : KNIITYL FMLMSLFILS

771, 1772 Probabi ity: O. 931 AA2 : 21 WMKSILGIWSFLIGLSLIA

7751, 17752 Probabi ity: O. 561. AA2 : 19 KYLLILII

7783, 17784 Probabi ity: O. 987 AA2 : WLILSIAL.A.

785, 1786 Probabi ity: O. 716 AA2 : 18 KLLSATFFMWFSWIS

7915, 17916 Probabi ity: O. 898 AA2 : 18 WKIFLSII

8O19, 1802O Probabi ity: O. 993 AA2 : 19 KKITFLLI FWTTFSFS

8039, 1804 O Probabi ity: O867 AA2 : OKVILTLV CIITSFFFOA

8057, 18058 Probabi ity: O874 AA2 : IFSCSKNS

8131, 18132 Probabi ity: 1. OOO AA2 : KKTOIILL ILLISMASHA

8237, 18238 Probabi ity: O. 975 AA2 : 19 KKVLIFYCWLFSLOGFS

8249, 18250 Probabi ity: O. 719 AA2 : 19 KTKTLLTV LTILFSLOS

8329, 1833 O Probabi ity: O.988 AA2 : 18 SKLAWLFL FLACNN

8377, 18378 Probabi ity: O. 983 AA2 : 19 KKARIIIL SFFIGMWAA

84 O3, 184O4 Probabi ity: 1. OOO AA2 : KKTILWLI CLFSISALFA

8435, 18436 Probabi ity: O. 611 AA2 : KIGFILIL SIAICTSCKW

8489, 18490 Probabi ity: O. 914 AA2 : 18 SITLLSFG