USOO6365390B1 (12) United States Patent (10) Patent No.: US 6,365,390 B1 Blum et al. (45) Date of Patent: Apr. 2, 2002

(54) PHENOLIC ACID , CODING Chen et al. (1995) “A Cyclophilin from the Polycentric SEQUENCES AND METHODS Anaerobic Rumen Fungus Orpinomyces sp. Strain PC-2 is Highly Homologous to Vertebrate Cyclophilin B” Proc. (75) Inventors: David L. Blum, San Diego, CA (US); Natl. Acad. Sci. USA 92:2587-2591.* Irina Kataeva, Athens, GA (US); Chritov and Prior (1993) “Esterases of Xylan-Degrading Xin-Liang Li, Athens, GA (US); Lars Microorganisms: Production, Properties, and Significance' G. Ljungdahl, Athens, GA (US) Microb. Technol. 15:460-475.* Dalrymple and Swadling (1997) “Expression of a Butyriv (73) Assignee: University of Georgia Research ibrio fibrisolvens E14 Gene (cinB) Encoding an Enzyme Foundation, Inc., Athens, GA (US) with Cinnamoyl Ester Activity is Negatively Regulated by the of an Adjacent Gene (cinR)” (*) Notice: Subject to any disclaimer, the term of this Microbiology 143:1203–1210. patent is extended or adjusted under 35 Dalrymple et al. (1996) “Cloning of a Gene Encoding U.S.C. 154(b) by 0 days. Cinnamoyl Ester Hydrolase from the Ruminal Bacterium Butyrivibrio fibrisolvens E14 by a Novel Method” FEMS (21) Appl. No.: 09/390,234 Microbiology Letters 143:115-120. De Vries et al. (1997) “The faeA Genes from Aspergillus (22) Filed: Sep. 3, 1999 niger and Aspergillus tubingensis Encode Ferulic Acid Related U.S. Application Data Esterases Involved in Degradation of Complex Cell Wall (60) Provisional application No. 60/099,136, filed on Sep. 4, Polysaccharides' Applied and Environmental Microbiology 1998. 63:46.38-4644. Faulds and Williamson (1991) “The Purification and Char acterization of 4-Hydroxy-3-Methoxycinnamic (Ferulic) (51) Int. Cl...... C12N 9/18; CO7H 21/04 Acid from Streptomyces Olivochromogenes' Jour (52) U.S. Cl...... 435/197; 435/183; 435/320.1; nal of General Microbiology 137:2339–2345. 435/252.3; 536/23.1; 536/23.2; 530/350 Felix and Ljungdahl (1993) “The Cellulosome: The Exo (58) Field of Search ...... 435/197, 183, cellular Organelle of Clostridium” Annu. Rev. Microbiol. 435/320.1, 252.3; 536/23.1, 23.2; 530/350 47:791-819. Ferreira et al. (1993) “A Modular Esterase from Pseudomo (56) References Cited nas fluorescens subsp. cellulosa Contains a Non-Catalytic U.S. PATENT DOCUMENTS Cellulose-Binding Domain Biochemical Journal 2.94:349-355. 5,882,905 A 3/1999 Saha et al...... 435/105 Flint et al. (193) “A Bifunctional Enzyme, with Separate FOREIGN PATENT DOCUMENTS Xylanase and f(1,3-1,4)-Glucanase Domains, Encoded by the XnyD Gene of Ruminococcus flavefaciens' Journal of EP O513140 B1 9/1995 ...... D21C/9/10 Bacteriology 175:2943-2951. GB 23.01.103 11/1996 ...... C12N/9/18 Fontes et al. (1995) “Evidence for a General Role for WO 98/46768 10/1998 ...... C12N/15/55 Non-Catalytic Thermostabilizing Domains in Xylanases from Thermophilic Bacteria" Biochem. J. 307:151-158. OTHER PUBLICATIONS Genbank Accession No. AF047761, CloStridium thermocel lum Xylanase V and U. Blum et al. (1999) “Characterization of a Feruloyl Esterase Genbank Accession No. L48074, Aspergillus fumigatus from the Anaerobic Fungus Orpinomyces sp. Strain PC-2” dipeptidyl peptidase. Abstracts. 99" General Meeting of the American Society for Microbiology. Chicago, IL. May 30-Jun. 3, 1999. vol. 99, (List continued on next page.) pp. 430-431. Borneman and Akin (1990) “Lignocellulose Degradation by Primary Examiner Rebecca E. Prouty Rumen Fungi and Bacteria: Ultrastructure and Cell Wall ASSistant Examiner-Richard Hudson Degrading ' In: Microbial and Plan Opportunities (74) Attorney, Agent, or Firm-Greenlee Winner and to Improve Lignocellulose Utilization by Ruminants. D.E. Sullivan PC Akin; L.G. Ljungdahl; J.R. Wilson; and P.J. Harris (Eds.) (57) ABSTRACT Elsevier Science Publishing Co. New York, NY. pp. 325-339.* Described herein are four phenolic acid esterases, three of Borneman et al. (1992) “Purification and Partial Character which correspond to domains of previously unknown func ization of Two Feruloyl Esterases from the Anaerobic Fun tion within bacterial xylanases, from XynY and XynZ of gus Neocallimastix Strain MC-2' Applied and Environmen Clostridium thermocellum and from a Xylanase of Rumino tal Microbiology 58:3762–3766.* coccus. The fourth Specifically exemplified Xylanase is a Borneman et al. (1990) “Assay for trans-p-Coumaroyl protein encoded within the genome of Orpinomyces PC-2. Esterase. Using a Specific from Plant Cell Walls” The amino acids of these polypeptides and nucleotide Analytical Biochemistry 190:129-133.* Sequences encoding them are provided. Recombinant host Castanares and Wood (1992) “Purification and Character cells, expression vectors and methods for the recombinant ization of a Feruloyl/p-Coumaroyl Esterase from Solid production of phenolic acid esterases are also provided. State Cultures of the Aerobic Fungus Penicillium pinophi lum” Biochemical Society Transactions 20:275S.* 26 Claims, 11 Drawing Sheets US 6,365,390 B1 Page 2

OTHER PUBLICATIONS McDermid et al. (1990) “Esterase Activities of Fibrobacter Succinogenes subsp. Succinogenes S85” Applied and Envi Genbank Accession No. M22624, CloStridium thermocel ronmental Microbiology 56:127-132. lum Xylanase Z. McSweeney et al. (1998) “Butyrivibrio spp. and Other Genbank Accession No. P31471, Escherichia coli 44.1 kD Xylanolytic Microorganisms from the Rumen have Cin protein. namoyl Esterase Activity” Anaerobe 4:57-65. Genbank Accession No. P51584, CloStridium thermocellum Ralph et al. (1995) “Lignin-Ferulate Cross-Links in Xylanase Y. Grasses: Active Incorporation of Ferulate Polysaccharide Genbank Accession No. S58235, Ruminococcus sp. Xyla Esters Into Ryegrass Lignins' Carbohydrate Research Sc. 2.75:167-178. Genbank Accession No. X83269, CloStridium thermocellum Sakka et al. (1996) “Identification and Characterization of Xylanase Y. Cellulose-Binding Domains in Xylanase A of CloStridium Grépinet et al. (1988) “Nucleotide Sequence and Deletion Stercorarium ' Ann. NYAcad. Sci. 782:241-251. Analysis of the Xylanase Gene (xynZ) of Clostridium ther Sakka et al. (1993)“Nucleotide Sequence of the Clostridium mocellum' Journal of Bacteriology 170:4582–4588. Stercorarium XynA Gene Encoding Xylanase A: Identifica Kirby et al. (1998) “Plant Cell Wall Degrading Enzyme tion of Catalytic and Cellulose Binding Domains' BioSci. Complexes from the Cellulolytic Rumen Bacterium Rumi Biotech. Biochen. 57:273-277. nococcus flavefaciens' Biochemical Society Transactions Arakaki et al. Xylanase 1 from RuminococcuS Sp. With a 26:S169. new pattern of domain shuffling, Genbank Accession No.: MacKenzie and Bilous (1988) “Ferulic Acid Esterase Activ ity from Schizophyllum commune' Applied and Environ Z49970, Aug. 1995. mental Microbiology 54:1170–1173. * cited by examiner

U.S. Patent Apr. 2, 2002 Sheet 2 of 11 US 6,365,390 B1 XUÁX ZuÁX

U.S. Patent Apr. 2, 2002 Sheet 3 of 11 US 6,365,390 B1

Molecular mass (kDa) -o-o-o-o--m----0 2000 232 43 13.7

120 O3O

1 OO Protein H 0.25 -O- FAE activity O.20

0. 15

O 1 O

0.05

0.00

O 5 1 O 15 2O Elution volume (ml)

F.G. 3

U.S. Patent Apr. 2, 2002 Sheet S of 11 US 6,365,390 B1

U.S. Patent Apr. 2, 2002 Sheet 6 of 11 US 6,365,390 B1

1OO

8 O

6 O F.G. 6A 4. O

2 O

O 1 O 20 30 40 50 60 7O 80 90 100 Temperature (C)

1OO

8 O

6 O FIG. 6B 4 O e

2 O

O - O 2 4 6 8 1O 12 U.S. Patent Apr. 2, 2002 Sheet 7 of 11 US 6,365,390 B1

FIG. 7 U.S. Patent Apr. 2, 2002 Sheet 8 of 11 US 6,365,390 B1

co2 So S 3 s 3 g g g & 2 (%) AAgow empere

o o

d

d s

of C OO

in a s O

als e w

N

o ve

2 3 3 S S 3 So do3 so o2 o q- was (9) AAgoe eagee

U.S. Patent Apr. 2, 2002 Sheet 10 of 11 US 6,365,390 B1

N S Signal peptide

FAE domain

Unknown domain

FIG. 10 U.S. Patent Apr. 2, 2002 Sheet 11 of 11 US 6,365,390 B1

0.09

0.08 o 0.06 0.05

0.04

O.O3.

0.02 0.01 OOO US 6,365,390 B1 1 2 PHENOLIC ACID ESTERASES, CODING There is a need in the art for phenolic acid esterases, SEQUENCES AND METHODS feruloyl esterases and/or coumaroyl esterases in pure form which degrade plant cell wall materials, and certain other CROSS REFERENCE TO RELATED Substrates, and for DNA encoding these enzymes to enable APPLICATIONS methods of producing ferulic acid and/or coumaric acid as This application claims priority from United States Pro well as facilitating degradation of plant cell wall materials. visional Application No. 60/099,136, filed Sep. 4, 1998. SUMMARY OF THE INVENTION ACKNOWLEDGEMENT OF FEDERAL The present invention provides novel phenolic acid RESEARCHSUPPORT 1O esterases, having feruloyl esterase and coumaroyl esterase This invention was made, at least in part, with finding activities, and coding Sequences for Same. from the United States Department of Energy (Grant No. One phenolic acid esterase of the present invention cor DE-FG05 93ER 20127). Accordingly, the United States responds to a domain of previously unknown function from Government has certain rights in this invention. Xylanase Y of Clostridium thermocellum. The recombi 15 nantly expressed domain polypeptide is active and has an BACKGROUND OF THE INVENTION amino acid sequence as given in FIG. 1 as “XynY Clotm.” The field of the present invention is the area of enzymes The nucleotide Sequence encoding the esterase polypeptide which degrade plant cell walls, and certain other Substrates, is given in Table 5, nucleotides 2383-3219, exclusive of in particular, the phenolic acid esterases, feruloyl esterases translation start and stop signals. See also SEQ ID NOS:11 and/or coumaroyl esterase, nucleotide Sequences encoding and 12. them and recombinant host cells and methods for producing A Second phenolic acid esterase of the present invention them. corresponds to a domain of previously unknown function of Plant cell wall material is one of the largest Sources of Xylanase Z from C. thermocellum. The amino acid Sequence renewable energy on earth. Plant cell walls are composed 25 of the esterase domain, which also is active when expressed mainly of cellulose, hemicelluloses, lignin and pectin. Ara as a recombinant polypeptide, is given in FIG. 1 as "XynZ binoxylan is one of the main constituents of hemicelluloses. Clotm.” The nucleotide Sequence encoding this polypeptide It is composed of a chain of B(1->4) linked xylose units that is given in Table 6, nucleotides 58-858. The present inven are Substituted by arabinose, acetate, and glucuronic acid. tion further provides a phenolic acid esterase polypeptide The arabinose has ester linked ferulic and p-coumaric acids further comprising a cellulose binding domain. A specifi Borneman et al. (1993) In: He micellulose and cally identified cellulose binding domain has an amino acid Hemicellulases, Coughlan and Hazlewood, Eds., pp. Sequence as given in Table 6, 289–400, with a corresponding 85-102). Ferulic acid has been shown to link hemicellulose coding sequence as given in Table 6, nucleotides 867-1200. and lignin Ralph et al. (1995) Carbohydrate Research See also SEO ID NOS:13 and 14. 275:167-178). Feruloyl esterases are involved in breaking 35 An additional object of the present invention is a phenolic the bond between the arabinose and ferulic acid, thus acid esterase (i.e., a feruloyl esterase) derived from a pre releasing the covalently bound lignin from hemicelluloses. viously uncharacterized portion of a RuminococcuS Xyla Feruloyl esterases have been found in many bacteria as well nase (See FIG. 1). The coding (nucleotides 2164-2895, as fungi, but have not been extensively Studied nor is there exclusive of translation start and stop signals) and deduced much sequence data available Christov and Prior (1993) 40 amino acid sequences (amino acids 546-789) are given in Enzyme. Microb. Technol. 15(6):460–75). Table 10. See also SEO ID NOS: 15 and 16. CloStridium thermocellum is a gram-positive bacterium The present invention further provides a feruloyl that produces a multienzymatic Structure termed the cellu (phenolic acid) esterase from the anaerobic fungus Orpino loSome. The celluloSome is one of the most active cellulose myces PC-2. The coding Sequence and deduced amino acid degrading complexes described to date. The celluloSome has 45 Sequences of the mature esterase protein are given in Table a multi-polypeptide Structure, including a Scaffolding Sub 9, and the purification of the Orpinomyces enzyme is unit which has nine cohesins binding to nine catalytic described herein below. See also SEO ID NOS: 17 and 18. Subunits, a dockerin domain for attachment to the cell wall, A further aspect of the present invention methods for the and a cellulose binding domain Felix and Ljungdahl (1993) recombinant production of the phenolic (especially ferulic) Annu. Rev. Microbiol. 47:791–819). The catalytic subunits 50 acid esterases of the present invention. Escherichia coli, include endoglucanase, cellobiohydrolase, lichenase, and Bacillus Subtilis, Streptomyces sp., Saccharomyces Xylanase, many of which have been cloned and Sequenced. cerevisiae, Aureobasidium pullulans, Pichia pastoris, They all have multidomain Structures that include at least a Trichoderma, Aspergillus nidulans or any other host cell dockerin domain for binding to the Scaffolding domain, a Suitable for the production of a heterologous protein can be linker, and a catalytic domain. They may also contain 55 transfected or transformed with an expression vector appro cellulose binding domains and fibronectin-like domains. priate for the chosen host. Compatible combinations of There are reports that Some enzymatic components may vectors and host cells are well known in the art as are have more than one catalytic domain. Two of these are appropriate promoters to be used to direct the expression of xylanase Y XynY, Fontes et al. (1995) Biochem. J. 307: a particular coding Sequence of interest. The recombinant 151-158 and Xylanase ZXynZ, Grépinet et al. (1988) J. 60 host cells are cultured under conditions Suitable for growth Bacteriol. 170(10):4582-8). XynY has a C-terminal domain and expression of the phenolic acid esterase and the recom whereas XynZ N-terminal domain without any functions binant esterase is then collected or the recombinant host determined. Although enzymes with dual catalytic domains cells in which the esterase has been produced are collected. (Xylanase and f-glucanase) have been found in other bac The coding Sequence of the esterase can be operably linked teria Flint et al. (1993) J. Bacteriol. 175:2943-2951) nei 65 to a nucleotide Sequence encoding a signal peptide which is ther phenolic acid esterase nor bifunctional enzymes have known in the art and functional in the desired host cell if been found in C thermocellum. Secretion of the esterase into the culture medium is desired. US 6,365,390 B1 3 4 In that case, the culture medium Serves as the Source of NO:15, nucleotides 2164 to 2895; SEQ ID NO:13, nucle esterase after growth of the host cells. otides 158 to 958; SEQ ID NO:13, nucleotides 158 to 1021; It is recognized by those skilled in the art that the DNA SEQ ID NO:13, nucleotides 158 to 1363. Sequences may vary due to the degeneracy of the genetic The phenolic acid esterase coding Sequences, including or code and codon usage. All DNA sequences which encode a excluding that encoding a Signal peptide of this invention, phenolic acid esterase polypeptide having a specifically can be used to express a phenolic acid esterase of the present exemplified amino acid Sequence are included in this invention in recombinant fungal host cells as well as in invention, including DNA sequences encoding them having bacteria, including without limitation, Bacillus spp., Strep an ATG preceding the coding region for the mature protein tomyces sp. and Escherichia coli. Any host cell in which the and a translation termination codon (TAA, TGA or TAG) Signal Sequence is expressed and processed may be used. after the coding Sequence. Preferred host cells are Aureobasidium species, Aspergillus Additionally, it will be recognized by those skilled in the Species, Trichoderma Species and Saccharomyces art that allelic variations may occur in the phenolic acid cerevisiae, as well as other yeasts known to the art for esterase polypeptide coding Sequences which will not Sig fermentation, including Pichia pastoris See, e.g., nificantly change activity of the amino acid Sequences of the 15 polypeptides which the DNA sequences encode. All Such Sreekrishna, K. (1993) In; Industrial Microorganisms: Basic equivalent DNA sequences are included within the Scope of and Applied Molecular Genetics, Baltz, R. H., et al. (Eds.) this invention and the definition of a phenolic acid esterase. ASM Press, Washington, D.C. 119-126). Filamentous fungi The skilled artisan will understand that the amino acid Such as Aspergillus, Trichoderma, Penicillium, etc. are also Sequence of an exemplified phenolic acid esterase polypep useful host organisms for expression of the DNA of this tide and Signal peptide(s) can be used to identify and isolate invention. Van den Handel, C. et al. (1991) In: Bennett, J. additional, nonexemplified nucleotide Sequences which will W. and Lasure, L. L. (eds.), More Gene Manipulations in encode functional equivalents to the polypeptides defined by Fungi, Academy Press, Inc., New York, 397–428. the amino acid Sequences given herein or an amino acid BRIEF DESCRIPTION OF THE DRAWINGS Sequence of greater than 40% identity thereto and having 25 equivalent biological activity. All integer percents between FIG. 1 shows amino acid Sequence alignment of the 40 and 100 are encompassed by the present invention. DNA exemplified phenolic acid esterases. Sequences are Xylanase Sequences having at least about 75% homology to any of the ZXynZ Clotm, Grépinet et al. (1988) supra), Xylanase Y ferulic acid esterases coding Sequences presented herein and XynY Clotm, Fontes et al. (1995) supra of C. encoding polypeptides with the same function are consid thermocellum, Xylanase A(Xyn A Rumin) of a Ruminococ ered equivalent to thereto and are included in the definition cus Sp, and a hypothetical 44-kDa protein of E. coli of “DNA encoding a phenolic acid esterase.” Following the (Genbank Accession Number P31471) (SEQ ID NO:19). teachings herein, the skilled worker will be able to make a Amino acid numbering was the Same as in the databases. large number of operative embodiments having equivalent Dots represent gaps introduced to optimize alignment, and DNA sequences to those listed herein. 35 are treated as mismatched in calculations of Sequence relat The present invention encompasses feruloyl esterase pro edness (similarity or identity). The partial amino acids are teins which are characteristic by at least a portion having derived from SEQ ID NO:20, SEQ ID NO:12, SEQ ID from at least about 40% amino acid sequence identity with NO:16, SEQ ID NO 19 and SEQ ID NO:18. an amino acid sequence as given in SEQ ID NO:18, amino FIG. 2 shows the domain organizations of two celluloso acids 227 to 440 (within the feruloyl esterase protein of 40 mal components, Xylanase Y and Xylanase Z of C. thermo Orpinomyces PC-2 of the present invention. All integer cellum. percent identities between 40 and 100% are also within the FIG.3 illustrates the results of Superose 6 gel filtration of Scope of the present invention. Similarly, the present inven proteins eluted from Avicel adsorption of C. thermocellum tion encompasses feruloyl esterase proteins having from culture Supernatant. Fractions (0.5 ml) were collected and about 40% to about 100% identity with an amino acid 45 assayed for protein and feruloyl esterase activity. Molecular Sequence from the group comprising amino acids 581 to 789 mass standards (Sigma Chemical Company, St. Louis, Mo.) of SEQ ID NO:16, amino acids 845 to 1075 of SEQ ID including blue dextran (2,000 kDa), catalase (232 kDa), NO:12, amino acids 69 to 286 of SEQ ID NO:14, amino ovalbumin (43 kDa), and A (13.7 kDa) were acids 69 to 307 of SEQID NO 14, and amino acids 69 to 421 run under identical conditions and their elution positions of SEQ ID NO:14. Specifically exemplified feruloyl 50 were indicated. esterases of the present invention are characterized by amino FIG. 4 presents amino acid Sequence alignment of family acid Sequences from the group comprising amino acids 227 VI cellulose binding domains. Sequences are Xylanase U to 440 of SEQ ID NO:18, amino acids 581 to 789 of SEQ (XynU Clotm), Xylanase V (XynV Clotm) (Fernandes et ID NO:16, amino acids 845 to 1075 of SEQ ID NO:12, al., 1998, Genbank Accession Number AF047761), and amino acids 69 to 286 of SEQ ID NO:14, amino acids 69 to 55 Xylanase ZXynZ Clotm, Grépinet et al. (1988) supra of 307 of SEQ ID NO:14, and amino acids 69 to 421 of SEQ C. thermocellum and Xylanase AXyn A CloSr, Sakka et al. ID NO:14. Feruloyl esterase proteins of the present inven (1993) Biosci. Biotech. Biochem. 57:273-277; Sakka et al. tion include those having the following amino acid (1996) Ann. NY Acad. Sci. 782:741–751 of C. Sterco sequences: SEQ ID NO:18, amino acids 1 to 530; SEQ ID rarium. The Sequences presented are portions of those NO:12, amino acids 795 to 1077; SEQ ID NO:16, amino 60 sequences presented in SEQID NO:12, SEQ ID NO:14 and acids 546 to 789, SEQ ID NO:14, amino acids 20 to 286; SEO ID NO:24. SEQ ID NO:14, amino acids 20 to 307; and SEQ ID NO:14, FIG. 5 shows the results of SDS-PAGE analysis of the C. amino acids 20 to 421. thermocellum XynZ ferulic acid esterase--cellulose binding Specifically exemplified nucleotide Sequences encoding domain (FAE/CBD) over-expressed in E. coli. Lane M, low the feruloyl esterase proteins of the present invention 65 range protein Standard markers (Bio-Rad Laboratories, include the following: SEQ ID NO:17, nucleotides 1 to Hercules, Calif.) including phosphorylase B (97.4 kDa), 1590; SEQ ID NO:11, nucleotides 2582–3430; SEQ ID serum albumin (66.2), ovalbumin (45 kDa), and carbonic US 6,365,390 B1 S 6 anhydrase (31 kDa) lane 1, E. coli cell free extract; lane 2, the gene product of which has a higher specificity for acetyl heat-treated cell free extract. groups than feruloyl groups Ferreira et al. (1993) Bio chemical J. 294:349-355 and two genes from Butyrivibrio FIGS. 6A and 6B, respectively, illustrate the effects of fibrisolvens termed cinA and cine3Dalrymple and Swadling temperature and pH on feruloyl esterase activity of the C. (1997) Microbiology 143:1203–1210; Dalrymple et al. thermocellum XynZ, FAE/CBD. Buffer used for evaluating (1996) FEMS Microbiol. Lett. 143:115-120). These genes temperature effects was 50 mM sodium citrate, pH 6.0. are believed to be regulated by the cinR gene product which ASSays mixtures with a pH range from 2 to 10 were may itself be regulated by FAXXDalrymple and Swadling formulated by using a universal phosphate buffer System. (1997) supra). Esterase activity has also been studied in FIG. 7 illustrates the results of SDS-PAGE analysis of the Streptomyces Olivochromogenes Faulds and Williamson purified feruloyl esterase from the culture Supernatant of (1991) J. Gen. Microbiol. 137:2339–2345), Schizophyllum Orpinomyces sp. Strain PC-2 (lane 1); molecular mass commune MacKenzie and Bilous (1988) Appl. Environ. markers are in lane 2. Microbiol. 54:1170–1173), Penicillium pinophillum FIGS. 8A and 8B show the temperature and pH activity Castanares and Wood (1992) Biochem. Soc. Trans. profiles, respectively, of the Orpinomyces sp. Strain PC-2 15 20:275S), and Fibrobacter Succinogenes McDermid et al. feruloyl esterase. (1990) Appl. Environ. Microbiol. 56: 127-132). FIG. 9 shows alignment of protein Sequences exhibiting AS described herein, feruloyl esterases are found as part homology to the Orpinomyces feruloyl esterase. Sequences of xylanases from the CloStridium thermocellum cellulo are: faea Orpin, Orpinomyces sp. Strain PC-2 Fae A, Xyna Some or as an individual enzyme, for example, from Orpi rumin, Xylanase from Ruminococcus sp. (Genbank Acces nomyces sp. PC-2. Xylanases Y and Z from C. thermocellum sion Number S58235); yiel ecoli hypothetical 44kDa pro are composed of a Xylanase domain, a linker domain, and tein from E. coli (Genbank Accession Number P31471); other domains as well as a domain to which no function has Xyny clotm, Xylanase Y from C. thermocellum (Genbank been assigned. We found partial Sequence homology Accession Number P51584); Xynz clotm, Xylanase Z from between these enzyme and the feruloyl esterase of Orpino C. thermocellum (Genbank Accession Number M22624); 25 myces in the region of the previously unknown domains and dppV asprf, dipeptidyl peptidase from A. fumigatus demonstrated that these domains indeed encode feruloyl (Genbank Accession Number L48074) (SEQ ID NO:20). esterases. Herein, we also report the purification, cloning, The partial sequences are taken from SEQ ID NO:18, SEQ and partial characterization of the feruloyl esterase from ID NO:16, SEQ ID NO:22, SEQID NO:12, SEQ ID NO:14 Orpinomyces sp. strain PC-2. and SEO ID NO:20. Anaerobic fungi produce high levels of phenolic esterases FIG. 10 is a schematic diagram of the fae A gene from Borneman and Akin (1990) In: Microbial and Plant Oppor Orpinomyces PC-2. tunities to Improve Lignocellulose Utilization by Ruminants. FIG. 11 illustrates the Synergistic effects of the Orpino D. E. Akin, L. G. Ljungdahl, J. R. Wilson, and P. J. Harris myces Fae A and XynA on the release of ferulic acid from (Eds.). Elsevier Science Publishing Co. New York, pp. wheat bran as Substrate. 35 325-340 and two feruloyl esterases of the anaerobic fungus Neocallimastix MC-2 were purified and characterized DETAILED DESCRIPTION OF THE Borneman et al. (1992) Appl. Environ. Microbiol. INVENTION 58:3762–3766). A cDNA coding for a feruloyl esterase The amino acids which occur in the various amino acid (FaeA) of the anaerobic fungus Orpinomyces PC-2 was 40 cloned and Sequenced by the present inventors. FASTA and Sequences referred to in the Specification have their usual BLAST searches showed that the catalytic domain of the three- and one-letter abbreviations routinely used in the art: Orpinomyces Fae A was over 30% identical to sequences A, Ala, Alanine; C, CyS, Cysteine; D, Asp, ASpartic Acid, E, coding for unknown domains (UD) in the databases includ Glu, Glutamic Acid; F, Phe, Phenylalanine; G, Gly, Glycine; ing the carboxy terminal region of XynY Fontes et al. (1995) H, His, Histidine; I, Ile, Isoleucine, K, LyS, Lysine; L., Leu, 45 Supra, the amino terminal region of XynZ Grépinet et al. Leucine; M, Met, Methionine; N, ASn, Asparagine; P, Pro, (1988) supra), a hypothetical polypeptide of E. coli Proline, Q, Gln, Glutamine, R, Arg, Arginine; S, Ser, Serine; (Genbank Accession Number P31471), and the carboxy T, Thr, Threonine; V, Val, Valine; W, Trp, Tryptophan; and terminal region of a Ruminococcus Xylanase Genbank Y, Tyr, Tyrosine. Accession No. S58235 (FIG. 1). No function had been Additional abbreviations used in the present Specification 50 previously assigned to the Sequences homologous to the include the following: aa, amino acid(s); bp, base pair(s); Orpinomyces FaeA. XynY consists of multiple domains CD, catalytic domain(s); GCG, Genetics Computer Group, including a family F Xylanase domain, followed by a puta Madison, Wis.; CMC, carboxymethyl cellulose; FPase, filter tive thermostability domain, a dockerin, and the UD Fontes paper-ase; HMWC, high-molecular weight complex(es); et al. (1995) supra). Similarly, XynZ is also multi-domain IPTG, isopropyl-B-D-thiogalactoside, OSX, oat spelt Xylan; 55 enzyme containing the UD, a family VI cellulose binding ORF, open reading frame; RBB, remazol brilliant blue; pful, domain, a dockerin, and a family 10 Xylanase domain plaque forming units, FAXX, (0-5-0-(E)-feruloyl-C-L- Grépinet et al. (1988) supra; Tomme et al. (1995) In: arabinofuranosyl-(1->3)-0-f-D-xylopyranosyl-(1->4)-D- Enzymatic Degradation of Insoluble Carbohydrates. J. N. Xylopyranose. Saddler, M. H. Panner (Eds.), ACS Symposium Series, Genes encoding feruloyl esterase (faeA) have been cloned 60 American Chemical Society, Washington, D.C., pp. from Aspergillus niger and Aspergillus tubingensis and the 142-163). Both XynY and XynZ are believed to be com deduced amino acid Sequences bear close Similarity to ponents of the cellulosome (FIG.2). The Orpinomyces FaeA de Vries et al. (1997) Appl. Environ. Microbiol. together with those homologous Sequences, however, failed 63:4638-4644). Expression of these gene products is regu to Show significant homology to the recently published lated by the xlnR gene product van Peijet al. (1998) Appl. 65 feruloyl esterases (FaeA) of Aspergillus niger and A. tubin Environ. Microbiol. 64:3615-3619). Other genes include the gensis de Vries et al. (1997) Supra). The sequence analysis XylD gene from Pseudomonas fluorescenS Subsp. cellulosa, implies that a new type of feruloyl esterase is encoded by the US 6,365,390 B1 7 8 Orpinomyces cDNA and the homologous Sequences feruloyl esterase activity of the polypeptide containing both described above. domains was removed by AVicel and acid Swollen cellulose We have determined that C. thermocellum produces feru adsorption but not with the UD alone, indicating that Strong loyl esterase activity under the conditions when the cellu cellulose binding capability resides in the family VI cellu loSome production is induced. The bacterium was cultivated lose binding domain of XynZ. Cellulose-binding ability was on low concentration (0.2%, w/v) of Avicel, and under this confirmed with native gel retardation analysis. growth condition, most of the Substrate was consumed and The polypeptide of the Fae domain plus CBD (FAE/CBD) cellulosomes released into culture medium, as indicated by has been purified from E. coli cell free extract to almost the activities on Avicel and Xylan (Table 2). Most of the homogeneity after a single step of heating at 70° C. for 30 feruloyl esterase activity (97.9%) was found in the culture min. Over 200 milligrams of the FAE/CBD were obtained medium (Table 2). It is well documented that cellulosomes from 2.5 gram crude proteins (Table 3). The purified FAE/ of C. thermocellum are readily adsorbed to cellulose Morag CBD had a mass of 45 kDa as revealed by SDS-PAGE (FIG. et al. (1992) Enzyme Microb. Technol. 14:289–292; Choi 6), consistent with the calculated size (46.5 kDa). This size and Ljungdahl (1996) Biochemistry 35:4897-4905), and was also consistent with what was seen on gel filtration. thus AVicel adsorption was used to assess association of the 15 There was no evidence for aggregation of the recombinant feruloyl activity with cellulosomes. As shown in Table polypeptides produced in E. coli. 2.97.1% of total feruloyl activity was removed from the The purified protein had a Vmax of 13.5 timol ferulic acid culture medium by AVicel treatment, even higher than the released min-1 mg-1 and Km of 3.2 uM using FAX as percentages of cellulase (80.5%) and Xylanase (73.3%) Substrate. The enzyme had the highest Specific activity activities removed. These data indicate that feruloyl toward FAXX, but it was almost as active as toward FAX esterases produced by C. thermocellum possess cellulose (Table 4). The protein released low levels of ferulic acid binding ability through either a cellulose-binding domain or from ethyl ferulic acid, ground wheat bran, and Coastal the cellulosomes. XynZ has a family VI cellulose binding Bermuda grass and p-coumaroyl acid from PAX and ethyl domain Grépinet et al. (1988) supra; Tonmme et al. (1995) p-coumaroyl acid. The protein lacked activity toward CMC, Supra and a docking domain between the CBD and the 25 Avicel, p-nitrophenyl (pNP)-arabinopyranoside, pNP dockerin, whereas XynY contains a docking domain. glucopyranoside, pNP-Xylopyranoside, and pNP-acetate. Cellulosomes eluted from AVicel adsorption were ana The recombinant FAE/CBD enzyme had high levels of lyzed by gel filtration chromatography using a Superose 6 activity between pH 3.8 and 7 and temperatures between 37 column to assess the sizes of proteins containing feruloyl and 65° C. (FIG. 6). The FAE/CBD was stable at tempera esterase activity in the native State. The majority of the tures up at 65 C. for 6 hours. proteins were eluted in fractions containing molecules with In order to understand how microorganisms breakdown sizes around 2.0 million daltons (FIG. 3), characteristic of plant cell wall material, we chose to Study enzymes from cellulosomes eluted from gel filtration Choi and Ljungdahl Clostridium thermocellum. In particular, XynY and XynZ (1996) Supra). Feruloyl esterase activity in the fractions from this organism were originally thought to contain a correlated well with fractions of cellulosomes. No activity 35 Xylanase domain and Second domain of unknown function. was found in fractions with protein molecules less than 200 We have now demonstrated that the function of this domain kDa, indicating that feruloyl esterase activity resides in the is that of a feruloyl esterase. We believe this is the first report cellulosome. of a phenolic acid esterase in the celluloSome. Feruloyl The UD coding region of XynY and various regions of esterases are important for the complete degradation of plant XynZ were over-expressed in E. coli using the pRSET 40 cell wall material. These enzymes are produced by Several System (Invitrogen, Carlsbad, Calif.). Constructs spanning organisms, but they have not been found in a bifunctional the XynYUD sequence, XynZ UD alone, and UD plus the enzyme. CBD sequence in PRSET gave high levels of feruloyl A feruloyl esterase from Orpinomyces PC-2 was purified esterase activity whereas cell-free extracts of E. coli har and internal fragments of the enzyme were used to Screen the boring the peT-21b recombinant plasmid failed to hydrolyze 45 Orpinomyces PC-2 cdNA library. A partial clone was FAXX. Constructs with 20 and 40 amino acid residues sequenced and showed homology to XynZ. A BLAST deleted from the C-terminus of the Xynz UD did not analysis showed that this enzyme, along with XynY, had hydrolyze FAXX, indicating that XynZ sequence from the domains of unknown function. end of the Signal peptide up to amino acid 288 was required The high temperature Stability of the enzyme is Surprising to form an active feruloyl esterase. The heterologous protein 50 because no other thermophilic feruloyl esterases have been band of the UD constructs without IPTG induction on reported until the present disclosure of the C. thermocellum SDS-PAGE analysis reached 40-50% of total protein. Both thermotolerant feruloyl esterases. The Orpinomyces PC-2 growth rates and levels of feruloyl activity of the constructs enzyme has SubStrate specificity for both feruloyl and with the XynY and XynZ sequences were lower with IPTG p-coumaroyl esterified Substrates. The cloStridial enzymes induction than without induction. Without wishing to be 55 are the first from bacteria to have Such a dual role. Although bound by theory, it is believed that low level of T7 poly the Orpinomyces enzyme is not a true p-coumaroyl esterase, merase in E. coli BL21 (DE3) strain was ideal for the no p-coumaric acid esterases have been found in bacteria to expression of the inserted genes in pRSET B, and over date. expression of T7 polymerase gene by IPTG induction Applications for the enzymes of the present invention resulted in toxic levels of feruloyl esterase production. 60 include producing ferulic acid from wheat bran or agricul Amino acid residues 328 to 419 of XynZ were homolo tural byproducts, using the enzyme to treat grasses or other gous to two repeated CBDs of C. Stercorarium XynA.Sakka plant materials used in the pulp and paper industries, feed et al. (1993) supra; Sakka et al. (1995) supra (FIG. 4). This processing, and as a food additive. These thermostable domain has been recently classified as a family VI CBD enzymes have advantages over other enzymes Since they are Tomme et al. (1995) Supra). Constructs containing the UD 65 economically and easily purified, they have high tempera alone and both the UD plus the putative CBD of XynZ were ture optima, good thermoStability, and they are Stable over a purified from recombinant E. coli cultures. The majority of wide range of pH values. US 6,365,390 B1 10 Feruloyl esterases and Xylanase act Synergistically to the FAE/CBD, they both exhibited feruloyl esterase activity. release of ferulic acid and reducing Sugars from lignocellu Thus, the removal of the 114 amino acids of the C13D did losic material Borneman et al. (1993) Supra). In C. ther not have a detrimental effect on the activity. XynlZ FAE/ mocellum XynY and Xynz, we hypothesize that this is more CBD bound to acid Swollen cellulose very weakly, while the efficient due to the incorporation of both enzymes into one. other constructs missing the CBD did not bind acid Swollen We believe there is a multicutting event catalyzed by these cellulose at all. FAE227 was an inactive but expressed enzymes much like the multicutting event in the celluloSome enzyme. The data here show that neither the CBD nor the itself which leads to more efficient hydrolysis of plant cell linker is necessary for activity, but amino acids 247–266 are wall material. The Substrate, arabinoxylan could be passed necessary for generation of an active enzyme. Since neither from one to another, which would eliminate the the linker region nor the CBD is necessary for activity, we process of each of two enzymes having to bind to the used the Smallest construct which Still retained activity, Substrate and then release it for the other enzyme to attack. FAE, for Subsequent experiments. XynY and XynZ are enzymatic components of the The XynZ, FAE/CBD polypeptide was purified from E. coli cell free extract after a Single Step of heat treatment at Clostridium thermocellum cellulosome. These components 70° C. for 30 min. Over 200 mg of the Xynz FAE/CBD were have a multi-domain Structure which includes a dockerin 15 obtained from 2.5 gram of crude protein (Table 3). The domain, a catalytic Xylanase domain, and a domain of purified XynZ, FAE/CBD had a mass as stated previously of unknown function. The previously unknown domains in 45 kDa as revealed by SDS-PAGE (FIG. 5), consistent with XynY and XynZ have been found to have phenolic esterase the calculated size (46.5 kDa). There was no evidence for activity. These domains have Some amino acid homology to aggregation of the feruloyl esterase produced in E. coli, and that of a phenolic esterase from the anaerobic fungus Orpi SDS-PAGE gels showed that protein which was removed nomyces sp. Strain PC-2. Secondly, purified cellulosomes from the cell free extract by centrifugation had no insoluble from C. thermocellum hydrolyze (O-5-O-(E)-feruloyl-(- protein which could be attributed to inclusion bodies. L-arabinofuranosyl-(1(3)-O-(-D-xylopyranosyl-(1(4)-D- The purified protein had a Vmax of 12.5 limol ferulic acid xylopyranose) (FAXX) and 5-O-(E)-feruloyl-O-(-D- released min-1 mg-1 and Km of 5 mM using FAX3 as xylopyranosyl-(1(2)-O-(-L-arabinofuranosyl-1(3)}-O-(- 25 Substrate. The enzyme had the highest Specific activity D-Xylopyranosyl-1(4)-D-xylopyranose (FAX) yielding towards FAXX but was almost as active toward FAX3 ferulic acid as a product, thus indicating the presence of a (Table 4). The protein was able to release low levels of FA phenolic acid esterase. Intracellular and extracellular frac from ethyl ferulic acid, ground wheat bran, and Coastal tions lacking cellulosomes had insignificant amounts of Bermuda grass and p-coumaric acid (PCA) from PAX3 and phenolic acid esterase activity which confirmed that the ethyl-p-coumarate. The protein lacked activity toward activity resided with the cellulosome. The final proof was CMC, Avicel, p-nitrophenyl (pNP)-arabinopyranoside, obtained by cloning the domains of XynY and XynZ into pNP-glucopyranoside, pNP-Xylopyranoside, and pNP Escherichia coli. The domains were expressed and found to acetate. Isoelectric focusing gel electrophoresis showed that possess phenolic acid esterase activities with FAXX and the protein had a pl of 5.8. FAX as Substrates. 35 The FAE polypeptide of XynZ was also expressed and Nucleotides corresponding to regions of DNA encoding purified to homogeneity. A purification Scheme is shown in amino acids in XynZ (Genbank Accession Number Table 3B. The protein was expressed in a manner similar to M22624) from 20-421 and in XynY (Genbank Accession that for XynZ, FAE/CBD. The heat treatment step also Number X83269) from 795-1077 were overexpressed in E. resulted in 200 mg of protein, but the protein was not pure. coli using the pET and pRSET systems respectively. The 40 An additional Step involving gel filtration resulted in a pure XynZ sequence will henceforth be referred to as XynZ enzyme with a Vmax of 28.2 umol ferulic acid released FAE/CBD since it incorporates the family VI CBD, and the min-1 mg-1 and Km of 10.5 mM using FAX3 as Substrate. XynY protein is XynY FAE since it only contains a catalytic FAE was inhibited by ferulic acid but not by xylose or domain. The cell free extracts containing the expressed arabinose. The FAE had a temperature optimum between proteins each hydrolyzed FAXX with release of ferulic acid 45 30° and 70° C. (FIG. 6A) and had high level activity (FA) which Suggests that these proteins are feruloyl between pH 4 and 7 (FIG. 6B) The enzyme was stable at esterases. The expressed protein from the construct contain temperatures up at 70° C. for 6 hours, and in a similar ing XynY FAE had a molecular weight of 31 kDa, consistent experiment, FAE/CBD also was stable at 70° C. At 80° C., with the Sequence data. Constructs containing XynZ, FAE/ the relative activity of FAE decreased to around 50% after CBD produced a protein with a molecular mass of 45 kDa 50 three hours of incubation, and most of the relative activity as analyzed by SDS-PAGE. The protein was expressed was destroyed after 1 hour of incubation at 90° C. without IPTG induction at a level of 8% of the total protein. Anaerobic microorganisms do not readily degrade lignin, Levels of feruloyl esterase activity of the constructs with the but are able to solubilize it. Anaerobic fungi are able to XynY FAE and XynZFAE/CBD sequences were lower with Solubilize but not metabolize lignin, and it is Suggested that IPTG induction than without induction. Since these proteins 55 the released lignin was carbohydrate linked McSweeney et had similar Sequences and Similar function coupled with the al. (1994) Appl. Environ. Microbiol. 60:2985-2989). The fact that XynZ had higher expression levels than XynY, we data herein indicate that feruloyl esterases are responsible decided to focus our attention on XynZ and Subsequent for lignin solubilization. Most studies of the cellulosome of experiments will refer to that protein. C. thermocellum has been directed toward its cellulolytic Constructs were made which corresponded to proteins 60 activity. It also has Xylanases which we have shown are with amino acids from the original C. thermocellum XynZ bifunctional enzymes with feruloyl esterase activity. The sequence of 20–307 (FAE287), 20–286 (FAE) and 20-247 cellulosome should be efficient in the degradation of arabi (FAE227) (with reference to SEQ ID NO. 14 and FIG. 2). noxylan. It has been previously shown that CloStridium FAE287 is missing the CBD, but contains a proline rich xylanolyticum released aromatics into the culture medium linker which separates the CBD from the FAE domain while 65 when grown on lignocellulosic material Rogers et al. FAE does not contain this linker. When these constructs (1992) International Biodeterioration & Biodegradation were expressed in E. coli in the same manner as XynZ 29:3-17). US 6,365,390 B1 11 12 XynY and XynZeach contain a glycosyl hydrolase family 4:57-65), and another Ruminococcus Xylanase has been 10 catalytic domain in addition to the FAE catalytic domain. shown to be a bifunctional enzyme with Xylanase and acetyl The Xylanase domain of XynZ has been well studied, that Xylan esterase activity Kirby et al. (1998) Biochemical construct has been crystallized, and the three dimensional Society Transactions 26:S169). No feruloyl esterase activity structure solved Dominguez et al. (1995) Nat. Struct. Biol. has been observed in E. coli. The gene from E. coli may 2:569–576; Souchon et al. (1994) J. Mol. Biol. encode a dipeptidase instead, because homology exists 235:1348-1350). In general, Xylanases are thought to be between a dipeptidase from Aspergillus fumigatus and feru Sterically hindered by groups Substituted on the Xylan back loyl esterases. The data Suggest a common ancestral encod bone. Feruloyl esterase and Xylanase have been shown to act ing feruloyl esterases from Orpinomyces, C. thermocellum, Synergistically for the release of ferulic acid and reducing and RuminococcuS. Sugars from lignocellulosic material Borneman et al. (1993) Potential applications for the enzymes of the present Supra). In XynY and XynZ, we hypothesize that this event invention include producing ferulic acid from wheat bran or has been made more efficient by the incorporation of both agricultural byproducts, using the enzyme to treat grasses FAE and Xylanase catalytic domains into one enzyme. which are used in the pulp and paper industry, feed Without wishing to be bound by theory, we believe that there 15 processing, and as a food additive. These thermostable is a multicutting event catalyzed by these enzymes much enzymes have advantages over other enzymes because they like the multicutting event in the cellulosome itself which are easy to purify, have high temperature optima and are leads to more efficient hydrolysis of plant cell wall material. Bifunctional enzymes like XynY and Xynz form a Stable over a wide pH range. dumbbell-like shape which attacks the arabinoxylan The feruloyl esterase domain of Xyn Z was highly polysaccharide and the Substrate is passed from one active expressed in E. coli and the esterase comprised 40-50% of Site to another, eliminating the relatively inefficient two the total cell protein. The recombinant esterase of XynZ was enzyme proceSS in which one has to bind to the Substrate and purified to almost homogeneity by heat treatment. The protein had a molecular mass of 45 kDa, consistent with the then release it for the other enzyme to attack. The existence Size of the predicted deduced amino acid Sequence. Of the of multidomain enzymes Such as the Sea whip coral 25 peroxidase-lipoxygenase Koljak et al. (1997) Science Substrates tested, the expressed protein had high Specific 277: 1994-1996 and a Xylanase-f(1,3-1,4)-glucanase from activity towards FAXX and FAXs. With FAX as a substrate Ruminococcus flavifaciens Flint et al. (1993) J. Bacteriol. Km and Vmax values were 3.2 mM and 13.5 umol ferulic 175:2943-2951) suggests an evolutionary importance of acid released mind mg-I respectively at pH 6.0 at 60° C. having two or more catalytic domains in one enzyme. XynZ Several phenolic esterified substrates were hydrolyzed and contains a contains a family VI CBD, which does not bind the Specific activities with those containing feruloyl groups cellulose significantly. However, representatives of CBDs of were higher than were those with p-coumaroyl groups this family usually efficiently bind Xylan. The CBD of XynZ confirming that the previously unknown domain of XynZ is may participate in a tight association of the catalytic a feruloyl esterase. The enzyme released mainly ferulic acid from wheat bran and Coastal Bermuda grass (CBG) with a domains with the substrate. This is consistent with the higher 35 Smaller amount of p-coumaroyl groups released from CBS. Km of FAE as compared to that of XynZ, FAE/CBD. This Study represents the first demonstration of esterases in Both FAE/CBD and FAE are highly thermostable. They the cellulosome of CloStridium thermocellum and of are active against both feruloyl and p-coumaroyl esterified enzymes from the cellulosome with two different activities. substrates, and they represent the first FAE from bacteria to The present work also provides a phenolic acid esterase hydrolyze p-coumaroyl esters. The high Km of FAE versus 40 XynZ FAE/CBD indicates that the CBD is important in derived from a Xylanase from RuminococcuS and as an binding the Substrate before enzyme . enzyme produced by Orpinomyces PC-2. The FAE domains of XynZ and XynY are homologous to A Summary of the purification of FAE from Orpinomyces each other and to the Orpinomyces FaeA. The Orpinomyces sp stain PC-2 is presented in Table 7. The Q-Sepharose FaeA, together with those homologous Sequences, however, 45 column Separated two peaks of esterase activity. Proteins failed to show significant homology to the recently pub which eluted in the first peak had higher activity against lished feruloyl esterases (FaeA) of Aspergillus niger and A. ethyl-pCA while proteins eluting in the Second peak had tubingensis de Vries et al. (1997) supra as well as CinA and greater activity against FAXX. These data Suggest that a Cin B from Butyrivibriofibrisolvens Dalrymple et al. (1996) p-coumaroyl esterase eluted in the first peak while the FEMS Microbiol. Lett. 143:115-120; Dalrymple and Swa 50 feruloyl esterase eluted in the Second. The first peak was not dling (1997) Microbiology 143: 1203–1210 and XylD from studied further, but the fractions in peak 2 were further Pseudomonas fluorescens subsp. cellulosa Ferreira et al. purified resulting in a purified enzyme which had an (1993) Biochemical Journal 294:349-355). The sequence approximate molecular mass of 50 kDa as Visualized by analysis implies that a new type of feruloyl esterase is SDS-PAGE analysis (FIG. 7). There was a decrease in encoded by the Orpinomyces gene and the homologous C. 55 specific activity after the MonoO step which could not be thermocellum sequences described above. The Orpinomyces explained. FaeA, and the FAE domains of XynZ and XynY were also Temperature and pH optima experiments showed that the shown to be homologous to a hypothetical polypeptide of E. enzyme had a temperature optimum of 50° C. (FIG. 8A) and coli (Genbank Accession Number P31471) and the carboxy had activity over a pH range between 5.2 and 8 (FIG. 8B). terminal region of a RuminococcuS Sp. Xylanase earlier 60 The purified enzyme was stable at 4 C. for over 18 months. designated as a UD Genbank Accession Number S58235). The purified enzyme was Subjected to N-terminal Sequenc No function had been assigned to those Sequences of E. coli ing giving the sequence ETTYGITLRDTKEKFTVFKD and Ruminococcus. Without wishing to be bound by theory, (SEQ ID NO:21). The protein was also subjected to internal the present inventors believe that these Sequences also Sequencing which resulted in four peptide fragments (Table encode feruloyl esterases and that the RuminococcuS Xyla 65 8) which were used to create degenerate PCR primers. nase is also bifunctional. RuminococcuS has been shown to Two of the peptide fragments from the internal amino acid produce FAE activity McSweeney et al. (1998) Anaerobe Sequencing were used to create degenerate aglionucleotide US 6,365,390 B1 13 14 primers which are listed in the materials and methods It will be understood by those skilled in the art that other Section. These primers were used to amplify regions of DNA nucleic acid Sequences besides those disclosed herein for the in the Orpinomyces PC-2 cDNA library. A 216 bp PCR phenolic acid esterases, i.e. feruloyl esterases, will function product was generated. The PCR product was labeled with as coding Sequences Synonymous with the exemplified cod 5 ing Sequences. Nucleic acid Sequences are Synonymous if digoxygenin-UTP and used as a probe to screen the cDNA the amino acid Sequences encoded by those nucleic acid library. After Screening 50,000 phage, one positive plaque Sequences are the same. The degeneracy of the genetic code was obtained and its DNA was sequenced using T3 and T7 is well known to the art. For many amino acids, there is more universal primerS. Sequencing using the T3 primer did not than one nucleotide triplet which Serves as the codon for a reveal any ORFs, however, Sequencing using the T7 reverse particular amino acid, and one of ordinary skill in the art primer gave the C-terminal end of the gene. Based on the understands nucleotide or codon Substitutions which do not Sequence data and restriction fragment analyses, but without affect the amino acid(s) encoded. It is further understood in wishing to be bound by theory, we have concluded that the the art that codon Substitutions to conform to common codon faeA gene in this cdNA was truncated and furthermore that usage in a particular recombinant host cell is Sometimes the insert comprises multiple genes. These other genes were 15 desirable not studied further. The deduced amino acid Sequence of the Specifically included in this invention are Sequences from insert matched the data from the peptide Sequencing. The other Strains of Clostridium and from other microorganisms insert had a size of 1074 bp and encoded a protein of 358 which hybridize to the sequences disclosed for feruloyl and amino acids. Since the Size of the encoded protein did not coumaryl esterases under Stringent conditions. Stringent match that of the purified enzyme and the N-terminal conditions refer to conditions understood in the art for a Sequence, including a Signal peptide and lack of a start given probe length and nucleotide composition and capable codon, another round of Screening was performed using the of hybridizing under Stringent conditions means annealing to entire Sequence as a probe after digoxygenin labeling. After a Subject nucleotide Sequence, or its complementary Strand, Screening an additional 50,000 phage, one positive clone under Standard conditions (i.e., high temperature and/or low 25 Salt content) which tend to disfavor annealing of unrelated was obtained which had a size of 1673 bp with the largest Sequences, (indicating about 95-100% nucleotide sequence open reading frame comprising a protein of 530 amino acids. identity). Also specifically included in this invention are The Sequence of this insert is believed to be an incomplete Sequences from other Strains of Orpinomyces Species and one since no 5' UTR was found and the (putative) signal other anaerobic fungi which hybridize to the Sequences Sequence has only four amino acids. Most signal Sequences disclosed for the esterase Sequences under moderately Strin found in hydrolytic enzymes from anaerobic fungi are at gent conditions. Moderately Stringent conditions refer to least 20 amino acids long. The insert was found to be in a conditions understood in the art for a given probe Sequence reverse orientation with respect to the Lac7, promoter. The and “conditions of medium Stringency means hybridization upstream lac promoter should direct synthesis of the inserted and wash conditions of 50–65 C., 1XSSC and 0.1% SDS gene, but no activity was found in lysed E. coli cells 35 (indicating about 80–95% similarity). Also specifically harboring the recombinant plasmid. The FaeA gene in E. coli included in this invention are Sequences from other Strains of was expressed using the peT System (Novagen) in the Orpinomyces, from other anaerobic fungi, and from other correct orientation. The recombinant FaeA released ferulic organisms, including bacteria, which hybridize to the acid from FAXX as well as other Substrates which were Sequences disclosed for the esterase Sequences under highly esterified with phenolic groups. The enzyme had the highest 40 Stringent conditions. Highly Stringent conditions refer to activity against FAXX, which demonstrates that it is a true conditions understood in the art for a given probe Sequence feruloyl esterase (Table 10). In addition, when the enzyme and “conditions of high Stringency' means hybridization and was incubated with a recombinant Xylanase, there was a 80 wash conditions of 65-68 C., 0.1xSSC and 0.1% SDS fold increase in FA released over FaeA alone. (indicating about 95-100% similarity). Hybridization assays The nucleotide and deduced amino acid Sequence of the 45 and conditions are further described in Sambrook et al. faeA gene are shown in Table 9. A BLAST analysis of the (1989). encoded protein showed homology to Several enzymes. A method for identifying other nucleic acids encoding These enzymes included domains of unknown function from feruloyl esterase- and/or coumaryl esterase-homologous Xylanase Z and Xylanase Y of CloStridium thermocellum, a enzymes is also provided wherein nucleic acid molecules domain of unknown function in a Xylanase from Rumino 50 encoding phenolic acid esterases are isolated from an coccuS spp. and a 44 kDa hypothetical protein from E. coli, anaerobic fungus, including but not limited to Orpinomyces and a dipeptidyl peptidase from Aspergillus fumigatus (FIG. or an anaerobic bacterium, Such as Clostridium or 9). All proteins had at least 20% identity with the C-terminal Ruminococcus, among others, and nucleic acid hybridiza 300 amino acids of the protein. The N-terminal part of the tion is performed with the nucleic acid molecules and a enzyme did not show homology to any enzyme in the 55 labeled probe having a nucleotide Sequence that includes all BLAST analysis and the function of this domain is or part of a FAE coding Sequence as given in Table 5, 6, 9 unknown. Although FAE activity has been demonstrated in and/or 10 herein. By this method, phenolic acid esterase the cellulase/hemicellulase complex from Orpinomyces, this genes Similar to the exemplified feruloyl and coumaryl protein does not contain a non-catalytic repeated peptide esterases can be identified and isolated from other Strains of domain (NCRPD). Analysis of C-terminal coding region 60 Clostridium or other anaerobic microorganisms. All or part indicated a typical Signature Sequence found in lipases and of a nucleotide Sequence referS Specifically to all continuous other esterases of GXSXG at residues 341-345 as well as an nucleotides of a nucleotide Sequence, or e.g. 1000 continu aspartic acid at residue 403 and a histidine at residue 436 ous nucleotides, 500 continuous nucleotides, 100 continuous which would make up the . A Search of the nucleotides, 25 continuous nucleotides, and 15 continuous Sequence revealed two N-glycosylation sites at amino acids 65 nucleotides. 300 and 488 (of SEQ ID NO:18) and a 16mer poly A tail in Sequences included in this invention are those amino acid the 3' UTR. sequences which are 40 to 100% identical to the amino acid US 6,365,390 B1 15 16 Sequences encoded by the exemplified C. thermocellum Each reference and patent document cited in the present Strain feruloyl esterase, amino acids proteins truncated from application is incorporated by reference herein to the extent the XynY or XynZ proteins or the Ruminococcus FAE that it is not inconsistent with the present disclosure. polypeptide or the Orpinomyces PC-2 FAE polypeptide, all The following examples are provided for illustrative Specifically identified herein. Sequences included in this invention are also those amino acid Sequences which are 40, purposes, and is not intended to limit the Scope of the 50, 60, 70, 75, 80, 85, 90, 95 to 100%, and all integers invention as claimed herein. Any variations: in the exem between 40% and 100%, identical to the amino acid plified articles which occur to the skilled artisan are intended Sequences encoded by an exemplified phenolic acid esterase to fall within the Scope of the present invention. coding Sequence and corresponding to or identifying 1O encoded proteins which exhibit feruloyl esterase activity. In EXAMPLES comparisons of protein or nucleic acid Sequences, gaps Example 1 introduced into either query or reference Sequence to opti mize alignment are treated as mismatches. In amino acid Bacterial Strains, Vectors, and Culture Media Sequence comparisons to identify feruloyl esterase proteins, 15 the reference Sequence is, desirably, amino acids 227 to 440 C. thermocellum JW20 was cultivated in prereduced of SEQ ID NO:18 (FAE of Orpinomyces PC-2). liquid medium Wiegel and Dykstra (1984) Appl. Microbiol. It is well-known in the biological arts that certain amino Biotechnol. 20:59-65 at 60° C. under an atmosphere of acid Substitutions may be made in protein Sequences without nitrogen. AVicel (microcrystalline cellulose, 0.4% w/v, affecting the function of the protein. Generally, conservative Baker TLC, 2–20 micron particle size) was used as the amino acid Substitutions or Substitutions of Similar amino carbon source. E. coli strain BL21 (DE3) (Stratagene, La acids are tolerated without affecting protein function. Simi Jolla, Calif.) and plasmid pRSET B (Invitrogen, Carlsbad, lar amino acids can be those that are Similar in size and/or Calif.) were used the host strain and the vector for protein charge properties, for example, aspartate and glutamate, and expression. Improved results were obtained using plasmid isoleucine and Valine, are both pairs of Similar amino acids. 25 pET-21b (Novagen, Madison, Wis.). The recombinant E. Similarity between amino acid pairs has been assessed in the coli were Selected for by growing in Luria-Bertani medium art in a number of ways. For example, Dayhoff et al. (1978) containing 100 lug/ml amplicillin. in Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, Chapter 22, pp. 345-352, which is incorpo Example 2 rated by reference herein provides frequency tables for Amplification and Cloning of Sequences Coding amino acid Substitutions which can be employed as a measure of amino acid Similarity. Dayhoff et al.'s frequency for Different Domains of C. thermocellum XynY tables are based on comparisons of amino acid sequences for and XynZ proteins having the same function from a variety of evolu Genomic DNA was isolated from C. thermocellum as tionarily different Sources. 35 previously described Maniatis et al. (1982) Supra). PCR Monoclonal or polyclonal antibodies, preferably primers were designed (Table 1) and Synthesized on an monoclonal, Specifically reacting with the phenolic acid Applied Biosystems (Foster City, Calif.) DNA sequencer. To esterases of the present invention may be made by methods facilitate the insertion of DNA sequence into or p T-21b or known in the art. See, e.g., Harlow and Lane (1988) Anti pRSETB, BamHI (for pET216) or NdeI for pRSET B, and bodies: A Laboratory Manual, Cold Spring Harbor Labora 40 HindIII sites were added to forward and reverse primers, tories; Goding (1986) Monoclonal Antibodies. Principles respectively (Table 1). PCRs were carried out on a Perkin and Practice, 2d ed., Academic Press, New York. Elmer 480 Thermocycler for 30 cycles with each cycle on Standard techniques for cloning, DNA isolation, amplifi 95° C. for 1 min, 48°C. for 1 min, and 72°C. for 3 min. PCR cation and purification, for enzymatic reactions involving products and the plasmid were digested with BamHI (or DNA , DNA polymerase, restriction 45 Ndel) and HindIII, purified with a Biol(01 Geneclean kit, and the like, and various Separation techniques are those ligated with T4 ligase. E. coli BL21 (DE3) was transformed known and commonly employed by those skilled in the art. with the ligation mixture and at least four colonies of each A number of Standard techniques are described in Sambrook construct were picked for analyzing feruloyl esterase expres et al. (1989) Molecular Cloning, Second Edition, Cold Sion. The inserted Sequences were Sequenced to Verify the Spring Harbor Laboratory, Plainview, N.Y.; Maniatis et al. 50 lack of unwanted mutations. (1982) Molecular Cloning, Cold Spring Harbor Laboratory, Two internal Sequences were used to create degenerate Plainview, N.Y.; Wu (ed.) (1993) Meth. Enzymol. 218, Part oligonucleotide primers for PCR in order to amplify the I; Wu (ed.) (1979) Meth. Enzymol. 68; Wu et al. (eds.) (1983) feruloyl esterase coding Sequence in the cDNA library in Meth. Enzymol. 100 and 101; Grossman and Moldave (eds.) Orpinomyces. The Orpinomyces PC-2 cDNA library is Meth. Enzymol. 65; Miller (ed.) (1972) Experiments in 55 described in the ZAPII vector (Stratagene, La Jolla, Calif.) Molecular Genetics, Cold Spring Harbor Laboratory, Cold in E. coli host cells is described in Chen et al. (1995) Proc. Spring Harbor, New York; Old and Primrose (1981) Prin Natl. Acad. Sci. 92:2587-2591. Positive clone(s) are sub ciples of Gene Manipulation, University of California Press, cloned a pBlueScript vector (Stratagene, La Jolla, Calif.). Berkeley; Schleif and Wensink (1982) Practical Methods in The amplified product was cloned into pCRII (Invitrogen, Molecular Biology; Glover (ed.) (1985) DNA Cloning Vol. 60 Carlsbad, Calif.) using the TA cloning kit and Sequenced I and II, IRL Press, Oxford, UK; Hames and Higgins (eds.) using an automatic PCR sequencer (Applied BioSystems, (1985) Nucleic Acid Hybridization, IRL Press, Oxford, UK; Foster City, Calif.) using M13 reverse primer. The resulting and Setlow and Hollaender (1979) Genetic Engineering: PCR product was used to screen the cDNA library after Principles and Methods, Vols. 1-4, Plenum Press, New being labeled with digoxigenin (Boehringer Mannheim, York. Abbreviations and nomenclature, where employed, 65 Indianapolis, Ind.). The digoxigenin probe was bound to are deemed Standard in the field and commonly used in plaques which were lifted from a nitrocellulose blot. Anti professional journals Such as those cited herein. bodies conjugated to alkaline showed a single US 6,365,390 B1 17 18 positive clone which hybridized to the PCR product. The Bermuda grass were ground in a Wiley mill to pass through product was sequenced and found to contain the C-terminal a 250 um Screen. Plant Samples of ten milligram were 358 amino acids of the enzyme (See Table 9). A second incubated for one hour in 400 ul of 50 mM Na-citrate buffer probe which incorporated those 339 amino acids was used as pH, 6.0 plus 25 ul of enzyme. After the addition of 25 ul of a probe to Screen the library in the same manner as before. 20% formic acid to Stop the reaction, the Samples were A Second clone was isolated which contained the C-terminal centrifuged at 16,000xg in a microfuge and then assayed for region plus an additional 172 amino acids making a FA and pCA by HPLC. polypeptide of 530 amino acids. Confirmation of the ASSays with p-nitrophenol Substrates were performed in Sequence came from N-terminal and internal protein microtiter plate wells. Two hundred microliters of substrate Sequence data from the purified enzyme which matched that at a concentration of 100 uM was preincubated in wells of the cloned cDNA product. Expression cloning of this heated to 40C. Enzyme (10 ul) was added to the reaction coding Sequence, which lacks an ATG translation Start Site, mixture and the absorbance was followed continuously at a can be achieved by expressing it, in frame, as a fusion wavelength of 405 nm. p-Nitrophenol was used as standard. protein using any one of a number of fusion protein vectors Xylanase and AVicelase activities were measured by reduc known to the art or an ATG translation start codon and/or 15 ing Sugar assays using dinitrosalicylate Miller, G. L. (1959) ribosome upstream of the ATG can be added Anal. Chem. 31:127-132). using methodology well known to and readily accessible to Unless otherwise noted, all Orpinomyces enzyme assays the art in an expression vector appropriate to the choice of were performed at 40°C. in 50 mM Bis-Tris Propane buffer, recombinant host cell. pH 6.0. One unit of enzyme activity is defined as the amount Example 3 that released 1 umol of product min-1, and Specific activity is given in units per milligram of protein. Protein was Isolation and Analysis of the Cellulosome determined by the method of Bradford Bradford, M. (1976) Anal. Biochem. 72:248-254). Feruloyl esterase activity was The cellulosomes were isolated from 10L of culture fluid assayed by the method of Borneman et al. (1990) Supra after complete Substrate exhaustion by the affinity digestion 25 which involved measuring the release of ferulic acid from method Morag et al. (1992) Supra). This preparation was FAXX via HPLC using a mobile phase of 10 mM used directly for gel filtration using a Fast Protein Liquid Na-formate pH 3 and 30% (vol/vol) methanol. FAXX was Chromatography (FPLC) system with a Superose 6 column purified from wheat bran as previously described Borneman (Pharmacia, Piscataway, N.J.). Proteins were eluted in 50 et al. (1990) Supra). For assay using ethyl-p-coumarate mM Tris-HCl, 100 mM NaCl at a flow rate of 0.2 ml/min. (ethyl-pCA), the substrate (10 mM) was used with 30% Fractions of 0.5 ml were collected and stored at 4 C. for methanol in the same mobile phase. Samples were run on a further analysis. Cell extracts were prepared by first growing Hewlett Packard 1100 Series instrument equipped with an the organism in the presence of 0.2% cellobiose for 2 days. autoSampler and diode array detector. Ferulic acid and Cells were then Separated by centrifugation, resuspended in p-coumaric acid were used as Standards. The appropriately 50 mM Tris-HCl buffer, pH 7.5, and sonicated. Culture 35 diluted protein sample (25ul) was added to 400 ul of buffer medium was concentrated to 5 ml using a Millipore con containing 750 uM FAXX. Samples were incubated at 40 centrator (Millipore, Bedford, Mass.). To adsorb cellulo C. for 30 min. and the reaction was stopped by adding 25 ul somes from the medium, 0.5 mg of Avicel was added and the of 20% formic acid. pH optimum assays were carried out in suspension was stirred at 4 C. for 4 hours. Avicel was 100 mM citrate phosphate buffer in the range of 2.6-7.0, 100 removed by centrifugation (Avicel-treated medium). All 40 fractions were tested for AVicelase, Xylanase, and ferulic mM phosphate in the range of pH 5.7-6.3, and 100 mM Tris acid esterase activities. in the range of pH 7.0-9.0. For temperature optimum determination, purified esterase were incubated for 30 min Unless otherwise noted, all C. thermocellum enzyme utes at the appropriate temperature within the range of 20 assays were performed at 60° C. in 50 mM Na-citrate buffer, to 70° C. pH 6.0. One unit of enzyme activity was defined as the 45 amount of enzyme that released 1 limol of product min-1, All reactions to test the Specificity of the Orpinomyces and Specific activity is given in units per milligram of PC-2 enzyme were carried out in 50 mM citrate buffer pH protein. Feruloyl esterase activity was measured using a 6.0. FAXX, FAX3,Et-FA and Et-pCA were assayed for 5 modified version of the assay described by Borneman et al. min. at 40 C. at a concentration of 10 mM. Enzyme solution Borneman et al. (1990) Anal. Biochem. 190:129-133). The 50 (L) was added 400 ul of substrate solution. The reaction was appropriately diluted protein sample (251) was added to 400 stopped with 25 ul of 20% formate. For studies on wheat All of buffer plus 8 mM of Substrate. Samples were incubated bran, crude recombinant Fae A (50 ul) equaling 0.7 units of at 60° C. for 5 min. and the reaction was stopped by adding activity against FAXX, Xyn A (50 ul) equaling 300 units of 25 ul of 20% formic acid. Release of ferulic acid was activity against birchwood Xylan or both was added to a total measured via HPLC using a mobile phase of 10 mM 55 reaction Volume of 1 ml also containing 10 mg of destarched Na-formate pH 3 and 30% (vol/vol) methanol. For routine wheat bran. The reaction was carried out for 40 min at 40 assays, FAXX and FAX3 purified from wheat bran were C. and stopped by adding 50 ul of 20% formate. used as substrates Borneman et al. (1990) Supra). Ethyl ferulate and ethyl-p-coumarate esters were a gift from D. E. Example 4 Akin (USDA, Athens, Ga.). The hydrolysis of these (10 60 mM) were determined similarly to that of FAXX, but the Enzyme Purification HPLC analyses were performed with 50% methanol. HPLC One liter of recombinant E. coli expressing the C. ther runs were with a Hewlett Packard 1100 Series instrument mocellum XynZ-derived FAE was grown in Luria broth equipped with an autoSampler and diode array detector. containing 100 lig/ml amplicillin until ODoo=0.5 and then Ferulic acid and p-coumaric acid were used as Standards. To 65 grown an additional 4–6 hours. Cells were harvested by determine the amount of feruloyl and p-coumaroyl groups centrifugation, resuspended at a concentration of 1 g per 3 released from plant cell walls, wheat bran and Coastal ml in 50 mM Tris-HCl (pH 7.5) and lysed in a French US 6,365,390 B1 19 20 preSSure cell. Cell debris was removed by centrifugation at The purity of the Orpinomyces FAE protein was verified 100,000xg. The cell extract was heat treated for 30 min. at by SDS-PAGE analysis and Coomassie blue staining. The 70° C. Denatured protein was removed by centrifugation at enzyme had a molecular mass of approximately 50 kDa. 100,000xg. The Supernatant was run on a MonoO HR 10/10 Purified enzyme was blotted onto a polyvinylidene difloride ion exchange chromatography column (Pharmacia, Piscataway, N.J.) equilibrated with 50 mM sodium citrate (PVDF) membrane and stained according to the manufac buffer, pH 6.0. MonoO (Pharmacia, Piscataway, N.J.) is a turer's instructions. The band corresponding to the purified Strong anion eXchange resin, hydrophilic and in bead form. enzyme was cut out, and the excised band was digested with A linear gradient of 1M NaCl in the same buffer over 40 ml Protease Lys-C (Boehringer Mannheim, Indianapolis, Ind.). was used to elute the purified protein. Protein Samples were Peptides were separated by HPLC using a C8 reverse phase stored at 4 C. column. The intact protein and its peptides were Subjected to Alternatively, the 100,000xg Supernatant after the heat N-terminal amino acid Sequencing. treatment was concentrated to a Volume of 2 ml with a For internal Sequencing, the enzyme was run on SDS Centricon 10 concentrator (Amicon, Millipore, Bedford, 15 PAGE and then blotted onto a PVDF membrane which was Mass.) and then applied to a TSK3000SW column Stained according to the manufacturer's instructions. The (Tosohaas) which was run with 50 mM Tris pH 7.5 and 5% band corresponding to the purified enzyme was cut out with glycerol as Solvent. The purified enzyme was stored at 4 C. a razor blade and digested with Protease Lys-C (Boehringer in the elution buffer and was stable for at least a month with Mannheim). Peptides were separated on High Performance minimal loSS. Liquid Chromatography with a C8 reverse phase column. A feruloyl esterase was purified from culture Supernatant The intact protein and its peptides were Subjected to of Orpinomyces sp. strain PC-2 (Barichievicz and Calza N-terminal amino acid Sequencing using an Applied Bio medium Barichievicz and Calza (1990) Appl. Environ. Systems model 477A gas-phase Sequencer equipped with an Microbiol. 56:43-48 with 0.2% Avicel as carbon source). automatic on-line phenylthiohydantoin analyzer. The enzyme was obtained from a 60 liter culture of the 25 fungus. The culture was grown under an atmosphere of CO2 for 6 days. The fungal mycelia were removed by filtration Example 6 through Miracloth (Calbiochem, San Diego, Calif.) The culture Supernatant was concentrated 120 fold using a Pel licon system (Millipore, Bedford, Mass.) and a 10 kDa C. thermocellum Enzyme Stability Experiments membrane. The concentrate was loaded onto a Q Sepharose (Pharmacia, Piscataway, N.J.) column equilibrated with 20 Purified enzyme at a concentration of 13 lug/ml was mM Tris-HCl pH 7.5, and proteins were eluted with a placed in a water bath at the appropriate temperature and gradient of 1M NaCl in the same buffer. The active fractions incubated at intervals of one hour. Enzyme aliquots (25 ul) were detected by their ability to release ferulic acid from 35 were removed and assays were performed in triplicate using FAXX as measured by HPLC. The active fractions were FAX3 as a Substrate as described above. FAE/CBD was combined and ammonium Sulfate was added to a concen tested attemperatures of 50, 60°, and 70° C. while FAE was tration of 1.7M. The Solution was filtered and then loaded tested at 70, 80° and 90° C. onto a Phenyl Sepharose High Performance Chromatogra phy (Pharmacia) column equilibrated with 20 mM Tris-HCl 40 Table 5 taken from Fontes et al. (1995) supra presents pH 7.5 and 1.7 Mammonium sulfate. The protein was eluted the nucleotide Sequence and deduced amino acid Sequence by a negative gradient of buffer without ammonium Sulfate. (amino acids 808-1061 of Xyly) of C. thermocellum xyl Y, Active fractions were concentrated using a Centricon 10 unit which is Xylanase Y. The starting points of the five domains (Amicon, Millipore, Bedford, Mass.) and Subsequently are marked A to it, with arrows. The Sequence is available applied to a TSK 3000SW column (Tosohaas, 45 under Accession Number X 83269, EMBL database. Montgomeryville, Pa.) which was equilibrated with 20 mM Tris-HCl pH 7.5 and 200 mM NaCl. Fractions with activity Table 6taken from Grépinet et al. (1988) supra presents were combined and loaded directly onto an anion eXchange the nucleotide and deduced amino acid Sequences (amino (MonoO HR 5/5, Pharmacia, Piscataway, N.J.) column acids 30-274 of XynZ) of the C. thermocellum XynZ and its equilibrated with 20 mM Tris-HCl pH 7.5. The purified 50 gene product. enzyme was eluted using a gradient of 0.5 M NaCl. The purification is summarized in Table 7. Table 9 presents the deduced amino acid Sequence and cDNA coding Sequence of the mature phenolic acid esterase Example 5 of Orpinomyces PC-2. 55 Other Analytical Procedures FIG. 1 provides the amino acid Sequence for a phenolic acid esterase (feruloyl esterase) which corresponds to a Enzyme purity was monitored using Sodium dodecyl previously uncharacterized RuminococcuS Xylanase. The sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) Sequence of the complete coding Sequence of that Xylanase carried out according to the method of Laemmli Laemmli 60 is available under Accession No. S58235 (Gen Bank)(See (1970) Nature (London) 227:680-685). Proteins were Table 10). The coding sequence of the phenolic acid esterase stained with Coomassie blue. The isoelectric point of the C. thermocellum XynZ-derived FAE protein was determined polypeptide is nucleotide 2164-2895, exclusive of transla by running the protein on a precast IEF gel (Serva). Each gel tion Start and Stop codons. was run at 12 W constant power for 45 min. 65 Catalytically active polypeptides were produced in Protein concentrations in liquid Samples were determined recombinant E. coli after the PCR amplification and cloning as described by Bradford, M. (1976) supra). as described in Example 2 herein above. US 6,365,390 B1 21 22

TABLE 1. Primer used in amplifying various regions of xynY and XVnz, of C. thermocellum Name Sequence Gene Direction Position SEQ ID NO: XYF1Bam TAGGATCCCCTGTAGCAGAAAATCCTTC xynY Forward 795-8OO 1. XYF1C TACATATGCCTGTAGCAGAAAATCCTTC xynY Forward 795-8OO 2 XYR1C GAGGAAGCTTTTACATGGAAGAAATATGGAAG xynY Reverse 1071-1077 3 XZF1d TACATATGCTTGTCACAATAAGCAGTACA xynZ. Forward 20-26 4 XZF1Bam TAGGATCCCTTGTCACAATAAGCAGTACA xynZ. Forward 20-26 5 XZR1d GAGGAAGCTTTTAGTTGTTGGCAACGCAATA xynZ Reverse 242-247 6 XZR2d GAGGAAGCTTACTTCCACACATTAAAATC xynZ Reverse 261-266 7 XZR3d GAGGAAGCTTAGTTTCCATCCCTCGTCAA xynZ Reverse 281-286 8 XZR4d GAGGAAGCTTAGTCATAATCTTCCGCTTC xynZ Reverse 3O2-307 9 XZR5d GAGGAAGCTTAAACGCCAAAAGTGAACCAGTC xynZ Reverse 414-421 1O Restriction sites NdeI and HindIII are underlined and double-underlined, respectively. 'Restriction site BamH1 is underlined. Amino acid positions are according to Xylanase sequences in the data banks. XYF1 or XYF1Bam and XYR1 are the forward and reverse primers used to amplify the feruloyl esterase domain from XZFIyIY(xynY) is the offorward C. thermocellum primer and XZR1-XZR5see Fontes et al.are (1995) the reverse supral. primers used in the amplification of the feruloyl esterase portion of the xynZ of C. thermocellum.

TABLE 2 TABLE 3B Distribution of proteins and hydrolytic activities in C. thermocellum 25 Purification of the XynZ, FAE polypeptide from E. coli cell free extract. culture grown on Avice Total Specific Feruloyl Protein activity Activity Purification Protein esterase Avicelase Xylanase Sample (mg) (U) (U/mg) Yield (%) Fold

Fraction mg/ml % U/ml % U/ml % U/ml % Cell free extract 532.6 1520 2.9 1OO 1. Heat treatment 212.5 1629 7.7 107 2.7 Cell- O.09 39.1 0.005 2.1 0.001 2.4 0.49 5.3 TSK3OOO SW 30.9 823 26.6 54 9.7 associated Cultural O.14 60.9 O.238 97.9 O.O4 97.6 8.72 94.7 "The protein sample was obtained from 1.0 liter E. coli culture. medium After Avice O.11 47.8 O.OO2 O.8 O.OO4 9.7 1.56 16.9 35 treatment TABLE 4 Avice- O.O3 13.2 0.24 97.1 O.O33 80.5 6.75 73.3 bound Substrate specificity of the feruloyl esterase in C. thermocellum XynZ. Substrate Specific activity (U/mg) 40 FAXX 12.5 TABLE 3A FAX 11.8 PAX 14 Purification of the FAE/CBD polypeptide from E. coli cell free extract. Ethyl-FA O.O66 Ethyl-pCA O.O22 Total Specific 45 CMC O Protein activity Activity Purification PNP-arabinopyranoside O Sample (mg) (U) (U/mg) Yield (%) Fold PNP-glucopyranoside O PNP-xylopyranoside O Cell free extract 2,597 3,253 1.25 1OO 1. Wheat bran O.O6 Heat treatment 219.8 2,827 12.9 86.9 10.3 Coastal Bermuda grass O1 The protein sample was obtained from 1.0 liter E. coli culture. 50 "Calculated value based on substrate concentration used in the assay

TABLE 5

Nucleotide and Deduced Amino Acid Sequences of Clostridium thermocellum Xylanase Y.

-200 TAAGAAACTTTAAAACACCCTTTATAAAAATACAAAGAATTACAGGCAATTATAGTGTAA

-100 TGTGGATTTTAACTAAAATGGAAGGAGGAATGTAATTGGTAATAGATATTATGATATAAT

US 6,365,390 B1 29 30

TABLE 5-continued

Nucleotide and Deduced Amino Acid Sequences of Clostridium thermocellum Xylanase Y.

l 3200 GGCGCCACTCACTGGTGGGGATACGTAAGACATTATATTTATGATGCACTTCCATATTTC G A T M W W G Y V R H Y I Y D A L P Y F

TTCCATGAATGAATGAGAAAGAAAAACATGATTGAGTTTCTAATCAATAAAAAAAGGAA P H E

33 OO TTTTTTAGTGGTGTCCAGGTTATTGAA

Nucleotide sequence of xynY

The nucleotide sequence of xynY and the deduced primary structure of XYLY are shown. The locations of the first residues of domains A, B, C, D and E are indicated with the corresponding letters. The positions of the two primers used to amplify the region of xynY coding for the catalytic domain of the xylanase (pCF2/ 3) are indicated by overlining. The 5' and 3' nucleotides of truncated forms of xynY are indicated by a downward arrow and the plasmids that encode the derivatives of the xylanase gene. The nucleotide sequence has been submitted to the EMBL database under the accession number X83269.

US 6,365,390 B1 37 38

TABLE 7 TABLE 8 Substrate specificity of Orpinomyces FaeA Purification of a Feruloyl Esterase from Orpinomyces PC-2 Culture Supernatant 5 Sample tumole FA released min' mg enzyme' FAXX 2.05 FAX 18O Total Total Ethyl-ferulate O.O7 Ethyl-p-coumarate O.O2 Activity Protein Specific Ac Purification 10 what BFA O.OOO2 Step (U) (mg) tivity (Umg") Fold Wheat bran FaeA + XynA O.O13

Culture Supernatant 32.38 5,830 5.6E - 3 1. All reactions were carried out in 50 mM citrate buffer pH O.96 6.0. FAXX, FAX, Et-FA and Et pCA were assayed for 5 min Concentrate 7.9 1460 5.42E - 3 15 at 40° C. at a concentration of 10 mM. Enzyme solution (uL) Q Sepharose 2.58 181 1.43e - 2 2.55 was added 400 lull of Substrate solution. The reaction was Phenyl Sepharose 1.68 28.2 5.96E - 2 10.6 stopped with 25 till of 20% formate. For studies on wheat bran, crude recombinant Fae A (50 uL) HP equaling 0.7 units of activity against FAXX, XynA (50 uL) TSK3OOO SW O.85 O.62 1.39 253 equaling 300 units of activity against birchwood Xylan or 20 both was added to a total reaction volume of 1 ml also Mono O HR 5/5 O.26 O.24 1.087 198 containing 10 mg of destarched what bran. The reaction was carried out for 40 min at 40° C. and stopped by adding 50 till of 20% formate.

TABLE 9 Nucleotide and Deduced Amino Acid Sequence for Feruloyl Esterase from Orpinomyces PC-2.

GGTTGTTTCTTGTGAAACTACTTACGGTATTACTTTACGTGATACTA 1 W W S C E T T Y G T L R D T K

AGGAAAAATTCACTGTATTCAAAGACGGTTCCGCTGCTACTGATATTGTTGAATCAGAAG 17 E K. F. T. W. F. K D = 1 O G. S A. A T D I W E S E D

ATGGTTCCGTTTCTTGGATTGCTACTGCTGCCGGTGGTGCTGGTGGTGGTGTTGCCTTCT 37 G. S W S W A. T. A. A. G. G. A. G. G. G. W. A. F. Y

ATGTTAAGGCTAACAAGGAAGAAATTAACATTGCTAACTATGAATCTATCGATATTGAAA 57 W. K. A. N. K E E I N I A N Y E S I D I E M.

TGGAATACACTCCAGTTGAAAACAAATGGAATGATGCTGCTAAGAACCCAAGTTTCTGTA 77 E Y T P W E N K W N D A A. K. N P S F. C. M.

TGAGAATTCTTCCATGGGATTCCACTGGTATGTTCGGTGGTTACGAAGATCTTGAATACT 97 R I, L P W D S T G M. F. G G Y E D L E Y F

TCGATACTCCAGCAAAATCTGGTAATTTCAAATACACTATTAAGATTCCTTCCTTCTTTG 117 D. T. P. A. K. S. G. N. F. K. Y T K P S F. F. A

CTGATAAGATTTTATCTAGCTCTGATCTCGATTCTATCTTAAGTTTTGCTATCAAGTTCA 137 D. K. I L S S S D L D S I L S F. A. I. K. F. N

ACGATTATGAAAGAGGTAACACGGACGGTGACCAAATTAAGATTCAATTAAAGAATGTTA. 157 D Y E R G N T D G D Q I K I Q L. K. N. W K

AATTCAACCCAAAGGAAAATGCTCCAGAAGATAAGGCTTTCGATGATGGTTTAAGGGATT 177 F N P K E N A P E D K. A F D D G L R D S

CTCAACGTGGTACTGTCGTTGAAATGAAATACTCATCTAGAGATTACACCGTCAAGGAAT 197 Q R G T V V E M K Y S S R D Y T V K E S

CTGAAGCTGACAAATACGAAAAGCACGCTTGGGTTTACCTTCCAGCTGGTTATGAAGCTG 217 E. A. D. K. Y. E. K. H. A. W. W. Y L. P A G Y E A D

ATAACAAGGATAAGAAATACCCATTAGTTGTTTTACTTCACGGTTATGGTCAAAATGAAA 237 N. K. D. K. K. Y P L v V L L H G Y G Q N E N

ACACTTGGGGTCTTTCCAACAAGGGTCGTGGTGGTAAGATCAAGGGTTACATGGACAGAG 257 T W G L S N K G R G G. K. I K G Y M D R G

GTATGGCTAGTGGTAATGTTGAAAAGTTTGTTCTTGTTGCCGCTACTGGTGTTGCCAGTA 27 M. A. S. G. N. W. E. K. F W L W A A T G W A S K

US 6,365,390 B1 43 44

TABLE 10-continued Nucleotide and Deduced Amino Acid Sequence for Ruminococcus sp. Xylanase (Xynl) 2701 ccaggcaagg atatgttcat ggagc acco a ggctgitatgc aggagagcga aat gaagttc 2761 agaga.cgttg gacct gagcc gaatgtatto atgataa.cag gC gg cacaaa cqacg gcqtc 2821 gtaggaacat tocccaa.gca gtacago gat atccttacaa gaaacgg.cgt to ac caacgt. 2881 ttaccagtct atc.cctaacg gC ggacacga cqcaggctot gtaaag.cctic atcto tacac 2941 attcatgaga tacgcattca aataatgata tagttgacat atgaaggaca gcqctittatg 3001 c.gctgtctitt cittitttgttgc aaaaagaaaa gocatttgag cittittgaagic toaaatggct 3061 tatatttata atagtatago ttattotgtt citgag agcct coaca

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS: 24 <210> SEQ ID NO 1 &2 11s LENGTH 2.8 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 1 taggat.cccc totagoagaa aatcc titc

<210> SEQ ID NO 2 &2 11s LENGTH 2.8 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 2 tacatatgcc tdtag.cagaa aatcctitc

<210> SEQ ID NO 3 &2 11s LENGTH 32 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 3 gaggaagctt ttacatggaa gaaatatgga ag

<210> SEQ ID NO 4 &2 11s LENGTH 29 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 4 tacatatgct totcacaata agcagtaca

<210 SEQ ID NO 5 US 6,365,390 B1 45 46

-continued

&2 11s LENGTH 29 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 5 taggat.ccct totcacaata agcagtaca 29

<210> SEQ ID NO 6 &2 11s LENGTH: 31 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 6 gaggaagctt ttagttgttg gcaac.gcaat a 31

<210 SEQ ID NO 7 &2 11s LENGTH 29 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 7 gaggaagctt actitccacac attaaaatc 29

<210 SEQ ID NO 8 &2 11s LENGTH 29 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 8 gaggaagctt agtttc catc cct cqtcaa 29

<210 SEQ ID NO 9 &2 11s LENGTH 29 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 9 gaggaagctt agt cataatc titcc.gcttic 29

<210> SEQ ID NO 10 &2 11s LENGTH 32 &212> TYPE DNA <213> ORGANISM: Artificial Sequence &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: oligonucleotide used in polymerase chain reaction. <400 SEQUENCE: 10 gaggaagctt aaacgc.caaa agtgalaccag to 32

<210> SEQ ID NO 11 &2 11s LENGTH 3507

US 6,365,390 B1 S3

-continued gaa citt gag cct ttg att gta gta aCa cc c a cit ttcaac ggc gga aac 2920 Glu Telu Glu Pro Teu Ile Wall Wall Thr Pro Phe Asin Gly Gly Asn 895 9 OO 905 tgc acg gcc Cala aac titt tat cag gaa titc. gg Caa aat gto att cct 2968 Cys Thr Ala Glin Asn Phe Glin Glu Phe rg Glin Asn Wall Ile Pro 910 915 92.0 titt gtg gaa agc aag tac tct act tat gca a.a. tca aCa acc cca cag Phe Wall Glu Ser Lys Ser Thr Ala lu Ser Thr Thr Pro Glin 925 930 935 gga ata gcc gct toa aga atg cac aga ggit t to ggc gga titc. toa atg Gly Ile Ala Ala Ser Arg Met His Arg Gly P he Gly Gly Phe Ser Met 940 945 950 955 gga gga ttg aCa a Ca tgg tat gta atg gtt ac toc citt gat tac gtt 3112 Gly Gly Telu Thr Thr Trp Wall Met Wall sn Cys Telu Asp Tyr Wall 96.O 965 970 gca tat titt atg cct tta agc ggit gac tac gg tat gga aac agt cc.g 3160 Ala Phe Met Pro Teu Ser Gly Asp T rp Tyr Gly Asn Ser Pro 975 98O 985 cag gat aag gct aat toa att gct gaa gca tit aac aga to c gga citt 3208 Glin Asp Lys Ala Asn Ser Ile Ala Glu Ala s l e Asn Arg Ser Gly Teu 99 O 995 10 OO toa aag agg gag tat titc. gta titt gCg gcc to c gac cat att 32.56 Ser Lys Arg Glu Phe Wall Phe Ala Ala hir Gly Ser Asp His Ile 1005 1010 10 15 gca tat gct aat atg aat cct Cala att gaa g ct atg aag gct ttg cc.g 3304 Ala Tyr Ala Asn Met Asn Pro Glin Ile Glu A la Met Lys Ala Teu Pro 1020 1025 1030 1035

Cat titt gat tat act tog gat titt to c a.a.a. g git aat titt tac titt citt 3352 His Phe Asp Tyr Thr Ser Asp Phe Ser Lys G ly Asn Phe Tyr Phe Teu 1040 1045 105 O

gta gct gcc act cac tgg tgg gga t ac gta aga cat tat att 34 OO Wall Ala Pro Gly Ala Thr His Trp Trp Gly T yr Val Arg His Tyr Ile 1055 1060 1065 tat gat gca citt coa tat titc. titc. cat gaa t gaatgagaa agaaaaac at 3450 Tyr Asp Ala Lieu Pro Tyr Phe Phe His Glu 1070 1075

gattgagttt gtaatcaata aaaaaaggaa ttttittagt g g to to caggt t attgaa 35. Of

SEQ ID NO 12 LENGTH 1077 TYPE PRT ORGANISM: Clostridium thermocellum

<400 SEQUENCE: 12

Met Lys Asn Lys Arg Wall Teu Ala Ile Thir Ala Telu Wall Wall Teu 1 5 10 15

Teu Gly Wall Phe Phe Wall Teu Pro Ser Asn I le Ser Glin Teu Ala 2O 25 30

Asp Tyr Glu Wall Wall His Asp Thr Phe Glu W all Asn Phe Asp Gly Trp 35 40 45

Asn Telu Gly Wall Asp Thr Telu Thr A la Wall Glu Asn Glu Gly 5 O 55 60

Asn Asn Gly Thr Arg Gly Met Met Wall Ile A sin Arg Ser Ser Ala Ser 65 70 75

Asp Gly Ala Ser Glu Gly Phe Tyr L. eu Asp Gly Gly Wall Glu 85 90 95

Ser Wall Phe Wall His Asn G ly Thr Gly Thr Glu Thr 100 105 110 US 6,365,390 B1 SS

-continued

Phe Telu Ser Wall Ser Telu Asp Ser G lu Thr Glu Glu Glu Asn 115 120 125

Glu Wall Ile Ala Thr Lys Asp Wall Wall A la Gly Glu Trp Thr Glu 130 135 1 4 0

Ile Ser Ala Lys Ala Pro Thr A la Wall Asn. Ile Thr Leu 145 15 O 155 160

Ser Ile Thr Thr Asp Ser Thr Wall Asp Phe I le Phe Asp Asp Val Thr 1.65 170 175

Ile Thr Arg Lys Gly Met Ala Glu Ala Asn Thr Val Tyr Ala Ala Asn 18O 185 190

Ala Wall Telu Asp Met Ala Asn Tyr P he Arg Val Gly Ser Val 195 200 2O5

Teu Asn Ser Gly Thr Wall Asn Asn Ser Ser I le Lys Ala Lieu. Ile Leu 210 215 220

Arg Glu Phe Asn Ser Ile Thr Glu Asn G lu Met Lys Pro Asp Ala 225 230 235 240

Thr Telu Wall Glin Ser Gly Ser Thr Asn Thr A sn Ile Arg Val Ser Lieu 245 250 255

Asn Arg Ala Ala Ser Ile Teu Asn Phe A la Glin Asn. Asn. Ile Ala 260 265 27 O

Wall Arg Gly His Thr Teu Wall Trp His Ser G lin Thr Pro Gln Trp Phe 275 280 285

Phe Lys Asp Asn Phe Glin Asp Asn Gly Asn T rp Wal Ser Glin Ser Wall 29 O 295 3OO

Met Asp Glin Arg Leu Glu Ser Ile Lys A Sn Met Phe Ala Glu Ile 305 310 315 320

Glin Arg Glin Pro Ser Teu Asn Telu Tyr A la Tyr Asp Val Val Asn 325 330 335

Glu Ala Wall Ser Asp Asp Ala Asn Arg Thr A rig Tyr Tyr Gly Gly Ala 340 345 350

Arg Glu Pro Gly Tyr Gly Asn Gly Arg Ser P ro Trp Val Glin Ile Tyr 355 360 365

Gly Asp Asn Phe Ile Glu Ala Phe Thr Tyr Ala Arg Lys Tyr 370 375 38O

Ala Pro Ala Asn Lys Teu Asn A sp Tyr Asin Glu Tyr Trp 385 390 395 400

Asp His Arg Asp Ile Ala Ser Ile Cys Ala Asn Lieu. Tyr Asn 405 410 415

Gly Telu Telu Asp Gly Wall Gly Met Glin S er His Ile Asn Ala Asp 420 425 430

Met Asn Gly Phe Ser Gly Ile Glin Asn Tyr L. ys Ala Ala Leu Gln Lys 435 4 40 4 45

Ile Asn Ile Gly Cys Asp Wall Glin Ile Thr Glu Lieu. Asp Ile Ser 450 455 460

Thr Glu Asn Gly Lys Phe Ser Telu Glin Glin G lin Ala Asp Lys Tyr Lys 465 470 475 480

Ala Wall Phe Glin Ala Ala Wall Asp Ile Asn Thr Ser Ser Lys Gly 485 490 495

Wall Thr Ala Wall Wall Trp Gly Pro Asp Ala Asn. Thir Trp 5 OO 505 510

Teu Gly Ser Glin Asn Ala Pro Telu Telu Phe Ala Asn. Asn Glin Pro 515 52O 525 US 6,365,390 B1 57 58

-continued Pro Ala Asn Ala Wall Ala Ser Ile I le Pro Glin Ser Glu Trp 530 535 540

Gly Asp Gly Asn Asn Pro Ala Gly Gly Gly G ly Gly Gly Lys Pro Glu 545 550 555 560

Glu Pro Asp Ala Asn Gly His A sp Thr Phe Glu Gly Ser 565 570 575

Wall Gly Glin Trp Thr Ala Arg Gly Pro Ala Glu Val Lieu Lleu Ser Gly 585 59 O

Arg Thr Ala Gly Ser Glu Ser Telu L. eu Val Arg Asn Arg Thr 595 600 605

Ala Ala Trp Asn Ala Glin Arg Ala Telu A sin Pro Arg Thr Phe Val 610 615 62O

Pro Gly Asn Thr Cys Phe Ser Wall Wall A la Ser Phe Ile Glu Gly 625 630 635 640

Ala Ser Ser Thr Thr Phe Met Telu G ln Tyr Val Asp Gly Ser 645 650 655

Gly Thr Glin Arg Asp Thr Ile Asp Met L ys Thr Val Gly Pro Asn 660 665 670

Glin Trp Wall His Teu Asn Pro Glin A rig Ile Pro Ser Asp Ala 675 680 685

Thr Asp Met Wall Wall Glu Thr Ala A sp Asp Thir Ile Asin Phe 69 O. 695 7 OO

Tyr Ile Asp Glu Ala Ile Gly Ala Wall Ala G ly Thr Val Ile Glu Gly 705 710 715 720

Pro Ala Pro Glin Pro Thr Glin Pro Pro Wall L. eu Lieu Gly Asp Wall Asn 725 730 735

Gly Asp Gly Thr Ile Asn Ser Thr Asp Telu Thr Met Leu Lys Arg Ser 740 745 750

Wall Telu Arg Ala Ile Thr Teu Thr Asp Asp A la Lys Ala Arg Ala Asp 755 760 765

Wall Asp Asn Gly Ser Ile Asn Ser Thr A sp Wall Leu Lleu Lleu Ser 770 775 78O

Arg Telu Telu Arg Wall Ile Asp Phe Pro Wall Ala Glu Asn Pro 785 790 795 8OO

Ser Ser Ser Phe Lys Glu Ser Ala Wall G ln Tyr Arg Pro Ala Pro 805 810 815

Asp Ser Telu Asn Pro Pro Glin Ala G ly Arg Ile Val Lys Glu 820 825 830

Thr Thr Gly Ile Asn Gly Thr Ser L eu Asn Val Tyr Leu Pro 835 840 845

Gly Asp Pro Asn Lys Asn I le Phe Tyr Leu Met His 85 O 855 860

Gly Gly Gly Glu Asn Glu Asn Thr Ile Phe S er Asn Asp Wall Lys Lieu 865 870 875 88O

Glin Asn Ile Telu Asp His Ala Ile Met Asn G ly Glu Lieu Glu Pro Leu 885 890 895

Ile Wall Wall Thr Pro Thr Phe Asn Gly Gly A sin Cys Thr Ala Glin Asn 9 OO 905 910

Phe Glin Glu Phe Arg Glin Asn Wall Ile Pro Phe Val Glu Ser Lys 915 920 925

Ser Thr Ala Glu Ser Thr Thr Pro G lin. Gly Ile Ala Ala Ser 930 935 940

Arg Met His Arg Gly Phe Gly Gly Phe Ser M. et Gly Gly Leu Thir Thr US 6,365,390 B1 59 60

-continued

945 950 955 96.O Trp Tyr Val Met Val Asn Cys Leu Asp Tyr V all Ala Tyr Phe Met Pro 965 970 975 Leu Ser Gly Asp Tyr Trp Tyr Gly Asn. Ser Pro Glin Asp Lys Ala Asn 98O 985 99 O Ser Ile Ala Glu Ala Ile Asn Arg Ser Gly L. eu Ser Lys Arg Glu Tyr 995 1 OOO 1005 Phe Val Phe Ala Ala Thr Gly Ser Asp His I le Ala Tyr Ala Asn Met 1010 10 15 1020 Asn Pro Glin Ile Glu Ala Met Lys Ala Leu Pro His Phe Asp Tyr Thr O25 1030 1035 1040 Ser Asp Phe Ser Lys Gly Asn Phe Tyr Phe L eu Val Ala Pro Gly Ala 1045 1050 1055 Thr His Trp Trp Gly Tyr Val Arg His Tyr I le Tyr Asp Ala Leu Pro 1060 1065 1 OFO Tyr Phe Phe His Glu 1075

<210> SEQ ID NO 13 &2 11s LENGTH 222 &212> TYPE DNA <213> ORGANISM: Clostridium thermocellum &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (101) . . (2611) <400 SEQUENCE: 13 atatataaat aaggg tatta attctgcaaa aagaaaagtig tittgctacat g aggtocatt 60 aatttittatt ttatat cata aatcaaaaag gaggagaaac atg tda a ga aaa citt 115 Met Ser Arg Lys Lieu 1 5 titc agit gta tta citt gtt g g c titg atg citt a tig aca tog titg citt gtc 163 Phe Ser Val Leu Leu Val Gly Leu Met Leu M et Thr Ser Leu Leu Val 10 15 20 aca ata agc agt aca to a gcg gCatcc ttg c ca acc at g cc.g. cct tcg 211 Thir Ile Ser Ser Thir Ser Ala Ala Ser Leu Pro Thr Met Pro Pro Ser 25 30 35 gga tat gac cag gta agg aac ggc gtt cog a ga ggg cag gtc gta aat 259 Gly Tyr Asp Glin Val Arg Asn Gly Val Pro Arg Gly Glin Val Val Asn 40 45 5 O att tot tat titc. tcc acg gcc acc aac agt a cc agg cc g gca aga gtt 3 Of Ile Ser Tyr Phe Ser Thr Ala Thr Asn Ser T hr Arg Pro Ala Arg Val 55 60 65 tat ttg cc g cc g g ga tat toa aag gac aaa a aa tac agt gtt ttg tat 355 Tyr Lieu Pro Pro Gly Tyr Ser Lys Asp Llys Lys Tyr Ser Val Lieu. Tyr 70 75 8O 85 citc tta cac ggc ata ggc ggit agt gala aac g ac togg titc gaa gqg gga 403 Leu Lieu. His Gly Ile Gly Gly Ser Glu Asn. A sp Trp Phe Glu Gly Gly 90 95 100 ggc aga gcc aat gtt att gcc gac aat citg a tit gcc gag gga aaa atc 451 Gly Arg Ala Asn Val Ile Ala Asp Asn Lieu. I le Ala Glu Gly Lys Ile 105 110 115 aag coc citg ata att gta aca cog aat act a ac goc goc ggit cog gga 499 Lys Pro Leu Ile Ile Val Thr Pro Asn Thr A sn Ala Ala Gly Pro Gly 120 125 130 ata gC g gac ggit tat gala aat titc aca aaa g at ttg citc aac agt citt 547 Ile Ala Asp Gly Tyr Glu Asn. Phe Thr Lys A. sp. Leu Lieu. Asn. Ser Lieu 135 1 4 0 145

US 6,365,390 B1 65 66

-continued

775 785 aat acc titt gta atg tgg gga titc. aCa gat a a tac aCa tgg att cog 2515 Asn Thr Phe Wall Met Trp Gly Phe Thr Asp L. ys Thr Trp Ile Pro 790 795 8 OO 805 gga act titc. cca gga ggC aat cca ttg a tit gac agc aat tac 2563 Gly Thr Phe Pro Gly Gly Asn Pro Telu I le Asp Ser Asn Tyr 810 815 820 aat cc.g a.a.a. cc.g gca aat gca ata aag gct citt at g g g c tat 2611 Asn Pro Lys Pro Ala Asn Ala Ile Lys Ala Telu Met Gly Tyr 825 830 835 tgataatticc gaaaagctga gcagataatg atgcc.gtaaa gocggctitct g aattalagag 2671

cc.ggctttac ggagatatac tttittacggc agaatacct g ttattitc.cat g 222

SEQ ID NO 14 LENGTH 837 TYPE PRT ORGANISM: Clostridium thermocellum

<400 SEQUENCE: 14

Met Ser Arg Lys Teu Phe Ser Wall Telu Telu W all Gly Telu Met Leu Met 1 5 10 15

Thr Ser Telu Telu Wall Thr Ile Ser Ser Thr Ala Ala Ser Leu Pro 25 30

Thr Met Pro Pro Ser Gly Asp Glin Wall Asn Gly Val Pro Arg 35 40 45

Gly Glin Wall Wall Asn Ile Ser Phe Ser Thr Ala Thr Asn Ser Thr 5 O 55 60

Arg Pro Ala Arg Wall Tyr Teu Pro Pro Gly Tyr Ser Asp Lys Lys 65 70 75 8O

Ser Wall Telu Tyr Teu Teu His Gly Ile Gly Ser Glu Asn Asp 85 90 95

Trp Phe Glu Gly Gly Gly Arg Ala Asn Wall I le Ala Asp Asn Lieu. Ile 100 105 110

Ala Glu Gly Ile Pro Telu Ile Ile W all Thr Pro Asn. Thir Asn 115 120 125

Ala Ala Gly Pro Gly Ile Ala Asp Gly Tyr G lu. Asn. Phe Thr Lys Asp 130 135 1 4 0

Teu Telu Asn Ser Teu Ile Pro Ile Glu S er Asn Tyr Ser Val Tyr 145 15 O 155 160

Thr Asp Arg Glu His Arg Ala Ile Ala Gly T. eu Ser Met Gly Gly Gly 1.65 170 175

Glin Ser Phe Asn Ile Gly Teu Thr Asn Telu A sp Lys Phe Ala Tyr Ile 18O 185 190

Gly Pro Ile Ser Ala Ala Pro Asn Thr P ro Asn Glu Arg Lieu Phe 195 200 2O5

Pro Asp Gly Gly Ala Ala Arg Glu Lys L. eu Lys Lieu Leu Phe Ile 210 215 220

Ala Gly Thr Asn Asp Ser Telu Ile Gly P he Gly Glin Arg Val His 225 230 235 240

Glu Wall Ala Asn Asn Ile Asn His V all Tyr Trp Lieu. Ile Glin 245 250 255

Gly Gly Gly His Asp Phe Asn Wall Trp Lys Pro Gly Leu Trp Asn Phe 260 265 27 O

Teu Glin Met Ala Asp Glu Ala Gly Telu Thr A rig Asp Gly Asn Thr Pro 275 280 285 US 6,365,390 B1 67 68

-continued

Wall Pro Thr Pro Ser Pro Lys Pro Ala Asn Thr Arg Ile Glu Ala Glu 29 O 295 3OO

Asp Asp Gly Ile Asn Ser Ser Ser Ile G lu Ile Ile Gly Val Pro 305 310 315 320

Pro Gly Gly Arg Gly Ile Gly Tyr Ile Thr Ser Gly Asp Tyr Lieu 325 330 335

Wall Ser Ile Asp Phe Gly Asn Gly. A la Thr Ser Phe Lys Ala 340 345 350

Wall Ala Asn Ala Asn Thr Ser Asn Ile G lu Leu Arg Lieu. Asn Gly 355 360 365

Pro Asn Gly Thr Teu Ile Gly Thr Telu Ser V all Lys Ser Thr Gly Asp 370 375

Trp Asn Thr Glu Glu Glin Thr Ser I le Ser Lys Val Thr Gly 385 390 395 400

Ile Asn Asp Telu Tyr Teu Wall Phe Gly P ro Val Asn. Ile Asp Trp 405 410 415

Phe Thr Phe Gly Wall Glu Ser Ser Ser Thr Gly Lieu Gly Asp Lieu. Asn 420 425 430

Gly Asp Gly Asn Ile Asn Ser Ser Asp Leu G lin Ala Telu Lys Arg His 435 4 40 4 45

Teu Telu Gly Ile Ser Pro Teu Thr Gly Glu A la Telu Telu Arg Ala Asp 450 455 460

Wall Asn Arg Ser Gly Lys Wall Asp Ser Thr A sp Tyr Ser Val Lieu Lys 465 470 475 480

Arg Ile Leu Arg Ile Ile Thr Glu Phe Pro Gly Glin Gly Asp Wall 485 490 495

Glin Thr Pro Asn Pro Ser Wall Thr Pro Thr G in Thr Pro Ile Pro Thr 5 OO 505 510

Ile Ser Gly Asn Ala Teu Arg Asp Ala Glu Ala Arg Gly Ile 515 525

Ile Gly Thr Wall Asn Tyr Pro Phe Tyr A sin Asn. Ser Asp Pro Thr 530 535 540

Tyr Asn Ser Ile Teu Glin Arg Glu Phe Ser M et Val Val Cys Glu Asn 545 550 555 560

Glu Met Phe Asp Ala Teu Glin Pro Arg G lin. Asn Val Phe Asp Phe 565 570 575

Ser Gly Asp Glin Teu Teu Ala Phe Ala Glu Arg Asn Gly Met Glin 585 59 O

Met Arg Gly His Thr Teu Ile Trp His Asn G. ln Asn Pro Ser Trp Lieu 595 600 605

Thr Asn Gly Asn Trp Asn Arg Asp Ser Lieu. L. eu Ala Wal Met Lys Asn 610 615

His Ile Thr Thr Wall Met Thr His Lys Gly Lys Ile Val Glu Trp 625 630 635 640

Asp Wall Ala Asn Glu Met Asp Asp Ser Gly Asn Gly Lieu Arg Ser 645 650 655

Ser Ile Trp Arg Asn Wall Ile Gly Glin Asp Tyr Lieu. Asp Tyr Ala Phe 660 665 670

Arg Ala Arg Glu Ala Asp Pro Asp Ala L. eu Lleu Phe Tyr Asn Asp 675 680 685

Asn Ile Glu Asp Teu Gly Pro Ser A sn Ala Wall Phe Asn Met 69 O. 695 7 OO US 6,365,390 B1 69 70

-continued

Ile Ser Met Glu Arg Gly Wall Pro I le Asp Gly Val Gly Phe 705 710 715 720

Glin His Phe Ile Asn Gly Met Ser Pro Glu Tyr Lieu Ala Ser Ile 725 730 735

Asp Glin Asn Ile Arg Ala Glu Ile G ly Val Ile Val Ser Phe 740 745 750

Thr Glu Ile Asp Ile Arg Ile Pro Glin Ser Glu Asn Pro Ala Thir Ala 755 760 765

Phe Glin Wall Glin Ala Asn Asn Glu L. eu Met Lys Ile Cys Lieu 770 775 78O

Ala Asn Pro Asn Asn Thr Phe Wall Met T rp Gly Phe Thr Asp Lys 785 790 795 8OO

Thr Trp Ile Pro Gly Thr Phe Pro Gly Tyr Gly Asn Pro Leu. Ile 805 810 815

Asp Ser Asn Asn Pro Pro Ala T yr Asn Ala Ile Lys Glu 820 825 830

Ala Telu Met Gly 835

SEQ ID NO 15 LENGTH 31 O5 TYPE DNA ORGANISM: Ruminococcus sp. FEATURE: NAME/KEY: CDS LOCATION: (529).. (2895) <400 SEQUENCE: 15 gatctttittc ataagtatgc ccc cattatt aagttttitta gatgcttgcc t attaattitcc 60 cittctggittt tgttgaactitc ttaacggtoa gagttcacac tittctittata t attgttctat 120 attataatgt atattgtagt aataataitac caaaattittc ctittaagtaa c aatat ctitt 18O acco tattta gcaatttitta acgatattitt ataatttgat tatttittaaa c tatacagtg 240 taaatacitat tatttaaaaa. gtoccaccalaa aatgtaaaat acaatgatat c ttaaacgta aaaacctgta caatgattgt tdatc.tttitt acattattgt tatatatogt c ttggtatag 360 tdag caatitt ttagtcaaga tatacaaggit cc.gcaaattt taacttgcaa t taa.caggto 420 agatgttitta taatgatato atagaaataa aaggagcact toggct cotta t ggggattac 480 tgaaatcata agtttgctitt ttittctaaaa. aacaaaggag to attgaa gtg aala aaa. 537 Wall Lys Lys 1 a Ca gtt a.a.a. Cala titc. atc agc agt gcc gtt a ca gcg tta atg gtg gct 585 Thr Wall Glin Phe Ile Ser Ser Ala Wall Thir Ala Leu Met Wall Ala 10 15 gca agc citg cct gcc gtt cct to c gtg a.a. C. g ca gcc gac goc cag cag 633 Ala Ser Telu Pro Ala Wall Pro Ser Wall Asn A la Ala Asp Ala Glin Glin 2O 25 35 aga ggC aat atc ggC ggit titc. gat tac gaa tg tog aac cag aac ggit 681 Arg Gly Asn Ile Gly Gly Phe Asp Glu e t Trp Asn Glin Asn Gly 40 45 5 O cag gga cag gta toa atg acg cct aag gca gc tot titc acc tgc tica 729 Glin Gly Glin Wall Ser Met Thr Pro Lys Ala Ser Phe Thr Cys Ser 55 60 65 tgg agc a.a. C. att gaa aac titc. citc. gca cgt. tg ggc aag aac tac gac 777 Trp Ser Asn Ile Glu Asn Phe Telu Ala Arg e t Gly Lys Asn Tyr Asp 70 75 8O agc cag a.a.a. aag aac tac aag gct titc. gga g ac att acc citc toc tac 825

US 6,365,390 B1 75 76

-continued

acc cca ggC aag gat atg titc. atg gag cac C. ca. ggC tgt atg Cag gag 2745 Thr Pro Gly Lys Asp Met Phe Met Glu His P ro Gly Cys Met Glin Glu 725 730 735 agc gaa atg aag titc. aga gac gtt gga cct g ag cc.g aat gta titc atg 2793 Ser Glu Met Lys Phe Arg Asp Wall Gly Pro G lu Pro Asn Wall Phe Met 740 745 750 755 alta aCa ggC ggC a Ca aac gac ggC gtc gta aCa titc. ccc. aag Cag 2841 Ile Thr Gly Gly Thr Asn Asp Gly Wall Wall Thr Phe Pro Lys Glin 760 765 770

agc gat atc citt a Ca aga a.a. C. ggC gtt Cala cgt tta coa gtc 2889 Ser Asp Ile Teu Thr Arg Asn Gly Wall Glin Arg Leu Pro Val 775 78O 785

cc c taacgg.cgga cacgacgcag gCtctgtaaa gC ct catcto it acacattca 2.945 Pro tgagatacgc attcaaataa toatatagitt gacatatgaa gacago got t tatgcgctg 30 O5 totttcttitt tatgcaaaaa gaaaag.ccat ttgagcttitt gaagctoaaa t g g cittatat 3O 65 ttataatagt atagottatt citgttctgag agcctccaca 3105

SEQ ID NO 16 LENGTH 789 TYPE PRT ORGANISM: Ruminococcus sp. <400 SEQUENCE: 16

Wall Lys Lys Thr Wall Glin Phe Ile Ser Ala Wall Thr Ala Lieu 5 10 15

Met Wall Ala Ala Ser Teu Pro Ala Wall Pro Wall Asn Ala Ala Asp 25 30

Ala Glin Glin Arg Gly Asn Ile Gly Gly Phe Glu Met Trp Asn 35 40 45

Glin Asn Gly Glin Gly Glin Wall Ser Met Thr Lys Ala Gly Ser Phe 5 O 55 60

Thr Ser Trp Ser Asn Ile Glu Asn Phe Ala Arg Met Gly Lys 65 70 75 8O

Asn Asp Ser Glin Asn Lys A la Phe Gly Asp Ile Thr 85 90 95

Teu Ser Asp Wall Glu Thr Pro Asn Ser Tyr Met Cys 100 105 110

Wall Gly Trp Thr Arg Asn Pro Telu Met Tyr Tyr Ile Wall Glu 115 120 125

Gly Trp Gly Asp Trp Arg Pro Pro Gly Asn A sp Gly Glu Asn Lys Gly 130 135 1 4 0

Thr Wall Thr Telu Asn Gly Asn Thr Asp I le Arg Lys Thr Met Arg 145 15 O 155 160

Asn Glin Pro Ser Teu Asp Gly Thr Ala Thr Phe Pro Gln Tyr Trp 1.65 170 175

Ser Wall Arg Glin Ser Gly Ser Glin Asn A sn Thir Thr Asn Tyr Met 18O 185 190

Gly Thr Ile Ser Wall Ser Lys His Phe A sp Ala Trp Ser Lys Ala 195 200

Gly Telu Asp Met Ser Gly Thr Telu Glu W al Ser Lieu. Asn Ile Glu 210 215 220

Gly Arg Ser Ser Gly Asn Ala Asn Wall Lys Ala Ile Ser Phe Asp 225 230 235 240 US 6,365,390 B1 77 78

-continued

Gly Ser Ile Pro Glu Pro Thr Ser Glu Pro W all Thr Glin Pro Wal Wall 245 250 255

Ala Glu Pro Asp Ala Asn Gly Tyr P he Lys Glu Lys Phe Glu 260 265 27 O

Ser Gly Ala Gly Asp Trp Ser Ala Arg Gly Thr Gly Ala Lys Val Thr 275 280 285

Ser Ser Asp Gly Phe Asn Gly Ser Gly I le Leu Val Ser Gly Arg 29 O 295 3OO

Gly Asp Asn Trp His Gly Ala Glin Telu Thr L. eu Asp Ser Ser Ala Phe 305 310 315 320

Thr Ala Gly Glu Thr Ser Phe Gly Ala L. eu Val Lys Glin Asp Gly 325 330 335

Glu Ser Ser Thr Ala Met Telu Thr Telu G. ln Tyr Asn Asp Ala Ser 340 345 350

Gly Thr Ala Asn Asp Wall Ala Glu P he Thr Ala Pro Lys Gly 355 360 365

Glu Trp Wall Asp Teu Ser Asn Thr Ser Phe T hr Ile Pro Ser Gly Ala 370 375 38O

Ser Asp Telu Ile Teu Tyr Wall Glu Ala Pro A sp Ser Lieu. Thr Asp Phe 385 390 395 400

Ile Asp Asn Ala Phe Gly Gly Ile Lys A sn Thr Ser Pro Leu Glu 405 410 415

Asp Wall Gly Ser His Thr Ile Ser Thr Pro G ly Ser Glu Thir Thr Thr 420 425 430

Wall Thr Thr Ala Ser ASn Gly Ile Arg G ly Asp Ile ASn Gly Asp 435 4 40 4 45

Gly Wall Ile Asn Ser Phe Asp Telu Ala Pro L. eu Arg Arg Gly Ile Leu 450 455 460

Lys Met Met Ser Gly Ser Gly Ser Thr Pro G lu. Asn Ala Asp Wall Asn 465 470 475 480

Gly Asp Gly Thr Wall Asn Wall Ala Asp Telu L. eu Lleu Lieu Gln Lys Phe 485 490 495

Ile Telu Gly Met Glu Ser Phe Pro Asp Pro Wall. Thir Thir Thir Thr 5 OO 505 510

Thr Pro Ile Thr Thr Thr Thr Glu I le Wall. Thir Thir Thir Thr 515 525

Ser Ser Ser Ser Ser Ser Ser Gly Asn L. eu Asn Ala Asp Ile Arg 530 535 540

Lys Asp Met Pro Thr Ser Wall Pro Gly Gly A sin Glu Lys Ser Gly Gly 545 550 555 560

Wall Glu Lys Thr Asn Cys L ys Phe Thr Gly Gly Glin 565 570 575

Ser Asn Wall Ile Teu Pro Pro Asn Tyr Ser Ala Ser Lys Glin 585 59 O

Pro Wall Met Wall Teu His Gly Ile G ly Gly Asn. Glu Gly Ser 595 600 605

Met Wall Ser Gly Met Gly Wall Glin Glu Telu L. eu Ala Gly Lieu. Thir Ala 610 615 62O

Asn Gly Lys Ala Glu Glu Met Ile Ile Wall L eu Pro Ser Glin Tyr Thr 625 630 635 640

Ser Asn Gly Asn Glin Gly Gly Gly Phe G ly Ile Asin Glin Glu Val 645 650 655 US 6,365,390 B1 79 80

-continued Cys Ala Ala Tyr Asp Asn. Phe Telu Tyr Asp I le Ser Asp Ser Lieu. Ile 660 665 670

Pro Phe Ile Glu Ala Asn Tyr Pro Wall T hir Gly Arg Glu Asn Arg 675 680 685

Ala Ile Thr Gly Phe Ser Met Gly Gly lu Ala Ile Tyr Ile Gly 6 9 O 695 7 OO

Leu Met Arg Pro Asp Leu Phe Ala Wall ly Gly Ala Cys Pro Ala 705 710 715 720

Pro Gly Ile Thr Pro Gly Lys Asp Met Phe M et Glu His Pro Gly Cys 725 730 735

Met Glin Glu Ser Glu Met Lys Phe Arg Asp W al Gly Pro Glu Pro Asn 740 745 750

Wall Phe Met Ile Thr Gly Gly Thr Asn Asp ly Val Val Gly Thr Phe 755 760 765

Pro Lys Glin Tyr Ser Asp Ile Telu Thr Arg A. sin Gly Val Asp Glin Arg 7 70 775 78O

Leu Pro Wall Tyr Pro 785

SEQ ID NO 17 LENGTH 1662 TYPE DNA ORGANISM: Orpinomyces sp. PC-2 FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1590) <400 SEQUENCE: 17 gtt gtt tot togt gaa act act tac ggit att ct tta cqt gat act aag 48 Wal Wall Ser Cys Glu Thir Thr Gly Ile hir Lieu Arg Asp Thr Lys 1 5 10 15 gaa aaa titc act gta ttcaaa gac ggit to c cit gct act gat att gtt 96 Glu Lys Phe Thr Val Phe Lys Asp Gly Ser la Ala Thr Asp Ile Wall 2O 25 30 gala to a gaa gat ggit toc gtt tot tgg att ct act gct gcc ggit ggit 144 Glu Ser Glu Asp Gly Ser Val Ser Trp Ile la Thr Ala Ala Gly Gly 35 40 45 gct ggt ggt gtt gcc titc tat gtt aag cit aac aag gala gaa att 192 Ala G y Gly Gly Val Ala Phe Wall Lys la Asn Lys Glu Glu Ile 5 O 55 60 aac gct aac tat gaa tot atc gat att aa at g gala tac act coa 240 Asn. I e Ala Asn Tyr Glu Ser Ile Asp Ile lu Met Glu Tyr Thr Pro 65 70 gtt aac aaa tog aat gat gct gct aag ac cca. agt titc tdt atg 288 Wall G u Asn Lys Trp Asn Asp Ala Ala Lys sn. Pro Ser Phe Cys Met 85 90 95 aga citt coa tagg gat tcc act ggit atg to ggt ggit tac gaa gat 336 Arg I e Lieu Pro Trp Asp Ser Thr Gly Met he Gly Gly Tyr Glu Asp 100 105 110 citt gala tac titc gat act coa gca a.a.a. tot gt aat ttcaaa tac act 384 Teu G u Tyr Phe Asp Thr Pro Ala Ser ly Asin Phe Lys Tyr Thr 115 120 125 att aag att cost toc titc titt gct gat aag tit tita tot agc tot gat 432 Ile Lys Ile Pro Ser Phe Phe Ala Asp Lys le Leu Ser Ser Ser Asp 1 30 135 1 4 0 citc. gat tot atc tita agt titt gct atc aag to aac gat tat gaa aga 480 Leu Asp Ser Ile Leu Ser Phe Ala Ile Lys he Asin Asp Tyr Glu Arg 145 15 O 155 160

a.a. C. acg gac gac caa att aag att aa tta aag aat gtt aaa 528 US 6,365,390 B1 81 82

-continued Gly Asn Thr Asp Gly Asp Ile Ile Lieu Lys Asn. Wall Lys 1.65 170 175 titc. a.a. C. cca aag gaa aat cca gaa gat gct titc gat gat ggit 576 Phe Asn Pro Lys Glu Asn Pro Glu Asp L ys Ala Phe Asp Asp Gly 18O 185 190 tta agg gat tot Cala cgt. act gtc gtt a.a. atg aaa tac toa tot 624 Teu Arg Asp Ser Glin Arg Thr Wall Wall Met Lys Tyr Ser Ser 195 200 2O5 aga gat tac acc gto aag tot gaa gct ac aaa tac gala aag cac 672 Arg Asp Thr Wall Lys Ser Glu Ala sp Lys Tyr Glu Lys His 210 220 gct tgg gtt tac citt cca ggit tat gaa gat aac aag gat aag 720 Ala Trp Wall Teu Pro Gly Glu Asp Asn Lys Asp Lys 225 230 235 240 a.a.a. tac cca tta gtt gtt tta citt cac ggit ggit caa aat gala aac 768 Pro Telu Wall Wall Teu Telu His Gly Gly Glin Asn. Glu Asn 245 250 255 act tgg ggit citt to c aac aag ggit cgt. ggit aag atc aag ggit tac 816 Thr Trp Gly Telu Ser Asn Lys Gly Arg Gly Lys Ile Lys Gly Tyr 260 265 27 O atg gac aga ggit atg gct agt ggit aat gtt gg ly aag titt gtt citt gtt 864 Met Asp Arg Gly Met Ala Ser Gly Asn Wall ll Lys Phe Val Leu Val 275 280 285 gcc gct act ggit gtt gcc agt aag aat tgg cca aac ggit tot got 912 Ala Ala Thr Gly Wall Ala Ser Lys Asn Trp g ly Pro Asn Gly Ser Gly 29 O 295 gtt gat citt gat ggit titc. aat gct titc. ggit g gt gaa citc aga aac gat 96.O Wall Asp Telu Asp Gly Phe Asn Ala Phe Gly G ly Glu Lieu Arg Asn Asp 305 310 315 32O tta citc. cca tac att aga gct cac titc. aat aag gtc gat cqt gat OO8 Teu Telu Pro Ile Arg Ala His Phe Asn Lys Val Asp Arg Asp 325 330 335 cac act gct tta gct ggit citt to c atg ggit ggit caa act atc agt His Thr Ala Telu Ala Gly Teu Ser Met Gly g I y Gly Glin Thr Ile Ser 340 345 350 att ggit att ggit gaa act citt gat gaa atc aac tac ggit tot titc 104 Ile Gly Ile Gly Glu Thr Teu Asp Glu Ile S e Asn Tyr Gly Ser Phe 355 360 365 tct cca gct tta titc. Cala act gct gaa gaa ttic ggit aag gtt aag 152 Ser Pro Ala Telu Phe Glin Thr Ala Glu Glu P he Phe Gly Lys Wall Lys 370 375 ggit a.a. C. titc. aag gaa gaa citt aga att cac citt tac atg act tdt 200 Gly Asn Phe Lys Glu Glu Teu Arg Ile His A sn Leu Tyr Met Thr Cys 385 390 395 400 ggit gat gct gat act tta gtt tac gat act cca agt tac gtt gaa 248 Gly Asp Ala Asp Thr Teu Wall Asp Thr Pro Ser Tyr Val Glu 405 410 415 gct tta aag aat tgg gat gct gtt gaa titc. aag gala tac act tac 296 Ala Telu Lys Asn Trp Asp Ala Wall Glu Phe e t Lys Glu Tyr Thr Tyr 420 425 430 cca ggit ggit act cac gat titc. cca gtt tgg t ac aga ggt titc aac gaa 344 Pro Gly Gly Thr His Asp Phe Pro Wall Trp T yr Arg Gly Phe Asn. Glu 435 4 40 4 45 titc. att Cala att gtt titc. a.a.a. aat Cala a.a.a. tt aag gala gaa cca att 392 Phe Ile Glin Ile Wall Phe Lys Asn Glin 3 al Lys Glu Glu Pro Ile 450 455 460 cat gct gat cca gta gaa gac cca tot gat g aa cca gtt agt gtt gat 4 40 His Ala Asp Pro Wall Glu Asp Pro Ser Asp G lu Pro Val Ser Val Asp 465 470 475 480 US 6,365,390 B1 83 84

-continued

cca tot gtt tot gto gaa gaa cca aat gac a git gaa tot toc tct gaa 1488 Pro Ser Wall Ser Wall Glu Glu Pro Asn Asp S er Glu Ser Ser Ser Glu 485 490 495

gat gaa cca gtg gtt a.a.a. a.a.a. act att aag c ac acc att gct aag aag 1536 Asp Glu Pro Wall Wall Lys Thr Ile Lys is Thir Ile Ala Lys Lys 5 OO 505 510 aag cca tot aag act aga act gtt acc aag a ag gtc att aag aag aag 1584 Lys Pro Ser Lys Thr Arg Thr Wall Thr Lys L ys Val Ile Lys Lys Lys 515 52O 525 aat a.a. C. taagaaagtt tagttagtac agtagt gtaa aaaaaaaaaa a aaatcaaaa 1640 Asn Asn 530

agaaactcgt gcc gaatticg at 1662

SEQ ID NO 18 LENGTH 530 TYPE PRT ORGANISM: Orpinomyces sp. PC-2

<400 SEQUENCE: 18

Wall Wall Ser Cys Glu Thr Thr Gly Ile Thir Lieu Arg Asp Thr 1 5 10 15

Glu Lys Phe Thr Wall Phe Asp Gly Ser A la Ala Thr Asp Ile Wall 2O 25 30

Glu Ser Glu Asp Gly Ser Wall Ser Trp Ile A la Thr Ala Ala Gly Gly 35 40 45

Ala Gly Gly Gly Wall Ala Phe Wall A la Asn Lys Glu Glu Ile 5 O 55 60

Asn Ile Ala Asn Glu Ser Ile Asp Ile Glu Met Glu Tyr Thr Pro 65 70 75

Wall Glu Asn Trp Asn Asp Ala Ala Lys A sin Pro Ser Phe Cys Met 85 90 95

Arg Ile Telu Pro Trp Asp Ser Thr Gly Met P he Gly Gly Tyr Glu Asp 100 105 110

Teu Glu Tyr Phe Asp Thr Pro Ala Ser G ly Asn. Phe Lys Thr 115 120 125

Ile Lys Ile Pro Ser Phe Phe Ala Asp I le Leu Ser Ser Ser Asp 130 135 1 4 0

Teu Asp Ser Ile Teu Ser Phe Ala Ile P he Asn Asp Tyr Glu Arg 145 15 O 155 160

Gly Asn Thr Asp Gly Asp Glin Ile Ile G lin. Leu Lys Asn. Wall Lys 1.65 170 175

Phe Asn Pro Lys Glu Asn Ala Pro Glu Asp Lys Ala Phe Asp Asp Gly 18O 185 190

Teu Arg Asp Ser Glin Arg Gly Thr Wall Wall Glu Met Lys Tyr Ser Ser 195 200

Arg Asp Thr Wall Glu Ser Glu Ala A sp Llys Tyr Glu Lys His 210 215 220

Ala Trp Wall Teu Pro Ala Gly Glu A la Asp Asn Lys Asp Lys 225 230 235 240

Pro Telu Wall Wall Teu Telu His Gly Tyr Gly Glin Asn. Glu Asn 245 250 255

Thr Trp Gly Telu Ser Asn Gly Arg Gly Gly Lys Ile Lys Gly Tyr 260 265 27 O

Met Asp Arg Gly Met Ala Ser Gly Asn Wall Glu Lys Phe Val Leu Val 275 280 285 US 6,365,390 B1 85 86

-continued

Ala Ala Thr Gly Val Ala Ser Lys Asn Trp G ly Pro Asn Gly Ser Gly 29 O 295 3OO

Val Asp Lieu. Asp Gly Phe Asn Ala Phe Gly G ly Glu Leu Arg Asn Asp 305 310 315 320

Leu Lleu Pro Tyr Ile Arg Ala His Phe Asn W all Lys Val Asp Arg Asp 325 330 335

His Thr Ala Lieu Ala Gly Lieu Ser Met Gly G ly Gly Glin Thir Ile Ser 340 345 350

Ile Gly Ile Gly Glu Thir Lieu Asp Glu Ile Ser Asn Tyr Gly Ser Phe 355 360 365

Ser Pro Ala Leu Phe Glin Thr Ala Glu Glu P he Phe Gly Lys Val Lys 370 375

Gly Asn. Phe Lys Glu Glu Lieu Arg Ile His A sn Telu Tyr Met Thr Cys 385 390 395 400

Gly Asp Ala Asp Thr Lieu Val Thr Tyr Pro Ser Tyr Val Glu 405 410 415

Ala Lieu Lys Asn Trp Asp Ala Wall Glu Phe M. et Lys Glu Tyr Thr Tyr 420 425 430

Pro Gly Gly Thr His Asp Phe Pro Wall Trp Tyr Arg Gly Phe Asn. Glu 435 4 40 4 45

Phe Ile Glin Ile Val Phe Lys Asn Glin Lys W all Lys Glu Glu Pro Ile 450 455 460

His Ala Asp Pro Val Glu Asp Pro Ser Asp G lu Pro Val Ser Val Asp 465 470 475 480

Pro Ser Wal Ser Wall Glu Glu Prd Asn Asp S er Glu Ser Ser Ser Glu 485 490 495

Asp Glu Pro Val Val Lys Lys Thir Ile Lys H is Thr Ile Ala Lys Lys 5 OO 505 510

Lys Pro Ser Lys Thr Arg Thr Wall Thr Lys Lys Val Ile Lys Lys Lys 515 52O 525

Asn. Asn 530

<210 SEQ ID NO 19 &2 11s LENGTH 4 OO &212> TYPE PRT <213> ORGANISM: Escherichia coli

<400 SEQUENCE: 19

Met Wal Met Glu Lieu. Asn. Glu Arg Asn Ile Thr Met Asn. Ile Lys Ile 1 5 10 15

Ala Ala Lieu. Thir Lieu Ala Ile Ala Ser Gly I le Ser Ala Glin Trp Ala 25 30

Ile Ala Ala Asp Met Pro Ala Ser Pro Ala Thr Ile Pro Wall 35 40 45

Gln Tyr Val Thr Glin Val Asn Ala Asp Asn Wall Thr Phe Arg 5 O 55 60

Phe Ala Pro Gly Ala Lys Asn Wal Ser Wall W all Wall Gly Val Pro Wall 65 70 75

Pro Asp Asin Ile His Pro Met Thr Lys Asp Ala Gly Val Trp Ser 85 90 95

Trp Arg Thr Pro Ile Leu Lys Gly Asn Telu Tyr Glu Tyr Phe Phe Asn 100 105 110

Val Asp Gly Val Arg Ser Ile Asp Thr Gly Thr Ala Met Thr Asn Pro US 6,365,390 B1 87 88

-continued

115 120 125

Glin Arg Glin Wall Asn Ser Ser Met Ile Telu V all Pro Gly Ser Tyr Leu 130 135 1 4 0

Asp Thr Arg Ser Wall Ala His Gly Asp Telu I le Ala Ile Thr Tyr His 145 15 O 155 160

Ser Asn Ala Telu Glin Ser Glu Arg Glin Met Tyr Val Trp Thr Pro Pro 1.65 170 175

Gly Thr Gly Met Gly Glu Pro Telu Pro V all Leu Tyr Phe Tyr His 18O 185 190

Gly Phe Gly Asp Thr Gly Arg Ser Ala Ile A sp Glin Gly Arg Ile Pro 195 200 2O5

Glin Ile Met Asp Asn Teu Teu Ala Glu Gly Lys Ile Llys Pro Met Leu 210 215 220

Wall Wall Ile Pro Asp Thr Glu Thr Asp Ala Lys Gly Ile Ile Pro Glu 225 230 235 240

Asp Phe Wall Pro Glin Glu Arg Arg Wall P he Tyr Pro Leu Asn Ala 245 250 255

Ala Ala Asp Arg Glu Teu Met Asn Asp I le Ile Pro Leu. Ile Ser 260 265 27 O

Arg Phe Asn Wall Arg Asp Ala Asp G ly Arg Ala Leu Ala Gly 275 280 285

Teu Ser Glin Gly Gly Glin Ala Telu Wall S er Gly Met Asn His Lieu 29 O 295 3OO

Glu Ser Phe Gly Trp Teu Ala Thr Phe Ser G ly Val Thr Thr Thr Thr 305 310 315 320

Wall Pro Asp Glu Gly Wall Ala Ala Arg Telu A sin Asp Pro Ala Ala Ile 325 330 335

Asn Glin Glin Telu Arg Asn Phe Thr Wall Wall W al Gly Asp Lys Asp Wal 340 345 350

Wall Thr Gly Asp Ile Ala Gly Telu Thr Glu Lieu Glu Gln Lys 355 360 365

Ile Asn Phe Asp Glin Glu Pro G ly Lieu. Asn His Glu Met 370 375 38O

Asp Wall Trp Arg Pro Ala Ala Ala Phe W al Glin Lys Lieu Phe Lys 385 390 395 400

SEQ ID NO 20 LENGTH 721 TYPE PRT ORGANISM Aspergillus fumigatus

<400 SEQUENCE: 20

Met Gly Ala Phe Arg Trp Teu Ser Ile Ala A la Ala Ala Ser Thr Ala 1 5 10 15

Teu Ala Telu Thr Pro Glu Glin Telu Ile Thr A la Pro Arg Arg Ser Glu 25 30

Ala Ile Pro Asp Pro Ser Gly Lys Wall Ala W all Phe Ser Thir Ser Glin 35 40 45

Ser Phe Glu Thr His Lys Arg Thr Ser T rp Trp Ser Lieu Lieu. Asp 5 O 55 60

Teu Thr Gly Glin Thr Wall Telu Thr A sin Asp Ser Ser Val Ser 65 70 75 8O

Glu Ile Wall Trp Teu Ser Asp Asp Ser Ile L eu Tyr Val Asn Ser Thr 85 90 95 US 6,365,390 B1 89 90

-continued Asn Ala Asp Ile Pro Gly Gly Wall Glu Leu T rp Val Thr Glin Ala Ser 100 105 110

Ser Phe Ala Gly Tyr Ala Ala Ser L. eu Pro Ala Ser Phe Ser 115 120 125

Gly Telu Ala Ala Thr Ser Gly. A sp Ile Arg Phe Val Ala 130 135 1 4 0

Tyr Gly Glin Ser Pro Asn Gly Thr Ala T yr Asn. Glu Glu Lieu Ala 145 15 O 155 160

Thr Ala Pro Telu Ser Ser Ala Arg Ile Tyr A sp Ser Ile Tyr Val Arg 1.65 170 175

His Trp Asp Tyr Trp Teu Ser Thr Thr Phe A sin Ala Val Phe Ser Gly 18O 185 190

Thr Telu Lys Gly His Gly Lys Asn Gly Tyr Ser Lieu. Asp Gly Glu 195 200 2O5

Teu Lys Asn Telu Wall Ser Pro Wall Asn. A la Glu Ser Pro Tyr Pro 210 215 220

Pro Phe Gly Gly Ala Ser Asp Asp Leu S er Pro Asp Gly Lys Trp 225 230 235 240

Wall Ala Phe Ser Ala Pro Glu Leu Pro Lys Ala Asn Phe Thr 245 250 255

Thr Ser Ile Teu Wall Pro His Asp A la Ser Glu Thir Ala Arg 260 265 27 O

Pro Ile Asn Gly Pro Asp Ser Pro Gly Thr Pro Lys Gly Ile Lys Gly 275 280 285

Asp Ser Ser Ser Pro Wall Phe Ser Pro Asn Gly Asp Llys Lieu Ala Tyr 29 O 295 3OO

Phe Glin Met Arg Asp Glu Thr Glu Ser A. sp. Arg Ala Lieu Lleu Tyr 305 310 315 320

Wall Ser Telu Gly Ser Thr Ile Pro Ser Val Ala Gly Asp 325 330 335

Trp Asp Arg Ser Pro Asp Ser Wall Lys Trp T hr Pro Asp Gly Lys Thr 340 345 350

Teu Ile Wall Gly Ser Glu Asp Telu Gly Arg T hr Arg Lieu Phe Ser Lieu 355 360 365

Pro Ala Asn Ala Asp Asp Pro L. ys Asn. Phe Thr Asp Gly 370 375 38O

Gly Ser Wall Ser Ala Tyr Phe Telu Pro A sp Ser Ser Lieu Lleu Val 385 390 395 400

Thr Gly Ser Ala Teu Trp Asn Trp Asn V all Tyr Thr Ala Lys Pro 405 410 415 Glu Gly Wall Ile Ile Ala Ser A la Asn. Glu Ile Asp Pro 420 425 430

Glu Telu Lys Gly Teu Gly Pro Ser Asp Ile S er Glu Phe Tyr Phe Glin 435 4 40 4 45

Gly Asn Phe Thr Asp Ile His Ala Trp Val I le Tyr Pro Glu Asn Phe 450 455 460

Asp Ser Tyr Pro Telu Ile Phe P he Ile His Gly Gly Pro 465 470 475 480

Glin Gly Asn Trp Ala Asp Gly Trp Ser Thr Arg Trp Asn Pro Lys Ala 485 490 495 Trp Ala Asp Glin Gly Tyr Wall Wall Wall Ala Pro Asn Pro Thr Gly Ser 5 OO 505 510

Thr Gly Phe Gly Glin Ala Teu Thr Thr Ala I le Glin Asn. Asn Trp Gly US 6,365,390 B1 91

-continued

515 52O 525 Gly Ala Pro Tyr Asp Asp Leu Val Lys Cys Trp Glu Tyr Val His Glu 530 535 540 Asn Lieu. Asp Tyr Val Asp Thr Asp His Gly W all Ala Ala Gly Ala Ser 545 550 555 560 Tyr Gly Gly Phe Met Ile Asn Trp Ile Glin Gly Ser Pro Leu Gly Arg 565 570 575 Lys Phe Lys Ala Leu Val Ser His Asp Gly Thr Phe Val Ala Asp Ala 58O 585 59 O Lys Val Ser Thr Glu Glu Leu Trp Phe Met G lin Arg Glu Phe Asin Gly 595 600 605 Thr Phe Trp Asp Ala Arg Asp Asn Tyr Arg Arg Trp Asp Pro Ser Ala 610 615 62O Pro Glu Arg Ile Leu Glin Phe Ala Thr Pro M et Leu Val Ile His Ser 625 630 635 640 Asp Lys Asp Tyr Arg Lieu Pro Wall Ala Glu Gly Lieu Ser Lieu Phe Asn 645 650 655 Val Lieu Glin Glu Arg Gly Val Pro Ser Arg P he Lieu. Asn. Phe Pro Asp 660 665 670 Glu Asn His Trp Val Val Asn Pro Glu Asn Ser Leu Val Trp His Gln 675 680 685 Glin Ala Lieu Gly Trp Ile Asn Lys Tyr Ser Gly Val Glu Lys Ser Asn 69 O. 695 7 OO Pro Asn Ala Val Ser Leu Glu Asp Thr Val V all Pro Val Val Asn Tyr 705 710 715 720

Asn

<210> SEQ ID NO 21 &2 11s LENGTH 2.0 &212> TYPE PRT <213> ORGANISM: Orpinomyces sp. PC-2 &220s FEATURE <223> OTHER INFORMATION: Description of Artificial Sequence: N-terminal amino acid sequence of a feruloy l esterase of Orpinomyces PC-2. <400 SEQUENCE: 21 Glu Thir Thr Tyr Gly Ile Thr Leu Arg Asp T hr Lys Glu Lys Phe Thr 1 5 10 15 Val Phe Lys Asp 2O

<210> SEQ ID NO 22 &2 11s LENGTH 4 OO &212> TYPE PRT <213> ORGANISM: Escherichia coli

<400 SEQUENCE: 22 Met Val Met Glu Leu Asn Glu Arg Asn Ile T hr Met Asin Ile Lys Ile 1 5 10 15 Ala Ala Lieu. Thir Lieu Ala Ile Ala Ser Gly I le Ser Ala Glin Trp Ala 2O 25 30 Ile Ala Ala Asp Met Pro Ala Ser Pro Ala Pro Thr Ile Pro Val Lys 35 40 45 Gln Tyr Val Thr Glin Val Asn Ala Asp Asin S er Val Thr Phe Arg Tyr 5 O 55 60 Phe Ala Pro Gly Ala Lys Asn Val Ser Val V al Val Gly Val Pro Val US 6,365,390 B1 93 94

-continued

65 70 75 8O Pro Asp Asin Ile His Pro Met Thr Lys Asp G lu Ala Gly Val Trp Ser 85 90 95 Trp Arg Thr Pro Ile Leu Lys Gly Asn Leu T yr Glu Tyr Phe Phe Asn 100 105 110 Val Asp Gly Val Arg Ser Ile Asp Thr Gly T hr Ala Met Thr Asn Pro 115 120 125 Glin Arg Glin Val Asn Ser Ser Met Ile Leu Val Pro Gly Ser Tyr Leu 130 135 1 4 0 Asp Thr Arg Ser Val Ala His Gly Asp Lieu. I le Ala Ile Thr Tyr His 145 15 O 155 160 Ser Asn Ala Leu Gln Ser Glu Arg Gln Met T yr Val Trp Thr Pro Pro 1.65 170 175 Gly Tyr Thr Gly Met Gly Glu Pro Leu Pro V all Leu Tyr Phe Tyr His 18O 185 190 Gly Phe Gly Asp Thr Gly Arg Ser Ala Ile A sp Glin Gly Arg Ile Pro 195 200 2O5 Glin Ile Met Asp Asn Lieu Lleu Ala Glu Gly Lys Ile Llys Pro Met Leu 210 215 220 Val Val Ile Pro Asp Thr Glu Thr Asp Ala L. ys Gly Ile Ile Pro Glu 225 230 235 240 Asp Phe Val Pro Glin Glu Arg Arg Llys Val Phe Tyr Pro Leu Asn Ala 245 250 255 Lys Ala Ala Asp Arg Glu Lieu Met Asn Asp I le Ile Pro Lieu. Ile Ser 260 265 27 O Lys Arg Phe Asin Val Arg Lys Asp Ala Asp G ly Arg Ala Lieu Ala Gly 275 280 285 Leu Ser Glin Gly Gly Tyr Glin Ala Lieu Val Ser Gly Met Asn His Leu 29 O 295 3OO Glu Ser Phe Gly Trp Leu Ala Thr Phe Ser Gly Val Thr Thr Thr Thr 305 310 315 320 Val Pro Asp Glu Gly Val Ala Ala Arg Lieu. A sin Asp Pro Ala Ala Ile 325 330 335 Asn Glin Gln Leu Arg Asn. Phe Thr Val Val V all Gly Asp Lys Asp Wal 340 345 350 Val Thr Gly Lys Asp Ile Ala Gly Lieu Lys Thr Glu Lieu Glu Gln Lys 355 360 365 Lys Ile Asn. Phe Asp Tyr Glin Glu Tyr Pro Gly Lieu. Asn His Glu Met 370 375 38O Asp Val Trp Arg Pro Ala Tyr Ala Ala Phe V all Glin Lys Lieu Phe Lys 385 390 395 400

<210> SEQ ID NO 23 &2 11s LENGTH 2.364 &212> TYPE DNA <213> ORGANISM: Clostridium stercorarium &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (440) . . (1975) <400 SEQUENCE: 23 aag cittaatt tatttggitat accittgctitt atgttcaatc acgttctogt c attaaacaa 60 cc catataag citgctoccitg accggaaagt tdaac attga ttcttgcatt c cqaatctgc 120 to caataaaa catttctgaa titt.cgagacg gcaaaaaatg atgcc.gcttic c atttcaa.ca 18O US 6,365,390 B1 95 96

-continued gtaacacago cittctgcaat cotttitcgto agctitcctitt aaattittaag t ttgttctatt 240 gacaaaacta aaaactgtaa ttactataaa aatatalacta ataaattaca t titttalacat cattatgggg tactggtaaa gacgtgatag ttattaataa atttaacaaa t aataa.caca 360 citgctatott cg accqtaaa tttactatgt citctaatgta atatgacata a ataatataa 420 gtaaaggagg taaaagttt at g aag cqt aag gtt aag aa g atg gCa gct atg 472 Met Lys Arg Lys Wall Lys Lys Met Ala Ala Met 1 5 10 gca acg agt ata att atg gct atc atg atc a to cta cat agt ata coa Ala Thir Ser Ile Ile Met Ala Ile Met Ile I le Leu. His Ser Ile Pro 15 20 25 gta citc gcc ggg cga ata att tac gac aat gag aca ggc aca cat gga 568 Wall Leu Ala Gly Arg Ile Ile Tyr Asp Asn. Glu Thr Gly Thr His Gly 30 35 40 ggc tac gac tat gag citc togg aaa. gac tac g ga aat acg att atg gaa 616 Gly Tyr Asp Glu Leu Trp Lys Asp Tyr Gly Asn Thr Ile Met Glu 45 50 55 citt aac gac ggit ggit act titt agt tgt caa t gg agt aat atc ggit aat 664 Lieu. Asn Asp Gly Gly Thr Phe Ser Cys Glin Trp Ser Asn. Ile Gly Asn 60 65 70 75 gca cita titt aga a.a.a. ggg aga aaa. titt aat it co gac aaa acc tat caa 12 Ala Leu Phe Arg Lys Gly Arg Lys Phe Asin Ser Asp Llys Thr Tyr Glin 8O 85 9 O gala tta gga gac alta gta gtt gala tat ggc t gt gat tac aat coa aac 760 Glu Lieu Gly Asp Ile Wal Wall Glu Tyr Gly Cys Asp Tyr Asn. Pro Asn 95 100 105 gga aat to c tat ttg tgt gtt tac ggit togg a ca aga aat coa citg gtt 808 Gly Asn Ser Leu Cys Val Tyr Gly Trp Arg ASn Pro Leu Val 110 115 120 gaa tat tac att gta gala agc togg ggC agc t gg cqt coa cct gga gca 856 Glu Tyr Tyr Ile Wall Glu Ser Trp Gly Ser rp Arg Pro Pro Gly Ala 125 130 135 a Ca CCC aaa. gga acc atc aca cag tgg at g g ca ggt act tat gaa ata 904 Thr Pro Lys Gly Thr Ile Thr Glin Trp Met A la Gly Thr Tyr Glu Ile 1 4 0 145 15 O 155 tat gaa act acc Cgg gta aat cag cct tcc. a. t c gat gga act gcg aca 952 Tyr Glu Thr Thr Arg Wall Asn Glin Pro Ser I l e Asp Gly Thr Ala Thr 160 1.65 17 O titc. cala caa. tat tgg agt gtt cqt aca to c a ag aga aca agc gga aca OOO Phe Glin Glin Tyr Trp Ser Val Arg Thir Ser L ys Arg Thr Ser Gly Thr 175 18O 185 ata tot gtc act gaa cat titt aaa. Cag togg g aa aga at g g gC atg Cqa Ile Ser Wall Thr Glu His Phe Lys Gln Trp G lu Arg Met Gly Met Arg 190 195 200 atg ggit aag atg tat gala gtt gct citt acc g tt gala ggit tat cag agc Met Gly Lys Met Glu Wall Ala Leu Thr V all Glu Gly Tyr Glin Ser 2O5 210 215 agt ggg tac gct aat gta tac aag aat gala a to aga at a ggit gca aat 144 Ser Gly Tyr Ala Asn Val Tyr Lys Asn Glu I le Arg Ile Gly Ala Asn 220 225 230 235 cca act cost gcc cca tot caa agc cca att a ga aga gat gca ttt to a 192 Pro Thr Pro Ala Pro Ser Glin Ser Pro Ile Arg Arg Asp Ala Phe Ser 240 245 25 O ata atc gala gCg gaa gaa tat aac agc aca a at toc toc act tta caa 240 Ile Ile Glu Ala Glu Glu Tyr Asn Ser Thr A sin Ser Ser Thr Lieu. Glin 255 260 265 gtg att gga acg cca aat aat ggc aga gga a tt ggit tat att gala aat 288 Val Ile Gly Thr Pro Asn Asn Gly Arg Gly I le Gly Tyr Ile Glu Asn 27 O 275 280 US 6,365,390 B1 97 98

-continued ggit aat acc gta act tac agc aat ata gat it tt ggit agt ggit gca aca 336 Gly Asn Thr Wall Thr Tyr Ser Asn Ile Asp P he Gly Ser Gly Ala Thr 285 290 295 ggg titc. tct gca act gtt gca acg gag gtt a at acc toca att caa atc 384 Gly Phe Ser Ala Thr Wall Ala Thr Glu Wall A sn Thir Ser Ile Glin Ile 3OO 305 310 315 cgt tot gac agt cct acc gga act cta citt g g t acci tta tat gta agt 432 Arg Ser Asp Ser Pro Thr Gly Thr Lieu Lieu G ly Thr Leu Tyr Wal Ser 320 325 330 tot acc ggc agc tgg aat aca tat caa acc g ta tot aca aac atc agc 480 Ser Thr Gly Ser Trp Asn Thr Tyr Glin Thr Wall Ser Thr Asn Ile Ser 335 34 O 345 aaa att acc ggC gtt cat gat att gta ttg g ta titc to a ggit cca gtc 528 Lys Ile Thr Gly Wall His Asp Ile Wall Leu W all Phe Ser Gly Pro Wall 350 355 360 aat gtg gac a.a. C. titc. ata titt agc aga agt t ca cca gtg cct gca cct 576 Asn Val Asp Asn Phe Ile Phe Ser Arg Ser S er Pro Wall Pro Ala Pro 365 370 375 ggit gat aac aCa aga gac goa tat tot atc. a tt Cag gCC gag gat tat 624 Gly Asp Asn Thr Arg Asp Ala Tyr Ser Ile I le Glin Ala Glu Asp Tyr 38O 385 390 395 gac agc agt tat ggit ccc aac citt caa atc. t tit agc tita coa ggt ggit 672 Asp Ser Ser Gly Pro Asn. Telu Glin Ile P he Ser Lieu Pro Gly Gly 400 405 410 ggC agc gCC att ggC tat att gaa aat ggit at to c act acc tat aaa. 720 Gly Ser Ala Ile Gly Asn Gly yr Ser Thr Thr 415 420 425 aat att gat titt ggit gac ggc gCa acg to c aca gCa aga gta gct 768 Asn. Ile Asp Phe Gly Asp Gly Ala Thir Ser 3 a. l Thir Ala Arg Wall Ala 430 435 4 40 acc cag aat gct act acc att cag gta aga t t g gga agt coa tog ggt 816 Thr Glin Asn Ala Thr Thir Ile Glin Val Arg L. eu Gly Ser Pro Ser Gly 445 450 455 aca tta citt gga a Ca att tac gtg ggg to c a ca gga agc titt gat act 864 Thr Lieu Lieu Gly Thr Ile Tyr Val Gly Ser Thr Gly Ser Phe Asp Thr 460 465 470 475 tat agg gat gta to c gct acc att agit aat a ct gCg ggt gta aaa gat 912 Wall Ser Ala Thir Ile Ser Asn Thr Ala Gly Val Lys Asp 480 485 490 att gtt citt gta titc. toa ggit cot gtt aat g tt gac togg titt gta titc 96.O Ile Wall Leu Wall Phe Ser Gly Pro Wall Asn. Wall Asp Trp Phe Wall Phe 495 5 OO 505 toa aaa.. tca gga act taagggtata gaccctaatg toggagtac aa aatctggitat 2015 Ser Lys Ser Gly Thr 510 ggcatatata aaaaaag act toggaattgta ccagtgcgac atataatggc t ttgtaaaat attctgatta aaacggaatg tittaaggata ggaaaagaaa gtattottitt c citgtc.ttitt 2135 titatgtaacc ttaaaaatta cagocaatta ttcaataaaa. taatttctgt a aatcagtta 21.95 ttcttgaacc aatattaaaa gaattitcc cc aaggtottta atgtctggcc g gattacatt 2255 atc.ttctic ct gtoattittaa aaaacagtta aatcaagctt ttgtc.gcaat a gaatgaatt 2315 attatttggg attccaaacc aaagacatat cattaag cag ttgtaaaaa 2364

<210> SEQ ID NO 24 &2 11s LENGTH 512 &212> TYPE PRT <213> ORGANISM: Clostridium stercorarium US 6,365,390 B1 99 100

-continued <400 SEQUENCE: 24

Met Lys Arg Lys Wall Met Ala Ala M. et Ala Thir Ser Ile Ile 1 5 10 15

Met Ala Ile Met Ile Ile Teu His Ser Ile Pro Val Lieu Ala Gly Arg 2O 25 30

Ile Ile Tyr Asp Asn Glu Thr Gly Thr His G ly Gly Tyr Asp Tyr Glu 35 40 45

Teu Trp Asp Gly Asn Thr Ile Met Glu Lieu. Asn Asp Gly Gly 5 O 55 60

Thr Phe Ser Glin Trp Ser Asn Ile Gly A sn Ala Lieu Phe Arg Lys 65 70 75 8O

Gly Arg Phe Asn Ser Asp Thr Tyr G lin. Glu Lieu Gly Asp Ile 85 90 95

Wall Wall Glu Tyr Gly Asp Asn Pro A sin Gly Asn. Ser Tyr Lieu 100 105 110

Wall Tyr Gly Trp Thr Arg Asn Pro Telu V al Glu Tyr Tyr Ile Val 115 120 125

Glu Ser Trp Gly Ser Trp Arg Pro Pro Gly A la Thr Pro Lys Gly Thr 130 135 1 4 0

Ile Thr Glin Trp Met Ala Gly Thr Glu I le Tyr Glu Thir Thr Arg 145 15 O 155 160

Wall Asn Glin Pro Ser Ile Asp Gly Thr Ala T hr Phe Glin Gln Tyr Trp 1.65 170 175

Ser Wall Arg Thr Ser Arg Thr Ser Gly Thir Ile Ser Wall Thr Glu 18O 185 190

His Phe Lys Glin Trp Glu Arg Met Gly Met A rg Met Gly Lys Met Tyr 195 200 2O5

Glu Wall Ala Telu Thr Wall Glu Gly Glin S er Ser Gly Tyr Ala Asn 210 215 220

Wall Asn Glu Ile Arg Ile Gly Ala A sin Pro Thr Pro Ala Pro 225 230 235 240

Ser Ser Pro Ile Arg Arg Asp Ala Phe S er Ile Ile Glu Ala Glu 245 250 255

Glu Asn Ser Thr Asn Ser Ser Thr Telu G ln Val Ile Gly Thr Pro 260 265 27 O

Asn Asn Gly Arg Gly Ile Gly Tyr Ile Glu A sin Gly Asn Thr Val Thr 275 280 285

Ser Asn Ile Asp Phe Gly Ser Gly Ala Thr Gly Phe Ser Ala Thr 29 O 295 3OO

Wall Ala Thr Glu Wall Asn Thr Ser Ile Glin I le Arg Ser Asp Ser Pro 305 310 315 320

Thr Gly Thr Telu Teu Gly Thr Telu Wall S er Ser Thr Gly Ser Trp 325 330 335

Asn Thr Glin Thr Wall Ser Thr Asn Ile Ser Lys Ile Thr Gly Val 340 345 350

His Asp Ile Wall Teu Wall Phe Ser Gly Pro V all Asn Val Asp Asn. Phe 355 360 365

Ile Phe Ser Arg Ser Ser Pro Wall Pro Ala Pro Gly Asp Asn. Thir Arg 370 375 38O

Asp Ala Ser Ile Ile Glin Ala Glu Asp T yr Asp Ser Ser Tyr Gly 385 390 395 400

Pro Asn Telu Glin Ile Phe Ser Telu Pro Gly G ly Gly Ser Ala Ile Gly 405 410 415 US 6,365,390 B1 101 102

-continued

Ile Glu Asn Gly Tyr Ser Thr Thr Tyr L. ys Asn. Ile Asp Phe Gly 420 425 430

Asp Gly Ala Thr Ser Wall Thr Ala Arg Wall A la Thr Glin Asn Ala Thr 435 4 40 4 45

Thir Ile Glin Val Arg Leu Gly Ser Pro Ser Gly Thr Leu Leu Gly. Thr 450 455 460

Ile Wall Gly Ser Thr Gly Ser Phe Asp Thr Tyr Arg Asp Val Ser 465 470 475 480

Ala Thir Ile Ser Asn Thr Ala Gly Wall Lys A. sp. Ile Val Lieu Val Phe 485 490 495

Ser Gly Pro Wall Asn Wall Asp Trp Phe Val P he Ser Lys Ser Gly Thr 5 OO 505 510

What is claimed is: to 789 of SEQ ID NO:16, amino acids 795 to 1077 of SEQ 1. A recombinant DNA molecule comprising a vector ID NO:12, amino acids 20 to 286 of SEQ ID NO:14, amino Sequence and a Sequence encoding a feruloyl esterase acids 20 to 307 of SEQID NO:14, and amino acids 20 to 421 protein, wherein Said feruloyl esterase protein is character of SEO ID NO:14. ized by an amino acid Sequence having at least 75% amino 14. The recombinant DNA molecule of claim 13, wherein acid sequence identity with amino acids 227 to 440 of SEQ 25 the feruloyl esterase consists of an amino acid Sequence as ID NO:18. given in SEQ ID NO:12, amino acids 795 to 1077. 2. The recombinant DNA molecule of claim 1, wherein 15. The recombinant DNA molecule of claim 14, wherein Said feruloyl esterase protein is characterized by the amino the Sequence encoding the feruloyl esterase protein is given acid sequence given in amino acids 227 to 440 of SEQ ID in SEQ ID NO:11, nucleotides 2582–3430. NO:18. 16. The recombinant DNA molecule of claim 13, wherein 3. The recombinant DNA molecule of claim 2, wherein the feruloyl esterase consists of the amino acid Sequence as the feruloyl esterase comprises the amino acid Sequence given in SEQ ID NO:16, amino acids 546 to 789. given in SEQ ID NO:18, amino acids 5 to 530. 17. The recombinant DNA molecule of claim 16, wherein 4. The recombinant DNA molecule of claim 3, wherein the Sequence encoding the feruloyl esterase protein is given the Sequence encoding the feruloyl esterase protein com 35 in SEQ ID NO:15, nucleotides 2164 to 2895. prises the sequence given in SEQ ID NO:17, nucleotides 13 18. The recombinant DNA molecule of claim 13, wherein to 1590. the feruloyl esterase consists of an amino acid Sequence as 5. The recombinant DNA molecule of claim 3, wherein given in SEQ ID NO:14, amino acids 20 to 286. the feruloyl esterase comprises the Sequence given in SEQ 19. The recombinant DNA molecule of claim 18, wherein ID NO:18, amino acids 1 to 530. 40 the Sequence encoding the feruloyl esterase protein is given 6. The recombinant DNA molecule of claim 5, wherein in SEQ ID NO:13, nucleotides 158 to 958. the Sequence encoding the feruloyl esterase protein com 20. The recombinant DNA molecule of claim 13, wherein prises the sequence given in SEQ ID NO:17, nucleotides 1 the feruloyl esterase consists of the amino acid Sequence to 1590. given in SEQ ID NO:14, amino acids 20 to 307. 7. A recombinant host cell comprising the recombinant 45 21. The recombinant DNA molecule of claim 20, wherein DNA molecule of claim 1. the Sequence encoding the feruloyl esterase protein is given 8. The recombinant host cell of claim 7, wherein said in SEQ ID NO:13, nucleotides 158 to 1021. feruloyl esterase protein is characterized by the amino acid 22. The recombinant DNA molecule of claim 13, wherein sequence given as amino acids 227 to 440 of SEQID NO:18. the feruloyl esterase consists of the amino acid Sequence 9. The recombinant host cell of claim 8 wherein the 50 given in SEQ ID NO:14, amino acids 20 to 421. feruloyl esterase comprises the amino acid Sequence given 23. The recombinant DNA molecule of claim 22, wherein in SEQ ID NO:18, amino acids 5 to 530. the Sequence encoding the feruloyl esterase protein is given 10. The recombinant host cell of claim 9 wherein the in SEQ ID NO:13, nucleotides 158 to 1363. Sequence encoding the feruloyl esterase protein comprises 24. A recombinant host cell comprising the recombinant the sequence given in SEQ ID NO:17, nucleotides 13 to 55 DNA molecule of claim 13. 1590. 25. A method for the recombinant production of a feruloyl 11. The recombinant host cell of claim 9 wherein the esterase protein comprising the Step of culturing a recom feruloyl esterase comprises the Sequence given in SEQ ID binant host cell comprising a vector Sequence and a NO:18, amino acids 1 to 530. Sequence encoding a feruloyl esterase protein, wherein the 12. The recombinant host cell of claim 11, wherein the 60 feruloyl esterase protein consists of an amino acid Sequence Sequence encoding the feruloyl esterase protein comprises selected from the group consisting of amino acids 581 to 789 the sequence given in SEQID NO:17, nucleotides 1 to 1590. of SEQ ID NO:16, amino acids 795 to 1077 of SEQ ID 13. A recombinant DNA molecule comprising a vector NO:12, amino acids 20 to 286 of SEQ ID NO:14, amino Sequence and a Sequence encoding a feruloyl esterase acids 20 to 307 of SEQ ID NO:14, amino acids 20 to 421 of protein, wherein the feruloyl esterase protein consists of an 65 SEQ ID NO:14, amino acids 5 to 530 of SEQ ID NO:18 and amino acid Sequence Selected from the group consisting of an amino acid Sequence of at least 75% amino acid Sequence amino acids 581 to 789 of SEQ ID NO:16, amino acids 346 identity with amino acids 227 to 440 of SEQ ID NO:18, US 6,365,390 B1 103 104 under conditions of nutrition, time and temperature Such that amino acids 795 to 1077 of SEQID NO:12, amino acids 845 a feruloyl esterase protein is produced via expression of the to 1075 of SEQ ID NO:12, amino acids 20 to 286 of SEQ Sequence encoding the feruloyl esterase protein contained ID NO:14, amino acids 20 to 307 of SEQ ID NO:14, amino within the recombinant DNA molecule within said host cell. acids 20 to 421 of SEQ ID NO:14, amino acids 1 to 530 of 26. The method of claim 25, wherein the feruloyl esterase 5 SEQ ID NO:18, and amino acids 5 to 530 of SEQ ID NO:18. protein consists of an amino acid Sequence from the group consisting of amino acids 581 to 789 of SEQ ID NO:16, k . . . . UNITED STATES PATENT AND TRADEMARK OFFICE CERTIFICATE OF CORRECTION

PATENT NO. : 6,365,390 B1 Page 1 of 2 DATED : April 2, 2002 INVENTOR(S) : Blum et al.

It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

Column 7 Line 24, delete “Tonmme” and replace with -- Tomme --. Column 10 Line 2, delete “C13D” and replace with -- CBD --. Line 3, delete "XynlZ” and replace with -- XynZ --. Line 34, delete “pl” and replace with -- p --. Column 12 Line 28, delete “mind mg-I” and replace with -- min-1 mg-1 --. Line 35, delete “CBS' and replace with -- CBG --. Line 67, delete “aglionucleotide' and replace with -- oligonucleotide --.

Column 21 Table 1, rows starting with XYR1, XZR1, XZR2, XZR3, XZR4, and XZR5", delete “GGAAGCTT and replace with -- GGAAGCTT --. Column 23 Table 5 continued, nucleotide position 317, delete the amino acid code "MM" and replace with -- HN --. Table 5 continued, nucleotide positions 442 and 457, delete the amino acid code “X” and replace with -- K --. Table 5 continued, nucleotide position 689, delete the amino acid code “P” and replace with -- F --.

Column 25 Table 5 continued, beneath the first line, delete “2400'. Table 5 continued, nucleotide positions 1127 and 1793, delete the amino acid code “X” and replace with -- K --. Table 5 continued, nucleotide positions 1281 and 2027, delete the amino acid code “K” and replace with -- H --. Table 5 continued, nucleotide positions 1709, delete the amino acid code "N” and replace with -- H --. Column 27 Table 5 continued, nucleotide position 2656, delete the amino acid code "N” and replace with -- H --. Table 5 continued, nucleotide position 2944, delete the amino acid code "N” and replace with -- W --. Table 5 continued, nucleotide position 3075, delete the amino acid code “M” and replace with -- N --. UNITED STATES PATENT AND TRADEMARK OFFICE CERTIFICATE OF CORRECTION

PATENT NO. : 6,365,390 B1 Page 2 of 2 DATED : April 2, 2002 INVENTOR(S) : Blum et al.

It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

Column 29 Table 5 continued, nucleotide position 3174, delete the amino acid code “M” and replace with -- H --. Column 32 Table 6, amino acid position 224, delete “ILU” and replace with -- ILE --. Table 6, amino acid position 254, delete “YYR'' and replace with -- TYR --. Table 6, amino acid position 300, delete the “300' over ILE. Column 33 Table 6 continued, amino acid position 353, place the pCT 1223 arrow one spot to the right, so that it is between “LYS' and “VAL’. Table 6 continued, nucleotide position 1105-1107, please replace “CGG” with -- CCG --.

Column 38 Table 9, amino acid position 25, delete “= 10' and shift amino acids 25-36 three nucleotide Spaces to the left. Column 101 Line 67, delete “364” and replace with -- 546 --.

Signed and Sealed this Twenty-sixth Day of November, 2002

Attest.

JAMES E ROGAN Attesting Officer Director of the United States Patent and Trademark Office