US009 187739B2

(12) United States Patent (10) Patent No.: US 9,187,739 B2 Lange et al. (45) Date of Patent: Nov. 17, 2015

(54) POLYPEPTIDES HAVING CI2N 9/42 (2013.01); CI2P 7/14 (2013.01); CELLOBO HYDROLASE ACTIVITY AND CI2Y 302/01091 (2013.01); Y02E 50/17 POLYNUCLEOTDESENCODING SAME (2013.01) (58) Field of Classification Search (71) Applicant: Novozymes Aktieselskab. Bagsvaerd CPC ...... C12N 1/00; C12N 1/22; C12N 9/2405: (DK) C12N 9/2408; C12N 9/2334; C12N 9/2437; C12N 9/2445; C12N 15/70; C12N 15/74; (72) Inventors: Lene Lange, Valby (DK); Wenping Wu, C12N 15779 Beijing (CN); Dominique Aubert, See application file for complete search history. Copenhagen (DK); Sara Landvik, Holte (DK); Kirk Matthew Schnorr, Holte (56) References Cited (DK); Ib Groth Clausen, Birkeroed (DK) U.S. PATENT DOCUMENTS 5, 120,463 A 6/1992 Bjork et al. (73) Assignee: Novozymes A/S, Bagsvaerd (DK) 5.487,989 A 1/1996 Fowler et al. 5,955,270 A 9, 1999 Radford et al. (*) Notice: Subject to any disclaimer, the term of this 6,184,019 B1 2/2001 Miettinen-Oinonen et al. patent is extended or adjusted under 35 U.S.C. 154(b) by 0 days. FOREIGN PATENT DOCUMENTS WO 95,02675 A1 1, 1995 (21) Appl. No.: 14/668,532 WO 97.14804 A1 4f1997 WO 99.06574 A1 2, 1999 (22) Filed: Mar. 25, 2015 WO 01.04284 A1 1, 2001 WO O1,25468 A1 4/2001 (65) Prior Publication Data WO O1,79507 A2 10, 2001 US 2015/0218542 A1 Aug. 6, 2015 OTHER PUBLICATIONS Azevedo et al., Nucleic Acids Research, vol. 18, No.3, p. 668 (1990). Emalfarb, Geneseq Accession No. ABA92722 (2002). Related U.S. Application Data Gams et al., Transacations British Mycological Society, vol. 59, No. (60) Continuation of application No. 14/064,398, filed on 3, pp. 519-522 (1972). Oct. 28, 2013, now Pat. No. 8,993,299, which is a Ganju et al., Biochimica et Biophysica Acta, vol. 993, pp. 266-274 (1989). division of application No. 13/681,490, filed on Nov. Gielkens et al., Applied and Environmental Microbiology, vol. 65. 20, 2012, now Pat. No. 8,603,794, which is a division No. 10, pp. 4340-4345 (1999). of application No. 13/646,980, filed on Oct. 8, 2012, Hong et al., EBI Accession No. AF421954 (2001). now Pat. No. 8,507,238, which is a division of Hong et al., EBI Accession No. Q96UR5 (2001). application No. 13/483,389, filed on May 30, 2012, Hong et al., EBI Accession No. AF478686 (2002). now Pat. No. 8,603.793, which is a division of Hong et al., Uniprot Accession No. Q8TG37 (2002). application No. 12/818,861, filed on Jun. 18, 2010, Kvachadze et al., Biosis Accession No. PREV 1998.00120135 (1997). Kvachadze et al., Mikrobiologiya, vol. 66, No. 5, pp. 644-649 (1997). now Pat. No. 8.338,156, which is a continuation of Li et al., Journal of Applied Microbiology, vol. 106, pp. 1867-1875 application No. 10/481,179, filed as application No. (2009). PCT/DK02/00429 on Jun. 26, 2002, now Pat. No. Radford et al., EMBL Accession No. X17258 (1990). 7,785,853. Riske et al., Applied and Environmental Microbiology, vol. 56, No. 11, pp. 3261-3265 (1990). (30) Foreign Application Priority Data Takada et al., Journal of Fermentation and Bioengineering, vol. 85, No. 1 pp. 1-9 (1998). Jun. 26, 2001 (DK) ...... 2001 O1OOO Taleb et al., Gene, vol. 161, pp. 137-138 (1995). Azevedo et al., Journal of General Microbiology, vol. 136, pp. 2569 (51) Int. Cl. 2576 (1990). CI2P 7/06 (2006.01) CI2P 7/08 (2006.01) Primary Examiner — Jana Hines CI2P 7/10 (2006.01) (74) Attorney, Agent, or Firm — Eric J. Fechter CI2P 7/14 (2006.01) CI2P 19/34 (2006.01) (57) ABSTRACT CI2N 9/26 (2006.01) The present invention relates to polypeptides having cello CID 3/386 (2006.01) biohydrolase I activity and polynucleotides having a nucle CI2N 9/42 (2006.01) otide sequence which encodes for the polypeptides. The CI2N 9/24 (2006.01) invention also relates to nucleic acid constructs, vectors, and (52) U.S. Cl. host cells comprising the nucleic acid constructs as well as CPC ...... CI2N 9/2408 (2013.01); CI ID3/386 methods for producing and using the polypeptides. (2013.01); CIID3/38645 (2013.01); CI2N 9/2437 (2013.01); C12N 9/2477 (2013.01); 23 Claims, No Drawings US 9, 187,739 B2 1. 2 POLYPEPTDES HAVING SUMMARY OF THE INVENTION CELLOBOHYDROLASE ACTIVITY AND POLYNUCLEOTDESENCODING SAME In a first aspect the present invention relates to a polypep tide having cellobiohydrolase I activity, selected from the CROSS-REFERENCE TO RELATED group consisting of: APPLICATIONS (a) a polypeptide comprising an amino acid sequence Selected from the group consisting of: This application is a continuation of U.S. application Ser. an amino acid sequence which has at least 80% identity No. 14/064,398 filed on Oct. 28, 2013, now U.S. Pat. No. with amino acids 1 to 526 of SEQID NO:2, 8,993,299, which is a divisional of U.S. application Ser. No. 10 an amino acid sequence which has at least 80% identity 13/681,490 filed on Nov. 20, 2012, now U.S. Pat. No. 8,603, with amino acids 1 to 529 of SEQID NO:4, 794, which is a divisional of U.S. application Ser. No. 13/646, an amino acid sequence which has at least 80% identity 980 filed on Oct. 8, 2012, now U.S. Pat. No. 8,507,238, which with amino acids 1 to 451 of SEQID NO:6, is a divisional of U.S. application Ser. No. 13/483,389 filed on an amino acid sequence which has at least 80% identity May 30, 2012, now U.S. Pat. No. 8,603.793, which is a 15 with amino acids 1 to 457 of SEQID NO:8, divisional of U.S. application Ser. No. 12/818,861 filed on an amino acid sequence which has at least 80% identity Jun. 18, 2010, now U.S. Pat. No. 8.338,156, which is a con with amino acids 1 to 538 of SEQID NO:10, tinuation of U.S. application Ser. No. 10/481,179 filed Dec. an amino acid sequence which has at least 70% identity 17, 2003, now U.S. Pat. No. 7,785,853, which is a 35 U.S.C. with amino acids 1 to 415 of SEQID NO:12, 371 national application of international application no. PCT/ an amino acid sequence which has at least 70% identity DK02/000429 filed Jun. 26, 2002, which claims priority or with amino acids 1 to 447 of SEQID NO:14, the benefit under 35 U.S.C. 119 of Danish application no. PA an amino acid sequence which has at least 80% identity 2001 01000 filed on Jun. 26, 2001. The contents of these with amino acids 1 to 452 of SEQID NO:16, applications are fully incorporated herein by reference. an amino acid sequence which has at least 80% identity 25 with amino acids 1 to 454 of SEQID NO:38, FIELD OF THE INVENTION an amino acid sequence which has at least 80% identity with amino acids 1 to 458 of SEQID NO:40, The present invention relates to polypeptides having cel an amino acid sequence which has at least 80% identity lobiohydrolase I (also referred to as CBH I or CBH 1) activity with amino acids 1 to 450 of SEQID NO:42, and polynucleotides having a nucleotide sequence which 30 an amino acid sequence which has at least 80% identity encodes for the polypeptides. The invention also relates to with amino acids 1 to 446 of SEQID NO:44, nucleic acid constructs, vectors, and host cells comprising the an amino acid sequence which has at least 80% identity nucleic acid constructs as well as methods for producing and with amino acids 1 to 527 of SEQID NO:46, using the polypeptides. an amino acid sequence which has at least 80% identity 35 with amino acids 1 to 455 of SEQID NO:48, BACKGROUND OF THE INVENTION an amino acid sequence which has at least 80% identity with amino acids 1 to 464 of SEQID NO:50, Cellulose is an important industrial raw material and a an amino acid sequence which has at least 80% identity Source of renewable energy. The physical structure and mor with amino acids 1 to 460 of SEQID NO:52, phology of native cellulose are complex and the fine details of 40 an amino acid sequence which has at least 80% identity its structure have been difficult to determine experimentally. with amino acids 1 to 450 of SEQID NO:54, However, the chemical composition of cellulose is simple, an amino acid sequence which has at least 80% identity consisting of D-glucose residues linked by beta-1,4-glyco with amino acids 1 to 532 of SEQID NO:56, sidic bonds to form linear polymers with chains length of over an amino acid sequence which has at least 80% identity 10,000 glycosidic residues. 45 with amino acids 1 to 460 of SEQID NO:58, In order to be efficient, the digestion of cellulose requires an amino acid sequence which has at least 80% identity several types of enzymes acting cooperatively. At least three with amino acids 1 to 525 of SEQID NO:60, and categories of enzymes are necessary to convert cellulose into an amino acid sequence which has at least 80% identity glucose: endo(1,4)-beta-D-glucanases (EC 3.2.1.4) that cut with amino acids 1 to 456 of SEQID NO:66: the cellulose chains at random; cellobiohydrolases (EC 50 (b) a polypeptide comprising an amino acid sequence 3.2.1.91) which cleave cellobiosyl units from the cellulose selected from the group consisting of: chain ends and beta-glucosidases (EC 3.2.1.21) that convert an amino acid sequence which has at least 80% identity cellobiose and soluble cellodextrins into glucose. Among with the polypeptide encoded by the cellobiohydrolase I these three categories of enzymes involved in the biodegra encoding part of the nucleotide sequence present in Acremo dation of cellulose, cellobiohydrolases are the key enzymes 55 nium thermophilum, for the degradation of native crystalline cellulose. an amino acid sequence which has at least 80% identity Exo-cellobiohydrolases (Cellobiohydrolase I, or CBH I) with the polypeptide encoded by the cellobiohydrolase I refer to the cellobiohydrolases which degrade cellulose by encoding part of the nucleotide sequence present in Chaeto hydrolyzing the cellobiose from the reducing end of the cel mium thermophilum, lulose polymer chains. 60 an amino acid sequence which has at least 80% identity It is an object of the present invention to provide improved with the polypeptide encoded by the cellobiohydrolase I polypeptides having cellobiohydrolase I activity and poly encoding part of the nucleotide sequence present in Scyta nucleotides encoding the polypeptides. The improved lidium sp., polypeptides may have improved specific activity and/or an amino acid sequence which has at least 80% identity improved stability in particular improved thermostability. 65 with the polypeptide encoded by the cellobiohydrolase I The polypeptides may also have an improved ability to resist encoding part of the nucleotide sequence present in Scyta inhibition by cellobiose. lidium thermophilum, US 9, 187,739 B2 3 4 an amino acid sequence which has at least 80% identity an amino acid sequence encoded by the cellobiohydrolase with the polypeptide encoded by the cellobiohydrolase I I encoding part of the nucleotide sequence present in Humi encoding part of the nucleotide sequence present in Ther cola nigrescens CBS 819.73, moaSCuS aurantiacus, an amino acid sequence encoded by the cellobiohydrolase an amino acid sequence which has at least 80% identity I encoding part of the nucleotide sequence present in Clad with the polypeptide encoded by the cellobiohydrolase I orrhinum foecundissimum CBS 427.97, encoding part of the nucleotide sequence present in Thielavia an amino acid sequence encoded by the cellobiohydrolase australiensis, I encoding part of the nucleotide sequence present in Diplo an amino acid sequence which has at least 70% identity dia gossypina CBS 247.96, with the polypeptide encoded by the cellobiohydrolase I 10 encoding part of the nucleotide sequence present in Verticil an amino acid sequence encoded by the cellobiohydrolase lium tenerum, I encoding part of the nucleotide sequence present in Myce an amino acid sequence which has at least 70% identity liophthora thermophila CBS 117.65, with the polypeptide encoded by the cellobiohydrolase I an amino acid sequence encoded by the cellobiohydrolase encoding part of the nucleotide sequence present in Neoter 15 I encoding part of the nucleotide sequence present in Rhizo File:S CdStanett.S, mucorpusillus CBS 109471, an amino acid sequence which has at least 80% identity an amino acid sequence encoded by the cellobiohydrolase with the polypeptide encoded by the cellobiohydrolase I I encoding part of the nucleotide sequence present in Meripi encoding part of the nucleotide sequence present in Melano lus giganteus CBS 521.95, carpus albomyces, an amino acid sequence encoded by the cellobiohydrolase an amino acid sequence which has at least 80% identity I encoding part of the nucleotide sequence present in Exidia with the polypeptide encoded by the cellobiohydrolase I glandulosa CBS 2377.96, encoding part of the nucleotide sequence present in Acremo an amino acid sequence encoded by the cellobiohydrolase nium sp., I encoding part of the nucleotide sequence present in Xylaria an amino acid sequence which has at least 80% identity 25 hypoxylon CBS 284.96, with the polypeptide encoded by the cellobiohydrolase I an amino acid sequence encoded by the cellobiohydrolase encoding part of the nucleotide sequence present in Chaeto I encoding part of the nucleotide sequence present in Tri midium pingtungium, chophaea saccata CBS 804.70, an amino acid sequence which has at least 80% identity an amino acid sequence encoded by the cellobiohydrolase with the polypeptide encoded by the cellobiohydrolase I 30 encoding part of the nucleotide sequence present in Sporot I encoding part of the nucleotide sequence present in Chaeto richum pruinosum, mium sp., an amino acid sequence which has at least 80% identity an amino acid sequence encoded by the cellobiohydrolase with the polypeptide encoded by the cellobiohydrolase I I encoding part of the nucleotide sequence present in Myce encoding part of the nucleotide sequence present in Diplodia 35 liophthora hinnulea, gOSSpina, an amino acid sequence encoded by the cellobiohydrolase an amino acid sequence which has at least 80% identity I encoding part of the nucleotide sequence present in Thiella with the polypeptide encoded by the cellobiohydrolase I via cf. microspora, encoding part of the nucleotide sequence present in Tri an amino acid sequence encoded by the cellobiohydrolase chophaea Saccata, 40 I encoding part of the nucleotide sequence present in an amino acid sequence which has at least 80% identity Aspergillus sp., with the polypeptide encoded by the cellobiohydrolase I an amino acid sequence encoded by the cellobiohydrolase encoding part of the nucleotide sequence present in Myce I encoding part of the nucleotide sequence present in Scopu liophthora thermophila, lariopsis sp., an amino acid sequence which has at least 80% identity 45 an amino acid sequence encoded by the cellobiohydrolase with the polypeptide encoded by the cellobiohydrolase I I encoding part of the nucleotide sequence present in encoding part of the nucleotide sequence present in Exidia sp., glandulosa, an amino acid sequence encoded by the cellobiohydrolase an amino acid sequence which has at least 80% identity I encoding part of the nucleotide sequence present in Verti with the polypeptide encoded by the cellobiohydrolase I 50 cillium sp., and encoding part of the nucleotide sequence present in Xylaria an amino acid sequence encoded by the cellobiohydrolase hypoxylon, I encoding part of the nucleotide sequence present in Phy an amino acid sequence which has at least 80% identity tophthora infestans, with the polypeptide encoded by the cellobiohydrolase I (c) a polypeptide comprising an amino acid sequence encoding part of the nucleotide sequence present in Poitrasia 55 selected from the group consisting of: circinans, an amino acid sequence which has at least 80% identity an amino acid sequence which has at least 80% identity with the polypeptide encoded by nucleotides 1 to 1578 of with the polypeptide encoded by the cellobiohydrolase I SEQID NO:1, encoding part of the nucleotide sequence present in Coprinus an amino acid sequence which has at least 80% identity cinereus, 60 with the polypeptide encoded by nucleotides 1 to 1587 of an amino acid sequence which has at least 80% identity SEQID NO:3, with the polypeptide encoded by the cellobiohydrolase I an amino acid sequence which has at least 80% identity encoding part of the nucleotide sequence present in with the polypeptide encoded by nucleotides 1 to 1353 of Pseudoplectania nigrella, SEQID NO:5, an amino acid sequence encoded by the cellobiohydrolase 65 an amino acid sequence which has at least 80% identity I encoding part of the nucleotide sequence present in Tri with the polypeptide encoded by nucleotides 1 to 1371 of chothecium roseum IFO 5372, SEQID NO:7, US 9, 187,739 B2 5 6 an amino acid sequence which has at least 80% identity nucleotides 1 to 1353 of SEQID NO:41, with the polypeptide encoded by nucleotides 1 to 1614 of nucleotides 1 to 1341 of SEQID NO:43, SEQID NO:9, nucleotides 1 to 1584 of SEQID NO:45, an amino acid sequence which has at least 70% identity nucleotides 1 to 1368 of SEQID NO:47, with the polypeptide encoded by nucleotides 1 to 1245 of nucleotides 1 to 1395 of SEQID NO:49, SEQID NO:11, nucleotides 1 to 1383 of SEQID NO:51, an amino acid sequence which has at least 70% identity nucleotides 1 to 1353 of SEQID NO:53, with the polypeptide encoded by nucleotides 1 to 1341 of nucleotides 1 to 1599 of SEQID NO:55, SEQID NO:13, nucleotides 1 to 1383 of SEQID NO:57, an amino acid sequence which has at least 80% identity 10 nucleotides 1 to 1578 of SEQID NO:59, and with the polypeptide encoded by nucleotides 1 to 1356 of nucleotides 1 to 1371 of SEQID NO:65; SEQID NO:15, (ii) the complementary strand of the nucleotides selected an amino acid sequence which has at least 80% identity from the group consisting of with the polypeptide encoded by nucleotides 1 to 1365 of nucleotides 1 to 500 of SEQID NO:1, SEQID NO:37, 15 nucleotides 1 to 500 of SEQID NO:3, an amino acid sequence which has at least 80% identity nucleotides 1 to 500 of SEQID NO:5, with the polypeptide encoded by nucleotides 1 to 1377 of nucleotides 1 to 500 of SEQID NO:7, SEQID NO:39, nucleotides 1 to 500 of SEQID NO:9, an amino acid sequence which has at least 80% identity nucleotides 1 to 500 of SEQID NO:11, with the polypeptide encoded by nucleotides 1 to 1353 of nucleotides 1 to 500 of SEQID NO:13, SEQID NO:41, nucleotides 1 to 500 of SEQID NO:15, an amino acid sequence which has at least 80% identity nucleotides 1 to 500 of SEQID NO:37, with the polypeptide encoded by nucleotides 1 to 1341 of nucleotides 1 to 500 of SEQID NO:39, SEQID NO:43, nucleotides 1 to 500 of SEQID NO:41, an amino acid sequence which has at least 80% identity 25 nucleotides 1 to 500 of SEQID NO:43, with the polypeptide encoded by nucleotides 1 to 1584 of nucleotides 1 to 500 of SEQID NO:45, SEQID NO:45, nucleotides 1 to 500 of SEQID NO:47, an amino acid sequence which has at least 80% identity nucleotides 1 to 500 of SEQID NO:49, with the polypeptide encoded by nucleotides 1 to 1368 of nucleotides 1 to 500 of SEQID NO:51, SEQID NO:47, 30 nucleotides 1 to 500 of SEQID NO:53, an amino acid sequence which has at least 80% identity nucleotides 1 to 500 of SEQID NO:55, with the polypeptide encoded by nucleotides 1 to 1395 of nucleotides 1 to 500 of SEQID NO:57, SEQID NO:49, nucleotides 1 to 500 of SEQID NO:59, an amino acid sequence which has at least 80% identity nucleotides 1 to 500 of SEQID NO:65, with the polypeptide encoded by nucleotides 1 to 1383 of 35 nucleotides 1 to 221 of SEQID NO:17, SEQID NO:51, nucleotides 1 to 239 of SEQID NO:18, an amino acid sequence which has at least 80% identity nucleotides 1 to 199 of SEQID NO:19, with the polypeptide encoded by nucleotides 1 to 1353 of nucleotides 1 to 191 of SEQID NO:20, SEQID NO:53, nucleotides 1 to 232 of SEQID NO:21, an amino acid sequence which has at least 80% identity 40 nucleotides 1 to 467 of SEQID NO:22, with the polypeptide encoded by nucleotides 1 to 1599 of nucleotides 1 to 534 of SEQID NO:23, SEQID NO:55, nucleotides 1 to 563 of SEQID NO:24, an amino acid sequence which has at least 80% identity nucleotides 1 to 218 of SEQID NO:25, with the polypeptide encoded by nucleotides 1 to 1383 of nucleotides 1 to 492 of SEQID NO:26, SEQID NO:57, 45 nucleotides 1 to 481 of SEQID NO:27, an amino acid sequence which has at least 80% identity nucleotides 1 to 463 of SEQID NO:28, with the polypeptide encoded by nucleotides 1 to 1578 of nucleotides 1 to 513 of SEQID NO:29, SEQID NO:59, and nucleotides 1 to 579 of SEQID NO:30, an amino acid sequence which has at least 80% identity nucleotides 1 to 514 of SEQID NO:31, with the polypeptide encoded by nucleotides 1 to 1371 of 50 nucleotides 1 to 477 of SEQID NO:32, SEQID NO:65; nucleotides 1 to 500 of SEQID NO:33, (d) a polypeptide which is encoded by a nucleotide nucleotides 1 to 470 of SEQID NO:34, sequence which hybridizes under high Stringency conditions nucleotides 1 to 491 of SEQID NO:35, with a polynucleotide probe selected from the group consist nucleotides 1 to 221 of SEQID NO:36, ing of 55 nucleotides 1 to 519 of SEQID NO:61, (i) the complementary strand of the nucleotides selected nucleotides 1 to 497 of SEQID NO:62, from the group consisting of nucleotides 1 to 498 of SEQID NO:63, nucleotides 1 to 1578 of SEQID NO:1, nucleotides 1 to 525 of SEQID NO:64, and nucleotides 1 to 1587 of SEQID NO:3, nucleotides 1 to 951 of SEQID NO:67; and nucleotides 1 to 1353 of SEQID NO:5, 60 (iii) the complementary strand of the nucleotides selected nucleotides 1 to 1371 of SEQID NO:7, from the group consisting of nucleotides 1 to 1614 of SEQID NO:9, nucleotides 1 to 200 of SEQID NO:1, nucleotides 1 to 1245 of SEQID NO:11, nucleotides 1 to 200 of SEQID NO:3, nucleotides 1 to 1341 of SEQID NO:13, nucleotides 1 to 200 of SEQID NO:5, nucleotides 1 to 1356 of SEQID NO:15, 65 nucleotides 1 to 200 of SEQID NO:7, nucleotides 1 to 1365 of SEQID NO:37, nucleotides 1 to 200 of SEQID NO:9, nucleotides 1 to 1377 of SEQID NO:39, nucleotides 1 to 200 of SEQID NO:11, US 9, 187,739 B2 7 8 nucleotides 1 to 200 of SEQID NO:13, weight, at the most 4% at the most 3% by weight, at the most nucleotides 1 to 200 of SEQID NO:15, 2% by weight, at the most 1% by weight, and at the most 96% nucleotides 1 to 200 of SEQID NO:37, by weight). Thus, it is preferred that the substantially pure nucleotides 1 to 200 of SEQID NO:39, polypeptide is at least 92% pure, i.e., that the polypeptide nucleotides 1 to 200 of SEQID NO:41, constitutes at least 92% by weight of the total polypeptide nucleotides 1 to 200 of SEQID NO:43, material present in the preparation, and higher percentages nucleotides 1 to 200 of SEQID NO:45, are preferred such as at least 94% pure, at least 95% pure, at nucleotides 1 to 200 of SEQID NO:47, least 96% pure, at least 96% pure, at least 97% pure, at least nucleotides 1 to 200 of SEQID NO:49, 98% pure, at least 99%, and at the most 99.5% pure. The nucleotides 1 to 200 of SEQID NO:51, 10 polypeptides disclosed herein are preferably in a substantially nucleotides 1 to 200 of SEQID NO:53, pure form. In particular, it is preferred that the polypeptides nucleotides 1 to 200 of SEQID NO:55, disclosed herein are in “essentially pure form', i.e., that the nucleotides 1 to 200 of SEQID NO:57, polypeptide preparation is essentially free of other polypep nucleotides 1 to 200 of SEQID NO:59, and tide material with which it is natively associated. This can be nucleotides 1 to 200 of SEQID NO:65; and 15 accomplished, for example, by preparing the polypeptide by (e) a fragment of (a), (b) or (c) that has cellobiohydrolase I means of well-known recombinant methods. Herein, the term activity. “substantially pure polypeptide' is synonymous with the In a second aspect the present invention relates to a poly terms "isolated polypeptide' and “polypeptide in isolated nucleotide having a nucleotide sequence which encodes for form. the polypeptide of the invention. Cellobiohydrolase I Activity: In a third aspect the present invention relates to a nucleic The term “cellobiohydrolase I activity” is defined hereinas acid construct comprising the nucleotide sequence, which a cellulose 1,4-beta-cellobiosidase (also referred to as Exo encodes for the polypeptide of the invention, operably linked glucanase, Exo-cellobiohydrolase or 1,4-beta-cellobiohydro to one or more control sequences that direct the production of lase) activity, as defined in the enzyme class EC 3.2.1.91, the polypeptide in a Suitable host. 25 which catalyzes the hydrolysis of 1,4-beta-D-glucosidic link In a fourth aspect the present invention relates to a recom ages in cellulose and cellotetraose, releasing cellobiose from binant expression vector comprising the nucleic acid con the reducing ends of the chains. struct of the invention. For purposes of the present invention, cellobiohydrolase I In a fifth aspect the present invention relates to a recombi activity may be determined according to the procedure nant host cell comprising the nucleic acid construct of the 30 described in Example 2. invention. In an embodiment, cellobiohydrolase I activity may be In a sixth aspect the present invention relates to a method determined according to the procedure described in Desh for producing a polypeptide of the invention, the method pande et al., Methods in Enzymology, pp. 126-130 (1988): comprising: “Selective Assay for Exo-1,4-Beta-Glucanases”. According (a) cultivating a strain, which in its wild-type form is 35 to this procedure, one unit of cellobiohydrolase I activity capable of producing the polypeptide, to produce the (agluconic bond cleavage activity) is defined as 1.0 micro polypeptide; and mole of p-nitrophenol produced per minute at 50° C. pH 5.0. (b) recovering the polypeptide. The polypeptides of the present invention should prefer In a seventh aspect the present invention relates to a method ably have at least 20% of the cellobiohydrolase I activity of a for producing a polypeptide of the invention, the method 40 polypeptide consisting of an amino acid sequence selected comprising: from the group consisting of SEQ ID NO:2, SEQ ID NO:4, (a) cultivating a recombinant host cell of the invention SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID under conditions conducive for production of the polypep NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, tide; and SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID (b) recovering the polypeptide. 45 NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, In an eight aspect the present invention relates to a method SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID for in-situ production of a polypeptide of the invention, the NO:60, and SEQIDNO:66. In a particular preferred embodi method comprising: ment, the polypeptides should have at least 40%. Such as at (a) cultivating a recombinant host cell of the invention least 50%, preferably at least 60%, such as at least 70%, more under conditions conducive for production of the polypep 50 preferably at least 80%, such as at least 90%, most preferably tide; and at least 95%, such as about or at least 100% of the cellobio (b) contacting the polypeptide with a desired substrate hydrolase I activity of the polypeptide consisting of the amino without prior recovery of the polypeptide. acid sequence selected from the group consisting of amino Other aspects of the present invention will be apparent acids 1 to 526 of SEQID NO:2, amino acids 1 to 529 of SEQ from the below description and from the appended claims. 55 ID NO:4, amino acids 1 to 451 of SEQID NO:6, amino acids 1 to 457 of SEQID NO:8, amino acids 1 to 538 of SEQ ID DEFINITIONS NO:10, amino acids 1 to 415 of SEQID NO:12, amino acids 1 to 447 of SEQID NO:14, amino acids 1 to 452 of SEQID Prior to discussing the present invention in further details, NO:16, amino acids 1 to 454 of SEQID NO:38, amino acids the following terms and conventions will first be defined: 60 1 to 458 of SEQID NO:40, amino acids 1 to 450 of SEQID Substantially Pure Polypeptide: NO:42, amino acids 1 to 446 of SEQID NO:44, amino acids In the present context, the term “substantially pure 1 to 527 of SEQID NO:46, amino acids 1 to 455 of SEQID polypeptide' means a polypeptide preparation which con NO:48, amino acids 1 to 464 of SEQID NO:50, amino acids tains at the most 10% by weight of other polypeptide material 1 to 460 of SEQID NO:52, amino acids 1 to 450 of SEQID with which it is natively associated (lower percentages of 65 NO:54, amino acids 1 to 532 of SEQID NO:56, amino acids other polypeptide material are preferred, e.g., at the most 8% 1 to 460 of SEQID NO:58, amino acids 1 to 525 of SEQID by weight, at the most 6% by weight, at the most 5% by NO:60, and amino acids 1 to 456 of SEQID NO:66. US 9, 187,739 B2 9 10 Identity: SEQID NO:56, amino acids 200 to 434 of SEQID NO:58, In the present context, the homology between two amino amino acids 200 to 434 of SEQID NO:60, and amino acids acid sequences or between two nucleotide sequences is 200 to 434 of SEQID NO:66. described by the parameter “identity”. Allelic Variant: For purposes of the present invention, the degree of identity In the present context, the term “allelic variant denotes between two amino acid sequences is determined by using the any of two or more alternative forms of a gene occupying the program FASTA included in version 2.0x of the FASTA pro same chromosomal locus. Allelic variation arises naturally gram package (see Pearson and Lipman, 1988, “Improved through mutation, and may result in polymorphism within Tools for Biological Sequence Analysis, PNAS 85:2444 populations. Gene mutations can be silent (no change in the 2448; and Pearson, 1990, “Rapid and Sensitive Sequence 10 encoded polypeptide) or may encode polypeptides having Comparison with FASTP and FASTA, Methods in Enzymol altered amino acid sequences. An allelic variant of a polypep ogy 183:63-98). The scoring matrix used was BLOSUM50, tide is a polypeptide encoded by an allelic variant of a gene. gap penalty was -12, and gap extension penalty was -2. Substantially Pure Polynucleotide: The degree of identity between two nucleotide sequences is The term “substantially pure polynucleotide' as used determined using the same algorithm and Software package 15 herein refers to a polynucleotide preparation, wherein the as described above. The scoring matrix used was the identity polynucleotide has been removed from its natural genetic matrix, gap penalty was -16, and gap extension penalty was milieu, and is thus free of other extraneous or unwanted -4. coding sequences and is in a form suitable for use within Fragment: genetically engineered protein production systems. Thus, a When used herein, a “fragment of a sequence selected substantially pure polynucleotide contains at the most 10% from the group consisting of SEQID NO:2, SEQ ID NO:4, by weight of other polynucleotide material with which it is SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID natively associated (lower percentages of other polynucle NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, otide material are preferred, e.g., at the most 8% by weight, at SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID the most 6% by weight, at the most 5% by weight, at the most NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, 25 4% at the most 3% by weight, at the most 2% by weight, at the SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID most 1% by weight, and at the most /3% by weight). A NO:60, and SEQ ID NO:66 is a polypeptide having one or Substantially pure polynucleotide may, however, include more amino acids deleted from the amino and/or carboxyl naturally occurring 5' and 3' untranslated regions, such as terminus of this amino acid sequence. Preferably, a fragment promoters and terminators. It is preferred that the substan is a polypeptide having the amino acid sequence deleted 30 tially pure polynucleotide is at least 92% pure, i.e., that the corresponding to the “cellulose-binding domain and/or the polynucleotide constitutes at least 92% by weight of the total “linker domain” of Trichoderma reesei cellobiohydrolase I as polynucleotide material present in the preparation, and higher described in SWISS-PROT accession number P00725. More percentages are preferred such as at least 94% pure, at least preferably, a fragment comprises the amino acid sequence 95% pure, at least 96% pure, at least 96% pure, at least 97% corresponding to the “catalytic domain of Trichoderma 35 pure, at least 98% pure, at least 99%, and at the most 99.5% reesei cellobiohydrolase I as described in SWISS-PROT pure. The polynucleotides disclosed herein are preferably in a accession number P00725. Most preferably, a fragment con substantially pure form. In particular, it is preferred that the tains at least 434 amino acid residues, e.g., the amino acid polynucleotides disclosed herein are in “essentially pure residues selected from the group consisting of amino acids 1 form', i.e., that the polynucleotide preparation is essentially to 434 of SEQ ID NO:2, amino acids 1 to 434 of SEQ ID 40 free of other polynucleotide material with which it is natively NO:4, amino acids 1 to 434 of SEQID NO:6, amino acids 1 associated. Herein, the term "substantially pure polynucle to 434 of SEQ ID NO:8, amino acids 1 to 434 of SEQ ID otide' is synonymous with the terms “isolated polynucle NO:10, amino acids 1 to 434 of SEQID NO:14, amino acids otide' and “polynucleotide in isolated form'. 1 to 434 of SEQID NO:16, amino acids 1 to 434 of SEQID Modification(s): NO:38, amino acids 1 to 434 of SEQID NO:40, amino acids 45 In the context of the present invention the term “modifica 1 to 434 of SEQID NO:42, amino acids 1 to 434 of SEQID tion(s) is intended to mean any chemical modification of a NO:44, amino acids 1 to 434 of SEQID NO:46, amino acids polypeptide consisting of an amino acid sequence selected 1 to 434 of SEQID NO:48, amino acids 1 to 434 of SEQID from the group consisting of SEQ ID NO:2, SEQ ID NO:4, NO:50, amino acids 1 to 434 of SEQID NO:52, amino acids SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID 1 to 434 of SEQID NO:54, amino acids 1 to 434 of SEQID 50 NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, NO:56, amino acids 1 to 434 of SEQID NO:58, amino acids SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID 1 to 434 of SEQID NO:60, and amino acids 1 to 434 of SEQ NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, ID NO:66. In particular, a fragment contains at least 215 SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID amino acid residues, e.g., the amino acid residues selected NO:60, and SEQID NO:66, as well as genetic manipulation from the group consisting of amino acids 200 to 434 of SEQ 55 of the DNA encoding that polypeptide. The modification(s) ID NO:2, amino acids 200 to 434 of SEQ ID NO:4, amino can be replacement(s) of the amino acid side chain(s). Sub acids 200 to 434 of SEQID NO:6, amino acids 200 to 434 of stitution(s), deletion(s) and/or insertions(s) in or at the amino SEQ ID NO:8, amino acids 200 to 434 of SEQ ID NO:10, acid(s) of interest. amino acids 200 to 415 of SEQID NO:12, amino acids 200 to Artificial Variant: 434 of SEQ ID NO:14, amino acids 200 to 434 of SEQ ID 60 When used herein, the term “artificial variant’ means a NO:16, amino acids 200 to 434 of SEQ ID NO:38, amino polypeptide having cellobiohydrolase I activity, which has acids 200 to 434 of SEQID NO:40, amino acids 200 to 434 of been produced by an organism which is expressing a modified SEQ ID NO:42, amino acids 200 to 434 of SEQID NO:44, gene as compared to SEQID NO:1, SEQID NO:3, SEQID amino acids 200 to 434 of SEQID NO:46, amino acids 200 to NO:5, SEQIDNO:7, SEQIDNO:9, SEQID NO:11, SEQID 434 of SEQ ID NO:48, amino acids 200 to 434 of SEQ ID 65 NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, NO:50, amino acids 200 to 434 of SEQ ID NO:52, amino SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID acids 200 to 434 of SEQID NO:54, amino acids 200 to 434 of NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, US 9, 187,739 B2 11 12 SEQID NO:55, SEQID NO:57, SEQID NO:59, or SEQ ID but not limited to, transcription, post-transcriptional modifi NO:65. The modified gene, from which said variant is pro cation, translation, post-translational modification, and secre duced when expressed in a Suitable host, is obtained through tion. human intervention by modification of a nucleotide sequence Expression Vector: selected from the group consisting of SEQID NO:1, SEQID 5 In the present context, the term “expression vector' covers NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, SEQID a DNA molecule, linear or circular, that comprises a segment NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, encoding a polypeptide of the invention, and which is oper SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID ably linked to additional segments that provide for its tran NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, Scription. SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID 10 Host Cell: NO:59, and SEQID NO:65. The term “host cell, as used herein, includes any cell type cDNA: which is susceptible to transformation with a nucleic acid COnStruct. The term “cDNA” when used in the present context, is The terms “polynucleotide probe'. “hybridization” as well intended to cover a DNA molecule which can be prepared by 15 as the various stringency conditions are defined in the section reverse transcription from a mature, spliced, mRNA molecule entitled “Polypeptides Having Cellobiohydrolase I Activity”. derived from a eukaryotic cell. cDNA lacks the intron Thermostability: sequences that are usually present in the corresponding The term “thermostability', as used herein, is measured as genomic DNA. The initial, primary RNA transcript is a pre described in Example 2. cursor to mRNA and it goes through a series of processing events before appearing as mature spliced mRNA. These DETAILED DESCRIPTION OF THE INVENTION events include the removal of intron sequences by a process called splicing. When cINA is derived from mRNA it there Polypeptides Having Cellobiohydrolase I Activity fore lacks intron sequences. In a first embodiment, the present invention relates to Nucleic Acid Construct: 25 polypeptides having cellobiohydrolase I activity and where When used herein, the term “nucleic acid construct” means the polypeptides comprises, preferably consists of an amino a nucleic acid molecule, either single- or double-stranded, acid sequence which has a degree of identity to an amino acid which is isolated from a naturally occurring gene or which has sequence selected from the group consisting of SEQ ID been modified to contain segments of nucleic acids in a man NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID ner that would not otherwise exist in nature. The term nucleic 30 NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, acid construct is synonymous with the term “expression cas SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID sette' when the nucleic acid construct contains the control NO:44, SEQ ID NO:46, SEQID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID sequences required for expression of a coding sequence of the NO:58, SEQID NO:60, and SEQID NO:66 (i.e., the mature present invention. 35 polypeptide) of at least 65%, preferably at least 70%, e.g., at Control Sequence: least 75%, more preferably at least 80%, such as at least 85%, The term “control sequences” is defined herein to include even more preferably at least 90%, most preferably at least all components, which are necessary or advantageous for the 95%, e.g., at least 96%, such as at least 97%, and even most expression of a polypeptide of the present invention. Each preferably at least 98%, such as at least 99% (hereinafter control sequence may be native or foreign to the nucleotide 40 “homologous polypeptides’). In an interesting embodiment, sequence encoding the polypeptide. Such control sequences the amino acid sequence differs by at the most ten amino acids include, but are not limited to, a leader, polyadenylation (e.g., by ten amino acids), in particular by at the most five sequence, propeptide sequence, promoter, signal peptide amino acids (e.g., by five amino acids). Such as by at the most sequence, and transcription terminator. At a minimum, the four amino acids (e.g., by four amino acids), e.g., by at the control sequences include a promoter, and transcriptional and 45 most three amino acids (e.g., by three amino acids) from an translational stop signals. The control sequences may be pro amino acid sequence selected from the group consisting of vided with linkers for the purpose of introducing specific SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, restriction sites facilitating ligation of the control sequences SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID with the coding region of the nucleotide sequence encoding a NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, polypeptide. 50 SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID Operably Linked: NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, The term “operably linked' is defined herein as a configu SEQ ID NO:58, SEQID NO:60, and SEQ ID NO:66. In a ration in which a control sequence is appropriately placed at particular interesting embodiment, the amino acid sequence a position relative to the coding sequence of the DNA differs by at the most two amino acids (e.g., by two amino sequence Such that the control sequence directs the expres 55 acids). Such as by one amino acid from an amino acid sion of a polypeptide. sequence selected from the group consisting of SEQ ID Coding Sequence: NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID When used herein the term “coding sequence' is intended NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, to cover a nucleotide sequence, which directly specifies the SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID amino acid sequence of its protein product. The boundaries of 60 NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, the coding sequence are generally determined by an open SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID reading frame, which usually begins with the ATG start NO:58, SEQID NO:60, and SEQID NO:66. codon. The coding sequence typically include DNA, cDNA, Preferably, the polypeptides of the present invention com and recombinant nucleotide sequences. prise an amino acid sequence selected from the group con Expression: 65 sisting of SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQ In the present context, the term "expression' includes any ID NO:8, SEQID NO:10, SEQID NO:12, SEQID NO:14, step involved in the production of the polypeptide including, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID US 9, 187,739 B2 13 14 NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, In an interesting embodiment of the invention, the amino SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID acid changes are of Such a nature that the physico-chemical NO:56, SEQID NO:58, SEQID NO:60, and SEQID NO:66; properties of the polypeptides are altered. For example, an allelic variant thereof; or a fragment thereof that has cel amino acid changes may be performed, which improve the lobiohydrolase I activity. In another preferred embodiment, thermal stability of the polypeptide, which alter the substrate the polypeptide of the present invention consists of an amino specificity, which changes the pH optimum, and the like. acid sequence selected from the group consisting of SEQID Preferably, the number of such substitutions, deletions and/ NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID or insertions as compared to an amino acid sequence selected NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID 10 SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:58, SEQID NO:60, and SEQID NO:66. NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, The polypeptide of the invention may be a wild-type cel SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID lobiohydrolase I identified and isolated from a natural source. 15 NO:60, and SEQID NO:66 is at the most 10, such as at the Such wild-type polypeptides may be specifically screened for most 9, e.g., at the most 8, more preferably at the most 7, e.g., by standard techniques known in the art, Such as molecular at the most 6, Such as at the most 5, most preferably at the most screening as described in Example 1. Furthermore, the 4, e.g., at the most 3, Such as at the most 2, in particular at the polypeptide of the invention may be prepared by the DNA most 1. shuffling technique, such as described in Ness et al., Nature The present inventors have isolated nucleotide sequences Biotechnology 17: 893-896 (1999). Moreover, the polypep encoding polypeptides having cellobiohydrolase I activity tide of the invention may be an artificial variant which com from the microorganisms selected from the group consisting prises, preferably consists of an amino acid sequence that has of Acremonium thermophilum, Chaetomium thermophilum, at least one Substitution, deletion and/or insertion of an amino Scytalidium sp., Scytalidium thermophilum, Thermoascus acid as compared to an amino acid sequence selected from the 25 aurantiacus, Thielavia australiensis, Verticillium tenerum, group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID Melanocarpus albomyces, Poitrasia circinans, Coprinus NO:6, SEQID NO:8, SEQID NO:10, SEQID NO:12, SEQ cinereus, Trichothecium roseum, Humicola nigrescens, Cla ID NO:14, SEQID NO:16, SEQID NO:38, SEQID NO:40, dorrhinum foecundissimum, Diplodia gossypina, Mycelioph SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID thora thermophila, Rhizomucor pusillus, Meripilus gigan NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, 30 teus, Exidia glandulosa, Xylaria hypoxylon, Trichophaea SEQID NO:56, SEQID NO:58, SEQID NO:60, and SEQID saccata, Acremonium sp., Chaetomium sp., Chaetomidium NO:66. Such artificial variants may be constructed by stan pingtungium, Myceliophthora thermophila, Myceliophthora dard techniques known in the art, Such as by site-directed/ hinnulea, Sporotrichum pruinosum, Thielavia cf. random mutagenesis of the polypeptide comprising an amino microspora, Aspergillus sp., Scopulariopsis sp., Fusarium acid sequence selected from the group consisting of SEQID 35 sp., Verticillium sp., Pseudoplectania nigrella, and Phytoph NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID thora infestans; and from the gut of the termite larvae Neo NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, termes castaneus. Thus, in a second embodiment, the present SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID invention relates to polypeptides comprising an amino acid NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, sequence which has at least 65% identity with the polypeptide SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID 40 encoded by the cellobiohydrolase I encoding part of the NO:58, SEQID NO:60, and SEQID NO:66. In one embodi nucleotide sequence present in an organism selected from the ment of the invention, amino acid changes (in the artificial group consisting of Acremonium thermophilum, Chaeto variant as well as in wild-type polypeptides) are of a minor mium thermophilum, Scytalidium sp., Scytalidium thermo nature, that is conservative amino acid substitutions that do philum, Thermoascus aurantiacus, Thielavia australiensis, not significantly affect the folding and/or activity of the pro 45 Verticillium tenerum, Neotermes castaneus, Melanocarpus tein; Small deletions, typically of one to about 30 amino acids; albomyces, Poitrasia circinans, Coprinus cinereus, Trichoth Small amino- or carboxyl-terminal extensions. Such as an ecium roseum IFO 5372, Humicola nigrescens CBS 819.73, amino-terminal methionine residue; a small linker peptide of Cladorrhinum foecundissimum CBS 427.97, Diplodia gos up to about 20-25 residues; or a small extension that facili spina CBS 247.96, Myceliophthora thermophila CBS tates purification by changing net charge or another function, 50 117.65, Rhizomucorpusillus CBS 109471, Meripilus gigan Such as a poly-histidine tract, an antigenic epitope or a bind teus CBS 521.95, Exidia glandulosa CBS 2377.96, Xylaria ing domain. hypoxylon CBS 284.96, Trichophaea saccata CBS 804.70, Examples of conservative substitutions are within the Acremonium sp., Chaetomium sp., Chaetomidium ping group of basic amino acids (arginine, lysine and histidine), tungium, Myceliophthora thermophila, Myceliophthora hin acidic amino acids (glutamic acid and aspartic acid), polar 55 nulea, Sporotrichum pruinosum, Thielavia cf. microspora, amino acids (glutamine and asparagine), hydrophobic amino Aspergillus sp., Scopulariopsis sp., Fusarium sp., Verticil acids (leucine, isoleucine, Valine and methionine), aromatic lium sp., Pseudoplectania nigrella, and Phytophthora amino acids (phenylalanine, tryptophan and tyrosine), and infestans. In an interesting embodiment of the invention, the Small amino acids (glycine, alanine, serine and threonine). polypeptide comprises an amino acid sequence which has at Amino acid Substitutions which do not generally alter the 60 least 70%, e.g., at least 75%, preferably at least 80%, such as specific activity are known in the art and are described, for at least 85%, more preferably at least 90%, most preferably at example, by H. Neurath and R. L. Hill, 1979, In, The Proteins, least 95%, e.g., at least 96%, such as at least 97%, and even Academic Press, New York. The most commonly occurring most preferably at least 98%, such as at least 99% identity exchanges are Ala/Ser, Val/Ile, Asp/Glu, Thr/Ser, Ala/Gly, with the polypeptide encoded by the cellobiohydrolase I Ala/Thr, Ser/Asn, Ala/Val, Ser/Gly, Tyr/Phe, Ala/Pro, Lys/ 65 encoding part of the nucleotide sequence present in an organ Arg, Asp/ASn, Leu/Ile, Leu/Val, Ala/Glu, and Asp/Gly as well ism selected from the group consisting of Acremonium ther as these in reverse. mophilum, Chaetonium thermophilum, Scytalidium sp., US 9, 187,739 B2 15 16 Scytalidium thermophilum, Thermoascus aurantiacus, 0582, CGMCC No. 0583, CBS 109513, DSM 14348, Thielavia australiensis, Verticillium tenerum, Neotermes cas CGMCC No. 0580, DSM 15064, DSM 15065, DSM 15066, taneus, Melanocarpus albomyces, Poitrasia circinans, DSM 15067, CGMCC No. 0747, CGMCC No. 0748, Coprinus cinereus, Trichothecium roseum IFO 5372, Humi CGMCC No. 0749, and CGMCC No. 0750. In another pre cola nigrescens CBS 819.73, Cladorrhinum foecundissimum ferred embodiment, the polypeptide of the present invention CBS 427.97, Diplodia gossypina CBS 247.96, Mycelioph consists of the amino acid sequence of the polypeptide thora thermophila CBS 117.65, Rhizomucor pusillus CBS encoded by the cellobiohydrolase I encoding part of the 109471, Meripilus giganteus CBS 521.95, Exidia glandulosa nucleotide sequence inserted into a plasmid present in a CBS 2377.96, Xylaria hypoxylon CBS 284.96, Trichophaea deposited microorganism selected from the group consisting saccata CBS 804.70, Acremonium sp., Chaetomium sp., Cha 10 of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. etomidium pingtungium, Myceliophthora thermophila, 0585, CGMCC No. 0582, CGMCC No. 0583, CBS 109513, Myceliophthora hinnulea, Sporotrichum pruinosum, Thiella DSM 14348, and CGMCC No. 0580, DSM 15064, DSM via cf. microspora, Aspergillus sp., Scopulariopsis sp., 15065, DSM 15066, DSM 15067, CGMCC No. 0747, Fusarium sp., Verticillium sp., Pseudoplectania nigrella, and CGMCC No. 0748, CGMCC No. 0749, and CGMCC No. Phytophthora infestans (hereinafter "homologous polypep 15 O750. tides’). In an interesting embodiment, the amino acid In a similar way as described above, the polypeptide of the sequence differs by at the most ten amino acids (e.g., by ten invention may be an artificial variant which comprises, pref amino acids), in particular by at the most five amino acids erably consists of anamino acid sequence that has at least one (e.g., by five amino acids), such as by at the most four amino Substitution, deletion and/or insertion of an amino acid as acids (e.g., by four amino acids), e.g., by at the most three compared to the amino acid sequence encoded by the cello amino acids (e.g., by three amino acids) from the polypeptide biohydrolase I encoding part of the nucleotide sequence encoded by the cellobiohydrolase I encoding part of the inserted into a plasmid present in a deposited microorganism nucleotide sequence present in an organism selected from the selected from the group consisting of CGMCC No. 0584, group consisting of Acremonium thermophilum, Chaeto CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 0582, mium thermophilum, Scytalidium sp., Scytalidium thermo 25 CGMCC No. 0583, CBS 109513, DSM 14348, and CGMCC philum, Thermoascus aurantiacus, Thielavia australiensis, No. 0580, DSM 15064, DSM 15065, DSM 15066, DSM Verticillium tenerum, Neotermes Castaneus, Melanocarpus 15067, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. albomyces, Poitrasia circinans, Coprinus cinereus, Trichoth 0749, and CGMCC No. 0750. ecium roseum IFO 5372, Humicola nigrescens CBS 819.73, In a third embodiment, the present invention relates to Cladorrhinum foecundissimum CBS 427.97, Diplodia gos 30 polypeptides having cellobiohydrolase I activity which are spina CBS 247.96, Myceliophthora thermophila CBS encoded by nucleotide sequences which hybridize under very 117.65, Rhizomucorpusillus CBS 109471, Meripilus gigan low stringency conditions, preferably under low stringency teus CBS 521.95, Exidia glandulosa CBS 2377.96, Xylaria conditions, more preferably under medium stringency condi hypoxylon CBS 284.96, Trichophaea saccata CBS 804.70, tions, more preferably under medium-high Stringency condi Acremonium sp., Chaetonium sp., Chaetomidium ping 35 tions, even more preferably under high Stringency conditions, tungium, Myceliophthora thermophila, Myceliophthora hin and most preferably under very high Stringency conditions nulea, Sporotrichum pruinosum, Thielavia cf. microspora, with a polynucleotide probe selected from the group consist Aspergillus sp., Scopulariopsis sp., Fusarium sp., Verticil ing of lium sp., Pseudoplectania nigrella, and Phytophthora (i) the complementary strand of the nucleotides selected infestans. In a particular interesting embodiment, the amino 40 from the group consisting of acid sequence differs by at the most two amino acids (e.g., by nucleotides 1 to 1578 of SEQID NO:1, two amino acids). Such as by one amino acid from the nucleotides 1 to 1587 of SEQID NO:3, polypeptide encoded by the cellobiohydrolase I encoding part nucleotides 1 to 1353 of SEQID NO:5, of the nucleotide sequence present in an organism selected nucleotides 1 to 1371 of SEQID NO:7, from the group consisting of Acremonium thermophilum, 45 nucleotides 1 to 1614 of SEQID NO:9, Chaetomium thermophilum, Scytalidium sp., Scytalidium nucleotides 1 to 1245 of SEQID NO:11, thermophilum, Thermoascus aurantiacus, Thielavia aus nucleotides 1 to 1341 of SEQID NO:13, traliensis, Verticillium tenerum, Neotermes castaneus, Mel nucleotides 1 to 1356 of SEQID NO:15, anocarpus albomyces, Poitrasia circinans, Coprinus nucleotides 1 to 1365 of SEQID NO:37, cinereus, Trichothecium roseum IFO 5372, Humicola nigre 50 nucleotides 1 to 1377 of SEQID NO:39, scens CBS 819.73, Cladorrhinum foecundissimum CBS nucleotides 1 to 1353 of SEQID NO:41, 427.97, Diplodia gossypina CBS 247.96, Myceliophthora nucleotides 1 to 1341 of SEQID NO:43, thermophila CBS 117.65, Rhizomucorpusillus CBS 109471, nucleotides 1 to 1584 of SEQID NO:45, Meripilus giganteus CBS 521.95, Exidia glandulosa CBS nucleotides 1 to 1368 of SEQID NO:47, 2377.96, Xvlaria hypoxylon CBS 284.96, Trichophaea sac 55 nucleotides 1 to 1395 of SEQID NO:49, cata CBS 804.70, Acremonium sp., Chaetomium sp., Chaeto nucleotides 1 to 1383 of SEQID NO:51, midium pingtungium, Myceliophthora thermophila, Myce nucleotides 1 to 1353 of SEQID NO:53, liophthora hinnulea, Sporotrichum pruinosum, Thielavia cf. nucleotides 1 to 1599 of SEQID NO:55, microspora, Aspergillus sp., Scopulariopsis sp., Fusarium nucleotides 1 to 1383 of SEQID NO:57, sp., Verticillium sp., Pseudoplectania nigrella, and Phytoph 60 nucleotides 1 to 1578 of SEQID NO:59, and thora infestans. nucleotides 1 to 1371 of SEQID NO:65; Preferably, the polypeptides of the present invention com (ii) the complementary strand of the nucleotides selected prise the amino acid sequence of the polypeptide encoded by from the group consisting of the cellobiohydrolase I encoding part of the nucleotide nucleotides 1 to 500 of SEQID NO:1, sequence inserted into a plasmid presentina deposited micro 65 nucleotides 1 to 500 of SEQID NO:3, organism selected from the group consisting of CGMCCNo. nucleotides 1 to 500 of SEQID NO:5, 0584, CGMCC No. 0581, CGMCC No. 0585, CGMCC No. nucleotides 1 to 500 of SEQID NO:7, US 9, 187,739 B2 17 18 nucleotides 1 to 500 of SEQID NO:9, In another embodiment, the present invention relates to nucleotides 1 to 500 of SEQID NO:11, polypeptides having cellobiohydrolase I activity which are nucleotides 1 to 500 of SEQID NO:13, encoded by the cellobiohydrolase I encoding part of the nucleotides 1 to 500 of SEQID NO:15, nucleotide sequence present in a microorganism selected nucleotides 1 to 500 of SEQID NO:37, from the group consisting of nucleotides 1 to 500 of SEQID NO:39, a microorganism belonging to Zygomycota, preferably nucleotides 1 to 500 of SEQID NO:41, belonging to the Mucorales, more preferably belonging to the nucleotides 1 to 500 of SEQID NO:43, family Mucoraceae, most preferably belonging to the genus nucleotides 1 to 500 of SEQID NO:45, Rhizomucor (e.g., Rhizomucorpusillus), or the family Choa nucleotides 1 to 500 of SEQID NO:47, 10 nephoraceae, most preferably belonging to the genus Poitra nucleotides 1 to 500 of SEQID NO:49, sia (e.g., Poitrasia circinans), nucleotides 1 to 500 of SEQID NO:51, a microorganism belonging to the Oomycetes, preferably nucleotides 1 to 500 of SEQID NO:53, to the order Pythiales, more preferably to the family Pythi nucleotides 1 to 500 of SEQID NO:55, aceae, most preferably to the genus Phytophthora (e.g., Phy nucleotides 1 to 500 of SEQID NO:57, 15 nucleotides 1 to 500 of SEQID NO:59, tophthora infestans), nucleotides 1 to 500 of SEQID NO:65, a microorganism belonging to Auriculariales (an order of nucleotides 1 to 221 of SEQID NO:17, the Basidiomycota, Hymenomycetes), preferably belonging nucleotides 1 to 239 of SEQID NO:18, to the family Exidiaceae, more preferably belonging to the nucleotides 1 to 199 of SEQID NO:19, genus Exidia (e.g., Exidia glandulosa), nucleotides 1 to 191 of SEQID NO:20, a microorganism belonging to Xylariales (an order of the nucleotides 1 to 232 of SEQID NO:21, , ), preferably belonging to the nucleotides 1 to 467 of SEQID NO:22, family Xylariaceae, more preferably belonging to the genus nucleotides 1 to 534 of SEQID NO:23, Xylaria (e.g., Xylaria hypoxylon), nucleotides 1 to 563 of SEQID NO:24, 25 a microorganism belonging to Dothideales (an order of the nucleotides 1 to 218 of SEQID NO:25, Ascomycota, Dothideomycetes), preferably belonging to the nucleotides 1 to 492 of SEQID NO:26, family Dothideaceae, more preferably belonging to the genus nucleotides 1 to 481 of SEQID NO:27, Diplodia (e.g., Diplodia gossypina), nucleotides 1 to 463 of SEQID NO:28, a microorganism belonging to Pezizales (an order of the nucleotides 1 to 513 of SEQID NO:29, 30 Ascomycota), preferably belonging to the family Pyrone nucleotides 1 to 579 of SEQID NO:30, mataceae, more preferably belonging to the genus Tri nucleotides 1 to 514 of SEQID NO:31, chophaea (e.g., Trichophaea saccata), or the family Sarco nucleotides 1 to 477 of SEQID NO:32, Somataceae, more preferably belonging to the genus nucleotides 1 to 500 of SEQID NO:33, Pseudoplectania (e.g., Pseudoplectania nigrella), nucleotides 1 to 470 of SEQID NO:34, 35 a microorganism belonging to the family Rigidiporaceae nucleotides 1 to 491 of SEQID NO:35, (under Basidiomycota, Hymenomycetes, Hymenomyc nucleotides 1 to 221 of SEQID NO:36, etales), more preferably belonging to the genus Meripilus nucleotides 1 to 519 of SEQID NO:61, (e.g., Meripilus giganteus), nucleotides 1 to 497 of SEQID NO:62, a microorganism belonging to the family Meruliaceae (un nucleotides 1 to 498 of SEQID NO:63, 40 der Basidiomycota, Hymenomycetes, Sterealesales), more nucleotides 1 to 525 of SEQID NO:64, and preferably belonging to the genus Sporothrichum (Sporothri nucleotides 1 to 951 of SEQID NO:67; and chum sp.), (iii) the complementary strand of the nucleotides selected a microorganism belonging to the family Agaricaceae (un from the group consisting of der Basidiomycota, Hymenomycetes, Agaricales), more nucleotides 1 to 200 of SEQID NO:1, 45 preferably belonging to the genus Coprinus (e.g., Coprinus nucleotides 1 to 200 of SEQID NO:3, cinereus), nucleotides 1 to 200 of SEQID NO:5, a microorganism belonging to the family Hypocreaceae nucleotides 1 to 200 of SEQID NO:7, (under Ascomycota, Sordariomycetes, ), more nucleotides 1 to 200 of SEQID NO:9, preferably belonging to the genus Acremonium (e.g., Acre nucleotides 1 to 200 of SEQID NO:11, 50 monium thermophilum, Acremonium sp.) or the (mitosporic) nucleotides 1 to 200 of SEQID NO:13, genus Verticillium (e.g., Verticillium tenerum), nucleotides 1 to 200 of SEQID NO:15, a microorganism belonging to the genus Cladorrhinum nucleotides 1 to 200 of SEQID NO:37, (under Ascomycota, Sordariomycetes, Sordariales, Sordari nucleotides 1 to 200 of SEQID NO:39, aceae) e.g., Cladorrhinum foecundissimum, nucleotides 1 to 200 of SEQID NO:41, 55 a microorganism belonging to the genus Myceliophthora nucleotides 1 to 200 of SEQID NO:43, (under Ascomycota, Sordariomycetes, Sordariales, Sordari nucleotides 1 to 200 of SEQID NO:45, aceae) e.g., Myceliophthora thermophila or Myceliophthora nucleotides 1 to 200 of SEQID NO:47, hinnulae, nucleotides 1 to 200 of SEQID NO:49, a microorganism belonging to the genus Chaetomium (un nucleotides 1 to 200 of SEQID NO:51, 60 der Ascomycota, Sordariomycetes, Sordariales, Chaetomi nucleotides 1 to 200 of SEQID NO:53, aceae) e.g., Chaetomium thermophilum, nucleotides 1 to 200 of SEQID NO:55, a microorganism belonging to the genus Chaetomidium nucleotides 1 to 200 of SEQID NO:57, (under Ascomycota, Sordariomycetes, Sordariales, Chaeto nucleotides 1 to 200 of SEQID NO:59, and miaceae) e.g., Chaetomidium pingtungium, nucleotides 1 to 200 of SEQID NO:65 65 a microorganism belonging to the genus Thielavia (under (Sambrook et al., 1989, Molecular Cloning, A Laboratory Ascomycota, Sordariomycetes, Sordariales, Chaetomiaceae) Manual, 2d edition, Cold Spring Harbor, N.Y.). e.g., Thielavia australiensis or Thielavia microspora, US 9, 187,739 B2 19 20 a microorganism belonging to the genus Thermoascus (un stringency conditions. Molecules to which the polynucle der Ascomycota, Eurotiomycetes, Eurotiales, Trichoco otide probe hybridizes under these conditions may be moaceae) e.g., Thermoascus aurantiacus, detected using X-ray film or by any other method known in a microorganism belonging to the genus Trichothecium the art. Whenever the term “polynucleotide probe' is used in (mitosporic Ascomycota) e.g., Trichothecium roseum, and the present context, it is to be understood that such a probe a microorganism belonging to the species Humicola nigre contains at least 15 nucleotides. SCa2S. In an interesting embodiment, the polynucleotide probe is A nucleotide sequence selected from the group consisting the complementary strand of the nucleotides selected from of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID the group consisting of NO:7, SEQID NO:9, SEQID NO:11, SEQID NO:13, SEQ 10 nucleotides 1 to 1578 of SEQID NO:1, ID NO:15, SEQID NO:37, SEQID NO:39, SEQID NO:41, nucleotides 1 to 1302 of SEQID NO:1, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID nucleotides 1 to 1587 of SEQID NO:3, NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, nucleotides 1 to 1302 of SEQID NO:3, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:65, SEQ ID nucleotides 1 to 1353 of SEQID NO:5, NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, 15 nucleotides 1 to 1302 of SEQID NO:5, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID nucleotides 1 to 1371 of SEQID NO:7, NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, nucleotides 1 to 1302 of SEQID NO:7, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID nucleotides 1 to 1614 of SEQID NO:9, NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, nucleotides 1 to 1302 of SEQID NO:9, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:61, SEQ ID nucleotides 1 to 1245 of SEQID NO:11, NO:62, SEQID NO:63, SEQID NO:64, and SEQID NO:67, nucleotides 1 to 1341 of SEQID NO:13, or a Subsequence thereof, as well as an amino acid sequence nucleotides 1 to 1302 of SEQID NO:13, selected from the group consisting of SEQID NO:2, SEQID nucleotides 1 to 1356 of SEQID NO:15, NO:4, SEQID NO:6, SEQIDNO:8, SEQID NO:10, SEQID nucleotides 1 to 1302 of SEQID NO:15, NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, 25 nucleotides 1 to 1365 of SEQID NO:37, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID nucleotides 1 to 1302 of SEQID NO:37, NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, nucleotides 1 to 1377 of SEQID NO:39, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID nucleotides 1 to 1302 of SEQID NO:39, NO:60, and SEQID NO:66, or a fragment thereof, may be nucleotides 1 to 1353 of SEQID NO:41, used to design a polynucleotide probe to identify and clone 30 nucleotides 1 to 1302 of SEQID NO:41, DNA encoding polypeptides having cellobiohydrolase I nucleotides 1 to 1341 of SEQID NO:43, activity from strains of different genera or species according nucleotides 1 to 1302 of SEQID NO:43, to methods well known in the art. In particular, such probes nucleotides 1 to 1584 of SEQID NO:45, can be used for hybridization with the genomic or cDNA of nucleotides 1 to 1302 of SEQID NO:45, the genus or species of interest, following standard Southern 35 nucleotides 1 to 1368 of SEQID NO:47, blotting procedures, in order to identify and isolate the cor nucleotides 1 to 1302 of SEQID NO:47, responding gene therein. Such probes can be considerably nucleotides 1 to 1395 of SEQID NO:49, shorter than the entire sequence, but should be at least 15, nucleotides 1 to 1302 of SEQID NO:49, preferably at least 25, more preferably at least 35 nucleotides nucleotides 1 to 1383 of SEQID NO:51, in length, Such as at least 70 nucleotides in length. It is, 40 nucleotides 1 to 1302 of SEQID NO:51, however, preferred that the polynucleotide probe is at least nucleotides 1 to 1353 of SEQID NO:53, 100 nucleotides in length. For example, the polynucleotide nucleotides 1 to 1302 of SEQID NO:53, probe may be at least 200 nucleotides in length, at least 300 nucleotides 1 to 1599 of SEQID NO:55, nucleotides in length, at least 400 nucleotides in length or at nucleotides 1 to 1302 of SEQID NO:55, least 500 nucleotides in length. Even longer probes may be 45 nucleotides 1 to 1383 of SEQID NO:57, used, e.g., polynucleotide probes which are at least 600 nucle nucleotides 1 to 1302 of SEQID NO:57, otides in length, at least 700 nucleotides in length, at least 800 nucleotides 1 to 1578 of SEQID NO:59, nucleotides in length, or at least 900 nucleotides in length. nucleotides 1 to 1302 of SEQID NO:59, Both DNA and RNA probes can be used. The probes are nucleotides 1 to 1371 of SEQID NO:65, and typically labeled for detecting the corresponding gene (for 50 nucleotides 1 to 1302 of SEQID NO:65; example, with P. H. S., biotin, or avidin). or the complementary strand of the nucleotides selected from Thus, a genomic DNA or cDNA library prepared from such the group consisting of other organisms may be screened for DNA which hybridizes nucleotides 1 to 500 of SEQID NO:1, with the probes described above and which encodes a nucleotides 1 to 500 of SEQID NO:3, polypeptide having cellobiohydrolase I activity. Genomic or 55 nucleotides 1 to 500 of SEQID NO:5, other DNA from such other organisms may be separated by nucleotides 1 to 500 of SEQID NO:7, agarose or polyacrylamide gel electrophoresis, or other sepa nucleotides 1 to 500 of SEQID NO:9, ration techniques. DNA from the libraries or the separated nucleotides 1 to 500 of SEQID NO:11, DNA may be transferred to, and immobilized, on nitrocellu nucleotides 1 to 500 of SEQID NO:13, lose or other suitable carrier materials. In order to identify a 60 nucleotides 1 to 500 of SEQID NO:15, clone or DNA which is homologous with SEQID NO:1 the nucleotides 1 to 500 of SEQID NO:37, carrier material with the immobilized DNA is used in a South nucleotides 1 to 500 of SEQID NO:39, ern blot. nucleotides 1 to 500 of SEQID NO:41, For purposes of the present invention, hybridization indi nucleotides 1 to 500 of SEQID NO:43, cates that the nucleotide sequence hybridizes to a labeled 65 nucleotides 1 to 500 of SEQID NO:45, polynucleotide probe which hybridizes to the nucleotide nucleotides 1 to 500 of SEQID NO:47, sequence shown in SEQID NO:1 under very low to very high nucleotides 1 to 500 of SEQID NO:49, US 9, 187,739 B2 21 22 nucleotides 1 to 500 of SEQID NO:51, nucleotides 1 to 200 of SEQID NO:61, nucleotides 1 to 500 of SEQID NO:53, nucleotides 1 to 200 of SEQID NO:62, nucleotides 1 to 500 of SEQID NO:55, nucleotides 1 to 200 of SEQID NO:63, nucleotides 1 to 500 of SEQID NO:57, nucleotides 1 to 200 of SEQID NO:64, and nucleotides 1 to 500 of SEQID NO:59, nucleotides 1 to 200 of SEQID NO:67. nucleotides 1 to 500 of SEQID NO:65, In another interesting embodiment, the polynucleotide nucleotides 1 to 221 of SEQID NO:17, probe is the complementary Strand of the nucleotide sequence nucleotides 1 to 239 of SEQID NO:18, which encodes a polypeptide selected from the group con nucleotides 1 to 199 of SEQID NO:19, sisting of SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQ nucleotides 1 to 191 of SEQID NO:20, 10 ID NO:8, SEQID NO:10, SEQID NO:12, SEQID NO:14, nucleotides 1 to 232 of SEQID NO:21, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID nucleotides 1 to 467 of SEQID NO:22, NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, nucleotides 1 to 534 of SEQID NO:23, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID nucleotides 1 to 563 of SEQID NO:24, NO:56, SEQID NO:58, SEQID NO:60, and SEQID NO:66. nucleotides 1 to 218 of SEQID NO:25, 15 In a furtherinteresting embodiment, the polynucleotide probe nucleotides 1 to 492 of SEQID NO:26, is the complementary Strand of a nucleotide sequence nucleotides 1 to 481 of SEQID NO:27, selected from the group consisting of SEQID NO:1, SEQID nucleotides 1 to 463 of SEQID NO:28, NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, SEQID nucleotides 1 to 513 of SEQID NO:29, NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, nucleotides 1 to 579 of SEQID NO:30, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID nucleotides 1 to 514 of SEQID NO:31, NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, nucleotides 1 to 477 of SEQID NO:32, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID nucleotides 1 to 500 of SEQID NO:33, NO:59, and SEQ ID NO:65. In another interesting embodi nucleotides 1 to 470 of SEQID NO:34, ment, the polynucleotide probe is the complementary strand nucleotides 1 to 491 of SEQID NO:35, 25 of the nucleotide sequence contained in a plasmid which is nucleotides 1 to 221 of SEQID NO:36, contained in a deposited microorganism selected from the nucleotides 1 to 519 of SEQID NO:61, group consisting of CGMCC No. 0584, CGMCC No. 0581, nucleotides 1 to 497 of SEQID NO:62, CGMCC No. 0585, CGMCC No. 0582, CGMCC No. 0583, nucleotides 1 to 498 of SEQID NO:63, CGMCC No. 0580, CBS 109513, DSM 14348, DSM 15064, nucleotides 1 to 525 of SEQID NO:64, and 30 DSM 15065, DSM 15066, DSM 15067, CGMCC No. 0747, nucleotides 1 to 951 of SEQID NO:67: CGMCC No. 0748, CGMCC No. 0749, and CGMCC No. or the complementary strand of the nucleotides selected from O750. the group consisting of For long probes of at least 100 nucleotides in length, very nucleotides 1 to 200 of SEQID NO:1, low to very high stringency conditions are defined as prehy nucleotides 1 to 200 of SEQID NO:3, 35 bridization and hybridization at 42° C. in 5xSSPE, 1.0% nucleotides 1 to 200 of SEQID NO:5, SDS, 5x Denhardt’s solution, 100 micrograms/ml sheared nucleotides 1 to 200 of SEQID NO:7, and denatured salmon sperm DNA, following standard nucleotides 1 to 200 of SEQID NO:9, Southern blotting procedures. Preferably, the long probes of nucleotides 1 to 200 of SEQID NO:11, at least 100 nucleotides do not contain more than 1000 nucle nucleotides 1 to 200 of SEQID NO:13, 40 otides. For long probes of at least 100 nucleotides in length, nucleotides 1 to 200 of SEQID NO:15, the carrier material is finally washed three times each for 15 nucleotides 1 to 200 of SEQID NO:37, minutes using 2xSSC, 0.1% SDS at 42°C. (very low strin nucleotides 1 to 200 of SEQID NO:39, gency), preferably washed three times each for 15 minutes nucleotides 1 to 200 of SEQID NO:41, using 0.5xSSC, 0.1% SDS at 42°C. (low stringency), more nucleotides 1 to 200 of SEQID NO:43, 45 preferably washed three times each for 15 minutes using nucleotides 1 to 200 of SEQID NO:45, 0.2xSSC, 0.1% SDS at 42° C. (medium stringency), even nucleotides 1 to 200 of SEQID NO:47, more preferably washed three times each for 15 minutes using nucleotides 1 to 200 of SEQID NO:49, 0.2xSSC, 0.1% SDS at 55° C. (medium-high stringency), nucleotides 1 to 200 of SEQID NO:51, most preferably washed three times each for 15 minutes using nucleotides 1 to 200 of SEQID NO:53, 50 0.1 xSSC, 0.1% SDS at 60°C. (high stringency), in particular nucleotides 1 to 200 of SEQID NO:55, washed three times each for 15 minutes using 0.1 xSSC, 0.1% nucleotides 1 to 200 of SEQID NO:57, SDS at 68°C. (very high stringency). nucleotides 1 to 200 of SEQID NO:59, Although not particularly preferred, it is contemplated that nucleotides 1 to 200 of SEQID NO:65, shorter probes, e.g., probes which are from about 15 to 99 nucleotides 1 to 200 of SEQID NO:22, 55 nucleotides in length, such as from about 15 to about 70 nucleotides 1 to 200 of SEQID NO:23, nucleotides in length, may be also be used. For Such short nucleotides 1 to 200 of SEQID NO:24, probes, Stringency conditions are defined as prehybridization, nucleotides 1 to 200 of SEQID NO:26, hybridization, and washing post-hybridization at 5°C. to 10° nucleotides 1 to 200 of SEQID NO:27, C. below the calculated T, using the calculation according to nucleotides 1 to 200 of SEQID NO:28, 60 Bolton and McCarthy (1962, Proceedings of the National nucleotides 1 to 200 of SEQID NO:29, Academy of Sciences USA 48:1390) in 0.9 M. NaCl, 0.09 M nucleotides 1 to 200 of SEQID NO:30, Tris-HCl pH 7.6, 6 mM EDTA, 0.5% NP-40, 1XDenhardt’s nucleotides 1 to 200 of SEQID NO:31, Solution, 1 mM Sodium pyrophosphate, 1 mM Sodium nucleotides 1 to 200 of SEQID NO:32, monobasic phosphate, 0.1 mM ATP and 0.2 mg of yeast RNA nucleotides 1 to 200 of SEQID NO:33, 65 per ml following standard Southern blotting procedures. nucleotides 1 to 200 of SEQID NO:34, For short probes which are about 15 nucleotides to 99 nucleotides 1 to 200 of SEQID NO:35, nucleotides in length, the carrier material is washed once in US 9, 187,739 B2 23 24 6xSCC plus 0.1% SDS for 15 minutes and twice each for 15 Myceliophthora hinnulea, Sporotrichum pruinosum, Thiella minutes using 6xSSC at 5° C. to 10°C. below the calculated via cf. microspora, Aspergillus sp., Scopulariopsis sp., Tn. Fusarium sp., Verticillium sp., Pseudoplectania nigrella, or Sources for Polypeptides Having Cellobiohydrolase I Activ Phytophthora infestans polypeptide. ity 5 In a more preferred embodiment, the polypeptide is a Acre A polypeptide of the present invention may be obtained monium thermophilum, Chaetomium thermophilum, Scyta from microorganisms of any genus. For purposes of the lidium sp., Scytalidium thermophilum, Thermoascus auran present invention, the term “obtained from as used herein tiacus, Thielavia australiensis, Verticillium tenerum, shall mean that the polypeptide encoded by the nucleotide Neotermes castaneus, Melanocarpus albomyces, Poitrasia sequence is produced by a cell in which the nucleotide 10 circinans, or Coprinus cinereus polypeptide, e.g., the sequence is naturally present or into which the nucleotide polypeptide consisting of an amino acid sequence selected sequence has been inserted. In a preferred embodiment, the from the group consisting of SEQ ID NO:2, SEQ ID NO:4, polypeptide is secreted extracellularly. SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID A polypeptide of the present invention may be a bacterial NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, polypeptide. For example, the polypeptide may be a gram 15 SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID positive bacterial polypeptide such as a Bacillus polypeptide, NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, e.g., a Bacillus alkalophilus, Bacillus amyloliquefaciens, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID Bacillus brevis, Bacillus circulans, Bacillus coagulans, NO:60, and SEQID NO:66. Bacillus lautus, Bacillus lentus, Bacillus licheniformis, It will be understood that for the aforementioned species, Bacillus megaterium, Bacillus Stearothermophilus, Bacillus the invention encompasses both the perfect and imperfect subtilis, or Bacillus thuringiensis polypeptide; or a Strepto states, and other taxonomic equivalents, e.g., anamorphs, myces polypeptide, e.g., a Streptomyces lividans or Strepto regardless of the species name by which they are known. myces murinus polypeptide; or a gram negative bacterial Those skilled in the art will readily recognize the identity of polypeptide, e.g., an E. coli or a Pseudomonas sp. polypep appropriate equivalents. tide. 25 Strains of these species are readily accessible to the public A polypeptide of the present invention may be a fungal in a number of culture collections, such as the AmericanType polypeptide, and more preferably a yeast polypeptide such as Culture Collection (ATCC), Deutsche Sammlung von Mik a Candida, Kluyveromyces, Neocallinastix, Pichia, Piromy roorganismen and Zellkulturen GmbH (DSMZ), China Gen ces, Saccharomyces, Schizosaccharomyces, or Yarrowia eral Microbiological Culture Collection Center (CGMCC), polypeptide; or more preferably a filamentous fungal 30 Centraalbureau Voor Schimmelcultures (CBS), and Agricul polypeptide Such as an Acremonium, Aspergillus, Aureoba tural Research Service Patent Culture Collection, Northern sidium, Cryptococcus, Filibasidium, Fusarium, Humicola, Regional Research Center (NRRL). Magnaporthe, Mucor, Myceliophthora, Neurospora, Paecilo Furthermore, such polypeptides may be identified and myces, Penicillium, Schizophyllum, Talaromyces, Thermoas obtained from other sources including microorganisms iso cus, Thielavia, Tolypocladium, or Trichoderma polypeptide. 35 lated from nature (e.g., Soil, water, plants, animals, etc.) using In an interesting embodiment, the polypeptide is a Saccha the above-mentioned probes. Techniques for isolating micro romyces Carlsbergensis, Saccharomyces cerevisiae, Saccha organisms from natural habitats are well known in the art. The romyces diastaticus, Saccharomyces douglasii, Saccharomy nucleotide sequence may then be derived by similarly screen ces kluyveri, Saccharomyces norbensis or Saccharomyces ing a genomic or cDNA library of another microorganism. Oviformis polypeptide. 40 Once a nucleotide sequence encoding a polypeptide has been In another interesting embodiment, the polypeptide is an detected with the probe(s), the sequence may be isolated or Aspergillus aculeatus, Aspergillus awamori, Aspergillus foe cloned by utilizing techniques which are known to those of tidus, Aspergillus japonicus, Aspergillus nidulans, Aspergil ordinary skill in the art (see, e.g., Sambrook et al., 1989, lus niger; Aspergillus Oryzae, Fusarium bactridioides, Supra). Fusarium cerealis, Fusarium crookwellense, Fusarium cul 45 Polypeptides encoded by nucleotide sequences of the morum, Fusarium graminearum, Fusarium graminum, present invention also include fused polypeptides or cleav Fusarium heterosporum, Fusarium negundi, Fusarium able fusion polypeptides in which another polypeptide is Oxysporum, Fusarium reticulatum, Fusarium roseum, fused at the N-terminus or the C-terminus of the polypeptide Fusarium Sambucinum, Fusarium sarcochroum, Fusarium or fragment thereof. A fused polypeptide is produced by sporotrichioides, Fusarium sulphureum, Fusarium torulo 50 fusing a nucleotide sequence (or a portion thereof) encoding sum, Fusarium trichothecioides, Fusarium venenatum, another polypeptide to a nucleotide sequence (or a portion Humicola insolens, Humicola lanuginosa, Mucor miehei, thereof) of the present invention. Techniques for producing Myceliophthora thermophila, Neurospora crassa, Penicil fusion polypeptides are known in the art, and include ligating lium purpurogenium, Trichoderma harzianum, Trichoderma the coding sequences encoding the polypeptides so that they koningii, Trichoderma longibrachiatum, Trichoderma reesei, 55 are in frame and that expression of the fused polypeptide is or Trichoderma viride polypeptide. under control of the same promoter(s) and terminator. In a preferred embodiment, the polypeptide is a Acremo Polynucleotides and Nucleotide Sequences nium thermophilum, Chaetomium thermophilum, Scyta The present invention also relates to polynucleotides hav lidium sp., Scytalidium thermophilum, Thermoascus auran ing a nucleotide sequence which encodes for a polypeptide of tiacus, Thielavia australiensis, Verticillium tenerum, 60 the invention. In particular, the present invention relates to Neotermes castaneus, Melanocarpus albomyces, Poitrasia polynucleotides consisting of a nucleotide sequence which circinans, Coprinus cinereus, Trichothecium roseum, Humi encodes for a polypeptide of the invention. In a preferred cola nigrescens, Cladorrhinum foecundissimum, Diplodia embodiment, the nucleotide sequence is selected from the gossypina, Myceliophthora thermophila, Rhizomucor pusil group consisting of SEQ ID NO:1, SEQ ID NO:3, SEQ ID lus, Meripilus giganteus, Exidia glandulosa, Xylaria hypoxy 65 NO:5, SEQIDNO:7, SEQIDNO:9, SEQID NO:11, SEQID lon, Trichophaea saccata, Acremonium sp., Chaetomium sp., NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, Chaetomidium pingtungium, Myceliophthora thermophila, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID US 9, 187,739 B2 25 26 NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, SEQ ID SEQID NO:55, SEQIDNO:57, SEQID NO:59, and SEQID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, NO:65. In a more preferred embodiment, the nucleotide SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID sequence is the mature polypeptide coding region contained NO:55, SEQID NO:57, SEQID NO:59, and SEQID NO:65, in a plasmid which is contained in a deposited microorganism 5 and where the modified nucleotide sequence encodes a selected from the group consisting of CGMCC No. 0584, polypeptide which consists of an amino acid sequence CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 0582, selected from the group consisting of SEQID NO:2, SEQID CGMCC No. 0583, CGMCC No. 0580, CBS 109513, DSM NO:4, SEQIDNO:6, SEQIDNO:8, SEQID NO:10, SEQID 14348, DSM 15064, DSM 15065, DSM 15066, DSM 15067, NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. 0749, 10 SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID and CGMCC No. 0750. The present invention also encom NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, passes polynucleotides comprising, preferably consisting of SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID nucleotide sequences which encode a polypeptide consisting NO:60, and SEQID NO:66. of an amino acid sequence selected from the group consisting The techniques used to isolate or clone a nucleotide of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID 15 sequence encoding a polypeptide are known in the art and NO:8, SEQID NO:10, SEQID NO:12, SEQID NO:14, SEQ include isolation from genomic DNA, preparation from ID NO:16, SEQID NO:38, SEQID NO:40, SEQID NO:42, cDNA, or a combination thereof. The cloning of the nucle SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID otide sequences of the present invention from Such genomic NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, DNA can be effected, e.g., by using the well known poly SEQID NO:58, SEQID NO:60, and SEQID NO:66, which merase chain reaction (PCR) or antibody Screening of expres differ from a nucleotide sequence selected from the group sion libraries to detect cloned DNA fragments with shared consisting of SEQ ID NO:1, SEQID NO:3, SEQ ID NO:5, structural features. See, e.g., Innis et al., 1990, PCR: A Guide SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID to Methods and Application, Academic Press, New York. NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, Other amplification procedures Such as ligase chain reaction SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID 25 (LCR), ligated activated transcription (LAT) and nucleotide NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, sequence-based amplification (NASBA) may be used. The SEQID NO:55, SEQIDNO:57, SEQID NO:59, and SEQID nucleotide sequence may be cloned from a strain selected NO:65 by virtue of the degeneracy of the genetic code. from the group consisting of Acremonium, Scytalidium, Ther The present invention also relates to polynucleotides com moascus, Thielavia, Verticillium, Neotermes, Melanocarpus, prising, preferably consisting of a Subsequence of a nucle 30 Poitrasia, Coprinus, Trichothecium, Humicola, Cladorrhi otide sequence selected from the group consisting of SEQID num, Diplodia, Myceliophthora, Rhizomucor, Meripilus, NO:1, SEQID NO:3, SEQID NO:5, SEQID NO:7, SEQID Exidia, Xvlaria, Trichophaea, Chaetomium, Chaetomidium, NO:9, SEQID NO:11, SEQID NO:13, SEQID NO:15, SEQ Sporotrichum, Thielavia, Aspergillus, Scopulariopsis, ID NO:37, SEQID NO:39, SEQID NO:41, SEQID NO:43, Fusarium, Pseudoplectania, and Phytophthora, or another or SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID 35 related organism and thus, for example, may be an allelic or NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, species variant of the polypeptide encoding region of the SEQID NO:59, and SEQID NO:65 which encode fragments nucleotide sequence. of an amino acid sequence selected from the group consisting The nucleotide sequence may be obtained by standard of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID cloning procedures used in genetic engineering to relocate the NO:8, SEQID NO:10, SEQID NO:12, SEQID NO:14, SEQ 40 nucleotide sequence from its natural location to a different ID NO:16, SEQID NO:38, SEQID NO:40, SEQID NO:42, site where it will be reproduced. The cloning procedures may SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID involve excision and isolation of a desired fragment compris NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, ing the nucleotide sequence encoding the polypeptide, inser SEQID NO:58, SEQID NO:60, and SEQID NO:66 that have tion of the fragment into a vector molecule, and incorporation cellobiohydrolase I activity. A subsequence of a nucleotide 45 of the recombinant vector into a host cell where multiple sequence selected from the group consisting of SEQ ID copies or clones of the nucleotide sequence will be replicated. NO:1, SEQID NO:3, SEQID NO:5, SEQID NO:7, SEQID The nucleotide sequence may be of genomic, cDNA, RNA, NO:9, SEQID NO:11, SEQID NO:13, SEQID NO:15, SEQ semisynthetic, synthetic origin, or any combinations thereof. ID NO:37, SEQID NO:39, SEQID NO:41, SEQID NO:43, The present invention also relates to a polynucleotide com SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID 50 prising, preferably consisting of a nucleotide sequence NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, which has a degree of identity with a nucleotide sequence SEQID NO:59, and SEQID NO:65 is a nucleotide sequence selected from the group consisting of encompassed by a sequence selected from the group consist nucleotides 1 to 1578 of SEQID NO:1, ing of SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQID nucleotides 1 to 1587 of SEQID NO:3, NO:7, SEQID NO:9, SEQID NO:11, SEQID NO:13, SEQ 55 nucleotides 1 to 1353 of SEQID NO:5, ID NO:15, SEQID NO:37, SEQID NO:39, SEQID NO:41, nucleotides 1 to 1371 of SEQID NO:7, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID nucleotides 1 to 1614 of SEQID NO:9, NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, nucleotides 1 to 1245 of SEQID NO:11, SEQID NO:57, SEQID NO:59, and SEQID NO:65 except nucleotides 1 to 1341 of SEQID NO:13, that one or more nucleotides from the 5' and/or 3' end have 60 nucleotides 1 to 1356 of SEQID NO:15, been deleted. nucleotides 1 to 1365 of SEQID NO:37, The present invention also relates to polynucleotides hav nucleotides 1 to 1377 of SEQID NO:39, ing, preferably consisting of a modified nucleotide sequence nucleotides 1 to 1353 of SEQID NO:41, which comprises at least one modification in the mature nucleotides 1 to 1341 of SEQID NO:43, polypeptide coding sequence selected from the group con 65 nucleotides 1 to 1584 of SEQID NO:45, sisting of SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQ nucleotides 1 to 1368 of SEQID NO:47, ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, nucleotides 1 to 1395 of SEQID NO:49, US 9, 187,739 B2 27 28 nucleotides 1 to 1383 of SEQID NO:51, selected from the group consisting of CGMCC No. 0584, nucleotides 1 to 1353 of SEQID NO:53, CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 0582, nucleotides 1 to 1599 of SEQID NO:55, CGMCC No. 0583, CGMCC No. 0580, CBS 109513, DSM nucleotides 1 to 1383 of SEQID NO:57, 14348, DSM 15064, DSM 15065, DSM 15066, DSM 15067, nucleotides 1 to 1578 of SEQID NO:59, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. 0749, nucleotides 1 to 1371 of SEQID NO:65, and CGMCC No. 0750. In a preferred embodiment, the nucleotides 1 to 500 of SEQID NO:1, degree of identity with the cellobiohydrolase I encoding part nucleotides 1 to 500 of SEQID NO:3, of the nucleotide sequence inserted into a plasmid present in nucleotides 1 to 500 of SEQID NO:5, a deposited microorganism selected from the group consist nucleotides 1 to 500 of SEQID NO:7, 10 ing of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. nucleotides 1 to 500 of SEQID NO:9, 0585, CGMCC No. 0582, CGMCC No. 0583, CGMCC No. nucleotides 1 to 500 of SEQID NO:11, 0580, CBS 109513, DSM 14348, DSM 15064, DSM 15065, nucleotides 1 to 500 of SEQID NO:13, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC No. nucleotides 1 to 500 of SEQID NO:15, 0748, CGMCC No. 0749, and CGMCC No. 0750 is at least nucleotides 1 to 500 of SEQID NO:37, 15 70%, e.g., at least 80%, such as at least 90%, more preferably nucleotides 1 to 500 of SEQID NO:39, at least 95%, such as at least 96%, e.g., at least 97%, even nucleotides 1 to 500 of SEQID NO:41, more preferably at least 98%, such as at least 99%. Preferably, nucleotides 1 to 500 of SEQID NO:43, the nucleotide sequence comprises the cellobiohydrolase I nucleotides 1 to 500 of SEQID NO:45, encoding part of the nucleotide sequence inserted into a plas nucleotides 1 to 500 of SEQID NO:47, mid present in a deposited microorganism selected from the nucleotides 1 to 500 of SEQID NO:49, group consisting of CGMCC No. 0584, CGMCC No. 0581, nucleotides 1 to 500 of SEQID NO:51, CGMCC No. 0585, CGMCC No. 0582, CGMCC No. 0583, nucleotides 1 to 500 of SEQID NO:53, CGMCC No. 0580, CBS 109513, DSM 14348, DSM 15064, nucleotides 1 to 500 of SEQID NO:55, DSM 15065, DSM 15066, DSM 15067, CGMCC No. 0747, nucleotides 1 to 500 of SEQID NO:57, 25 CGMCC No. 0748, CGMCC No. 0749, and CGMCC No. nucleotides 1 to 500 of SEQID NO:59, 0750. In an even more preferred embodiment, the nucleotide nucleotides 1 to 500 of SEQID NO:65, sequence consists of the cellobiohydrolase I encoding part of nucleotides 1 to 221 of SEQID NO:17, the nucleotide sequence inserted into a plasmid present in a nucleotides 1 to 239 of SEQID NO:18, deposited microorganism selected from the group consisting nucleotides 1 to 199 of SEQID NO:19, 30 of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. nucleotides 1 to 191 of SEQID NO:20, 0585, CGMCC No. 0582, CGMCC No. 0583, CGMCC No. nucleotides 1 to 232 of SEQID NO:21, 0580, CBS 109513, DSM 14348, DSM 15064, DSM 15065, nucleotides 1 to 467 of SEQID NO:22, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC No. nucleotides 1 to 534 of SEQID NO:23, 0748, CGMCC No. 0749, and CGMCC No. 0750. nucleotides 1 to 563 of SEQID NO:24, 35 Modification of a nucleotide sequence encoding a polypep nucleotides 1 to 218 of SEQID NO:25, tide of the present invention may be necessary for the synthe nucleotides 1 to 492 of SEQID NO:26, sis of a polypeptide, which comprises anamino acid sequence nucleotides 1 to 481 of SEQID NO:27, that has at least one substitution, deletion and/or insertion as nucleotides 1 to 463 of SEQID NO:28, compared to an amino acid sequence selected from the group nucleotides 1 to 513 of SEQID NO:29, 40 consisting of SEQ ID NO:2, SEQ ID NO:4, SEQID NO:6, nucleotides 1 to 579 of SEQID NO:30, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID nucleotides 1 to 514 of SEQID NO:31, NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, nucleotides 1 to 477 of SEQID NO:32, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID nucleotides 1 to 500 of SEQID NO:33, NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, nucleotides 1 to 470 of SEQID NO:34, 45 SEQID NO:56, SEQID NO:58, SEQID NO:60, and SEQID nucleotides 1 to 491 of SEQID NO:35, NO:66. These artificial variants may differ in some engi nucleotides 1 to 221 of SEQID NO:36, neered way from the polypeptide isolated from its native nucleotides 1 to 519 of SEQID NO:61, Source, e.g., variants that differ in specific activity, thermo nucleotides 1 to 497 of SEQID NO:62, stability, pH optimum, or the like. nucleotides 1 to 498 of SEQID NO:63, 50 It will be apparent to those skilled in the art that such nucleotides 1 to 525 of SEQID NO:64, and modifications can be made outside the regions critical to the nucleotides 1 to 951 of SEQID NO:67 function of the molecule and still result in an active polypep of at least 70% identity, such as at least 75% identity; prefer tide. Amino acid residues essential to the activity of the ably, the nucleotide sequence has at least 80% identity, e.g., at polypeptide encoded by the nucleotide sequence of the inven least 85% identity, such as at least 90% identity, more pref 55 tion, and therefore preferably not subject to modification, erably at least 95% identity, such as at least 96% identity, e.g., Such as Substitution, may be identified according to proce at least 97% identity, even more preferably at least 98% dures known in the art, Such as site-directed mutagenesis or identity, such as at least 99%. Preferably, the nucleotide alanine-scanning mutagenesis (see, e.g., Cunningham and sequence encodes a polypeptide having cellobiohydrolase I Wells, 1989, Science 244: 1081-1085). In the latter technique, activity. The degree of identity between two nucleotide 60 mutations are introduced at every positively charged residue sequences is determined as described previously (see the in the molecule, and the resultant mutant molecules are tested section entitled “Definitions'). for cellobiohydrolase I activity to identify amino acid resi In another interesting aspect, the present invention relates dues that are critical to the activity of the molecule. Sites of to a polynucleotide having, preferably consisting of a nucle Substrate-enzyme interaction can also be determined by otide sequence which has at least 65% identity with the cel 65 analysis of the three-dimensional structure as determined by lobiohydrolase I encoding part of the nucleotide sequence Such techniques as nuclear magnetic resonance analysis, inserted into a plasmid present in a deposited microorganism crystallography or photoaffinity labelling (see, e.g., de Vos et US 9, 187,739 B2 29 30 al., 1992, Science 255:306–312; Smith et al., 1992, Journal of nucleotides 1 to 1395 of SEQID NO:49, Molecular Biology 224: 899-904: Wlodaver et al., 1992, nucleotides 1 to 1302 of SEQID NO:49, FEBS Letters 309: 59-64). nucleotides 1 to 1383 of SEQID NO:51, Moreover, a nucleotide sequence encoding a polypeptide nucleotides 1 to 1302 of SEQID NO:51, of the present invention may be modified by introduction of 5 nucleotides 1 to 1353 of SEQID NO:53, nucleotide substitutions which do not give rise to another nucleotides 1 to 1302 of SEQID NO:53, amino acid sequence of the polypeptide encoded by the nucleotides 1 to 1599 of SEQID NO:55, nucleotide sequence, but which correspond to the codon nucleotides 1 to 1302 of SEQID NO:55, usage of the host organism intended for production of the nucleotides 1 to 1383 of SEQID NO:57, enzyme. 10 nucleotides 1 to 1302 of SEQID NO:57, The introduction of a mutation into the nucleotide nucleotides 1 to 1578 of SEQID NO:59, sequence to exchange one nucleotide for another nucleotide nucleotides 1 to 1302 of SEQID NO:59, may be accomplished by site-directed mutagenesis using any nucleotides 1 to 1371 of SEQID NO:65, and of the methods known in the art. Particularly useful is the nucleotides 1 to 1302 of SEQID NO:65; procedure, which utilizes a supercoiled, double stranded 15 (ii) the complementary strand of the nucleotides selected DNA vector with an insert of interest and two synthetic prim from the group consisting of ers containing the desired mutation. The oligonucleotide nucleotides 1 to 500 of SEQID NO:1, primers, each complementary to opposite strands of the vec nucleotides 1 to 500 of SEQID NO:3, tor, extend during temperature cycling by means of Pfu DNA nucleotides 1 to 500 of SEQID NO:5, polymerase. On incorporation of the primers, a mutated plas nucleotides 1 to 500 of SEQID NO:7, mid containing staggered nicks is generated. Following tem nucleotides 1 to 500 of SEQID NO:9, perature cycling, the product is treated with DpnI which is nucleotides 1 to 500 of SEQID NO:11, specific for methylated and hemimethylated DNA to digest nucleotides 1 to 500 of SEQID NO:13, the parental DNA template and to select for mutation-con nucleotides 1 to 500 of SEQID NO:15, taining synthesized DNA. Other procedures known in the art 25 nucleotides 1 to 500 of SEQID NO:37, may also be used. For a general description of nucleotide nucleotides 1 to 500 of SEQID NO:39, substitution, see, e.g., Ford et al., 1991, Protein Expression nucleotides 1 to 500 of SEQID NO:41, and Purification 2: 95-107. nucleotides 1 to 500 of SEQID NO:43, The present invention also relates to a polynucleotide com nucleotides 1 to 500 of SEQID NO:45, prising, preferably consisting of a nucleotide sequence 30 nucleotides 1 to 500 of SEQID NO:47, which encodes a polypeptide having cellobiohydrolase I nucleotides 1 to 500 of SEQID NO:49, activity, and which hybridizes under very low stringency nucleotides 1 to 500 of SEQID NO:51, conditions, preferably under low stringency conditions, more nucleotides 1 to 500 of SEQID NO:53, preferably under medium stringency conditions, more pref nucleotides 1 to 500 of SEQID NO:55, erably under medium-high Stringency conditions, even more 35 nucleotides 1 to 500 of SEQID NO:57, preferably under high Stringency conditions, and most pref nucleotides 1 to 500 of SEQID NO:59, erably under very high Stringency conditions with a poly nucleotides 1 to 500 of SEQID NO:65, nucleotide probe selected from the group consisting of nucleotides 1 to 221 of SEQID NO:17, (i) the complementary strand of the nucleotides selected nucleotides 1 to 239 of SEQID NO:18, from the group consisting of 40 nucleotides 1 to 199 of SEQID NO:19, nucleotides 1 to 1578 of SEQID NO:1, nucleotides 1 to 191 of SEQID NO:20, nucleotides 1 to 1302 of SEQID NO:1, nucleotides 1 to 232 of SEQID NO:21, nucleotides 1 to 1587 of SEQID NO:3, nucleotides 1 to 467 of SEQID NO:22, nucleotides 1 to 1302 of SEQID NO:3, nucleotides 1 to 534 of SEQID NO:23, nucleotides 1 to 1353 of SEQID NO:5, 45 nucleotides 1 to 563 of SEQID NO:24, nucleotides 1 to 1302 of SEQID NO:5, nucleotides 1 to 218 of SEQID NO:25, nucleotides 1 to 1371 of SEQID NO:7, nucleotides 1 to 492 of SEQID NO:26, nucleotides 1 to 1302 of SEQID NO:7, nucleotides 1 to 481 of SEQID NO:27, nucleotides 1 to 1614 of SEQID NO:9, nucleotides 1 to 463 of SEQID NO:28, nucleotides 1 to 1302 of SEQID NO:9, 50 nucleotides 1 to 513 of SEQID NO:29, nucleotides 1 to 1245 of SEQID NO:11, nucleotides 1 to 579 of SEQID NO:30, nucleotides 1 to 1341 of SEQID NO:13, nucleotides 1 to 514 of SEQID NO:31, nucleotides 1 to 1302 of SEQID NO:13, nucleotides 1 to 477 of SEQID NO:32, nucleotides 1 to 1356 of SEQID NO:15, nucleotides 1 to 500 of SEQID NO:33, nucleotides 1 to 1302 of SEQID NO:15, 55 nucleotides 1 to 470 of SEQID NO:34, nucleotides 1 to 1365 of SEQID NO:37, nucleotides 1 to 491 of SEQID NO:35, nucleotides 1 to 1302 of SEQID NO:37, nucleotides 1 to 221 of SEQID NO:36, nucleotides 1 to 1377 of SEQID NO:39, nucleotides 1 to 519 of SEQID NO:61, nucleotides 1 to 1302 of SEQID NO:39, nucleotides 1 to 497 of SEQID NO:62, nucleotides 1 to 1353 of SEQID NO:41, 60 nucleotides 1 to 498 of SEQID NO:63, nucleotides 1 to 1302 of SEQID NO:41, nucleotides 1 to 525 of SEQID NO:64, and nucleotides 1 to 1341 of SEQID NO:43, nucleotides 1 to 951 of SEQID NO:67; and nucleotides 1 to 1302 of SEQID NO:43, (iii) the complementary strand of the nucleotides selected nucleotides 1 to 1584 of SEQID NO:45, from the group consisting of nucleotides 1 to 1302 of SEQID NO:45, 65 nucleotides 1 to 200 of SEQID NO:1, nucleotides 1 to 1368 of SEQID NO:47, nucleotides 1 to 200 of SEQID NO:3, nucleotides 1 to 1302 of SEQID NO:47, nucleotides 1 to 200 of SEQID NO:5, US 9, 187,739 B2 31 32 nucleotides 1 to 200 of SEQID NO:7, from the E. colilac operon, Streptomyces coelicolor agarase nucleotides 1 to 200 of SEQID NO:9, gene (dagA), Bacillus subtilis levanSucrase gene (sacB), nucleotides 1 to 200 of SEQID NO:11, Bacillus licheniformis alpha-amylase gene (amy), Bacillus nucleotides 1 to 200 of SEQID NO:13, Stearothermophilus maltogenic amylase gene (amyM), nucleotides 1 to 200 of SEQID NO:15, 5 Bacillus amyloliquefaciens alpha-amylase gene (amyO), nucleotides 1 to 200 of SEQID NO:37, Bacillus licheniformis penicillinase gene (penP), Bacillus nucleotides 1 to 200 of SEQID NO:39, subtilis XylA and XylB genes, and prokaryotic beta-lactamase nucleotides 1 to 200 of SEQID NO:41, gene (Villa-Kamaroff et al., 1978, Proceedings of the nucleotides 1 to 200 of SEQID NO:43, National Academy of Sciences USA 75: 3727-3731), as well nucleotides 1 to 200 of SEQID NO:45, 10 as the tac promoter (DeBoer et al., 1983, Proceedings of the nucleotides 1 to 200 of SEQID NO:47, National Academy of Sciences USA 80: 21-25). Further pro nucleotides 1 to 200 of SEQID NO:49, moters are described in “Useful proteins from recombinant nucleotides 1 to 200 of SEQID NO:51, bacteria” in Scientific American, 1980, 242: 74-94; and in nucleotides 1 to 200 of SEQID NO:53, Sambrook et al., 1989, supra. nucleotides 1 to 200 of SEQID NO:55, 15 Examples of Suitable promoters for directing the transcrip nucleotides 1 to 200 of SEQID NO:57, tion of the nucleic acid constructs of the present invention in nucleotides 1 to 200 of SEQID NO:59, a filamentous fungal host cell are promoters obtained from nucleotides 1 to 200 of SEQID NO:65, the genes for Aspergillus Oryzae TAKA amylase, Rhizomucor nucleotides 1 to 200 of SEQID NO:22, miehei aspartic proteinase, Aspergillus niger neutral alpha nucleotides 1 to 200 of SEQID NO:23, amylase, Aspergillus niger acid stable alpha-amylase, nucleotides 1 to 200 of SEQID NO:24, Aspergillus niger or Aspergillus awamori glucoamylase nucleotides 1 to 200 of SEQID NO:26, (glaA), Rhizomucor miehei lipase, Aspergillus Oryzae alka nucleotides 1 to 200 of SEQID NO:27, line protease, Aspergillus Oryzae triose phosphate isomerase, nucleotides 1 to 200 of SEQID NO:28, Aspergillus nidulans acetamidase, and Fusarium oxysporum nucleotides 1 to 200 of SEQID NO:29, 25 trypsin-like protease (WO 96/00787), as well as the NA2-tpi nucleotides 1 to 200 of SEQID NO:30, promoter (a hybrid of the promoters from the genes for nucleotides 1 to 200 of SEQID NO:31, Aspergillus niger neutral alpha-amylase and Aspergillus nucleotides 1 to 200 of SEQID NO:32, Oryzae triose phosphate isomerase), and mutant, truncated, nucleotides 1 to 200 of SEQID NO:33, and hybrid promoters thereof. nucleotides 1 to 200 of SEQID NO:34, 30 In a yeast host, useful promoters are obtained from the nucleotides 1 to 200 of SEQID NO:35, genes for Saccharomyces cerevisiae enolase (ENO-1), Sac nucleotides 1 to 200 of SEQID NO:61, charomyces cerevisiae galactokinase (GAL1), Saccharomy nucleotides 1 to 200 of SEQID NO:62, ces cerevisiae alcohol dehydrogenase/glyceraldehyde-3- nucleotides 1 to 200 of SEQID NO:63, phosphate dehydrogenase (ADH2/GAP), and nucleotides 1 to 200 of SEQID NO:64, and 35 Saccharomyces cerevisiae 3-phosphoglycerate kinase. Other nucleotides 1 to 200 of SEQID NO:67. useful promoters for yeast host cells are described by As will be understood, details and particulars concerning Romanos et al., 1992, Yeast 8: 423-488. hybridization of the nucleotide sequences will be the same or The control sequence may also be a suitable transcription analogous to the hybridization aspects discussed in the sec terminator sequence, a sequence recognized by a host cell to tion entitled “Polypeptides Having Cellobiohydrolase I 40 terminate transcription. The terminator sequence is operably Activity” herein. linked to the 3' terminus of the nucleotide sequence encoding Nucleic Acid Constructs the polypeptide. Any terminator which is functional in the The present invention also relates to nucleic acid constructs host cell of choice may be used in the present invention. comprising a nucleotide sequence of the present invention Preferred terminators for filamentous fungal host cells are operably linked to one or more control sequences that direct 45 obtained from the genes for Aspergillus Oryzae TAKA amy the expression of the coding sequence in a Suitable host cell lase, Aspergillus niger glucoamylase, Aspergillus nidulans under conditions compatible with the control sequences. anthranilate synthase, Aspergillus niger alpha-glucosidase, A nucleotide sequence encoding a polypeptide of the and Fusarium oxysporum trypsin-like protease. present invention may be manipulated in a variety of ways to Preferred terminators for yeast host cells are obtained from provide for expression of the polypeptide. Manipulation of 50 the genes for Saccharomyces cerevisiae enolase, Saccharo the nucleotide sequence prior to its insertion into a vector may myces cerevisiae cytochrome C (CYC1), and Saccharomyces be desirable or necessary depending on the expression vector. cerevisiae glyceraldehyde-3-phosphate dehydrogenase. The techniques for modifying nucleotide sequences utilizing Other useful terminators for yeast host cells are described by recombinant DNA methods are well known in the art. Romanos et al., 1992, supra. The control sequence may be an appropriate promoter 55 The control sequence may also be a suitable leader sequence, a nucleotide sequence which is recognized by a sequence, a nontranslated region of an mRNA which is host cell for expression of the nucleotide sequence. The pro important for translation by the host cell. The leader sequence moter sequence contains transcriptional control sequences, is operably linked to the 5' terminus of the nucleotide which mediate the expression of the polypeptide. The pro sequence encoding the polypeptide. Any leader sequence that moter may be any nucleotide sequence which shows tran- 60 is functional in the host cell of choice may be used in the Scriptional activity in the host cell of choice including mutant, present invention. truncated, and hybrid promoters, and may be obtained from Preferred leaders for filamentous fungal host cells are genes encoding extracellular or intracellular polypeptides obtained from the genes for Aspergillus Oryzae TAKA amy either homologous or heterologous to the host cell. lase and Aspergillus nidulans triose phosphate isomerase. Examples of suitable promoters for directing the transcrip- 65 Suitable leaders for yeast host cells are obtained from the tion of the nucleic acid constructs of the present invention, genes for Saccharomyces cerevisiae enolase (ENO-1), Sac especially in a bacterial host cell, are the promoters obtained charomyces cerevisiae 3-phosphoglycerate kinase, Saccha US 9, 187,739 B2 33 34 romyces cerevisiae alpha-factor, and Saccharomyces cerevi subtilis neutral protease (nprT), Saccharomyces cerevisiae siae alcohol dehydrogenase/glyceraldehyde-3-phosphate alpha-factor, Rhizomucor miehei aspartic proteinase, and dehydrogenase (ADH2/GAP). Myceliophthora thermophila laccase (WO95/33836). The control sequence may also be a polyadenylation Where both signal peptide and propeptide regions are sequence, a sequence operably linked to the 3' terminus of the present at the amino terminus of a polypeptide, the propeptide nucleotide sequence and which, when transcribed, is recog region is positioned next to the amino terminus of a polypep nized by the host cell as a signal to add polyadenosine resi tide and the signal peptide region is positioned next to the dues to transcribed mRNA. Any polyadenylation sequence amino terminus of the propeptide region. which is functional in the host cell of choice may be used in It may also be desirable to add regulatory sequences which the present invention. 10 allow the regulation of the expression of the polypeptide Preferred polyadenylation sequences for filamentous fun relative to the growth of the host cell. Examples of regulatory gal host cells are obtained from the genes for Aspergillus systems are those which cause the expression of the gene to be Oryzae TAKA amylase, Aspergillus niger glucoamylase, turned on or offin response to a chemical or physical stimu Aspergillus nidulans anthranilate synthase, Fusarium lus, including the presence of a regulatory compound. Regu Oxysporum trypsin-like protease, and Aspergillus niger 15 latory systems in prokaryotic systems include the lac, tac, and alpha-glucosidase. trp operator systems. In yeast, the ADH2 system or GAL1 Useful polyadenylation sequences for yeast host cells are system may be used. In filamentous fungi, the TAKA alpha described by Guo and Sherman, 1995, Molecular Cellular amylase promoter, Aspergillus niger glucoamylase promoter, Biology 15:5983-5990. and Aspergillus Oryzae glucoamylase promoter may be used The control sequence may also be a signal peptide coding as regulatory sequences. Other examples of regulatory region that codes for an amino acid sequence linked to the sequences are those which allow for gene amplification. In amino terminus of a polypeptide and directs the encoded eukaryotic systems, these include the dihydrofolate reductase polypeptide into the cells secretory pathway. The 5' end of gene which is amplified in the presence of methotrexate, and the coding sequence of the nucleotide sequence may inher the metallothionein genes which are amplified with heavy ently contain a signal peptide coding region naturally linked 25 metals. In these cases, the nucleotide sequence encoding the in translation reading frame with the segment of the coding polypeptide would be operably linked with the regulatory region which encodes the secreted polypeptide. Alternatively, Sequence. the 5' end of the coding sequence may contain a signal peptide Expression Vectors coding region which is foreign to the coding sequence. The The present invention also relates to recombinant expres foreign signal peptide coding region may be required where 30 sion vectors comprising the nucleic acid construct of the the coding sequence does not naturally contain a signal pep invention. The various nucleotide and control sequences tide coding region. Alternatively, the foreign signal peptide described above may be joined together to produce a recom coding region may simply replace the natural signal peptide binant expression vector which may include one or more coding region in order to enhance secretion of the polypep convenient restriction sites to allow for insertion or substitu tide. However, any signal peptide coding region which directs 35 tion of the nucleotide sequence encoding the polypeptide at the expressed polypeptide into the secretory pathway of a host Such sites. Alternatively, the nucleotide sequence of the cell of choice may be used in the present invention. present invention may be expressed by inserting the nucle Effective signal peptide coding regions for bacterial host otide sequence or a nucleic acid construct comprising the cells are the signal peptide coding regions obtained from the sequence into an appropriate vector for expression. In creat genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus 40 ing the expression vector, the coding sequence is located in Stearothermophilus alpha-amylase, Bacillus licheniformis the vector So that the coding sequence is operably linked with subtilisin, Bacillus licheniformis beta-lactamase, Bacillus the appropriate control sequences for expression. Stearothermophilus neutral proteases (nprT, nprS, nprM), and The recombinant expression vector may be any vector Bacillus subtilis prSA. Further signal peptides are described (e.g., a plasmidor virus) which can be conveniently Subjected by Simonen and Palva, 1993, Microbiological Reviews 57: 45 to recombinant DNA procedures and can bring about the 109-137. expression of the nucleotide sequence. The choice of the Effective signal peptide coding regions for filamentous vector will typically depend on the compatibility of the vector fungal host cells are the signal peptide coding regions with the host cell into which the vector is to be introduced. obtained from the genes for Aspergillus Oryzae TAKA amy The vectors may be linear or closed circular plasmids. lase, Aspergillus niger neutral amylase, Aspergillus niger 50 The vector may be an autonomously replicating vector, i.e., glucoamylase, Rhizomucor miehei aspartic proteinase, a vector which exists as an extrachromosomal entity, the Humicola insolens cellulase, and Humicola lanuginosa replication of which is independent of chromosomal replica lipase. tion, e.g., a plasmid, an extrachromosomal element, a min Useful signal peptides for yeast host cells are obtained ichromosome, or an artificial chromosome. from the genes for Saccharomyces cerevisiae alpha-factor 55 The vector may contain any means for assuring self-repli and Saccharomyces cerevisiae invertase. Other useful signal cation. Alternatively, the vector may be one which, when peptide coding regions are described by Romanos et al., 1992, introduced into the host cell, is integrated into the genome and Supra. replicated together with the chromosome(s) into which it has The control sequence may also be a propeptide coding been integrated. Furthermore, a single vector or plasmid or region that codes for an amino acid sequence positioned at the 60 two or more vectors or plasmids which together contain the amino terminus of a polypeptide. The resultant polypeptide is total DNA to be introduced into the genome of the host cell, known as a proenzyme or propolypeptide (or a Zymogen in or a transposon may be used. Some cases). A propolypeptide is generally inactive and can The vectors of the present invention preferably contain one be converted to a mature active polypeptide by catalytic or or more selectable markers which permit easy selection of autocatalytic cleavage of the propeptide from the propolypep 65 transformed cells. A selectable marker is a gene the product of tide. The propeptide coding region may be obtained from the which provides for biocide or viral resistance, resistance to genes for Bacillus subtilis alkaline protease (aprE), Bacillus heavy metals, prototrophy to auxotrophs, and the like. US 9, 187,739 B2 35 36 Examples of bacterial selectable markers are the dal genes The procedures used to ligate the elements described above from Bacillus subtilis or Bacillus licheniformis, or markers to construct the recombinant expression vectors of the present which confer antibiotic resistance Such as amplicillin, kana invention are well known to one skilled in the art (see, e.g., mycin, chloramphenicol or tetracycline resistance. Suitable Sambrook et al., 1989, supra). markers for yeast host cells are ADE2, HIS3, LEU2, LYS2, 5 Host Cells MET3, TRP1, and URA3. Selectable markers for use in a The present invention also relates to recombinant a host filamentous fungal host cell include, but are not limited to, cell comprising the nucleic acid construct of the invention, amdS (acetamidase), argB (ornithine carbamoyltransferase), which are advantageously used in the recombinant produc bar (phosphinothricin acetyltransferase), hygB (hygromycin tion of the polypeptides. A vector comprising a nucleotide phosphotransferase), nial) (nitrate reductase), pyrC (oroti 10 sequence of the present invention is introduced into a host cell dine-5'-phosphate decarboxylase), SC (Sulfate adenyltrans so that the vector is maintained as a chromosomal integrant or ferase), trpC (anthranilate synthase), as well as equivalents as a self-replicating extra-chromosomal vector as described thereof. earlier. Preferred for use in an Aspergillus cell are the amdS and The host cell may be a unicellular microorganism, e.g., a pyrC genes of Aspergillus nidulans or Aspergillus Oryzae and 15 prokaryote, or a non-unicellular microorganism, e.g., a the bar gene of Streptomyces hygroscopicus. eukaryote. The vectors of the present invention preferably contain an Useful unicellular cells are bacterial cells such as gram element(s) that permits stable integration of the vector into positive bacteria including, but not limited to, a Bacillus cell, the host cells genome or autonomous replication of the vec e.g., Bacillus alkalophilus, Bacillus amyloliquefaciens, tor in the cell independent of the genome. 20 Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus For integration into the host cell genome, the vector may coagulans, Bacillus lautus, Bacillus lentus, Bacillus licheni rely on the nucleotide sequence encoding the polypeptide or formis, Bacillus megaterium, Bacillus Stearothermophilus, any other element of the vector for stable integration of the Bacillus subtilis, and Bacillus thuringiensis; or a Streptomy vector into the genome by homologous or nonhomologous ces cell, e.g., Streptomyces lividans or Streptomyces murinus, recombination. Alternatively, the vector may contain addi 25 or gram negative bacteria Such as E. coli and Pseudomonas tional nucleotide sequences for directing integration by sp. In a preferred embodiment, the bacterial host cell is a homologous recombination into the genome of the host cell. Bacillus lentus, Bacillus licheniformis, Bacillus stearother The additional nucleotide sequences enable the vector to be mophilus, or Bacillus subtilis cell. In another preferred integrated into the host cell genome at a precise location(s) in embodiment, the Bacillus cell is an alkalophilic Bacillus. the chromosome(s). To increase the likelihood of integration 30 The introduction of a vector into a bacterial host cell may, at a precise location, the integrational elements should pref for instance, be effected by protoplast transformation (see, erably contain a sufficient number of nucleotides, such as 100 e.g., Chang and Cohen, 1979, Molecular General Genetics to 1,500 base pairs, preferably 400 to 1,500 base pairs, and 168: 111-115), using competent cells (see, e.g., Young and most preferably 800 to 1,500 base pairs, which are highly Spizizin, 1961, Journal of Bacteriology 81:823-829, or Dub homologous with the corresponding target sequence to 35 nau and Davidoff-Abelson, 1971, Journal of Molecular Biol enhance the probability of homologous recombination. The ogy 56: 209-221), electroporation (see, e.g., Shigekawa and integrational elements may be any sequence that is homolo Dower, 1988, Biotechniques 6: 742-751), or conjugation (see, gous with the target sequence in the genome of the host cell. e.g., Koehler and Thorne, 1987, Journal of Bacteriology 169: Furthermore, the integrational elements may be non-encod 5771-5278). ing or encoding nucleotide sequences. On the other hand, the 40 The host cell may be a eukaryote. Such as a mammalian, vector may be integrated into the genome of the host cell by insect, plant, or fungal cell. non-homologous recombination. In a preferred embodiment, the host cell is a fungal cell. For autonomous replication, the vector may further com "Fungi as used herein includes the phyla Ascomycota, prise an origin of replication enabling the vector to replicate Basidiomycota, Chytridiomycota, and Zygomycota (as autonomously in the host cell in question. Examples of bac 45 defined by Hawksworth et al., In, Ainsworth and Bisby's terial origins of replication are the origins of replication of Dictionary of The Fungi, 8th edition, 1995, CAB Interna plasmidspbR322, p OC19, p.ACYC177, and pACYC184 per tional, University Press, Cambridge, UK) as well as the mitting replication in E. coli, and puB110, pE 194, pTA1060, Oomycota (as cited in Hawksworth et al., 1995, supra, page and pAME31 permitting replication in Bacillus. Examples of 171) and all mitosporic fungi (Hawksworth et al., 1995, origins of replication for use in a yeast host cell are the 2 50 Supra). micron origin of replication, ARS1, ARS4, the combination In a more preferred embodiment, the fungal host cell is a of ARS1 and CEN3, and the combination of ARS4 and CEN6. yeast cell. "Yeast as used herein includes ascosporogenous The origin of replication may be one having a mutation which yeast (Endomycetales), basidiosporogenous yeast, and yeast makes its functioning temperature-sensitive in the host cell belonging to the Fungi Imperfecti (Blastomycetes). Since the (see, e.g., Ehrlich, 1978, Proceedings of the National Acad 55 classification of yeast may change in the future, for the pur emy of Sciences USA 75: 1433). poses of this invention, yeast shall be defined as described in More than one copy of a nucleotide sequence of the present Biology and Activities of Yeast (Skinner, F. A., Passmore, S. invention may be inserted into the host cell to increase pro M., and Davenport, R. R., eds, Soc. App. Bacteriol. Sympo duction of the gene product. An increase in the copy number sium Series No. 9, 1980). of the nucleotide sequence can be obtained by integrating at 60 In an even more preferred embodiment, the yeast host cell least one additional copy of the sequence into the host cell is a Candida, Aschbvii, Hansenula, Kluyveromyces, Pichia, genome or by including an amplifiable selectable marker Saccharomyces, Schizosaccharomyces, or Yarrowia cell. gene with the nucleotide sequence where cells containing In a most preferred embodiment, the yeast host cell is a amplified copies of the selectable marker gene, and thereby Saccharomyces Carlsbergensis, Saccharomyces cerevisiae, additional copies of the nucleotide sequence, can be selected 65 Saccharomyces diastaticus, Saccharomyces douglasii, Sac for by cultivating the cells in the presence of the appropriate charomyces kluyveri, Saccharomyces norbensis or Saccharo selectable agent. myces oviformis cell. In another most preferred embodiment, US 9, 187,739 B2 37 38 the yeast host cell is a Kluyveromyces lactis cell. In another Pseudoplectania, and Phytophthora; more preferably the most preferred embodiment, the yeast host cell is a Yarrowia strain is selected from the group consisting of Acremonium lipolytica cell. thermophilum, Chaetonium thermophilum, Scytalidium In another more preferred embodiment, the fungal host cell thermophilum, Thermoascus aurantiacus, Thielavia aus is a filamentous fungal cell. "Filamentous fungi' include all traliensis, Verticillium tenerum, Neotermes castaneus, Mel filamentous forms of the subdivision Eumycota and Oomy anocarpus albomyces, Poitrasia circinans, Coprinus cota (as defined by Hawksworth et al., 1995, supra). The cinereus, Trichothecium roseum, Humicola nigrescens, Cla filamentous fungi are characterized by a mycelial wall com dorrhinum foecundissimum, Diplodia gossypina, Mycelioph posed of chitin, cellulose, glucan, chitosan, mannan, and thora thermophila, Rhizomucor pusillus, Meripilus gigan other complex polysaccharides. Vegetative growth is by 10 hyphal elongation and carbon catabolism is obligately aero teus, Exidia glandulosa, Xylaria hypoxylon, Trichophaea bic. In contrast, vegetative growth by yeasts such as Saccha saccata, Chaetomidium pingtungium, Myceliophthora ther romyces cerevisiae is by budding of a unicellular thallus and mophila, Myceliophthora hinnulea, Sporotrichum pruino carbon catabolism may be fermentative. sum, Thielavia cf. microspora, Pseudoplectania nigrella, and In an even more preferred embodiment, the filamentous 15 Phytophthora infestans. fungal host cell is a cell of a species of, but not limited to, The present invention also relates to methods for producing Acremonium, Aspergillus, Fusarium, Humicola, Mucor, a polypeptide of the present invention comprising (a) culti Myceliophthora, Neurospora, Penicillium, Thielavia, Tolypo Vating a host cell under conditions conducive for production cladium, or Trichoderma. of the polypeptide; and (b) recovering the polypeptide. In a most preferred embodiment, the filamentous fungal The present invention also relates to methods for in-situ host cell is an Aspergillus awamori, Aspergillus foetidus, production of a polypeptide of the present invention compris Aspergillus japonicus, Aspergillus nidulans, Aspergillus ing (a) cultivating a host cell under conditions conducive for niger or Aspergillus Oryzae cell. In another most preferred production of the polypeptide; and (b) contacting the embodiment, the filamentous fungal host cell is a Fusarium polypeptide with a desired Substrate, such as a cellulosic bactridioides, Fusarium cerealis, Fusarium crookwellense, 25 substrate, without prior recovery of the polypeptide. The term Fusarium culmorum, Fusarium graminearum, Fusarium “in-situ production' is intended to mean that the polypeptide graminum, Fusarium heterosporum, Fusarium negundi, is produced directly in the locus in which it is intended to be Fusarium oxysporum, Fusarium reticulatum, Fusarium used. Such as in a fermentation process for production of roseum, Fusarium Sambucinum, Fusarium sarcochroum, ethanol. Fusarium sporotrichioides, Fusarium sulphureum, Fusarium 30 In the production methods of the present invention, the torulosum, Fusarium trichothecioides, or Fusarium venena cells are cultivated in a nutrient medium suitable for produc tum cell. In an even most preferred embodiment, the filamen tion of the polypeptide using methods known in the art. For tous fungal parent cell is a Fusarium venenatum (Nirenberg example, the cell may be cultivated by shake flask cultivation, sp. nov.) cell. In another most preferred embodiment, the Small-scale or large-scale fermentation (including continu filamentous fungal host cell is a Humicola insolens, Humi 35 ous, batch, fed-batch, or solid state fermentations) in labora cola lanuginosa, Mucor miehei, Myceliophthora thermo tory or industrial fermentors performed in a suitable medium phila, Neurospora crassa, Penicillium purpurogenium, and under conditions allowing the polypeptide to be Thielavia terrestris, Trichoderma harzianum, Trichoderma expressed and/or isolated. The cultivation takes place in a koningii, Trichoderma longibrachiatum, Trichoderma reesei, Suitable nutrient medium comprising carbon and nitrogen or Trichoderma viride cell. 40 Sources and inorganic salts, using procedures known in the Fungal cells may be transformed by a process involving art. Suitable media are available from commercial suppliers protoplast formation, transformation of the protoplasts, and or may be prepared according to published compositions regeneration of the cell wall in a manner known perse. Suit (e.g., in catalogues of the AmericanType Culture Collection). able procedures for transformation of Aspergillus host cells If the polypeptide is secreted into the nutrient medium, the are described in EP238 023 and Yelton et al., 1984, Proceed 45 polypeptide can be recovered directly from the medium. If the ings of the National Academy of Sciences USA 81: 1470 polypeptide is not secreted, it can be recovered from cell 1474. Suitable methods for transforming Fusarium species lysates. are described by Malardier et al., 1989, Gene 78: 147-156 and The polypeptides may be detected using methods known in WO 96/00787. Yeast may be transformed using the proce the art that are specific for the polypeptides. These detection dures described by Becker and Guarente. In Abelson, J. N. 50 methods may include use of specific antibodies, formation of and Simon, M. I., editors, Guide to Yeast Genetics and an enzyme product, or disappearance of an enzyme Substrate. Molecular Biology, Methods in Enzymology, 194: 182-187, For example, an enzyme assay may be used to determine the Academic Press, Inc., New York; Ito et al., 1983, Journal of activity of the polypeptide as described herein. Bacteriology 153: 163; and Hinnen et al., 1978, Proceedings The resulting polypeptide may be recovered by methods of the National Academy of Sciences USA 75: 1920. 55 known in the art. For example, the polypeptide may be recov Methods of Production ered from the nutrient medium by conventional procedures The present invention also relates to methods for producing including, but not limited to, centrifugation, filtration, extrac a polypeptide of the present invention comprising (a) culti tion, spray-drying, evaporation, or precipitation. Vating a strain, which in its wild-type form is capable of The polypeptides of the present invention may be purified producing the polypeptide; and (b) recovering the polypep 60 by a variety of procedures known in the art including, but not tide. Preferably, the strain is selected from the group consist limited to, chromatography (e.g., ion exchange, affinity, ing of Acremonium, Scytalidium, Thermoascus, Thielavia, hydrophobic, chromatofocusing, and size exclusion), electro Verticillium, Neotermes, Melanocarpus, Poitrasia, Coprinus, phoretic procedures (e.g., preparative isoelectric focusing), Trichothecium, Humicola, Cladorrhinum, Diplodia, Myce differential solubility (e.g., ammonium Sulfate precipitation), liophthora, Rhizomucor, Meripilus, Exidia, Xylaria, Tri 65 SDS-PAGE, or extraction (see, e.g., Protein Purification, chophaea, Chaetomium, Chaetomidium, Sporotrichum, J.-C. Janson and Lars Ryden, editors, VCH Publishers, New Thielavia, Aspergillus, Scopulariopsis, Fusarium, York, 1989). US 9, 187,739 B2 39 40 Plants Plant Mol. Biol. 24: 863-878), a seed specific promoter such The present invention also relates to a transgenic plant, as the glutelin, prolamin, globulin, or albumin promoter from plant part, or plant cell which has been transformed with a rice (Wu et al., 1998, Plant and Cell Physiology39: 885-889), nucleotide sequence encoding a polypeptide having cellobio a Vicia faba promoter from the legumin B4 and the unknown hydrolase I activity of the present invention so as to express 5 seed protein gene from Vicia faba (Conrad et al., 1998, Jour and produce the polypeptide in recoverable quantities. The nal of Plant Physiology 152: 708–711), a promoter from a polypeptide may be recovered from the plant or plant part. seed oil body protein (Chen et al., 1998, Plant and Cell Alternatively, the plant or plant part containing the recombi Physiology 39: 935-941), the storage protein nap.A promoter nant polypeptide may be used as Such for improving the from Brassica napus, or any other seed specific promoter quality of a food or feed, e.g., improving nutritional value, 10 known in the art, e.g., as described in WO 91/14772. Further palatability, and rheological properties, or to destroy an anti more, the promoter may be a leaf specific promoter Such as nutritive factor. the rbcs promoter from rice or tomato (Kyozuka et al., 1993, The transgenic plant can be dicotyledonous (a dicot) or Plant Physiology 102:991-1000, the chlorella virusadenine monocotyledonous (a monocot). Examples of monocot methyltransferase gene promoter (Mitra and Higgins, 1994, plants are grasses, such as meadow grass (blue grass, Poa), 15 Plant Molecular Biology 26: 85-93), or the aldP gene pro forage grass such as Festuca, Lolium, temperate grass, such as moter from rice (Kagaya et al., 1995, Molecular and General Agrostis, and cereals, e.g., wheat, oats, rye, barley, rice, Sor Genetics 248: 668-674), or a wound inducible promoter such ghum, millets, and maize (corn). as the potato pin2 promoter (Xu et al., 1993, Plant Molecular Examples of dicot plants are tobacco, lupins, potato, Sugar Biology 22:573-588). beet, legumes, such as pea, bean and Soybean, and cruciferous 20 A promoter enhancer element may also be used to achieve plants (family Brassicaceae), such as cauliflower, rape, higher expression of the enzyme in the plant. For instance, the canola, and the closely related model organism Arabidopsis promoter enhancer element may be an intron which is placed thaliana. between the promoter and the nucleotide sequence encoding Examples of plant parts are stem, callus, leaves, root, fruits, a polypeptide of the present invention. For instance, Xu et al., seeds, and tubers. Also specific plant tissues, such as chloro- 25 1993, supra disclose the use of the first intron of the rice actin plast, apoplast, mitochondria, vacuole, peroxisomes, and 1 gene to enhance expression. cytoplasm are considered to be a plant part. Furthermore, any The selectable marker gene and any other parts of the plant cell, whatever the tissue origin, is considered to be a expression construct may be chosen from those available in plant part. the art. Also included within the scope of the present invention are 30 The nucleic acid construct is incorporated into the plant the progeny (clonal or seed) of Such plants, plant parts and genome according to conventional techniques known in the plant cells. art, including Agrobacterium-mediated transformation, The transgenic plant or plant cell expressing a polypeptide virus-mediated transformation, microinjection, particle bom of the present invention may be constructed in accordance bardment, biolistic transformation, and electroporation (Gas with methods known in the art. Briefly, the plant or plant cell 35 ser et al., 1990, Science 244: 1293; Potrykus, 1990, Bio/ is constructed by incorporating one or more expression con Technology 8:535; Shimamoto et al., 1989, Nature 338:274). structs encoding a polypeptide of the present invention into Presently, Agrobacterium tumefaciens-mediated gene the plant host genome and propagating the resulting modified transfer is the method of choice for generating transgenic plant or plant cell into a transgenic plant or plant cell. dicots (for a review, see Hooykas and Schilperoort, 1992, Conveniently, the expression construct is a nucleic acid 40 Plant Molecular Biology 19: 15-38). However it can also be construct which comprises a nucleotide sequence encoding a used for transforming monocots, although other transforma polypeptide of the present invention operably linked with tion methods are generally preferred for these plants. Pres appropriate regulatory sequences required for expression of ently, the method of choice for generating transgenic mono the nucleotide sequence in the plant or plant part of choice. cots is particle bombardment (microscopic gold or tungsten Furthermore, the expression construct may comprise a select- 45 particles coated with the transforming DNA) of embryonic able marker useful for identifying host cells into which the callior developing embryos (Christou, 1992, Plant Journal 2: expression construct has been integrated and DNA sequences 275-281; Shimamoto, 1994, Current Opinion Biotechnology necessary for introduction of the construct into the plant in 5: 158-162; Vasil et al., 1992, Bio/Technology 10: 667-674). question (the latter depends on the DNA introduction method An alternative method for transformation of monocots is to be used). 50 based on protoplast transformation as described by Omirulleh The choice of regulatory sequences. Such as promoter and et al., 1993, Plant Molecular Biology 21: 415-428. terminator sequences and optionally signal or transit Following transformation, the transformants having incor sequences, is determined, for example, on the basis of when, porated therein the expression construct are selected and where, and how the polypeptide is desired to be expressed. regenerated into whole plants according to methods well For instance, the expression of the gene encoding a polypep- 55 known in the art. tide of the present invention may be constitutive or inducible, The present invention also relates to methods for producing or may be developmental, stage or tissue specific, and the a polypeptide of the present invention comprising (a) culti gene product may be targeted to a specific tissue or plant part Vating a transgenic plant or a plant cell comprising a nucle Such as seeds or leaves. Regulatory sequences are, for otide sequence encoding a polypeptidehaving cellobiohydro example, described by Tague et al., 1988, Plant Physiology 60 lase I activity of the present invention under conditions 86: 506. conducive for production of the polypeptide; and (b) recov For constitutive expression, the 35S-CaMV promoter may ering the polypeptide. be used (Franck et al., 1980, Cell 21: 285-294). Organ-spe The present invention also relates to methods for in-situ cific promoters may be, for example, a promoter from Storage production of a polypeptide of the present invention compris sink tissues such as seeds, potato tubers, and fruits (Edwards 65 ing (a) cultivating a transgenic plant or a plant cell comprising & Coruzzi, 1990, Ann. Rev. Genet. 24: 275-303), or from a nucleotide sequence encoding a polypeptide having cello metabolic sink tissues such as meristems (Ito et al., 1994, biohydrolase I activity of the present invention under condi US 9, 187,739 B2 41 42 tions conducive for production of the polypeptide; and (b) Examples of useful proteases are the variants described in contacting the polypeptide with a desired substrate, such as a WO 92/19729, WO 98/20115, WO 98/20116, and WO cellulosic substrate, without prior recovery of the polypep 98/34946, especially the variants with substitutions in one or tide. more of the following positions: 27, 36, 57, 76, 87.97, 101, Compositions 104,120,123, 167, 170, 194,206, 218, 222, 224, 235 and 274. In a still further aspect, the present invention relates to Lipases: compositions comprising a polypeptide of the present inven Suitable lipases include those of bacterial or fungal origin. tion. Chemically modified or protein engineered mutants are The composition may comprise a polypeptide of the inven included. Examples of useful lipases include lipases from tion as the major enzymatic component, e.g., a mono-com 10 Humicola (synonym Thermomyces), e.g., from H. lanugi ponent composition. Alternatively, the composition may nosa (T. lanuginosus) as described in EP 258 068 and EP305 comprise multiple enzymatic activities, such as an aminopep 216 or from H. insolens as described in WO 96/13580, a tidase, amylase, carbohydrase, carboxypeptidase, catalase, Pseudomonas lipase, e.g., from Palcaligenes or P. pseudoal cellulase, chitinase, cutinase, cyclodextrin glycosyltrans calligenes (EP218272), P. cepacia (EP331376), P Stutzeri ferase, deoxyribonuclease, esterase, alpha-galactosidase, 15 (GB 1.372,034), Pfluorescens, Pseudomonas sp. strain SD beta-galactosidase, glucoamylase, alpha-glucosidase, beta 705 (WO95/06720 and WO 96/27002), P wisconsinensis glucosidase, haloperoxidase, invertase, laccase, lipase, man (WO 96/12012), a Bacillus lipase, e.g., from B. subtilis (Dar nosidase, oxidase, pectinolytic enzyme, peptidoglutaminase, tois et al. (1993), Biochemica et Biophysica Acta, 1131,253 peroxidase, phytase, polyphenoloxidase, proteolytic enzyme, 360), B. Stearothermophilus (JP 64/744992) or B. pumilus ribonuclease, transglutaminase, or Xylanase. (WO 91/16422). The compositions may be prepared in accordance with Other examples are lipase variants such as those described methods known in the art and may be in the form of a liquid in WO92/05249, WO 94/01541, EP 407 225, EP 260 105, or a dry composition. For instance, the polypeptide compo WO95/35381, WO 96/00292, WO95/30744, WO 94/25578, sition may be in the form of a granulate or a microgranulate. WO95/14783, WO 95/22615, WO 97/04079 and WO The polypeptide to be included in the composition may be 25 97/072O2. stabilized in accordance with methods known in the art. Amylases: Examples are given below of preferred uses of the polypep Suitable amylases (alpha and/or beta) include those of tide compositions of the invention. The dosage of the bacterial or fungal origin. Chemically modified or protein polypeptide composition of the invention and other condi engineered mutants are included. Amylases include, for tions under which the composition is used may be determined 30 example, alpha-amylases obtained from Bacillus, e.g., a spe on the basis of methods known in the art. cial strain of B. licheniformis, described in more detail in GB Detergent Compositions 1296,839. The polypeptide of the invention may be added to and thus Examples of useful amylases are the variants described in become a component of a detergent composition. WO 94/02597, WO 94/18314, WO 96/23873, and WO The detergent composition of the invention may for 35 97/43424, especially the variants with substitutions in one or example be formulated as a hand or machine laundry deter more of the following positions: 15, 23, 105, 106, 124, 128, gent composition including a laundry additive composition 133, 154, 156, 181, 188, 190, 197,202, 208, 209, 243, 264, suitable for pre-treatment of stained fabrics and a rinse added 304,305, 391, 408, and 444. fabric Softener composition, or be formulated as a detergent Cellulases: composition for use in general household hard Surface clean 40 Suitable cellulases include those of bacterial or fungal ing operations, or be formulated for hand or machine dish origin. Chemically modified or protein engineered mutants washing operations. are included. Suitable cellulases include cellulases from the In a specific aspect, the invention provides a detergent genera Bacillus, Pseudomonas, Humicola, Fusarium, Thiella additive comprising the polypeptide of the invention. The via, Acremonium, e.g., the fungal cellulases produced from detergent additive as well as the detergent composition may 45 Humicola insolens, Myceliophthora thermophila and comprise one or more other enzymes Such as a protease, a Fusarium oxysporum disclosed in U.S. Pat. No. 4.435.307. lipase, a cutinase, an amylase, a carbohydrase, a cellulase, a U.S. Pat. No. 5,648,263, U.S. Pat. No. 5,691,178, U.S. Pat. pectinase, a mannanase, an arabinase, a galactanase, a Xyla No. 5,776,757 and WO 89/09259. nase, an oxidase, e.g., a laccase, and/or a peroxidase. Especially suitable cellulases are the alkaline or neutral In general the properties of the chosen enzyme(s) should be 50 cellulases having colour care benefits. Examples of such cel compatible with the selected detergent, (i.e., pH-optimum, lulases are cellulases described in EP 0 495 257, EPO 531 compatibility with other enzymatic and non-enzymatic ingre 372, WO 96/11262, WO 96/29397, WO 98/08940. Other dients, etc.), and the enzyme(s) should be present in effective examples are cellulase variants such as those described in WO amountS. 94/07998, EP 0531 315, U.S. Pat. No. 5,457,046, U.S. Pat. Proteases: 55 No. 5,686,593, U.S. Pat. No. 5,763,254, WO95/24471, WO Suitable proteases include those of animal, vegetable or 98/12307 and PCTFDK98/OO299. microbial origin. Microbial origin is preferred. Chemically Peroxidases/Oxidases: modified or protein engineered mutants are included. The Suitable peroxidases/oxidases include those of plant, bac protease may be a serine protease or a metallo protease, terial or fungal origin. Chemically modified or protein engi preferably an alkaline microbial protease or a trypsin-like 60 neered mutants are included. Examples of useful peroxidases protease. Examples of alkaline proteases are subtilisins, espe include peroxidases from Coprinus, e.g., from C. cinereus, cially those derived from Bacillus, e.g., subtilisin Novo, Sub and variants thereofas those described in WO93/24618, WO tilisin Carlsberg, subtilisin 309, subtilisin 147 and subtilisin 95/10602, and WO 98/15257. 168 (described in WO 89/06279). Examples of trypsin-like The detergent enzyme(s) may be included in a detergent proteases are trypsin (e.g., of porcine or bovine origin) and the 65 composition by adding separate additives containing one or Fusarium protease described in WO 89/06270 and WO more enzymes, or by adding a combined additive comprising 94/25583. all of these enzymes. A detergent additive of the invention, US 9, 187,739 B2 43 44 i.e., a separate additive or a combined additive, can beformu tive, e.g., an aromatic borate ester, or a phenylboronic acid lated e.g., as a granulate, a liquid, a slurry, etc. Preferred derivative such as 4-formylphenylboronic acid, and the com detergent additive formulations are granulates, in particular position may be formulated as described in e.g., WO non-dusting granulates, liquids, in particular stabilized liq 92/19709 and WO92/19708. uids, or slurries. 5 The detergent may also contain other conventional deter Non-dusting granulates may be produced, e.g., as dis gent ingredients such as e.g., fabric conditioners including closed in U.S. Pat. Nos. 4,106,991 and 4,661,452 and may clays, foamboosters, Suds suppressors, anti-corrosion agents, optionally be coated by methods known in the art. Examples soil-suspending agents, anti-Soil redeposition agents, dyes, of waxy coating materials are poly(ethylene oxide) products (polyethyleneglycol, PEG) with mean molar weights of 1000 10 bactericides, optical brighteners, hydrotropes, tarnish inhibi to 20000; ethoxylated nonylphenols having from 16 to 50 tors, or perfumes. ethylene oxide units; ethoxylated fatty alcohols in which the It is at present contemplated that in the detergent compo alcohol contains from 12 to 20 carbon atoms and in which sitions any enzyme, in particular the polypeptide of the inven there are 15 to 80 ethylene oxide units; fatty alcohols; fatty tion, may be added in an amount corresponding to 0.01-100 acids; and mono- and di- and triglycerides of fatty acids. 15 mg of enzyme protein per liter of wash liquor, preferably Examples of film-forming coating materials Suitable for 0.05-5 mg of enzyme protein per liter of wash liquor, in application by fluid bed techniques are given in GB 1483591. particular 0.1-1 mg of enzyme protein per liter of wash liquor. Liquid enzyme preparations may, for instance, be stabilized The polypeptide of the invention may additionally be by adding a polyol Such as propylene glycol, a Sugar or Sugar incorporated in the detergent formulations disclosed in WO alcohol, lactic acid or boric acid according to established 20 97/07202 which is hereby incorporated as reference. methods. Protected enzymes may be prepared according to DNA Recombination (Shuffling) the method disclosed in EP 238 216. The nucleotide sequences of SEQID NO:1, SEQID NO:3, The detergent composition of the invention may be in any SEQID NO:5, SEQID NO:7, SEQIDNO:9, SEQID NO:11, convenient form, e.g., a bar, a tablet, a powder, a granule, a SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID paste or a liquid. A liquid detergent may be acqueous, typically 25 NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, containing up to 70% water and 0-30% organic solvent, or SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID non-aqueous. NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, The detergent composition comprises one or more Surfac tants, which may be non-ionic including semi-polar and/or SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID anionic and/or cationic and/or Zwitterionic. The Surfactants 30 NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, are typically presentata level of from 0.1% to 60% by weight. SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID When included therein the detergent will usually contain NO:61, SEQ ID NO:62, SEQID NO:63, SEQ ID NO:64, from about 1% to about 40% of an anionic surfactant such as SEQ ID NO:65, SEQ ID NO:67 may be used in a DNA linear alkylbenzenesulfonate, alpha-olefinsulfonate, alkyl recombination (or shuffling) process. The new polynucle sulfate (fatty alcohol sulfate), alcohol ethoxysulfate, second- 35 otide sequences obtained in Such a process may encode new ary alkanesulfonate, alpha-sulfo fatty acid methyl ester, polypeptides having cellobiase activity with improved prop alkyl- or alkenylsuccinic acid or soap. erties. Such as improved stability (storage stability, thermo When included therein the detergent will usually contain stability), improved specific activity, improved pH-optimum, from about 0.2% to about 40% of a non-ionic surfactant such and/or improved tolerance towards specific compounds. as alcohol ethoxylate, nonylphenol ethoxylate, alkylpolygly- 40 Shuffling between two or more homologous input poly coside, alkyldimethylamineoxide, ethoxylated fatty acid nucleotides (starting-point polynucleotides) involves frag monoethanolamide, fatty acid monoethanolamide, polyhy menting the polynucleotides and recombining the fragments, droxyalkylfatty acid amide, or N-acyl N-alkyl derivatives of to obtain output polynucleotides (i.e., polynucleotides that glucosamine (glucamides’). have been subjected to a shuffling cycle) wherein a number of The detergent may contain 0-65% of a detergent builder or 45 nucleotide fragments are exchanged in comparison to the complexing agent such as Zeolite, diphosphate, triphosphate, input polynucleotides. phosphonate, carbonate, citrate, nitrilotriacetic acid, ethyl DNA recombination or shuffling may be a (partially) ran enediaminetetraacetic acid, diethylenetriaminepentaacetic dom process in which a library of chimeric genes is generated acid, alkyl- or alkenylsuccinic acid, Soluble silicates or lay from two or more starting genes. A number of known formats ered silicates (e.g., SKS-6 from Hoechst). 50 can be used to carry out this shuffling or recombination pro The detergent may comprise one or more polymers. CCSS, Examples are carboxymethylcellulose, poly(vinylpyrroli The process may involve random fragmentation of parental done), poly (ethylene glycol), poly(Vinyl alcohol), poly(vi DNA followed by reassembly by PCR to new full-length nylpyridine-N-oxide), poly(vinylimidazole), polycarboxy genes, e.g., as presented in U.S. Pat. Nos. 5,605,793, 5,811, lates Such as polyacrylates, maleic/acrylic acid copolymers 55 238, 5.830,721, 6,117,679. In-vitro recombination of genes and lauryl methacrylate/acrylic acid copolymers. may be carried out, e.g., as described in U.S. Pat. Nos. 6,159, The detergent may contain a bleaching system which may 687, 6,159,688, 5,965,408, 6,153,510, and WO 98/41623. comprise a HO Source Such as perborate or percarbonate The recombination process may take place in Vivo in a living which may be combined with a peracid-forming bleach acti cell, e.g., as described in WO97/07205 and WO 98/28416. vator Such as tetraacetylethylenediamine or nonanoyloxy- 60 The parental DNA may be fragmented by DNAse I treat benzenesulfonate. Alternatively, the bleaching system may ment or by restriction endonuclease digests as described by comprise peroxyacids of e.g., the amide, imide, or Sulfone Kikuchi et al (2000a, Gene 236:159-167). Shuffling of two type. parents may be done by shuffling single stranded parental The enzyme(s) of the detergent composition of the inven DNA of the two parents as described in Kikuchietal (2000b, tion may be stabilized using conventional stabilizing agents, 65 Gene 243:133-137). e.g., a polyol Such as propylene glycol or glycerol, a Sugar or A particular method of shuffling is to follow the methods Sugar alcohol, lactic acid, boric acid, or a boric acid deriva described in Crameri et al., 1998, Nature 391: 288-291 and US 9, 187,739 B2 45 46 Ness et al., Nature Biotechnology 17: 893-896. Another for primers were constructed, based on sequence alignment and mat would be the methods described in U.S. Pat. No. 6,159, identification of conserved regions among CBH1 proteins. 687: Examples 1 and 2. The PCR band from a gel electrophoresis was used to obtain Production of Ethanol from Biomass a partial sequence of the CBH1 gene from Diplodia gos The present invention also relates to methods for producing sypina. Homology search confirmed that the partial sequence ethanol from biomass, such as cellulosic materials, compris ing contacting the biomass with the polypeptides of the inven was a partial sequence of the CBH1 gene (EC 3.2.1.91). tion. Ethanol may Subsequently be recovered. The polypep The full-length CBH1 gene of Diplodia gossypina is tides of the invention may be produced “in-situ', i.e., as part obtained by accessing the patent deposit CBS 247.96, make a of or directly in an ethanol production process, by cultivating DNA or cDNA preparation, use the partial sequence as basis a host cellor a strain, which in its wild-type form is capable of 10 for construction of specific primers, and use standard PCR producing the polypeptides, under conditions conducive for cloning techniques to step by step getting the entire gene. production of the polypeptides. Several other approaches can be taken: Ethanol can be produced by enzymatic degradation of bio PCR screening of the cDNA library or the cDNAs that were mass and conversion of the released polysaccharides to etha nol. This kind of ethanol is often referred to as bioethanol or 15 used for the construction of the library, could be performed. biofuel. It can be used as a fuel additive or extender in blends To do so, Gene Specific Primers (GSP) and vector/adaptor of from less than 1% and up to 100% (a fuel substitute). In primers are constructed from the partial cDNA sequence of Some countries, such as Brazil, ethanol is Substituting gaso the CBH1 gene and from vector/adaptor sequence respec line to a very large extent. tively; both sets of primers designed to go outward into the The predominant polysaccharide in the primary cell wall of missing 5' and 3' regions of the CBH1 cDNA. The longest biomass is cellulose, the second most abundant is hemi-cel PCR products obtained using combinations of GSP and vec lulose, and the third is pectin. The secondary cell wall, pro tor/adaptor primer represent the full-length 5' and 3' end duced after the cell has stopped growing, also contains regions of the CBH 1 clNA from Diplodia gossypina. polysaccharides and is strengthened through polymeric lig Homology search and comparison with the partial cDNA nin covalently cross-linked to hemicellulose. Cellulose is a homopolymer of anhydrocellobiose and thus a linear beta-(1- 25 sequence confirm that the 5' and 3' PCR products belong to 4)-D-glucan, while hemicelluloses include a variety of com the same CBH1 cDNA from Diplodia gossypina. The full pounds, such as Xylans, Xyloglucans, arabinoxylans, and length cDNA can then be obtained by PCR using a set of mannans in complex branched structures with a spectrum of primers constructed from both the 5' and 3' ends. Substituents. Although generally polymorphous, cellulose is Alternatively, the cDNA library could be screened for the found in plant tissue primarily as an insoluble crystalline 30 full-length cDNA using standard hybridization techniques matrix of parallel glucan chains. Hemicelluloses usually and the partial cDNA sequence as a probe. The clones giving hydrogen bond to cellulose, as well as to other hemicellulo a positive hybridization signal with the probe are then purified ses, which helps stabilize the cell wall matrix. and sequenced to determine the longest cDNA sequence. Three major classes of cellulase enzymes are used to breakdown biomass: 35 Homology search and comparison confirms that the full The “endo-1,4-beta-glucanases” or 1,4-beta-D-glucan-4- length cDNA correspond to the partial CBH1 cDNA glucanohydrolases (EC 3.2.1.4), which act randomly on sequence that was originally used as a probe. soluble and insoluble 1,4-beta-glucan Substrates. The two approaches described above rely on the presence The “exo-1,4-beta-D-glucanases” including both the 1,4- of the full-length CBH1 cDNA in the cDNA library or in the beta-D-glucan glucohydrolases (EC 3.2.1.74), which liberate 40 cDNAs used for its construction. Alternatively, the 5' and 3' D-glucose from 1,4-beta-D-glucans and hydrolyze D-cello RACE (Rapid Amplification of cDNA Ends) techniques or biose slowly, and 1,4-beta-D-glucan cellobiohydrolase (EC derived techniques could be used to identify the missing 5' 3.2.1.91), also referred to as cellobiohydrolase I, which lib and 3' regions. For this purpose, preferably mRNAs from erates D-cellobiose from 1,4-beta-glucans. Diplodia gossipina are isolated and utilized to synthesize first The “beta-D-glucosidases” or beta-D-glucoside glucohy 45 Strand cDNAS using oligo(dT)-containing Adapter Primer or drolases (EC 3.2.1.21), which act to release D-glucose units a 5'-Gene Specific Primer (GSP). from cellobiose and soluble cellodextrins, as well as an array of glycosides. The full-length cDNA of the CBH1 gene from Diplodia These three classes of enzymes work together synergisti gossypina can also be obtained by using genomic DNA from cally in a complex interplay that results in efficient decrystal Diplodia gossypina. The CBH1 gene can be identified by lization and hydrolysis of native cellulose from biomass to 50 PCR techniques such as the one describe above or by standard yield the reducing Sugars which are converted to ethanol by genomic library Screening using hybridization techniques fermentation. and the partial CBH 1 clNA as a probe. Homology search and The present invention is further described by the following comparison with the partial CBH 1 clNA confirms that the examples which should not be construed as limiting the scope genomic sequence correspond to the CBH1 gene from Diplo of the invention. 55 diagossypina. Identification of consensus sequences such as initiation site of transcription, start and stop codons or polyA EXAMPLES sites could be used to define the region comprising the full Chemicals used as buffers and substrates were commercial length cDNA. Primers constructed from both the 5' and 3' products of at least reagent grade. ends of this region could then be used to amplify the full 60 length cDNA from mRNA or cDNA library from Diplodia Example 1 gossypina (see above). By expression of the full-length gene in a suitable expres Cloning of a Partial and a Full-Length sion host construct the CBH 1 enzyme is harvested as an intra Cellobiohydrolase I (CBH1) DNA Sequence cellular or extra cellular enzyme from the culture broth. 65 The methods described above apply to the cloning of cel A cDNA library of Diplodia gossypina was PCR screened lobiohydrolase I DNA sequences from all organisms and not for presence of the CBH1 gene. For this purpose sets of only Diplodia gossypina. US 9, 187,739 B2 47 48 Example 2 2 L Suction flask: Ultra Turrax Homogenizer. Cellobiohydrolase I (CBH I) Activity Acetone and ortho-phosphoric-acid is cooled on ice. Avicel(R) is moisted with water, and then the 150 mL icecold A cellobiohydrolase I is characterized by the ability to 85% Ortho-phosphoric-acid is added. The mixture is placed hydrolyze highly crystalline cellulose very efficiently com on an icebath with weak stirring for one hour. pared to other cellulases. Cellobiohydrolase I may have a Add 500 mL ice-cold acetone with stirring, and transfer the higher catalytic activity using PASO (phosphoric acid swol mixture to a glass filter funnel and wash with 3x100 mL len cellulose) as substrate than using CMC as substrate. For ice-cold acetone. Suck as dry as possible in each wash. Wash the purposes of the present invention, any of the following 10 with 2x500 mL water (or until there is no odor of acetone), assays can be used to identify a cellobiohydrolase I: Suck as dry as possible in each wash. Activity on AZO-Avicel Re-suspend the solids in water to a total volume of 500 mL. AZO-Avicel (MegaZyme, Bray Business Park, Bray, Wick and blend to homogeneity using an Ultra Turrax Homog low, Ireland) was used according to the manufacturers enizer. Store wet in refrigerator and equilibrate with buffer by instructions. 15 centrifugation and re-suspension before use. Activity on PNP-Beta-Cellobiose CMC: Substrate solution: 5 mM PNP beta-D-Cellobiose (p-Nitro Bacterial cellulose microfibrils in an impure form were phenyl B-d-Cellobioside Sigma N-5759) in 0.1 M Na-ac obtained from the Japanese foodstuff “nata de coco' (Fujico etate buffer, pH 5.0; Company, Japan). The cellulose in 350 g of this product was Stop reagent: 0.1 M Na-carbonate, pH 11.5. purified by suspension of the product in about 4 L of tap water. 50 microliters CBH I solution was mixed with 1 mL Sub This water was replaced by freshwater twice a day for 4 days. strate solution and incubated 20 minutes at 40°C. The reac Then 1% (w/v) NaOH was used instead of water and the tion was stopped by addition of 5 mL stop reagent. Absor product was re-suspended in the alkali solution twice a day bance was measured at 404 nm. for 4 days. Neutralisation was done by rinsing the purified Activity on PASC and CMC 25 cellulose with distilled water until the pH at the surface of the The substrate is degraded with cellobiohydrolase I (CBHI) product was neutral (pH 7). to form reducing Sugars. A Microdochium nivale carbohy The cellulose was microfibrillated and a suspension of drate oxidase (rMnO) or another equivalent oxidase acts on individual bacterial cellulose microfibrils was obtained by the reducing Sugars to form H2O in the presence of O. The homogenisation of the purified cellulose microfibrils in a formed HO activates in the presence of excess peroxidase 30 Waring blender for 30 min. The cellulose microfibrils were the oxidative condensation of 4-aminoantipyrine (AA) and further purified by dialysing this Suspension through a pore N-ethyl-N-sulfopropyl-m-toluidine (TOPS) to form a purple membrane against distilled water and the isolated and puri product which can be quantified by its absorbance at 550 nm. fied cellulose microfibrils were stored in a water suspension at When all components except CBH I are in surplus, the rate 40 C. of increase in absorbance is proportional to the CBH I activ 35 Deposit of Biological Material ity. The reaction is a one-kinetic-step reaction and may be China General Microbiological Culture Collection Center carried out automatically in a Cobas Fara centrifugal analyzer (CGMCC) (Hoffmann La Roche) or another equivalent spectrophotom The following biological material has been deposited eter which can measure steady state kinetics. under the terms of the Budapest Treaty with the China Gen Buffer: 50 mM Na-acetate buffer (pH 5.0); 40 eral Microbiological Culture Collection Center (CGMCC), Reagents: rMnO oxidase, purified Microdochium nivale car Institute of Microbiology, Chinese Academy of Sciences, bohydrate oxidase, 2 mg/L (final concentration); Haidian, Beijing 100080, China: Peroxidase, SIGMA P-8125 (96 U/mg), 25 mg/L (final Accession Number: CGMCC No. 0584 concentration); Applicants reference: ND000575 4-aminoantipyrine, SIGMA A-4382, 200 mg/L (final con 45 Date of Deposit: 2001-05-29 centration); Description: Acremonium thermophilum CBH I gene on plas TOPS, SIGMA E-8506, 600 mg/L (final concentration): mid PASC or CMC (see below), 5 g/L (final concentration). Classification: Ascomycota; Sordariomycetes; Hypocrales; All reagents were added to the buffer in the concentrations Hypocreaceae indicated above and this reagent solution was mixed thor 50 Origin: China, 1999 oughly. Related sequence(s): SEQID NO:1 and SEQID NO:2 (DNA 50 microliters cellobiohydrolase I sample (in a suitable sequence encoding a cellobiohydrolase I from Acremo dilution) was mixed with 300 uL reagent solution and incu nium thermophilum and the corresponding protein bated 20 minutes at 40° C. Purple color formation was sequence) detected and measured as absorbance at 550 nm. 55 Accession Number: CGMCC No. 0581 The AA/TOPS-condensate absorption coefficient is Applicants reference: ND000548 0.01935 Asso/(microM cm). The rate is calculated as micro Date of Deposit: 2001-05-29 moles reducing Sugar produced per minute from ODsso/ Description: Chaetomium thermophilum CBH Igene on plas minute and the absorption coefficient. mid PASC: 60 Classification: Ascomycota; Sordariomycetes; Sordariales; Materials: Chaetomiaceae 5g Avicel(R) (Art. 2331 Merck); Origin: China, 1999 150 mL 85% Ortho-phosphoric-acid (Art. 573 Merck); Related sequence(s): SEQID NO:3 and SEQID NO.4 (DNA 800 mL Acetone (Art. 14 Merck); sequence encoding a cellobiohydrolase I from Chaeto Approx. 2 liter deionized water (Milli-Q); 65 mium thermophilum and the corresponding protein 1 L. glass beaker, sequence) 1 L. glass filter funnel; Accession Number: CGMCC No. 0585 US 9, 187,739 B2 49 50 Applicants reference: ND001223 Description: Scytalidium thermophilum CBH I gene on plas Date of Deposit: 2001-05-29 mid Description: Scytalidium sp. CBH I gene on plasmid Classification: Ascomycota; Mitosporic Classification: Ascomycota; Mitosporic Origin: China, 2000 Origin: China, 1999 Related sequence(s): SEQID NO:59 and SEQID NO:60 Related sequence(s): SEQID NO:5 and SEQID NO:6 (DNA Centraalbureau Voor Schimmelcultures (CBS) sequence encoding a cellobiohydrolase I from Scytalidium The following biological material has been deposited sp. and the corresponding protein sequence) under the terms of the Budapest Treaty with the Centraalbu Accession Number: CGMCC No. 0582 reau Voor Schimmelcultures (CBS), Uppsalalaan 8, 3584 CT Applicants reference: ND000549 10 Utrecht, The Netherlands (alternatively P.O. Box 85167, 3508 AD Utrecht, The Netherlands): Date of Deposit: 2001-05-29 Accession Number: CBS 109513 Description: Thermoascus aurantiacus CBH I gene on plas Applicants reference: ND000538 mid Date of Deposit: 2001-06-01 Classification: Eurotiomycetes; Eurotiales; Trichocomaceae 15 Description: Verticillium tenerum Origin: China Classification: Ascomycota, Hypocreales, Pyrenomycetes Related sequence(s): SEQID NO:7 and SEQID NO:8 (DNA (mitosporic) sequence encoding a cellobiohydrolase I from Thermoas Origin: - cus aurantiacus and the corresponding protein sequence) Related sequence(s): SEQ ID NO:11 and SEQ ID NO:12 Accession Number: CGMCC No. 0583 (DNA sequence encoding a cellobiohydrolase I from Ver Applicants reference: ND001182 ticillium tenerum and the corresponding protein sequence) Date of Deposit: 2001-05-29 Accession Number: CBS 819.73 Description: Thielavia australiensis CBH I gene on plasmid Applicants reference: ND000533 Classification: Ascomycota; Sordariomycetes; Sordariales; Date of Deposit: Publicly available (not deposited by appli Chaetomiaceae 25 cant) Origin: China, 1998 Description: Humicola nigrescens Related sequence(s): SEQ ID NO:9 and SEQ ID NO:10 Classification: Sordariaceae, Sordariales, Sordariomycetes: (DNA sequence encoding a cellobiohydrolase I from Ascomycota Thielavia australiensis and the corresponding protein Origin: - sequence) 30 Related sequence(s): SEQID NO:18 (partial DNA sequence Accession Number: CGMCC No. 0580 encoding a cellobiohydrolase I from Humicola nigrescens) Applicants reference: ND000562 Accession Number: CBS 427.97 Date of Deposit: 2001-05-29 Applicants reference: ND000530 Description: Melanocarpus albomyces CBH I gene on plas Date of Deposit: 1997-01-23 mid 35 Description: Cladorrhinum foecundissimum Classification: Ascomycota, Sordariomycetes; Sordariales Classification: Sordariaceae, Sordariales, Sordariomycetes: Origin: China, 1999 Ascomycota Related sequence(s): SEQ ID NO:15 and SEQ ID NO:16 Origin: Jamaica (DNA sequence encoding a cellobiohydrolase I from Mel Related sequence(s): SEQID NO:19 (partial DNA sequence anocarpus albomyces and the corresponding protein 40 encoding a cellobiohydrolase I from Cladorrhinum sequence) foecundissimum) Accession Number: CGMCC No. 0748 Accession Number: CBS 247.96 Applicants reference: ND001181 Applicants reference: ND000534 and ND001231 Date of Deposit: 2002-06-07 Date of Deposit: 1996-03-12 Description: Acremonium sp. CBH I gene on plasmid 45 Description: Diplodia gossypina Classification: mitosporic Ascomycetes Classification: Dothideaceae, Dothideales, Dothidemycetes: Origin: China, 2000 Ascomycota Related sequence(s): SEQID NO:53 and SEQID NO:54 Origin: Indonesia, 1992 Accession Number: CGMCC No. 0749 Related sequence(s): SEQID NO:20 (partial DNA sequence Applicants reference: ND000577 50 encoding a cellobiohydrolase I from Diplodia gossypina), Date of Deposit: 2002-06-07 SEQ ID NO:37 (full DNA sequence encoding a cellobio Description: Aspergillus filmigatus CBH I gene on plasmid hydrolase I from Diplodia gossypina) and SEQID NO:38 Classification: Trichocomaceae, Eurotiales, Ascomycota (full cellobiohydrolase I protein sequence from Diplodia (Teleomorph: Neosartorya fumigata) gossypina) Origin: China, 2000 55 Accession Number: CBS 117.65 Related sequence(s): SEQID NO:55 and SEQID NO:56 Applicants reference: ND000536 Accession Number: CGMCC No. 0747 Date of Deposit: Publicly available Applicants reference: ND001175 Description: Myceliophthora thermophila Date of Deposit: 2002-06-07 Classification: Sordariaceae, Sordariales, Sordariomycetes: Description: Sporotrichum pruinosum CBH I gene on plas 60 Ascomycota mid Origin: - Classification: Meruliaceae, Stereales, Basidiomycota Related sequence(s): SEQID NO:21 (partial DNA sequence Origin: China, 2000 encoding a cellobiohydrolase I from Myceliophthora ther Related sequence(s): SEQID NO:57 and SEQID NO:58 mophila) Accession Number: CGMCC No. 0750 65 Accession Number: CBS 109471 Applicants reference: ND000571 Applicants reference: ND000537 Date of Deposit: 2002-06-07 Date of Deposit: 2001-05-29 US 9, 187,739 B2 51 52 Description: Rhizomucor pusillus Related sequence(s): SEQ ID NO:13 and SEQ ID NO:14 Classification: Mucoraceae. Mucorales, Zygomycota (DNA sequence encoding a cellobiohydrolase I from gut Origin: Denmark cells or microbes from the gut of Neotermes castaneus and Related sequence(s): SEQID NO:22 (partial DNA sequence the corresponding protein sequence) encoding a cellobiohydrolase I from Rhizomucorpusillus) 5 Accession Number: DSM 15066 Accession Number: CBS 521.95 Applicants reference: ND001349 Applicants reference: ND000542 Date of Deposit: 2002-06-21 Date of Deposit: 1995-07-04 Description: Poitrasia circinans CBH I gene on plasmid Description: Meripilus giganteus Classification: Choanephoraceae, Zygomycota, Mucorales Classification: Rigidiporaceae, Hymenomycetales, Basidi 10 omycota Origin: - Origin: Denmark, 1993 Related sequence(s): SEQID NO:49 (DNA sequence encod Related sequence(s): SEQID NO:23 (partial DNA sequence ing a cellobiohydrolase I from Poitrasia circinans) and encoding a cellobiohydrolase I from Meripilus giganteus) SEQID NO:50 (cellobiohydrolase I protein sequence from Accession Number: CBS 277.96 15 Poitrasia circinans) Applicants reference: ND000543, NDOO1346 and Accession Number: DSM 15065 NDOO1243 Applicants reference: ND001339 Date of Deposit: 1996-03-12 Date of Deposit: 2002-06-21 Description: Exidia glandulosa Description: Coprinus cinereus CBH I gene on plasmid Classification: Exidiaceae, Auriculariales, Hymenomycetes, Classification: Basidiomycota, Hymenomycetes; Agaricales, Basidiomycota Agaricaceae Origin: Denmark, 1993 Origin: Denmark Related sequence(s): SEQID NO:24 (partial DNA sequence Related sequence(s): SEQID NO:51 (DNA sequence encod encoding a cellobiohydrolase I from Exidia glandulosa), ing a cellobiohydrolase I from Coprinus cinereus) and SEQID NO:45 (full DNA sequence encoding a cellobio 25 SEQID NO:52 (cellobiohydrolase I protein sequence from hydrolase I with CBD from Exidia glandulosa), SEQ ID Coprinus cinereus) NO:46 (full cellobiohydrolase I protein sequence with Accession Number: DSM 15064 CBD from Exidia glandulosa), SEQID NO.47 (full DNA Applicants reference: ND001264 sequence encoding a cellobiohydrolase I from Exidia glan Date of Deposit: 2002-06-21 dulosa) and SEQID NO:48 (full cellobiohydrolase I pro 30 Description: Trichophaea saccata CBH I gene on plasmid tein sequence from Exidia glandulosa) Classification: Ascomycota, Pezizomycetes; Pezizales; Accession Number: CBS 284.96 Pyronemataceae Applicants reference: ND000544 and ND001235 Origin: - Date of Deposit: 1996-03-12 Related sequence(s): SEQID NO:39 (DNA sequence encod Description: Xylaria hypoxylon 35 ing a cellobiohydrolase I from Trichophaea saccata) and Classification: Sordariaceae, Sordariales, Sordariomycetes, SEQID NO:40 (cellobiohydrolase I protein sequence from Ascomycota Trichophaea saccata) Origin: Denmark, 1993 Accession Number: DSM 15067 Related sequence(s): SEQID NO:25 (partial DNA sequence Applicants reference: ND001232 encoding a cellobiohydrolase I from Xylaria hypoxylon), 40 Date of Deposit: 2002-06-21 SEQID NO:43 (full DNA sequence encoding a cellobio Description: Myceliophthora thermophila CBH I gene on hydrolase I from Xylaria hypoxylon) and SEQ ID NO:44 plasmid (full cellobiohydrolase I protein sequence from Xylaria Classification: Sordariaceae, Sordariales, Sordariomycetes: hypoxylon) Ascomycota Accession Number: CBS 804.70 45 Origin: - Applicants reference: ND001227 Related sequence(s): SEQID NO:41 (DNA sequence encod Date of Deposit: Publicly available ing a cellobiohydrolase I from Myceliophthora thermo Description: Trichophaea saccata phila) and SEQ ID NO:42 (cellobiohydrolase I protein Classification: Ascomycota, Pezizomycetes; Pezizales; sequence from Myceliophthora thermophila) Pyronemataceae 50 Institute for Fermentation, Osaka (IFO) Related sequence(s): SEQID NO:36 (partial DNA sequence The following biological material has been deposited encoding a cellobiohydrolase I from Trichophaea saccata) under the terms of the Budapest Treaty with the Institute for Deutsche Sammlund Von Mikroorganismen Und Zellkul Fermentation, Osaka (IFO), 17-85, Juso-honmachi 2-chome, turen GmbH (DSMZ) Yodogawa-ku, Osaka 532-8686, Japan: The following biological material has been deposited 55 Accession Number: IFO 5372 under the terms of the Budapest Treaty with the Deutsche Applicants reference: ND000531 Sammlung von Mikroorganismen und Zellkulturen GmbH Date of Deposit: Publicly available (not deposited by appli (DSMZ), Mascheroder Weg 1 b, 3.8124 Braunschweig, Ger cant) many: Description: Trichothecium roseum Accession Number: DSM 14348 60 Classification: mitosporic Ascomycetes Applicants reference: ND000551 Origin: - Date of Deposit: 2001-06-13 Related sequence(s): SEQID NO:17 (partial DNA sequence Description: Neotermes Castaneus, termite CBH I gene on encoding a cellobiohydrolase I from Trichothecium plasmid roseum) Classification: - 65 The deposit of CBS 427.97, CBS 247.96, CBS 521.95, Origin: Cultures of termite larvae bought from BAM, Ger CBS 284.96, CBS 274.96 were made by Novo Nordisk NS many, 1999 and were later assigned to Novozymes A/S.

US 9, 187,739 B2 57 - Continued Ala Glin Glin Ala Cys Thr Lieu. Thir Ala Glu Asn His Pro Thr Lieu. Ser 2O 25 3O Trp Ser Lys Cys Thr Ser Gly Gly Ser Cys Thr Ser Val Ser Gly Ser 35 4 O 45 Val Thir Ile Asp Ala Asn Trp Arg Trp Thr His Glin Val Ser Ser Ser SO 55 6 O Thr Asn Cys Tyr Thr Gly Asn Glu Trp Asp Thr Ser Ile Cys Thr Asp 65 70 7s 8O Gly Ala Ser Cys Ala Ala Ala Cys Cys Lieu. Asp Gly Ala Asp Tyr Ser 85 90 95 Gly Thr Tyr Gly Ile Thr Thr Ser Gly Asn Ala Leu Ser Leu Glin Phe 1OO 105 11 O Val Thr Glin Gly Pro Tyr Ser Thr Asn Ile Gly Ser Arg Thr Tyr Lieu. 115 12 O 125 Met Ala Ser Asp Thr Lys Tyr Gln Met Phe Thr Lieu. Leu Gly Asin Glu 13 O 135 14 O Phe Thr Phe Asp Wall Asp Val Thr Gly Lieu. Gly Cys Gly Lieu. Asn Gly 145 150 155 160 Ala Lieu. Tyr Phe Val Ser Met Asp Glu Asp Gly Gly Lieu. Ser Lys Tyr 1.65 17O 17s Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 18O 185 19 O Glin Cys Pro Arg Asp Lieu Lys Phe Ile Asin Gly Glu Ala Asn. Asn. Wall 195 2OO 2O5 Gly Trp Thr Pro Ser Ser ASn Asp Lys Asn Ala Gly Lieu. Gly ASn Tyr 21 O 215 22O Gly Ser Cys Cys Ser Glu Met Asp Val Trp Glu Ala Asn Ser Ile Ser 225 23 O 235 24 O Ala Ala Tyr Thr Pro His Pro Cys Thr Thr Ile Gly Glin Thr Arg Cys 245 250 255 Glu Gly Asp Asp Cys Gly Gly Thr Tyr Ser Thr Asp Arg Tyr Ala Gly 26 O 265 27 O Glu Cys Asp Pro Asp Gly Cys Asp Phe Asn. Ser Tyr Arg Met Gly Asn 27s 28O 285 Thir Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Ser Lys Llys Phe 29 O 295 3 OO Thr Val Val Thr Glin Phe Lieu. Thir Asp Ser Ser Gly Asn Lieu. Ser Glu 3. OS 310 315 32O Ile Lys Arg Phe Tyr Val Glin Asn Gly Val Val Ile Pro Asn Ser Asn 3.25 330 335 Ser Asn Ile Ala Gly Val Ser Gly Asn Ser Ile Thr Glin Ala Phe Cys 34 O 345 35. O Asp Ala Glin Llys Thr Ala Phe Gly Asp Thr Asn Val Phe Asp Gln Lys 355 360 365

Gly Gly Lieu Ala Glin Met Gly Lys Ala Lieu Ala Glin Pro Met Val Lieu. 37 O 375 38O Val Met Ser Lieu. Trp Asp Asp His Ala Val Asn Met Lieu. Trp Lieu. Asp 385 390 395 4 OO

Ser Thr Tyr Pro Thr Asn Ala Ala Gly Llys Pro Gly Ala Ala Arg Gly 4 OS 41O 415

Thr Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu Ser Glin Ala 42O 425 43 O

Pro Asn Ser Llys Val Ile Tyr Ser Asn Ile Arg Phe Gly Pro Ile Gly US 9, 187,739 B2 59 60 - Continued

435 44 O 445

Ser Thir Wall Ser Gly Lell Pro Gly Gly Gly Ser Asn Pro Gly Gly Gly 450 45.5 460

Ser Ser Ser Thir Thir Thir Thir Thir Arg Pro Ala Thir Ser Thir Thir Ser 465 470 47s 48O

Ser Ala Ser Ser Gly Pro Thir Gly Gly Gly Thir Ala Ala His Trp Gly 485 490 495

Glin Gly Gly Ile Gly Trp Thir Gly Pro Thir Wall Ala Ser Pro SOO 505

Thir Cys Glin Lys Lell Asn Asp Trp Glin Cys Luell 515 52O 525

SEQ ID NO 3 LENGTH: 1590 TYPE: DNA ORGANISM: Chaetomium thermophilum FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1590)

< 4 OOs SEQUENCE: 3 atg atg tac aag aag tto gcc gct citc. gcc gcc citc. gtg gct ggc gcc 48 Met Met Tyr Lys Lys Phe Ala Ala Luell Ala Ala Lell Wall Ala Gly Ala 1. 5 1O 15 gcc gcc cag cag gct tgc t cc citc. acc act gag a CC CaC cc c aga citc. 96 Ala Ala Glin Glin Ala Cys Ser Luell Thir Thir Glu Thir His Pro Arg Luell 2O 25 3O act tgg aag cgc tgc acc tect 93C 93C c tgc tcg acc gtg c 93C 144 Thir Trp Lys Arg Cys Thir Ser Gly Gly Asn Cys Ser Thir Wall Asn Gly 35 4 O 45 gcc gt C acc at C gat gcc aac tgg cgc tgg act CaC a CC gtt to c ggc 192 Ala Wall Thir Ile Asp Ala Asn Trp Arg Trp Thir His Thir Wall Ser Gly SO 55 6 O tcg acc aac tgc tac a CC ggc aac gag tgg gat a CC t cc at C tgc tot 24 O Ser Thir Asn Cys Tyr Thir Gly Asn Glu Trp Asp Thir Ser Ile Cys Ser 65 70 7s 8O gat ggc aag agc tgc gcc Cag acc tgc tgc gtc gac ggc gct gac tac 288 Asp Gly Lys Ser Cys Ala Glin Thir Cys Cys Wall Asp Gly Ala Asp 85 90 95 tot tog acc tat ggit atc. a CC acc agc ggt gac t cc Ctg aac citc. aag 336 Ser Ser Thir Tyr Gly Ile Thir Thir Ser Gly Asp Ser Lell Asn Luell Lys 1OO 105 11 O tto gt C acc aag CaC Cag tac ggc acc aat gtc ggc tot cgt. gt C tac 384 Phe Wall Thir Lys His Glin Gly Thir Asn Wall Gly Ser Arg Wall 115 12 O 125

Ctg atg gag aac gac a CC aag tac cag atg titc gag citc. citc. ggc aac 432 Lell Met Glu Asn Asp Thir Lys Glin Met Phe Glu Lell Luell Gly Asn 13 O 135 14 O gag ttic acc ttic gat gtc gat gt C tot aac Ctg ggc tgc ggt citc. aac Glu Phe Thir Phe Asp Wall Asp Wall Ser Asn Luell Gly Gly Luell Asn 145 150 155 160 ggit gcc citc. tac tto gtc t cc atg gac gct gat ggit ggit atg agc aag 528 Gly Ala Luell Phe Wall Ser Met Asp Ala Asp Gly Gly Met Ser 1.65 17O 17s tac tot ggc aac aag gct ggc gcc aag tac 999 acg 999 tac gat 576 Ser Gly Asn Lys Ala Gly Ala Lys Gly Thir Gly Tyr Asp 18O 185 19 O gct cag tgc cc.g cgc gac citt aag ttic at C aac ggc gag gcc aac att 624 Ala Glin Cys Pro Arg Asp Lell Lys Phe Ile ASn Gly Glu Ala Asn Ile 195 2OO 2O5

US 9, 187,739 B2 63 - Continued

515 52O 525 citg taa 1590 Lell

<210s, SEQ ID NO 4 &211s LENGTH: 529 212. TYPE: PRT <213> ORGANISM: Chaetomium thermophilum

<4 OOs, SEQUENCE: 4 Met Met Tyr Lys Llys Phe Ala Ala Lieu Ala Ala Lieu Val Ala Gly Ala 1. 5 1O 15 Ala Ala Glin Glin Ala Cys Ser Lieu. Thir Thr Glu Thr His Pro Arg Lieu. 2O 25 3O Thir Trp Lys Arg Cys Thir Ser Gly Gly Asn Cys Ser Thr Val Asn Gly 35 4 O 45 Ala Val Thir Ile Asp Ala Asn Trp Arg Trp Thr His Thr Val Ser Gly SO 55 6 O Ser Thr Asn Cys Tyr Thr Gly Asn Glu Trp Asp Thr Ser Ile Cys Ser 65 70 7s 8O Asp Gly Lys Ser Cys Ala Glin Thr Cys Cys Val Asp Gly Ala Asp Tyr 85 90 95 Ser Ser Thr Tyr Gly Ile Thr Thr Ser Gly Asp Ser Lieu. Asn Lieu Lys 1OO 105 11 O Phe Val Thr Lys His Glin Tyr Gly Thr Asn Val Gly Ser Arg Val Tyr 115 12 O 125 Lieu Met Glu Asn Asp Thir Lys Tyr Glin Met Phe Glu Lieu. Lieu. Gly Asn 13 O 135 14 O Glu Phe Thr Phe Asp Wall Asp Val Ser Asn Lieu. Gly Cys Gly Lieu. Asn 145 150 155 160 Gly Ala Lieu. Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Met Ser Lys 1.65 17O 17s Tyr Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 18O 185 19 O Ala Glin Cys Pro Arg Asp Lieu Lys Phe Ile Asn Gly Glu Ala Asn. Ile 195 2OO 2O5 Glu Asn Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly Phe Gly Arg 21 O 215 22O Tyr Gly Ser Cys Cys Ser Glu Met Asp Ile Trp Glu Ala Asn Asn Met 225 23 O 235 24 O Ala Thr Ala Phe Thr Pro His Pro Cys Thr Ile Ile Gly Glin Ser Arg 245 250 255 Cys Glu Gly Asn Ser Cys Gly Gly Thr Tyr Ser Ser Glu Arg Tyr Ala 26 O 265 27 O Gly Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ala Tyr Arg Glin Gly 27s 28O 285 Asp Llys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys 29 O 295 3 OO Met Thr Val Val Thr Glin Phe His Lys Asn Ser Ala Gly Val Leu Ser 3. OS 310 315 32O

Glu Ile Lys Arg Phe Tyr Val Glin Asp Gly Llys Val Ile Ala Asn Ala 3.25 330 335

Glu Ser Lys Ile Pro Gly Asn Pro Gly Asn Ser Ile Thr Glin Glu Trp 34 O 345 35. O US 9, 187,739 B2 65 66 - Continued

Asp Ala Glin Lys Wall Ala Phe Gly Asp Ile Asp Asp Phe Asn Arg 355 360 365

Gly Gly Met Ala Glin Met Ser Ala Luell Glu Gly Pro Met Wall 37 O 375

Lell Wall Met Ser Wall Trp Asp Asp His Ala Asn Met Luell Trp Luell 385 390 395 4 OO

Asp Ser Thir Pro Ile Asp Ala Gly Thir Pro Gly Ala Glu Arg 4 OS 41O 415

Gly Ala Pro Thir Thir Ser Gly Wall Pro Ala Glu Ile Glu Ala Glin 42O 425 43 O

Wall Pro Asn Ser Asn Wall Ile Phe Ser Asn Ile Arg Phe Gly Pro Ile 435 44 O 445

Gly Ser Thir Wall Pro Gly Lell Asp Gly Ser Thir Pro Ser Asn Pro Thir 450 45.5 460

Ala Thir Wall Ala Pro Pro Thir Ser Thir Thir Ser Wall Arg Ser Ser Thir 465 470

Thir Glin Ile Ser Thir Pro Thir Ser Glin Pro Gly Gly Thir Thir Glin 485 490 495

Trp Gly Glin Cys Gly Gly Ile Gly Thir Gly Thir Asn Cys SOO 505 51O

Wall Ala Gly Thir Thir Thir Glu Luell Asn Pro Trp Tyr Ser Glin Cys 515 52O 525

Lell

SEQ ID NO 5 LENGTH: 1356 TYPE: DNA ORGANISM: Scytalidium Sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1356)

< 4 OOs SEQUENCE: 5 atg cag at C aag agc tac atc. cag tac Ctg gcc gcg gct Ctg cc.g citc. 48 Met Glin Ile Lys Ser Ile Glin Luell Ala Ala Ala Luell Pro Luell 1. 1O 15

Ctg agc agc gt C gct gcc Cag cag gcc ggc acc atc. a CC gcc gag aac 96 Lell Ser Ser Wall Ala Ala Glin Glin Ala Gly Thir Ile Thir Ala Glu Asn 2O 25 3O

CaC cc c agg atg a CC tgg aag agg tog ggc c cc ggc aac tgc cag 144 His Pro Arg Met Thir Trp Lys Arg Cys Ser Gly Pro Gly Asn Cys Glin 35 4 O 45 a CC gtg cag ggc gag gtc gtc at C gac gcc aac tgg tgg Ctg CaC 192 Thir Wall Glin Gly Glu Wall Wall Ile Asp Ala ASn Trp Arg Trp Luell His SO 55 6 O aac aac ggc cag aac tgc tat gag ggc aac aag tgg a CC agc cag 24 O Asn Asn Gly Glin Asn Cys Glu Gly Asn Lys Trp Thir Ser Glin 65 70 7s agc tog gcc acc gac gcg cag agg tgc gcc citc. gac ggt gcc aac 288 Ser Ser Ala Thir Asp Ala Glin Arg Cys Ala Lell Asp Gly Ala Asn 85 90 95

cag tog acc tac ggc gcc tog acc agc ggc gac t cc Ctg acg citc. 336 Glin Ser Thir Tyr Gly Ala Ser Thir Ser Gly Asp Ser Luell Thir Luell 1OO 105 11 O

ttic gt C acc aag CaC gag tac ggc acc aac atc. ggc tog cgc ttic 384 Phe Wall Thir Lys His Glu Tyr Gly Thir ASn Ile Gly Ser Arg Phe 115 12 O 125 tac citc. atg gcc aac Cag aac aag tac cag atg tto a CC Ctg atg aac 432

US 9, 187,739 B2 69 70 - Continued gtc. aac git c taa 1356 Wall Asn. Wall 450

<210s, SEQ ID NO 6 &211s LENGTH: 451 212. TYPE : PRT &213s ORGANISM: Scytalidium Sp.

<4 OOs, SEQUENCE: 6

Met Glin Ile Llys Ser Ile Glin Tyr Luell Ala Ala Ala Luell Pro Luell 1. 5 15

Lell Ser Ser Wall Ala Ala Glin Glin Ala Gly Thir Ile Thir Ala Glu Asn 25

His Pro Arg Met Thir Trp Arg Ser Gly Pro Gly Asn Glin 35 4 O 45

Thir Wall Glin Gly Glu Wall Ile Asp Ala ASn Trp Arg Trp Luell His SO 6 O

Asn Asn Gly Glin Asn Cys Glu Gly Asn Lys Trp Thir Ser Glin Cys 65 70

Ser Ser Ala Thir Asp Ala Glin Arg Cys Ala Lell Asp Gly Ala Asn 85 90 95

Glin Ser Thir Tyr Gly Ala Ser Thir Ser Gly Asp Ser Luell Thir Luell 105 11 O

Phe Wall Thir His Glu Tyr Gly Thir ASn Ile Gly Ser Arg Phe 115 12 O 125

Lieu Met Ala Asn Gln Asn Gln Met Phe Thr Luell Met Asn 13 O 135 14 O

Asn Glu Phe Ala Phe Asp Wall Asp Luell Ser Lys Wall Glu Gly Ile 145 150 155 160

Asn Ser Ala Luell Tyr Phe Wall Ala Met Glu Glu Asp Gly Gly Met Ala 1.65 17O

Ser Pro Ser Asn Arg Ala Gly Ala Lys Gly Thir Gly Tyr 18O 185 19 O

Asp Ala Glin Ala Arg Asp Luell Phe Ile Gly Gly Ala Asn 195

Ile Glu Gly Trp Arg Pro Ser Thir Asn Asp Pro Asn Ala Gly Wall Gly 21 O 215 22O

Pro Met Gly Ala Cys Ala Glu Ile Asp Wall Trp Glu Ser Asn Ala 225 23 O 235 24 O

Ala Ala Phe Thir Pro His Ala Cys Gly Ser Asn Arg Tyr 245 250 255

His Ile Glu Thir Asn Asn Gly Gly Thir Ser Asp Asp Arg 26 O 265 27 O

Phe Ala Gly Tyr Cys Asp Ala Asn Gly Asp Asn Pro 28O 285

Met Gly Asn Asp Phe Tyr Gly Gly Lys Thir Wall Asp Thir Asn 29 O 295 3 OO

Arg Phe Thir Wall Wall Ser Arg Phe Glu Arg Asn Arg Luell Ser Glin 3. OS 310 315 32O

Phe Phe Wall Glin Asp Gly Arg Ile Glu Wall Pro Pro Pro Thir Trp 3.25 330 335

Pro Gly Luell Pro Asn Ser Ala Asp Ile Thir Pro Glu Lell Cys Asp Ala 34 O 345 35. O

Glin Phe Arg Wall Phe Asp Asp Arg Asn Arg Phe Ala Glu Thir Gly Gly US 9, 187,739 B2 71 72 - Continued

355 360 365

Phe Asp Ala Luell Asn Glu Ala Luell Thir Ile Pro Met Wall Luell Wall Met 37 O 375 38O

Ser Ile Trp Asp Asp His His Ser Asn Met Luell Trp Lell Asp Ser Ser 385 390 395 4 OO

Pro Pro Glu Lys Ala Gly Luell Pro Gly Gly Asp Arg Gly Pro 4 OS 415

Pro Thir Thir Ser Gly Wall Pro Ala Glu Wall Glu Ala Glin Tyr Pro Asp 425 43 O

Ala Glin Wall Wall Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir 435 44 O 445

Wall Asn Wall 450

SEO ID NO 7 LENGTH: 1374 TYPE: DNA ORGANISM: Thermoascus aurantiacus FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1374)

<4 OOs, SEQUENCE: 7 atg tat cag cgc gct citt citc. ttic tot ttic titc citc. t cc gcc gcc cgc 48 Met Tyr Glin Arg Ala Lell Lell Phe Ser Phe Phe Lell Ser Ala Ala Arg 1. 5 1O 15 gcg cag cag gcc ggit a CC Cta acc gca gag aat CaC cott to c Ctg acc 96 Ala Gln Gln Ala Gly Thr Lieu Thr Ala Glu Asn His Pro Ser Lieu Thr 2O 25 3O tgg cag Cala tgc t cc agc ggc ggt agt tgt acc acg Cag aat gga a.a.a. 144 Trp Glin Glin Cys Ser Ser Gly Gly Ser Cys Thir Thir Glin Asn Gly Lys 35 4 O 45 gtc gtt at C gat gcg aac tgg cgt tgg gt C Cat a CC a CC tot gga tac 192 Wall Wall Ile Asp Ala Asn Trp Arg Trp Wall His Thir Thir Ser Gly Tyr SO 55 6 O a CC aac tgc tac acg ggc aat acg tgg gac acc agt atc. tgt cc c gac 24 O Thir Asn Cys Thir Gly Asn Thir Trp Asp Thir Ser Ile Cys Pro Asp 65 70 7s 8O gac gtg acc tgc gct Cag aat tgt gcc ttg gat gga gcg gat tac agt 288 Asp Wall Thir Cys Ala Glin Asn Cys Ala Luell Asp Gly Ala Asp Tyr Ser 85 90 95 ggc acc tat ggt gtt acg a CC agt ggc aac gcc Ctg aga Ctg aac titt 336 Gly Thir Gly Wall Thir Thir Ser Gly Asn Ala Lell Arg Luell Asn Phe 105 11 O gtc acc Cala agc toa 999 aag aac att ggc tog cgc Ctg tac Ctg Ctg 384 Wall Thir Glin Ser Ser Gly Lys Asn Ile Gly Ser Arg Lell Tyr Luell Luell 115 12 O 125

Cag gac gac acc act tat Cag at C ttic aag Ctg Ctg ggit cag gag titt 432 Glin Asp Asp Thir Thir Glin Ile Phe Lys Luell Lell Gly Glin Glu Phe 13 O 135 14 O a CC ttic gat gt C gac gtc t cc aat citc. cott tgc 999 Ctg aac ggc gcc Thir Phe Asp Wall Asp Wall Ser Asn Luell Pro Cys Gly Lell Asn Gly Ala 145 150 155 160 citc. tac titt gtg gcc atg gac gcc gac ggc gga ttg t cc a.a.a. tac cott 528 Lell Phe Wall Ala Met Asp Ala Asp Gly Gly Lell Ser Tyr Pro 1.65 17O 17s ggc aac aag gca ggc gct aag tat ggc act ggt tac gac tot cag 576 Gly Asn Ala Gly Ala Gly Thir Gly Asp Ser Glin 18O 185 19 O

US 9, 187,739 B2 75 76 - Continued

Ala Glin Glin Ala Gly Thir Lell Thir Ala Glu ASn His Pro Ser Luell Thir 2O 25 3O

Trp Glin Glin Ser Ser Gly Gly Ser Thir Thir Glin Asn Gly 35 4 O 45

Wall Wall Ile Ala Asn Trp Arg Trp Wall His Thir Thir Ser Gly SO 55 6 O

Thir Asn Thir Gly Asn Thir Trp Asp Thir Ser Ile Pro Asp 65 70

Asp Wall Thir Ala Glin Asn Ala Luell Asp Gly Ala Asp Tyr Ser 85 90 95

Gly Thir Gly Wall Thir Thir Ser Gly Asn Ala Lell Arg Luell Asn Phe 105 11 O

Wall Thir Glin Ser Ser Gly Asn Ile Gly Ser Arg Lell Tyr Luell Luell 115 12 O 125

Glin Asp Asp Thir Thir Glin Ile Phe Luell Lell Gly Glin Glu Phe 13 O 135 14 O

Thir Phe Asp Wall Asp Wall Ser Asn Luell Pro Cys Gly Lell Asn Gly Ala 145 150 155 160

Lell Tyr Phe Wall Ala Met Asp Ala Asp Gly Lell Ser Tyr Pro 1.65 17O 17s

Gly Asn Ala Gly Ala Gly Thir Asp Ser Glin 18O 185 19 O

Pro Arg Asp Lell Phe Ile Asn Gly Ala Asn Wall Glu Gly 195 2OO 2O5

Trp Pro Ser Ala Asn Asp Pro Asn Ala Wall Gly Asn His Gly 21 O 215 22O

Ser Ala Glu Met Asp Wall Trp Glu Asn Ser Ile Ser Thir 225 23 O 24 O

Ala Wall Thir Pro His Pro Asp Thir Pro Glin Thir Met Cys Glin 245 250 255

Gly Asp Asp Cys Gly Gly Thir Ser Ser Thir Arg Ala Gly Thir 26 O 265 27 O

Asp Pro Asp Gly Asp Phe Asn Pro Arg Glin Gly Asn His 27s 28O 285

Ser Phe Gly Pro Gly Lys Ile Wall Asp Thir Ser Ser Phe Thir 29 O 295 3 OO

Wall Wall Thir Glin Phe Ile Thir Asp Asp Gly Thir Pro Ser Gly Thir Luell 3. OS 310 315

Thir Glu Ile Arg Phe Wall Glin Asn Gly Wall Ile Pro Glin 3.25 330 335

Ser Glu Ser Thir Ile Ser Gly Wall Thir Gly ASn Ser Ile Thir Thir Glu 34 O 345 35. O

Thir Ala Glin Ala Ala Phe Gly Asp Asn Thir Gly Phe Phe 355 360 365

Thir His Gly Gly Lell Glin Lys Ile Ser Glin Ala Lell Ala Glin Gly Met 37 O 375

Wall Luell Wall Met Ser Lell Trp Asp Asp His Ala Ala Asn Met Luell Trp 385 390 395 4 OO

Lell Asp Ser Thir Tyr Pro Thir Asp Ala Asp Pro Asp Thir Pro Gly Wall 4 OS 41O 415

Ala Arg Gly Thir Pro Thir Thir Ser Gly Wall Pro Ala Asp Wall Glu 425 43 O

Ser Glin Asn Pro Asn Ser Wall Ile Ser Asn Ile Wall Gly

US 9, 187,739 B2 81 82 - Continued

<4 OOs, SEQUENCE: 10

Met Tyr Ala Lys Phe Ala Thir Luell Ala Ala Luell Wall Ala Gly Ala Ser 1. 1O 15

Ala Glin Ala Wall Cys Ser Lell Thir Ala Glu Thir His Pro Ser Luell Thir 2O 25 3O

Trp Glin Lys Thir Ala Pro Gly Ser Thir Asn Wall Ala Gly Ser 35 4 O 45

Ile Thir Ile Ala Asn Trp Arg Trp Thir His Glin Thir Ser Ser Ala SO 55 6 O

Thir Asn Ser Gly Ser Trp Asp Ser Ser Ile Thir Thir 65 70

Gly Thir Ala Ser Ile Asp Gly Ala Glu Tyr Ser 85 90 95

Ser Thir Gly Ile Thir Thir Ser Gly Asn Ala Lell Asn Luell Phe 105 11 O

Wall Thir Lys Gly Glin Tyr Ser Thir Asn Ile Gly Ser Arg Thir Luell 115 12 O 125

Met Glu Ser Asp Thir Tyr Glin Met Phe Lell Lell Gly Asn Glu 13 O 135 14 O

Phe Thir Phe Asp Wall Asp Wall Ser Asn Luell Gly Cys Gly Luell Asn Gly 145 150 155 160

Ala Luell Tyr Phe Wall Ser Met Asp Ala Asp Gly Met Ser Lys 1.65 17O 17s

Ser Gly Asn Lys Ala Gly Ala Tyr Gly Thir Cys Asp Ala 18O 185 19 O

Glin Pro Arg Asp Lell Phe Ile Asn Gly Ala Asn Wall Glu 195 2OO

Gly Trp Glu Ser Ser Thir Asn Asp Ala Asn Ala Ser Gly 21 O 215

Gly Ser Thir Glu Met Asp Wall Trp Glu Asn Asn Met Ala 225 23 O 235 24 O

Thir Ala Phe Thir Pro His Pro Thir Thir Ile Glin Thir Arg 245 250 255

Glu Gly Asp Thir Gly Gly Thir Tyr Ser Ser Arg Tyr Ala Gly 26 O 265 27 O

Wall Asp Pro Asp Gly Asp Phe Asn Ser Arg Glin Gly Asn 28O 285

Thir Phe Gly Gly Met Thir Wall Asp Thir Thir Ile 29 O 295 3 OO

Thir Wall Wall Thir Glin Phe Lell Asn Ser Ala Gly Glu Luell Ser Glu 3. OS 310 315

Ile Arg Phe Tyr Ala Glin Asp Gly Lys Wall Ile Pro Asn Ser Glu 3.25 330 335

Ser Thir Ile Ala Gly Ile Pro Gly Asn Ser Ile Thir Ala 34 O 345 35. O

Asp Ala Glin Thir Wall Phe Glin Asn Thir Asp Asp Phe Thir Ala 355 360 365

Gly Gly Luell Wall Glin Met Gly Ala Luell Ala Gly Asp Met Wall Luell 37 O 375

Wall Met Ser Wall Trp Asp Asp His Ala Wall ASn Met Lell Trp Luell Asp 385 390 395 4 OO

Ser Thir Pro Thir Asp Glin Wall Gly Wall Ala Gly Ala Glu Arg Gly 4 OS 41O 415

US 9, 187,739 B2 87 88 - Continued

SO 55 6 O

Asn Phe Glu Gly Asn Thir Trp Thir Gly Thir Ser Gly Ala Asp 65 70

Gly Ala Asn Ala Wall Glu Gly Ala Asn Tyr Glin Ser Thir 85 90 95

Gly Wall Ser Thir Ser Gly Asn Ala Luell Ser Lell Arg Phe Wall Thir 1OO 105 11 O

Glu His Glu His Gly Wall Asn Thir Gly Ser Arg Thir Tyr Luell Met Glu 115 12 O 125

Ser Ala Thir Tyr Glin Met Phe Thir Luell Met Asn Asn Glu Luell Ala 13 O 135 14 O

Phe Asp Wall Asp Lell Ser Wall Ala Gly Met Asn Ser Ala Luell 145 150 155 160

Luell Wall Pro Met Ala Asp Gly Gly Luell Ser Ser Glu Thir Asn 1.65 17O 17s

Asn Asn Ala Gly Ala Gly Thir Gly Tyr Asp Ala Glin 18O 185 19 O

Ala Arg Asp Luell Phe Wall Asn Gly Ala Asn Ile Glu Gly Trp 195

Glin Ala Ser Thir Asp Glu Asn Ser Gly Wall Gly Asn Met Gly Ser 21 O 215 22O

Cys Ala Glu Ile Asp Wall Trp Glu Ser ASn Arg Glu Ser Phe Ala 225 23 O 235 24 O

Phe Thir Pro His Ala Ser Glin Asn Glu Tyr His Wall Thir Gly 245 250 255

Ala Asn Gly Gly Thir Ser Asp Asp Arg Phe Ala Gly Lys 26 O 265 27 O

Asp Ala Asn Gly Asp Asn Pro Phe Arg Wall Gly Asn Glin Asn 27s 285

Phe Tyr Gly Pro Met Thir Wall Asn Thir ASn Ser Phe Thir Wall 29 O 295 3 OO

Ile Ser Arg Phe Glu Asn Glu Ala Glin Wall Phe Ile Glin Asn 3. OS 310 315

Gly Arg Thir Ile Glu Wall Pro Arg Pro Thir Luell Ser Gly Ile Thir Glin 3.25 330 335

Phe Glu Ala Lys Ile Thir Pro Glu Phe Ser Thir Pro Thir Wall 34 O 345 35. O

Phe Gly Asp Arg Asp Arg His Gly Glu Ile Gly Gly His Thir Ala Luell 355 360 365

Asn Ala Ala Luell Arg Met Pro Met Wall Luell Wall Met Ser Ile Trp Ala 37 O 375

Asp His Ala Asn Met Lell Trp Luell Asp Ser Ile Pro Pro Glu 385 390 395 4 OO

Arg Gly Glin Pro Gly Ala His Arg Gly Arg Arg Ser Arg Gly 4 OS 41O 415

SEQ ID NO 13 LENGTH: 1341 TYPE: DNA ORGANISM: Neotermes cast aneus FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1341)

< 4 OOs SEQUENCE: 13

US 9, 187,739 B2 91 92 - Continued

Cag aac gga aag gtg att gag aac tog aag aac aag att tog gga att OO8 Glin Asn Gly Lys Wall Ile Glu Asn Ser Lys ASn Lys Ile Ser Gly Ile 3.25 330 335 gac gag acg aac gca gtg agt gat act ttic tgc gat Cag Cala aag aag Asp Glu Thir Asn Ala Wall Ser Asp Thir Phe Cys Asp Glin Glin Lys Lys 34 O 345 35. O gcc ttic ggt gat acg aac gat ttic aag aac aag ggc ggit ttic gct aag 104 Ala Phe Gly Asp Thir Asn Asp Phe Lys Asn Lys Gly Gly Phe Ala Lys 355 360 365 ttg ggt cag gtg tto gag act ggt cag gtt citc. gtg Ctg tog Ctg tgg 152 Lell Gly Glin Wall Phe Glu Thir Gly Glin Wall Luell Wall Lell Ser Luell Trp 37 O 375 38O gat gac CaC tog gtt gca atg Ctg tgg ttg gac tcg gcc tac CC a acg 2OO Asp Asp His Ser Wall Ala Met Luell Trp Luell Asp Ser Ala Pro Thir 385 390 395 4 OO aac aag gat aag agc agc C Ca ggt gtt gac cgt. 999 cott tgc cc.g acg 248 Asn Lys Asp Lys Ser Ser Pro Gly Wall Asp Arg Gly Pro Cys Pro Thir 4 OS 41O 415 act to c 999 aag cc.g gat gat gtt gala agc Cala tot c cc gat gca acc 296 Thir Ser Gly Lys Pro Asp Asp Wall Glu Ser Glin Ser Pro Asp Ala Thir 42O 425 43 O gtc att tat ggc aac atc. aag ttic ggt gca Ctg gac t cc act tac 341 Wall Ile Tyr Gly Asn Ile Phe Gly Ala Luell Asp Ser Thir 435 44 O 445

<210s, SEQ ID NO 14 &211s LENGTH: 447 212. TYPE PRT &213s ORGANISM: Nedtermes cast aneus

<4 OOs, SEQUENCE: 14

Ala Arg Gly Lieu. Ala Ala Ala Luell Phe Thir Phe Ala Ser Wall Gly 1. 5 1O 15

Ile Gly Thir Lys Thir Ala Glu Asn His Pro Lell Asn Trp Glin Asn 2O 25 3O

Ala Ser Lys Gly Ser Ser Glin Wall Ser Gly Glu Wall Thir Met 35 4 O 45

Asp Ser Asn Trp Arg Trp Thir His Asp Gly ASn Gly Asn Tyr SO 55 6 O

Asp Gly Asn Thir Trp Ile Ser Ser Luell Pro Asp Gly Thir Cys 65 70 7s

Ser Asp Wall Lell Asp Gly Ala Glu Glin Ala Thir Tyr Gly 85 90 95

Ile Thir Ser Asn Gly Thir Ala Wall Thir Luell Phe Wall Thir His Gly 1OO 105 11 O

Ser Ser Thir Asn Ile Gly Ser Arg Luell Lell Lell Asp Glu 115 12 O 125

Asn Thir Ile Phe Lys Wall Asn Asn Glu Phe Thir Phe Ser 13 O 135 14 O

Wall Asp Wall Ser Lys Lell Pro Gly Luell ASn Gly Ala Luell Phe 145 150 155 160

Wall Ser Met Asp Ala Asp Gly Gly Ala Gly Ser Gly Ala Lys 1.65 17O 17s

Pro Gly Ala Lys Tyr Gly Lell Gly Tyr Asp Ala Glin Cys Pro Ser 18O 185 19 O

Asp Luell Lys Phe Ile Asn Gly Glu Ala Asn Ser Asp Gly Trp Pro 195 2OO 2O5 US 9, 187,739 B2 93 94 - Continued

Glin Ala Asn Asp Lys Asn Ala Gly Asn Gly Tyr Gly Ser 21 O 215 22O

Ser Glu Met Asp Wall Trp Glu Ala Asn Ser Glin Ala Thir Ala Thir 225 23 O 235 24 O

Pro His Wall Lys Thir Thir Gly Glin Glin Arg Cys Ser Gly Thir Ser 245 250 255

Glu Gly Gly Glin Asp Gly Ala Ala Arg Phe Glin Gly Luell Asp 26 O 265 27 O

Glu Asp Gly Asp Phe Asn Ser Trp Arg Glin Gly Asp Thir Phe 28O 285

Gly Pro Gly Lell Thir Wall Asp Thir Ser Pro Phe Thir Wall Wall 29 O 295 3 OO

Thir Glin Phe Wall Gly Ser Pro Wall Glu Ile Arg Arg Wall 3. OS 310 315 32O

Glin Asn Gly Wall Ile Glu Asn Ser Lys ASn Ile Ser Gly Ile 3.25 330 335

Asp Glu Thir Asn Ala Wall Ser Asp Thir Phe Asp Glin Glin 34 O 345 35. O

Ala Phe Gly Asp Thir Asn Asp Phe Asn Gly Gly Phe Ala 355 360 365

Lell Gly Glin Wall Phe Glu Thir Gly Glin Wall Luell Wall Lell Ser Luell Trp 37 O 375

Asp Asp His Ser Wall Ala Met Luell Trp Luell Asp Ser Ala Pro Thir 385 390 395 4 OO

Asn Asp Ser Ser Pro Gly Wall Asp Arg Gly Pro Pro Thir 4 OS 41O 415

Thir Ser Gly Lys Pro Asp Asp Wall Glu Ser Glin Ser Pro Asp Ala Thir 42O 425 43 O

Wall Ile Tyr Gly Asn Ile Phe Gly Ala Luell Asp Ser Thir 435 44 O 445

SEO ID NO 15 LENGTH: 1359 TYPE: DNA ORGANISM: Melanocarpus albomyces FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1359)

<4 OOs, SEQUENCE: 15 atg atg atg aag Cag tac citc. cag tac citc. gC9 gcc gcg Ctg cc.g citc. 48 Met Met Met Lys Glin Lell Glin Luell Ala Ala Ala Luell Pro Luell 1. 5 1O 15 gtc ggc citc. gcc gcc ggc Cag cgc gct ggt aac gag acg cc c gag agc 96 Wall Gly Luell Ala Ala Gly Glin Arg Ala Gly ASn Glu Thir Pro Glu Ser 2O 25 3O

CaC cc c cc.g citc. a CC tgg Cag agg tgc acg gcc cc.g ggc aac tgc cag 144 His Pro Pro Luell Thir Trp Glin Arg Thir Ala Pro Gly Asn Glin 35 4 O 45 a CC gtg aac gcc gag gtc att gac gcc aac tgg tgg Ctg CaC 192 Thir Wall Asn Ala Glu Wall Ile Asp Ala ASn Trp Arg Trp Luell His SO 6 O gac gac aac atg Cag aac tac gac ggc aac Cag tgg acc aac gcc 24 O Asp Asp Asn Met Glin Asn Asp Gly ASn Glin Trp Thir Asn Ala 65 70 7s 8O tgc agc acc gcc a CC gac gct gag aag atg atc. gag ggt gcc 288 Ser Thir Ala Thir Asp Ala Glu Lys Met Ile Glu Gly Ala

US 9, 187,739 B2 97 98 - Continued

Ile Pro Pro Glu Lys Glu Gly Glin Pro Gly Ala Ala Arg Gly Asp 4 OS 41O 415 tgc cc c acg gac tcg ggit gtc cc c gcc gag gtc gag gct cag ttic cc c 1296 Cys Pro Thir Asp Ser Gly Wall Pro Ala Glu Wall Glu Ala Glin Phe Pro 42O 425 43 O gac gcc cag gt C gtc tgg t cc aac at C cgc titc ggc c cc at C ggc tog 1344 Asp Ala Glin Wall Wall Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser 435 44 O 445 a CC tac gac ttic taa 1359 Thir Tyr Asp Phe 450

<210s, SEQ ID NO 16 &211s LENGTH: 452 212. TYPE : PRT &213s ORGANISM: Melanocarpus albomyces

<4 OOs, SEQUENCE: 16

Met Met Met Lys Glin Tyr Lell Glin Tyr Luell Ala Ala Ala Luell Pro Luell 1. 5 15

Wall Gly Luell Ala Ala Gly Glin Arg Ala Gly ASn Glu Thir Pro Glu Ser 25

His Pro Pro Luell Thir Trp Glin Arg Thir Ala Pro Gly Asn Glin 35 4 O 45

Thir Wall Asn Ala Glu Wall Wall Ile Asp Ala ASn Trp Arg Trp Luell His SO 55 6 O

Asp Asp Asn Met Glin Asn Asp Gly ASn Glin Trp Thir Asn Ala 65 70 75

Ser Thir Ala Thir Asp Ala Glu Lys Met Ile Glu Gly Ala 85 90 95

Gly Asp Luell Gly Thir Gly Ala Ser Thir Ser Gly Asp Ala Luell 105 11 O

Thir Luell Lys Phe Wall Thir His Glu Tyr Gly Thir Asn Wall Gly Ser 115 12 O 125

Arg Phe Luell Met Asn Gly Pro Asp Glin Met Phe Asp Luell 13 O 135 14 O

Lell Gly Asn Glu Lell Ala Phe Asp Wall Asp Luell Ser Thir Wall Glu Cys 145 150 155 160

Gly Ile Asn Ser Ala Lell Tyr Phe Wall Ala Met Glu Glu Asp Gly Gly 1.65 17s

Met Ala Ser Tyr Pro Ser Asn Glin Ala Gly Ala Arg Gly Thir Gly 18O 185 19 O

Asp Ala Glin Ala Arg Asp Luell Lys Phe Wall Gly Gly 195 2O5

Ala Asn Ile Glu Gly Trp Lys Pro Ser Thir ASn Asp Pro Asn Ala Gly 21 O 215 22O

Wall Gly Pro Gly Gly Cys Ala Glu Ile Asp Wall Trp Glu Ser 225 23 O 235 24 O

Asn Ala Ala Phe Ala Phe Thir Pro His Ala Thir Thir Asn Glu 245 250 255

His Wall Cys Glu Thir Thir Asn Cys Gly Gly Thir Ser Glu Asp 26 O 265 27 O

Arg Phe Ala Gly Asp Ala Asn Gly Cys Asp Tyr Asn Pro 27s 28O 285

Arg Met Gly Asn Pro Asp Phe Gly Gly Lys Thir Luell Asp Thir 29 O 295 3 OO US 9, 187,739 B2 99 100 - Continued

Ser Arg Llys Phe Thr Val Val Ser Arg Phe Glu Glu Asn Lys Luell Ser 3. OS 310 315 32O

Glin Tyr Phe Ile Glin Asp Gly Arg Lys Ile Glu Ile Pro Pro Pro Thir 3.25 330 335

Trp. Glu Gly Met Pro Asn Ser Ser Glu Ile Thr Pro Glu Luell Cys Ser 34 O 345 35. O

Thr Met Phe Asp Val Phe Asn Asp Arg Asn Arg Phe Glu Glu Val Gly 355 360 365

Gly Phe Glu Glin Lieu. Asn. Asn Ala Lieu. Arg Val Pro Met Wall Lieu Wall 37 O 375

Met Ser Ile Trp Asp Asp His Tyr Ala Asn Met Lell Trp Luell Asp Ser 385 390 395 4 OO

Ile Tyr Pro Pro Glu Lys Glu Gly Gln Pro Gly Ala Ala Arg Gly Asp 4 OS 41O 415

Cys Pro Thr Asp Ser Gly Val Pro Ala Glu Wall Glu Ala Glin Phe Pro 425 43 O

Asp Ala Glin Val Val Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser 435 44 O 445 Thr Tyr Asp Phe 450

<210s, SEQ ID NO 17 &211s LENGTH: 221 &212s. TYPE: DNA <213> ORGANISM: Trichothecium roseum & 22 O FEATURE; <221 > NAMEAKEY: misc feature <222s. LOCATION: (1) ... (221) 223s OTHER INFORMATION: Partial CBH1 encoding sequence

<4 OOs, SEQUENCE: 17 tacgcc cagt gcgc.ccgtga cct caagttc Ctcggcggca Ctt coaact a cacggctgg 6 O aagc cct cqg acactgacga Cagcgc.cggit gtcggcaacc gcggat.cctg. Ctgcgc.cgag 12 O attgacat ct gggagt ccaa citcgcacgcc titcgcct tca CCCCC cacgc ctg.cgagaac 18O aacgagtacc acatctgcga gaccaccgac tgcgg.cggca c 221

<210s, SEQ ID NO 18 &211s LENGTH: 239 &212s. TYPE: DNA <213> ORGANISM: Humicola nigrescens 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (1) ... (239) 223s OTHER INFORMATION: Partial CBH1 encoding sequence

<4 OOs, SEQUENCE: 18 tacggcacgg ggtactg.cga cgc.ccaatgc gcc.cgcgatc t caagttcgt. 6 O gccalatgttg agggctggaa acagt ccacc aacgatgc.ca atgc.cggcgt. gggtc.cgatg 12 O ggcggttgct gcgc.cgaaat tgacgtctgg gaatcgaacg cc catgc citt cgc ctitcacg 18O cc.gcacgc.gt gcgagaacaa caagtaccac atctg.cgaga Ctgacggatg cgg.cggcac 239

<210s, SEQ ID NO 19 &211s LENGTH: 199 &212s. TYPE: DNA <213> ORGANISM: Cladorrhinum foecundissimum 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (1) . . (199)

US 9, 187,739 B2 117 118 - Continued

Cys Thir Gly Asn Thir Trp Asn Ala Asp Tyr Cys Thir Asp Asn Thir 65 70

Glu Ala Ser Asn Ala Luell Asp Gly Ala Asp Ser Gly Thir 85 90 95

Gly Ala Thir Thir Ser Gly Asp Ser Luell Arg Lell Asn Phe Ile Thir 105 11 O

Asn Gly Glin Glin Asn Ile Gly Ser Arg Met Lell Met Glin Asp 115 12 O 125

Asp Glu Thir Ala Wall His Luell Luell ASn Lys Glu Phe Thir Phe 13 O 135 14 O

Asp Wall Asp Thir Ser Lys Lell Pro Gly Luell Asn Gly Ala Wall Tyr 145 150 155 160

Phe Wall Ser Met Asp Ala Asp Gly Gly Met Ala Phe Pro Asp Asn 1.65 17s

Ala Gly Ala Lys Gly Thir Gly Tyr Cys Asp Ser Glin Pro 18O 185 19 O

Arg Asp Luell Phe Ile Asp Gly Lys Ala ASn Wall Glu Gly Trp Wall 195

Pro Ser Glu Asn Asp Ser Asn Ala Gly Wall Gly Asn Lell Gly Ser 21 O 215 22O

Cys Ala Glu Met Asp Ile Trp Glu Ala Asn Ser Ile Ser Thir Ala Tyr 225 23 O 235 24 O

Thir Pro His Ser Cys Thir Wall Ala Glin His Ser Thir Gly Asp 245 250 255

Asp Gly Gly Thir Ser Ala Thir Arg Tyr Ala Gly Asp Asp 26 O 265 27 O

Pro Asp Gly Cys Asp Phe Asn Ser Arg Glin Gly Wall Asp Phe 28O 285

Gly Pro Gly Met Thir Wall Asp Ser Asn Ser Wall Wall Thir Wall Wall 29 O 295 3 OO

Thir Glin Phe Ile Thir Asn Asp Gly Thir Ala Ser Gly Thir Luell Ser Glu 3. OS 310 315

Ile Arg Phe Tyr Wall Glin Asn Gly Lys Wall Ile Pro Asn Ser Glu 3.25 330 335

Ser Thir Ile Ala Gly Wall Ser Gly Asn Ser Ile Thir Ser Ala 34 O 345 35. O

Asp Ala Glin Glu Wall Phe Gly Asp Asn Thir Ser Phe Glin Asp Glin 355 360 365

Gly Gly Luell Ala Ser Met Ser Glin Ala Luell ASn Ala Gly Met Wall Luell 37 O 375

Wall Met Ser Ile Trp Asp Asp His His Ser ASn Met Lell Trp Luell Asp 385 390 395 4 OO

Ser Asp Pro Wall Asp Ala Asp Pro Ser Glin Pro Gly Ile Ser Arg 4 OS 41O 415

Gly Thir Pro Thir Thir Ser Gly Wall Pro Ser Glu Wall Glu Glu Ser 42O 425 43 O

Ala Ala Ser Ala Tyr Wall Wall Tyr Ser Asn Ile Wall Gly Asp Luell 435 44 O 445

Asn Ser Thir Phe Ser Ala 450

<210s, SEQ ID NO 39 &211s LENGTH: 1377

US 9, 187,739 B2 121 122 - Continued

acg acc ttic tac ggc aag gga aag acg gt C gat a CC agc to c aag ttic 912 Thir Thir Phe Gly Lys Gly Lys Thir Wall Asp Thir Ser Ser Lys Phe 29 O 295 3 OO acg gt C gtg acc Cag tto atc. acc gac act gga a CC gcc to c ggc tog 96.O Thir Wall Wall Thir Glin Phe Ile Thir Asp Thir Gly Thir Ala Ser Gly Ser 3. OS 310 315 32O citc. acg gag at C cgc cgc tto tac gt C cag aac gga aag ttg at C cc c OO8 Lell Thir Glu Ile Arg Arg Phe Wall Glin ASn Gly Lys Luell Ile Pro 3.25 330 335 aac to c cag tog aag atc. tcg ggc gt C act ggc aac t cc at C acc tot Asn Ser Glin Ser Lys Ile Ser Gly Wall Thir Gly Asn Ser Ile Thir Ser 34 O 345 35. O gct ttic tgc gac gct Cag aag gcg gct ttic ggc gat aac tac acg ttic 104 Ala Phe Cys Asp Ala Glin Lys Ala Ala Phe Gly Asp Asn Thir Phe 355 360 365 aag gac aag ggc ggc tto gca to c atg act act gct atg aag aac gga 152 Lys Asp Lys Gly Gly Phe Ala Ser Met Thir Thir Ala Met Lys Asn Gly 37 O 375 38O atg gt C Ctg gtt atg citt tgg gat gac CaC tac gcc aat atg citc. 2OO Met Wall Luell Wall Met Lell Trp Asp Asp His Ala Asn Met Luell 385 390 395 4 OO tgg citt gat agc gac c cc act aac gcg gac t cc t cc aag cc.g ggt 248 Trp Luell Asp Ser Asp Pro Thir Asn Ala Asp Ser Ser Lys Pro Gly 4 OS 41O 415 gtt gct cgt ggc a CC cc.g act tot to c ggc gtg c cc tog gat gt C 296 Wall Ala Arg Gly Thir Pro Thir Ser Ser Gly Wall Pro Ser Asp Wall 42O 425 43 O gag act aac aat gca agc gct tog gt C acg tac t cc aac att aga titt 344 Glu Thir Asn Asn Ala Ser Ala Ser Wall Thir Ser Asn Ile Arg Phe 435 44 O 445 gga gat citc. aat t cc act tac acc gcc cag taa 377 Gly Asp Luell Asn Ser Thir Tyr Thir Ala Glin 450 45.5

<210s, SEQ ID NO 4 O &211s LENGTH: 458 212. TYPE : PRT &213s ORGANISM: Trichophaea Saccata

<4 OOs, SEQUENCE: 4 O

Met Glin Arg Lieu Lell Wall Lell Luell Thir Ser Luell Lell Ala Phe Thir Tyr 1. 5 1O 15

Gly Glin Glin Wall Gly Thir Glin Glin Ala Glu Wall His Pro Ser Met Thir 2O 25 3O

Trp Glin Glin Thir Ser Gly Gly Thir Thir Lys Asn Gly 35 4 O 45

Wall Wall Ile Ala Asn Trp Arg Trp Wall His Asn Wall Gly Gly Tyr SO 55 6 O

Thir Asn Thir Gly Asn Thir Trp Asp Ser Ser Lell Pro Asp 65 70

Asp Wall Thir Ala Asn Ala Luell Asp Gly Ala Asp Tyr Ser 85 90 95

Gly Thir Gly Wall Thir Ala Gly Gly Asn Ser Lell Luell Thir Phe 1OO 105 11 O

Wall Thir Lys Gly Glin Tyr Ser Thir Asn Wall Gly Ser Arg Luell Met 115 12 O 125

Lell Ala Asp Asp Ser Thir Tyr Glin Met Tyr ASn Lell Lell Asn Glin Glu 13 O 135 14 O US 9, 187,739 B2 123 124 - Continued

Phe Thir Phe Asp Wall Asp Wall Ser Asn Luell Pro Cys Gly Lieu. Asn Gly 145 150 155 160

Ala Luell Tyr Phe Wall Ser Met Asp Asp Gly Gly Met Ser Lys 1.65 17O 17s

Ser Gly Asn Lys Ala Gly Ala Tyr Gly Thir Gly Tyr Cys Asp Ser 18O 185 19 O

Glin Pro Arg Asp Lell Phe Ile Asn Gly Glin Gly Asn Wall Glu 195 2OO

Gly Trp Pro Ser Ser Asn Asp Ala Asn Ala Gly Wall Gly Gly His 21 O 215 22O

Gly Ser Ala Glu Met Asp Wall Trp Glu Ala Asn Ser Ile Ser 225 23 O 235 24 O

Ala Ala Wall Thir Pro His Ser Ser Thir Thir Ser Glin Thir Met 245 250 255

Asn Gly Asp Ser Gly Gly Thir Tyr Ser Ala Thir Arg Tyr Ala Gly 26 O 265 27 O

Wall Asp Pro Asp Gly Asp Phe Asn Ser Arg Met Gly Asp 28O 285

Thir Thir Phe Gly Gly Lys Thir Wall Asp Thir Ser Ser Phe 29 O 295 3 OO

Thir Wall Wall Thir Glin Phe Ile Thir Asp Thir Gly Thir Ala Ser Gly Ser 3. OS 310 315

Lell Thir Glu Ile Arg Arg Phe Wall Glin ASn Gly Luell Ile Pro 3.25 330 335

Asn Ser Glin Ser Ile Ser Gly Wall Thir Gly Asn Ser Ile Thir Ser 34 O 345 35. O

Ala Phe Cys Asp Ala Glin Ala Ala Phe Gly Asp Asn Tyr Thir Phe 355 360 365

Asp Gly Gly Phe Ala Ser Met Thir Thir Ala Met Asn Gly 37 O 375

Met Wall Luell Wall Met Ser Lell Trp Asp Asp His Ala Asn Met Luell 385 390 395 4 OO

Trp Luell Asp Ser Asp Pro Thir Asn Ala Asp Ser Ser Pro Gly 4 OS 415

Wall Ala Arg Gly Thir Pro Thir Ser Ser Gly Wall Pro Ser Asp Wall 425 43 O

Glu Thir Asn Asn Ala Ser Ala Ser Wall Thir Ser Asn Ile Arg Phe 435 44 O 445

Gly Asp Luell Asn Ser Thir Tyr Thir Ala Glin 450 45.5

SEQ ID NO 41 LENGTH: 1353 TYPE: DNA ORGANISM: Myceliophthora thermophila FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1353)

< 4 OOs SEQUENCE: 41 atg aag cag tac citc. Cag tac citc. gcg gcg acc Ctg c cc Ctg gtg ggc 48 Met Lys Glin Tyr Lell Glin Tyr Luell Ala Ala Thir Lell Pro Luell Wall Gly 1. 1O 15

Ctg gcc acg gcc Cag Cag gcg ggt aac Ctg cag a CC gag act CaC cc c 96 Lell Ala Thir Ala Glin Glin Ala Gly Asn Luell Glin Thir Glu Thir His Pro 2O 25 3O

US 9, 187,739 B2 127 128 - Continued

34 O 345 35. O tto aag gcc ttic gat gac cgt gac cgc ttic tog gag gtt ggc ggc ttic 104 Phe Lys Ala Phe Asp Asp Arg Asp Arg Phe Ser Glu Wall Gly Gly Phe 355 360 365 gat gcc at C aac acg gcc citc. agc act cc c atg gtc citc. gtC atg to c 152 Asp Ala Ile Asn Thir Ala Lell Ser Thir Pro Met Wall Lell Wall Met Ser 37 O 375 38O atc. tgg gat gat CaC tac gcc aat atg citc. tgg citc. gac tog agc tac 2OO Ile Trp Asp Asp His Tyr Ala Asn Met Luell Trp Lell Asp Ser Ser Tyr 385 390 395 4 OO c cc cott gag aag gct ggc Cag cott ggc ggt gac cgt ggc cc.g tgt cott 248 Pro Pro Glu Lys Ala Gly Glin Pro Gly Gly Asp Arg Gly Pro Cys Pro 4 OS 41O 415

Cag gac tot gtc cc.g gcc gac gtt gag gct Cag tac cott aat gcc 296 Glin Asp Ser Wall Pro Ala Asp Wall Glu Ala Glin Pro Asn Ala 425 43 O aag gt C at C tgg t cc aac atc. cgc ttic ggc coc atc. ggc tog act gt C 344 Wall Ile Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir Wall 435 44 O 445 aac gt C taa 353 Asn Wall 450

SEQ ID NO 42 LENGTH: 450 TYPE : PRT ORGANISM: Myceliophthora thermophila

< 4 OO SEQUENCE: 42

Met Lys Glin Tyr Lell Glin Luell Ala Ala Thir Lell Pro Luell Wall Gly 1. 5 15

Lell Ala Thir Ala Glin Glin Ala Gly Asn Luell Glin Thir Glu Thir His Pro 25

Arg Luell Thir Trp Ser Thir Ala Pro Gly Ser Cys Glin Glin Wall 35 4 O 45

Asn Gly Glu Wall Wall Ile Asp Ser Asn Trp Arg Trp Wall His Asp Glu SO 55 6 O

Asn Ala Glin Asn Cys Tyr Asp Gly Asn Glin Trp Thir Asn Ala Ser 65 70 8O

Ser Ala Thir Asp Cys Ala Glu Asn Ala Luell Glu Gly Ala Asp 85 90 95

Glin Gly Thir Tyr Gly Ala Ser Thir Ser Gly ASn Ala Lell Thir Luell Thir 105 11 O

Phe Wall Thir His Glu Gly Thir Asn Ile Gly Ser Arg Luell 115 12 O 125

Lell Met Asn Gly Ala Asn Lys Glin Met Phe Thir Lell Gly Asn 13 O 135 14 O

Glu Luell Ala Phe Asp Wall Asp Luell Ser Ala Wall Glu Gly Luell Asn 145 150 155 160

Ser Ala Luell Phe Wall Ala Met Glu Glu Asp Gly Gly Wall Ser Ser 1.65 17O 17s

Pro Thir Asn Thir Ala Gly Ala Lys Phe Gly Thir Gly Tyr Asp 18O 185 19 O

Ala Glin Cys Ala Arg Asp Lell Lys Phe Wall Gly Gly Lys Gly Asn Ile 195

Glu Gly Trp Pro Ser Thir Asn Asp Ala ASn Ala Gly Wall Gly Pro 21 O 215 22O US 9, 187,739 B2 129 130 - Continued

Tyr Gly Gly Cys Ala Glu Ile Asp Wall Trp Glu Ser Asn Lys Tyr 225 23 O 235 24 O

Ala Phe Ala Phe Thir Pro His Gly Glu ASn Pro Tyr His Wall 245 250 255

Glu Thir Thir Asn Gly Gly Thir Tyr Ser Glu Asp Arg Phe Ala 26 O 265 27 O

Gly Asp Cys Asp Ala Asn Gly Cys Asp Tyr ASn Pro Tyr Arg Met Gly 27s 285

Asn Glin Asp Phe Tyr Gly Pro Gly Luell Thir Wall Asp Thir Ser 29 O 295 3 OO

Phe Thir Wall Wall Ser Glin Phe Glu Glu Asn Lys Lell Thir Glin Phe Phe 3. OS 310 315 32O

Wall Glin Asp Gly Lys Ile Glu Ile Pro Gly Pro Wall Glu Gly 3.25 330 335

Ile Asp Ala Asp Ser Ala Ala Ile Thir Pro Glu Lell Ser Ala Luell 34 O 345 35. O

Phe Ala Phe Asp Asp Arg Asp Arg Phe Ser Glu Wall Gly Gly Phe 355 360 365

Asp Ala Ile Asn Thir Ala Lell Ser Thir Pro Met Wall Lell Wall Met Ser 37 O 375

Ile Trp Asp Asp His Tyr Ala Asn Met Luell Trp Lell Asp Ser Ser Tyr 385 390 395 4 OO

Pro Pro Glu Ala Gly Glin Pro Gly Gly Asp Arg Gly Pro Cys Pro 4 OS 41O 415

Glin Asp Ser Gly Wall Pro Ala Asp Wall Glu Ala Glin Tyr Pro Asn Ala 42O 425 43 O

Wall Ile Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir Wall 435 44 O 445

Asn Wall 450

SEQ ID NO 43 LENGTH: 1341 TYPE: DNA ORGANISM: Xylaria hypoxylon FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1341)

<4 OOs, SEQUENCE: 43 atg ttg to c citc. gcc gtg tcg gcc gcc citt citc. 999 citc. gcg tot gcc 48 Met Luell Ser Luell Ala Wall Ser Ala Ala Luell Luell Gly Lell Ala Ser Ala 1. 5 1O 15

Cag cag gtt gga aag gag Cala tot gag act CaC cott aag Ctg tot tgg 96 Glin Glin Wall Gly Lys Glu Glin Ser Glu Thir His Pro Lys Luell Ser Trp 2O 25 3O aag aag tgc acc agc ggit ggit to c tgc acc cag a CC aac gct gag gtg 144 Cys Thir Ser Gly Gly Ser Thir Glin Thir Asn Ala Glu Wall 35 4 O 45 a CC at C gac tot aac tgg cga tgg citt CaC tot citc. gaa ggc act gag 192 Thir Ile Asp Ser Asn Trp Arg Trp Luell His Ser Lell Glu Gly Thir Glu SO 55 6 O aac tgc tac gat ggit aac aag tgg acc tog cag agc act ggc gag 24 O Asn Cys Asp Gly Asn Trp Thir Ser Glin Ser Thir Gly Glu 65 70 7s 8O gac gcc acc aag gcc at C gag ggt gcc gac tac agc aag acc 288 Asp Ala Thir Ala Ile Glu Gly Ala Asp Ser Thir

US 9, 187,739 B2 133 134 - Continued

Ala Gly Glin Pro Gly Gly Asp Arg Gly Asp Ala Pro Asp Ser Gly 4 OS 41O 415 gtc cc c to c gac gtc gag gcc agc at C cc c gat gcc aag gtC gt C tgg 1296 Wall Pro Ser Asp Wall Glu Ala Ser Ile Pro Asp Ala Lys Wall Wall Trp 42O 425 43 O t cc aac at C cgc tto ggit c cc at C ggc tot act gtc gag gtt taa 1341 Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir Wall Glu Wall 435 44 O 445

<210s, SEQ ID NO 44 &211s LENGTH: 446 212. TYPE : PRT &213s ORGANISM: Xylaria hypoxylon

<4 OOs, SEQUENCE: 44

Met Luell Ser Luell Wall Ser Ala Ala Luell Luell Gly Lell Ala Ser Ala 1. 15

Glin Glin Wall Gly Glu Glin Ser Glu Thir His Pro Luell Ser Trp 25

Cys Thir Ser Gly Gly Ser Thir Glin Thir Asn Ala Glu Wall 35 4 O 45

Thir Asp Ser Asn Trp Arg Trp Luell His Ser Lell Glu Gly Thir Glu 55 6 O

Asn Asp Gly Asn Trp Thir Ser Glin Ser Thir Gly Glu 65 70

Asp Ala Thir Lys Ala Ile Glu Gly Ala Asp Ser Lys Thir 85 90 95

Gly Ala Ser Thir Ser Gly Asp Ala Luell Thir Lell Phe Luell Thir 105 11 O

His Glu Tyr Gly Thir Asn Ile Gly Ser Arg Phe Tyr Luell Met Asn 115 12 O 125

Gly Ala Asp Tyr Glin Thir Phe Asp Luell Lys Gly Asn Glu Phe Thir 13 O 135 14 O

Phe Asp Wall Asp Lell Ser Thir Wall Asp Gly Lell Asn Ala Ala Luell 145 150 155 160

Phe Wall Ala Met Glu Glu Asp Gly Gly Met Ala Ser Pro Asn 1.65 17O 17s

Asn Ala Gly Ala Gly Thir Gly Tyr Asp Ala Glin 18O 185 19 O

Ala Arg Asp Luell Phe Wall Gly Gly Gly Asn Wall Glu Gly Trp 195

Glu Pro Ser Thir Asn Asp Asp Asn Ala Gly Wall Gly Pro Gly Ala 21 O 215 22O

Cys Ala Glu Ile Asp Wall Trp Glu Ser ASn Ser His Ser Phe Ala 225 23 O 235 24 O

Phe Thir Pro His Pro Thir Thir Asn Glu Tyr His Wall Glu Glin 245 250 255

Asp Glu Gly Gly Thir Ser Glu Asp Arg Phe Ala Gly Lys 26 O 265 27 O

Asp Ala Asn Gly Asp Asn Pro Tyr Arg Met Gly Asn Thir Asp 27s 28O 285

Phe Tyr Gly Glin Gly Thir Wall Asp Thir Ser Lys Phe Thir Wall 29 O 295 3 OO

Wall Thir Glin Phe Ala Glu Asn Luell Thir Glin Phe Phe Wall Glin Asp 3. OS 310 315 32O US 9, 187,739 B2 135 136 - Continued

Gly Ile Glu Ile Pro Gly Pro Lys Ile Asp Gly Phe Pro Thir 3.25 330 335

Asp Ser Ala Ile Thir Pro Glu Cys Thir Ala Glu Phe Asn Wall Luell 34 O 345 35. O

Gly Asp Arg Asp Arg Phe Ser Glu Wall Gly Gly Phe Asp Glin Luell Asn 355 360 365

Asn Ala Luell Asp Wall Pro Met Wall Luell Wall Met Ser Ile Trp Asp Asp 37 O 375 38O

His Ala Asn Met Lell Trp Luell Asp Ser Ser Pro Pro Glu Lys 385 390 395 4 OO

Ala Gly Glin Pro Gly Gly Asp Arg Gly Asp Ala Pro Asp Ser Gly 4 OS 41O 415

Wall Pro Ser Asp Wall Glu Ala Ser Ile Pro Asp Ala Wall Wall Trp 425 43 O

Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir Wall Glu Wall 435 44 O 445

SEO ID NO 45 LENGTH: 1584 TYPE: DNA ORGANISM: Exidia glandulosa FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1584)

<4 OOs, SEQUENCE: 45 atg tac gcc aag tto gct a CC citc. gct gcc citc. gtg gca gct gcc agc 48 Met Tyr Ala Lys Phe Ala Thr Lieu Ala Ala Lieu. Wall Ala Ala Ala Ser 1. 1O 15 gcc cag cag gca tgc a Ca citc. acc gcc gag aac Cat c cc to c atg act 96 Ala Glin Glin Ala Cys Thir Lell Thir Ala Glu ASn His Pro Ser Met Thir 2O 25 3O tgg tot aag gcc gcc gga ggt agc tgc act tcg gtt tot ggt to a 144 Trp Ser Lys Cys Ala Ala Gly Gly Ser Cys Thir Ser Wall Ser Gly Ser 35 4 O 45 gtc acc at C gat gcc aac tgg cga tgg citt CaC Cag citc. aac agc gcc 192 Wall Thir Ile Asp Ala Asn Trp Arg Trp Luell His Glin Lell Asn Ser Ala SO 55 6 O a CC aac tgc tac gac ggc aac aag tgg aac acc a CC tac tgc agc aca 24 O Thir Asn Cys Asp Gly Asn Lys Trp Asn Thir Thir Cys Ser Thir 65 70 7s 8O gat gct act tgc gct gct Cag tgc tgt gtt gat ggc toa gac tat gct 288 Asp Ala Thir Cys Ala Ala Glin Cys Cys Wall Asp Gly Ser Asp Tyr Ala 85 90 95 ggc acc tac ggt gcc a CC act agc ggt aac Ctg aac citc. aag ttic 336 Gly Thir Gly Ala Thir Thir Ser Gly Asn Lell Asn Luell Lys Phe 105 11 O gtc acc Cala 999 t cc tot aag aac at C t cc cgg ttg tac citc. 384 Wall Thir Glin Gly Ser Ser Lys Asn Ile Ser Arg Luell Luell 115 12 O 125 atg gag tog gat a CC tat cag atg titt Ctg citc. ggc cag gag 432 Met Glu Ser Asp Thir Tyr Glin Met Phe Lell Lell Gly Glin Glu 13 O 135 14 O tto act ttic gac gta gat gtc to c aac ttg tgc ggit citc. aac ggt Phe Thir Phe Asp Wall Asp Wall Ser Asn Luell Gly Luell Asn Gly 145 150 160 gcc citc. tac ttic gtc agc atg gac gct gac ggc acg to c aag tat 528 Ala Luell Phe Wall Ser Met Asp Ala Asp Gly Thir Ser Lys Tyr 1.65 17O 17s

US 9, 187,739 B2 139 140 - Continued

gcg Cala tgc ggc ggit atc. ggc tgg act ggt coc acg a CC aag agc 1536 Ala Glin Cys Gly Gly Ile Gly Trp Thir Gly Pro Thir Thir Lys Ser SOO 505 cc.g tac acg tgc aca gcc t cc aac cc.g tac tac tcg Cag tgc ttg taa 1584 Pro Thir Cys Thr Ala Ser Asn Pro Ser Glin Cys Luell 515 52O 525

<210s, SEQ ID NO 46 &211s LENGTH: 527 212. TYPE : PRT <213> ORGANISM: Exidia glandulosa

<4 OOs, SEQUENCE: 46

Met Tyr Ala Lys Phe Ala Thir Luell Ala Ala Luell Wall Ala Ala Ala Ser 1. 5 1O 15

Ala Glin Glin Ala Cys Thir Lell Thir Ala Glu ASn His Pro Ser Met Thir 2O 25 3O

Trp Ser Lys Cys Ala Ala Gly Gly Ser Thir Ser Wall Ser Gly Ser 35 4 O 45

Wall Thir Ile Asp Ala Asn Trp Arg Trp Luell His Glin Lell Asn Ser Ala SO 55 6 O

Thir Asn Tyr Asp Gly Asn Trp Asn Thir Thir Ser Thir 65 70

Asp Ala Thir Cys Ala Ala Glin Wall Asp Gly Ser Asp Tyr Ala 85 90 95

Gly Thir Gly Ala Thir Thir Ser Gly Asn Ala Lell Asn Luell Phe 1OO 105 11 O

Wall Thir Glin Gly Ser Ser Lys Asn Ile Gly Ser Arg Luell Luell 115 12 O 125

Met Glu Ser Asp Thr Tyr Glin Met Phe Glin Lell Lell Gly Glin Glu 13 O 135 14 O

Phe Thir Phe Asp Val Asp Wall Ser Asn Luell Gly Gly Luell Asn Gly 145 150 155 160

Ala Luell Phe Wall Ser Met Asp Ala Asp Gly Thir Ser Lys 1.65 17O 17s

Thir Gly Asn Lys Ala Gly Ala Tyr Gly Thir Cys Asp Ser 18O 185 19 O

Glin Pro Arg Asp Lell Phe Ile Asn Gly Ala Asn Wall Glu 195 2OO 2O5

Gly Trp Thir Pro Ser Thir Asn Asp Ala Asn Ala Ile Gly Thir His 21 O 215

Gly Ser Cys Ser Glu Met Asp Ile Trp Glu Asn Asn Wall Ala 225 23 O 235 24 O

Ala Ala Thir Pro His Pro Thir Thir Ile Glin Ser Ile 245 250 255

Ser Gly Asp Ser Cys Gly Gly Thir Tyr Ser Ser Arg Tyr Ala Gly 26 O 265 27 O

Wall Asp Pro Asp Gly Asp Phe Asn Ser Arg Met Gly Asp 285

Thir Gly Phe Gly Luell Thir Wall Asp Thir Ser Ser Phe 29 O 295 3 OO

Thir Wall Wall Thr Gin Phe Lell Thir Gly Ser Asp Gly Asn Luell Ser Glu 3. OS 310 315

Ile Arg Phe Tyr Wall Glin Asn Gly Lys Wall Ile Pro Asn Ser Glin 3.25 330 335 US 9, 187,739 B2 141 142 - Continued

Ser Ile Ala Gly Wall Ser Gly Asn Ser Ile Thir Thir Asp Phe 34 O 345 35. O

Ser Ala Glin Thir Ala Phe Gly Asp Thir ASn Wall Phe Ala Glin 355 360 365

Gly Gly Luell Ala Gly Met Gly Ala Ala Luell Ala Gly Met Wall Luell 37 O 375

Wall Met Ser Ile Trp Asp Asp His Ala Wall ASn Met Lell Trp Luell Asp 385 390 395 4 OO

Ser Thir Pro Thir Asp Ser Thir Pro Gly Ala Ala Arg Gly Thir 4 OS 41O 415

Pro Thir Thir Ser Gly Wall Pro Ala Asp Wall Glu Ala Glin Wall Pro 425 43 O

Asn Ser Asn Wall Ile Ser Asn Ile Wall Gly Pro Ile Asn Ser 435 44 O 445

Thir Phe Thir Gly Gly Thir Ser Gly Gly Gly Gly Ser Ser Ser Ser Ser 450 45.5 460

Thir Thir Ile Arg Thir Ser Thir Thir Ser Thir Arg Thir Thir Ser Thir Ser 465 470

Thir Ala Pro Gly Gly Gly Ser Thir Gly Ser Ala Gly Ala Asp His Trp 485 490 495

Ala Glin Gly Gly Ile Gly Trp Thir Gly Pro Thir Thir Cys Ser SOO 505 51O

Pro Thir Thir Ala Ser Asn Pro Ser Glin Luell 515 52O 525

SEO ID NO 47 LENGTH: 1368 TYPE: DNA ORGANISM: Exidia glandulosa FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1368)

<4 OOs, SEQUENCE: 47 atg tac gcc aag tto gct a CC citc. gct gcc citc. gtg gca gct gcc agc 48 Met Tyr Ala Lys Phe Ala Thir Luell Ala Ala Luell Wall Ala Ala Ala Ser 1. 1O 15 gcc cag cag gca tgc a Ca citc. acc gcc gag aac Cat c cc to c atg act 96 Ala Glin Glin Ala Cys Thir Lell Thir Ala Glu ASn His Pro Ser Met Thir 2O 25 3O tgg tot aag gcc gcc gga ggt agc tgc act tcg gtt tot ggt to a 144 Trp Ser Lys Cys Ala Ala Gly Gly Ser Cys Thir Ser Wall Ser Gly Ser 35 4 O 45 gtc acc at C gat gcc aac tgg cga tgg citt CaC Cag citc. aac agc gcc 192 Wall Thir Ile Asp Ala Asn Trp Arg Trp Luell His Glin Lell Asn Ser Ala SO 55 6 O a CC aac tac gac ggc aac aag tgg aac acc a CC tac agc aca 24 O Thir Asn Asp Gly Asn Trp Asn Thir Thir Ser Thir 65 70 7s 8O gat gct act gct gct Cag gtt gat ggc toa gac tat gct 288 Asp Ala Thir Ala Ala Glin Wall Asp Gly Ser Asp Ala 85 90 ggc acc tac ggt gcc a CC act agc ggt aac gct Ctg aac citc. ttic 336 Gly Thir Gly Ala Thir Thir Ser Gly Asn Ala Lell Asn Luell Phe 105 11 O gtc acc Cala 999 t cc tat tot aag aac at C ggt t cc cgg ttg citc. 384 Wall Thir Glin Gly Ser Ser Lys Asn Ile Gly Ser Arg Luell Luell 115 12 O 125

US 9, 187,739 B2 145 146 - Continued

435 44 O 445 aac tog acc titc aag gcg toc taa 1368 Asn Ser Thr Phe Lys Ala Ser 450 45.5

<210s, SEQ ID NO 48 &211s LENGTH: 45.5 212. TYPE: PRT <213> ORGANISM: Exidia glandulosa

<4 OOs, SEQUENCE: 48

Met Tyr Ala Lys Phe Ala Thr Lieu. Ala Ala Luell Wall Ala Ala Ala Ser 1. 5 1O 15

Ala Glin Glin Ala Cys Thr Lieu. Thr Ala Glu ASn His Pro Ser Met Thir 2O 25 3O

Trp Ser Lys Cys Ala Ala Gly Gly Ser Thir Ser Wall Ser Gly Ser 35 4 O 45

Val Thir Ile Asp Ala Asn Trp Arg Trp Luell His Glin Lell Asn Ser Ala SO 55 6 O

Thir Asn. Cys Tyr Asp Gly Asn Lys Trp Asn Thir Thir Ser Thir 65 70

Asp Ala Thir Cys Ala Ala Glin Cys Wall Asp Gly Ser Asp Tyr Ala 85 90 95

Gly Thr Tyr Gly Ala Thr Thr Ser Gly Asn Ala Lell Asn Luell Phe 1OO 105 11 O

Val Thr Glin Gly Ser Tyr Ser Lys Asn Ile Gly Ser Arg Luell Luell 115 120 125

Met Glu Ser Asp Thr Lys Tyr Glin Met Phe Glin Lell Lell Gly Glin Glu 13 O 135 14 O

Phe Thr Phe Asp Val Asp Val Ser Asn Luell Gly Gly Luell Asn Gly 145 150 155 160

Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Thir Ser Lys Tyr 1.65 17O 17s

Thr Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thir Cys Asp Ser 18O 185 19 O

Glin Cys Pro Arg Asp Lieu Lys Phe Ile Asn Gly Ala Asn Wall Glu 195 2OO 2O5

Gly Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Ile Gly Thir His 21 O 215

Gly Ser Cys Cys Ser Glu Met Asp Ile Trp Glu Asn Asn Wall Ala 225 23 O 235 24 O

Ala Ala Tyr Thr Pro His Pro Cys Thir Thir Ile Glin Ser Ile Cys 245 250 255

Ser Gly Asp Ser Cys Gly Gly Thr Tyr Ser Ser Arg Tyr Ala Gly 26 O 265 27 O

Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Arg Met Gly Asp 27s 28O 285

Thr Gly Phe Tyr Gly Lys Gly Lieu. Thir Wall Asp Thir Ser Ser Phe 29 O 295 3 OO

Thir Wal Wall Thr Glin Phe Lieu. Thir Gly Ser Asp Gly Asn Luell Ser Glu 3. OS 310 315 32O

Ile Lys Arg Phe Tyr Val Glin Asn Gly Lys Wall Ile Pro Asn Ser Glin 3.25 330 335

Ser Lys Ile Ala Gly Val Ser Gly Asn Ser Ile Thir Thir Asp Phe Cys 34 O 345 35. O US 9, 187,739 B2 147 148 - Continued

Ser Ala Glin Lys Thir Ala Phe Gly Asp Thir ASn Wall Phe Ala Glin Lys 355 360 365

Gly Gly Luell Ala Gly Met Gly Ala Ala Luell Lys Ala Gly Met Wall Luell 37 O 375 38O

Wall Met Ser Ile Trp Asp Asp His Ala ASn Met Lell Trp Luell Asp 385 390 395 4 OO

Ser Thir Pro Thir Asp Ala Ser Pro Asp Glu Pro Gly Gly Arg 4 OS 41O 415

Gly Thir Asp Thir Ser Ser Gly Wall Pro Ala Asp Ile Glu Thir Ser 425 43 O

Glin Ala Ser Asn Ser Wall Ile Tyr Ser Asn Ile Phe Gly Pro Ile 435 44 O 445

Asn Ser Thir Phe Lys Ala Ser 450 45.5

SEQ ID NO 49 LENGTH: 1395 TYPE: DNA ORGANISM: Poitrasia circinians FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1395)

<4 OOs, SEQUENCE: 49 atg Cat cag act t cc gtt citt tot tog citc. tot ttg citc. citc. gca gcc 48 Met His Glin Thir Ser Wall Lell Ser Ser Luell Ser Lell Lell Luell Ala Ala 1. 5 1O 15 t cc ggt gcc cag Cag gtc ggc acc cag aat gct gag act CaC cc.g agt 96 Ser Gly Ala Glin Glin Wall Gly Thir Glin Asn Ala Glu Thir His Pro Ser 2O 25 3O

Ctg acc acc cag aag a CC acc gac ggc ggc tgc a CC gac cag to c 144 Lell Thir Thir Glin Lys Thir Thir Asp Gly Gly Thir Asp Glin Ser 35 4 O 45 act gcc at C gtg citt gac gcc aac tgg cgc tgg Ctg CaC acc acc gag 192 Thir Ala Ile Wall Lell Asp Ala Asn Trp Arg Trp Lell His Thir Thir Glu SO 55 6 O ggc tac acc aac tgc tac act ggc cag gala gac a CC gac at C 24 O Gly Thir Asn Cys Tyr Thir Gly Glin Glu Asp Thir Asp Ile 65 70 t cc to c cc.g gag gct gcc acc ggc tgc gct citt gac ggt gcc gac 288 Ser Ser Pro Glu Ala Cys Ala Thir Gly Cys Ala Lell Asp Gly Ala Asp 85 90 95 tac gag ggc act tac ggc att acg act gac ggc aac gct citt to c atg 336 Glu Gly Thir Tyr Gly Ile Thir Thir Asp Gly Asn Ala Luell Ser Met 1OO 105 11 O aag titt gt C acc Cag ggc tcg cag aag aac gtc ggc ggit cgt. gtt tac 384 Lys Phe Wall Thir Glin Gly Ser Glin Lys Asn Wall Gly Gly Arg Wall Tyr 115 12 O 125

Ctg citt gct cc c gac t cc gaa gat gcg tac gag citc. tto aag ttg aag 432 Lell Luell Ala Pro Asp Ser Glu Asp Ala Glu Lell Phe Luell Lys 13 O 135 14 O aac cag gag ttic act tto gac gtt gac gt C to c gac citc. cc c ggc Asn Glin Glu Phe Thir Phe Asp Wall Asp Wall Ser Asp Lell Pro Gly 145 150 155 160

Ctg aac ggc gcc Ctg tac tto to c gag atg gat gaa gat ggt ggc atg 528 Lell Asn Gly Ala Lell Phe Ser Glu Met Asp Glu Asp Gly Gly Met 1.65 17O 17s t cc aag tac gag aac aac aag gcc ggc gcc aag tac ggc act ggc tac 576 Ser Glu Asn Asn Ala Gly Ala Lys Gly Thir Gly Tyr

US 9, 187,739 B2 151 152 - Continued

Met His Glin Thir Ser Wall Lell Ser Ser Luell Ser Lell Lell Luell Ala Ala 15

Ser Gly Ala Glin Glin Wall Gly Thir Glin Asn Ala Glu Thir His Pro Ser 25

Lell Thir Thir Glin Thir Thir Asp Gly Gly Thir Asp Glin Ser 35 4 O 45

Thir Ala Ile Wall Lell Asp Ala Asn Trp Arg Trp Lell His Thir Thir Glu SO 55 6 O

Gly Tyr Thir Asn Tyr Thir Gly Glin Glu Trp Asp Thir Asp Ile Cys 65 70

Ser Ser Pro Glu Ala Ala Thir Gly Cys Ala Lell Asp Gly Ala Asp 85 90 95

Glu Gly Thir Tyr Gly Ile Thir Thir Asp Gly Asn Ala Luell Ser Met 105 11 O

Phe Wall Thir Glin Gly Ser Glin Asn Wall Gly Gly Arg Wall 115 12 O 125

Lell Luell Ala Pro Asp Ser Glu Asp Ala Glu Lell Phe Luell 13 O 135 14 O

Asn Glin Glu Phe Thir Phe Asp Wall Asp Wall Ser Asp Lell Pro Gly 145 150 155 160

Lell Asn Gly Ala Lell Phe Ser Glu Met Asp Glu Asp Gly Gly Met 1.65 17O 17s

Ser Glu Asn Asn Ala Gly Ala Gly Thir Gly 18O 185 19 O

Thr Gln Cys Pro His Asp Wall Phe Ile Asn Gly Glu Ala 195

Asn Luell Asn Trp Thir Lys Ser Glu Thir Asp Wall Asn Ala Gly Thir i 215 22O

Gly Gly Ser Cys Asn Glu Met Asp Ile Trp Glu Ala Asn 225 23 O 235 24 O

Ser Ala Thir Ala Wall Thir Pro His Wall Asn Ala Asp Wall Ile 245 250 255

Gly Wall Arg Cys Asn Gly Thir Asp Gly Asp Gly Asp Asn Arg 26 O 265 27 O

Gly Wall Cys Asp Asp Gly Asp Asn Pro Arg 285

Met Asn Glu Ser Phe Tyr Gly Ser Asn Gly Ser Thir Ile Asp Thir 295 3 OO

Thir Phe Thir Wall Ile Thir Glin Phe Ile Thir Ser Asp Asn Thir 3. OS 310 315

Ser Thir Gly Asp Lell Wall Glu Ile Arg Arg Wall Glin Asp Gly 3.25 330 335

Thir Wall Ile Glu Asn Ser Phe Ala Asp Asp Thir Lell Ala Thir Phe 34 O 345 35. O

Asn Ser Ile Ser Asp Asp Phe Cys Asp Ala Glin Thir Luell Phe Gly 355 360 365

Asp Glu Asn Asp Phe Thir Gly Gly Ile Ala Arg Met Gly Glu 37 O 375 38O

Ser Phe Glu Arg Gly Met Wall Luell Wall Met Ser Ile Trp Asp Asp His 385 390 395 4 OO

Ala Ala Asn Ala Lell Trp Lell Asp Ser Thir Pro Wall Asp Gly Asp 4 OS 415

Ala Thir Pro Gly Ile Arg Gly Pro Cys Gly Thir Asp Thir Gly

US 9, 187,739 B2 155 156 - Continued t cc aat gcc tac a CC c cc CaC cott tgc cga acc Cag aac gat ggt ggc 768 Ser Asn Ala Thir Pro His Pro Cys Arg Thir Glin Asn Asp Gly Gly 245 250 255 tac cag cgc tgc gag ggc cgc gac tgc aac cag cott cgc tat gag ggt 816 Glin Arg Cys Glu Gly Arg Asp Cys Asn Glin Pro Arg Tyr Glu Gly 26 O 265 27 O citt tgc gat cott gat ggc tgt gac tac aac coc tto cgc atg ggt aac 864 Lell Cys Asp Pro Asp Gly Cys Asp Asn Pro Phe Arg Met Gly Asn 27s 28O 285 aag gac ttic tac gga c cc gga aag acc gt C gac a CC aac agg aag atg 912 Lys Asp Phe Gly Pro Gly Lys Thir Wall Asp Thir Asn Arg Lys Met 29 O 295 3 OO a CC gt C gt C acc Cala tto atc. acc CaC gac aac a CC gac act ggc acc 96.O Thir Wall Wall Thir Glin Phe Ile Thir His Asp ASn Thir Asp Thir Gly Thir 3. OS 310 315 32O citc. gtt gac at C cgc cgc citc. tac gtt Cala gac ggc cgt gtC att gcc OO8 Lell Wall Asp Ile Arg Arg Lell Wall Glin Asp Gly Arg Wall Ile Ala 3.25 330 335 aac cott cc c acc aac tto c cc ggt citc. atg coc gcc CaC gac to c at C Asn Pro Pro Thir Asn Phe Pro Gly Luell Met Pro Ala His Asp Ser Ile 34 O 345 35. O a CC gag cag ttic tgc act gac cag aag aac citc. tto ggc gac tac agc 104 Thir Glu Glin Phe Cys Thir Asp Glin Lys Asn Luell Phe Gly Asp Ser 355 360 365 agc ttic gct cgt gac ggit ggit citc. gct CaC atg ggit cgc to c citc. gcc 152 Ser Phe Ala Arg Asp Gly Gly Luell Ala His Met Gly Arg Ser Luell Ala 37 O 375 aag ggit CaC gtC ctic gct ctic tecc at C tgg c gac CaC ggit gCC CaC 2OO Lys Gly His Wall Lell Ala Lell Ser Ile Trp ASn Asp His Gly Ala His 385 390 395 4 OO atg ttg tgg citc. gac t cc aac tac cc c acc gac gct gac cc c aac aag 248 Met Luell Trp Luell Asp Ser Asn Pro Thir Asp Ala Asp Pro Asn Lys 4 OS 41O 415 c cc ggt att gct cgt ggit a CC tgc cc.g acc act ggit ggc acc cc c cgt 296 Pro Gly Ile Ala Arg Gly Thir Cys Pro Thir Thir Gly Gly Thir Pro Arg 42O 425 43 O gaa acc gala Cala aac CaC cott gat gcc cag gtc atc. tto to c aac att 344 Glu Thir Glu Glin Asn His Pro Asp Ala Glin Wall Ile Phe Ser Asn Ile 435 44 O 445 a.a.a. ttic ggt gac atc. ggc tcg act ttic tot ggt tac taa 383 Phe Gly Asp Ile Gly Ser Thir Phe Ser Gly Tyr 450 45.5 460

<210s, SEQ ID NO 52 &211s LENGTH: 460 212. TYPE : PRT &213s ORGANISM: Coprinus cinereus

<4 OOs, SEQUENCE: 52

Met Phe Llys Lys Wall Ala Lell Thir Ala Luell Phe Lell Ala Wall Ala 1. 5 15

Glin Ala Glin Glin Wall Gly Arg Glu Wall Ala Glu Asn His Pro Arg Luell 2O 25

Pro Trp Glin Arg Cys Thir Arg Asn Gly Gly Glin Thir Wall Ser Asn 35 4 O 45

Gly Glin Wall Wall Lell Asp Ala Asn Trp Arg Trp Lell His Wall Thir Asp SO 55 6 O

Gly Tyr Thir Asn Cys Tyr Thir Gly Asn Ser Asn Ser Thir Wall 65 70 US 9, 187,739 B2 157 158 - Continued

Ser Asp Pro Th Thir Ala Glin Arg Cys Ala Lell Glu Gly Ala Asn 85 90 95

Glin Glin Thr Tyr Gly Ile Thir Thir Asn Gly Asp Ala Luell Thir Ile 1OO 105 11 O

Phe Luell Thr Arg Ser Glin Glin Thir Asn Wall Gly Ala Arg Wall 115 12 O 125

Lell Met Glu Asn. Glu Asn Arg Glin Met Phe Asn Lell Luell Asn 13 O 135 14 O

Glu Phe Thir Phe Asp Wall Asp Wall Ser Wall Pro Gly Ile Asn 145 150 155 160

Gly Ala Luell Tyr Phe Ile Glin Met Asp Ala Asp Gly Gly Met Ser 1.65 17O 17s

Glin Pro Asn Asn Arg Ala Gly Ala Lys Gly Thir Gly Tyr Asp 18O 185 19 O

Ser Glin Cys Pro Arg Asp Ile Lys Phe Ile Asp Gly Wall Ala Asn Ser 195

Ala Asp Trp Thir Pro Ser Glu Thir Asp Pro ASn Ala Gly Arg Gly Arg 21 O 215

Tyr Gly Ile Ala Glu Met Asp Ile Trp Glu Ala Asn Ser Ile 225 23 O 235 24 O

Ser Asn Ala Tyr Thr Pro His Pro Arg Thir Glin Asn Asp Gly Gly 245 250 255

Glin Arg Cys Glu Gly Arg Asp Cys Asn Glin Pro Arg Tyr Glu Gly 26 O 265 27 O

Lieu Asp Pro Asp Gly Asp Asn Pro Phe Arg Met Gly Asn 285

Asp Phe Pro Gly Thir Wall Asp Thir Asn Arg Met 29 O 295 3 OO

Thir Wall Wall Thr Gin Phe Ile Thir His Asp ASn Thir Asp Thir Gly Thir 3. OS 310 315

Lell Wall Asp Ile Arg Arg Lell Wall Glin Asp Gly Arg Wall Ile Ala 3.25 330 335

Asn Pro Pro Thir Asn Phe Pro Gly Luell Met Pro Ala His Asp Ser Ile 34 O 345 35. O

Thir Glu Glin Phe Cys Thir Asp Glin Asn Luell Phe Gly Asp Ser 355 360 365

Ser Phe Ala Arg Asp Gly Gly Luell Ala His Met Gly Arg Ser Luell Ala 37 O 375

Lys Gly His Wall Lieu. Ala Lell Ser Ile Trp ASn Asp His Gly Ala His 385 390 395 4 OO

Met Luell Trp Lieu. Asp Ser Asn Pro Thir Asp Ala Asp Pro Asn 4 OS 415

Pro Gly Ile Ala Arg Gly Thir Pro Thir Thir Gly Gly Thir Pro Arg 42O 425 43 O

Glu Thir Glu Glin Asn His Pro Asp Ala Glin Wall Ile Phe Ser Asn Ile 435 44 O 445

Phe Gly Asp Ile Gly Ser Thir Phe Ser Gly Tyr 450 45.5 460

SEO ID NO 53 LENGTH: 1353 TYPE: DNA ORGANISM: Acremonium sp. FEATURE: NAME/KEY: CDS

US 9, 187,739 B2 161 162 - Continued

aag ttic acc gt C gtc a CC cgc ttic cag gag aac gac citc. tog cag tac 96.O Lys Phe Thir Wall Wall Thir Arg Phe Glin Glu ASn Asp Lell Ser Glin Tyr 3. OS 310 315 32O tto at C cag gac ggc aag at C gag at C cc.g c cc cc.g acc tgg gac OO8 Phe Ile Glin Asp Gly Arg Lys Ile Glu Ile Pro Pro Pro Thir Trp Asp 3.25 330 335 ggc citc. cc.g aag agc agc CaC at C acg cc c gag Ctg tgc gcg acc cag Gly Luell Pro Lys Ser Ser His Ile Thir Pro Glu Lell Cys Ala Thir Glin 34 O 345 35. O tto gac gt C ttic gac gac cgc aac cgc ttic gag gag gtc ggc ggc ttic 104 Phe Asp Wall Phe Asp Asp Arg Asn Arg Phe Glu Glu Wall Gly Gly Phe 355 360 365 c cc gcc citc. aac gcc gct citc. at C cc c atg gtc citt gtC atg to c 152 Pro Ala Luell Asn Ala Ala Lell Arg Ile Pro Met Wall Lell Wall Met Ser 37 O 375 38O atc. tgg gac gac CaC tac gcc aac atg citc. tgg citc. gac to c gt C tac 2OO Ile Trp Asp Asp His Tyr Ala Asn Met Luell Trp Lell Asp Ser Wall Tyr 385 390 395 4 OO cc.g cc c gag aag gag ggc a CC cc c ggc gcc gag cgt ggc cott tgc cc c 248 Pro Pro Glu Lys Glu Gly Thir Pro Gly Ala Glu Arg Gly Pro Cys Pro 4 OS 41O 415

Cag acc tot ggt gtc c cc gcc gala gt C gag gcc Cag tac cc c aac gcc 296 Glin Thir Ser Gly Wall Pro Ala Glu Wall Glu Ala Glin Pro Asn Ala 42O 425 43 O aag gt C gt C tgg t cc aac atc. cgc ttic ggc coc atc. ggc tog acc tac 344 Wall Wall Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir 435 44 O 445 aac atg taa 353 Asn Met 450

<210s, SEQ ID NO 54 &211s LENGTH: 450 212. TYPE : PRT <213> ORGANISM: Acremonium sp.

<4 OOs, SEQUENCE: 54

Met Met Lys Glin Tyr Lell Glin Luell Ala Ala Ala Lell Pro Luell Met 1. 5 15

Gly Luell Ala Ala Gly Glin Glin Ala Gly Arg Glu Thir Pro Glu Asn His 25

Pro Arg Luell Thir Trp Cys Ser Gly Glin Gly Ser Glin Thir 35 4 O 45

Wall Asn Gly Glu Wall Wall Asp Ala Asn Trp Arg Trp Luell His Asp SO 6 O

Ser Asn Met Glin Asn Cys Asp Gly Asn Glin Trp Thir Ser Ala Cys 65 70

Ser Ser Ala Thir Asp Ala Ser Cys Ile Glu Gly Ala Asp 85 90 95

Gly Arg Thir Tyr Gly Ala Ser Thir Ser Gly Asp Ser Luell Thir Luell 105 11 O

Phe Wall Thir Glin His Glu Tyr Gly Thir ASn Ile Gly Ser Arg Phe 115 12 O 125

Luell Met Ser Ser Pro Thir Arg Glin Met Phe Thir Luell Met Asn 13 O 135 14 O

Asn Glu Phe Ala Phe Asp Wall Asp Luell Ser Thir Wall Glu Gly Ile 145 150 155 160 US 9, 187,739 B2 163 164 - Continued

Asn Ser Ala Luell Tyr Phe Wall Ala Met Glu Glu Asp Gly Gly Met Ala 1.65 17O 17s

Ser Pro Thir Asn Ala Gly Ala Lys Gly Thir Gly Tyr 18O 185 19 O

Asp Ala Glin Ala Arg Asp Luell Phe Wall Gly Gly Lys Ala Asn 195

Ile Glu Gly Trp Arg Pro Ser Thir Asn Asp Ala Asn Ala Gly Wall Gly 21 O 215 22O

Pro Met Gly Gly Cys Ala Glu Ile Asp Wall Trp Glu Ser Asn Ala 225 23 O 235 24 O

His Ala Phe Ala Phe Thir Pro His Ala Cys Glu Asn Asn Asn Tyr His 245 250 255

Ile Glu Thir Ser Asn Gly Gly Thir Ser Asp Asp Arg Phe 26 O 265 27 O

Ala Gly Luell Asp Ala Asn Gly Cys Asp Asn Pro Arg Met 285

Gly Asn Pro Asp Phe Gly Lys Gly Thir Lell Asp Thir Ser Arg 29 O 295 3 OO

Lys Phe Thir Wall Wall Thir Arg Phe Glin Glu ASn Asp Lell Ser Glin Tyr 3. OS 310 315

Phe Ile Glin Asp Gly Arg Ile Glu Ile Pro Pro Pro Thir Trp Asp 3.25 330 335

Gly Luell Pro Lys Ser Ser His Ile Thir Pro Glu Lell Ala Thir Glin 34 O 345 35. O

Phe Asp Wall Phe Asp Asp Arg Asn Arg Phe Glu Glu Wall Gly Gly Phe 355 360 365

Pro Ala Luell Asn Ala Ala Lell Arg Ile Pro Met Wall Lell Wall Met Ser 37 O 375

Ile Trp Asp Asp His Tyr Ala Asn Met Luell Trp Lell Asp Ser Wall Tyr 385 390 395 4 OO

Pro Pro Glu Glu Gly Thir Pro Gly Ala Glu Arg Gly Pro Cys Pro 4 OS 415

Glin Thir Ser Gly Wall Pro Ala Glu Wall Glu Ala Glin Tyr Pro Asn Ala 42O 425 43 O

Wall Wall Trp Ser Asn Ile Arg Phe Gly Pro Ile Gly Ser Thir Tyr 435 44 O 445

Asn Met 450

SEO ID NO 55 LENGTH: 1599 TYPE: DNA ORGANISM: Chaetomidium pingtungium FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1599)

<4 OOs, SEQUENCE: 55 atg Ctg gcc to c a CC tto t cc tac cgc atg tac aag a CC gcg citc. at C 48 Met Luell Ala Ser Thir Phe Ser Arg Met Tyr Thir Ala Luell Ile 1. 1O 15

Ctg gcc gcc citt Ctg ggc tot ggc cag gct cag Cag gtc ggt act to c 96 Lell Ala Ala Luell Lell Gly Ser Gly Glin Ala Glin Glin Wall Gly Thir Ser 2O 25

Cag gcg gala gtg Cat cc.g t cc atg acc tgg cag agc tgc acg gct ggc 144 Glin Ala Glu Wall His Pro Ser Met Thir Trp Glin Ser Cys Thir Ala Gly 35 4 O 45

US 9, 187,739 B2 167 168 - Continued

355 360 365

Ctg ttic cag gac Cag aac gtc ttic gala aag CaC ggc ggc citc. gag ggc 152 Lell Phe Glin Asp Glin Asn Wall Phe Glu Lys His Gly Gly Luell Glu Gly 37 O 375 38O atg ggt gct gcc citc. gcc Cag ggc atg gtt citc. gtc atg to c Ctg tgg 2OO Met Gly Ala Ala Lell Ala Glin Gly Met Wall Luell Wall Met Ser Luell Trp 385 390 395 4 OO gat gat CaC tog gcc aac atg citc. tgg citc. gac agc aac tac cc.g acc 248 Asp Asp His Ser Ala Asn Met Luell Trp Luell Asp Ser Asn Pro Thir 4 OS 41O 415 act gcc tot to c a CC act c cc ggc gt C gcc ggit a CC tgc gac at C 296 Thir Ala Ser Ser Thir Thir Pro Gly Wall Ala Arg Gly Thir Cys Asp Ile 42O 425 43 O t cc to c ggc gt C cott gcg gat gt C gag gcg aac CaC c cc gac gcc tac 344 Ser Ser Gly Wall Pro Ala Asp Wall Ala ASn His Pro Asp Ala 435 44 O 445 gtc gt C tac to c aac atc. aag gt C cc c atc. ggc tcg acc ttic aac 392 Wall Wall Ser Asn Ile Lys Wall Pro Ile Gly Ser Thir Phe Asn 450 45.5 460 agc ggt ggc tog aac c cc ggit ggc acc acc acg a Ca act acc acc 44 O Ser Gly Gly Ser Asn Pro Gly Gly Thir Thir Thir Thir Thir Thir Thir 465 470 47s 48O

Cag cott act acc a CC acg a CC acg gga aac cott ggc ggc acc gga 488 Glin Pro Thir Thir Thir Thir Thir Thir Gly ASn Pro Gly Gly Thir Gly 485 490 495 gtc gca cag CaC tat ggc Cag tgt gga atc. gga tgg acc gga cc c 536 Wall Ala Glin His Tyr Gly Glin Cys Gly Ile Gly Trp Thir Gly Pro 500 51O a Ca acc tgt gcc agc cott tat acc cag aag Ctg aat gat tat tac 584 Thir Thir Cys Ala Ser Pro Thir Glin Lys Lell Asn Asp 515 52O 525 tot cag tgc Ctg tag 5.99 Ser Glin Cys Luell 53 O

<210s, SEQ ID NO 56 &211s LENGTH: 532 212. TYPE : PRT <213> ORGANISM: Chaetomidium pingtungium

<4 OOs, SEQUENCE: 56

Met Lieu. Ala Ser Thir Phe Ser Arg Met Tyr Thir Ala Luell Ile 1. 5 15

Lell Ala Ala Luell Lell Gly Ser Gly Glin Ala Glin Glin Wall Gly Thir Ser 25 3O

Glin Ala Glu Wall His Pro Ser Met Thir Trp Glin Ser Cys Thir Ala Gly 35 4 O 45

Gly Ser Thir Thir Asn Asn Gly Wall Wall Ile Asp Ala Asn Trp SO 55 6 O

Arg Trp Wall His Lys Wall Gly Asp Thir ASn Thir Gly Asn 65 70

Thir Trp Asp Thir Thir Ile Pro Asp Asp Ala Thir Ala Ser Asn 85 90 95

Ala Luell Glu Gly Ala Asn Glu Ser Thir Gly Wall Thir Ala 1OO 105 11 O

Ser Gly Asn Ser Lell Arg Lell Asn Phe Wall Thir Thir Ser Glin Glin Lys 115 12 O 125

Asn Ile Gly Ser Arg Lell Met Met Lys Asp Asp Ser Thir Glu US 9, 187,739 B2 169 170 - Continued

13 O 135 14 O

Met Phe Luell Lell Asn Glin Glu Phe Thir Phe Asp Wall Asp Wall Ser 145 150 155 160

Asn Luell Pro Gly Lell Asn Gly Ala Luell Phe Wall Ala Met Asp 1.65 17O 17s

Ala Gly Gly Met Ser Pro Thir ASn Ala Gly Ala 185 19 O

Gly Thir Asp Ser Glin Pro Arg Asp Luell Phe 195 2OO

Ile Asn Gly Ala Asn Wall Glu Gly Trp Glin Pro Ser Ser Asn Asp 21 O 215 22O

Ala Asn Ala Thir Gly Asn His Gly Ser Cys Ala Glu Met Asp 225 23 O 235 24 O

Ile Trp Glu Asn Ser Ile Ser Thir Ala Phe Thir Pro His Pro 245 250 255

Asp Thir Pro Glin Wall Met Thir Gly Asp Ala Gly Gly Thir 265 27 O

Ser Ser Asp Arg Gly Gly Thir Asp Pro Asp Gly Asp 27s 285

Phe Asn Ser Phe Glin Gly Asn Thir Phe Tyr Gly Pro Gly Met 29 O 295 3 OO

Thir Wall Asp Thir Lys Ser Phe Thir Wall Wall Thir Glin Phe Ile Thir 3. OS 310 315

Asp Asp Gly Thir Ser Ser Gly Thir Luell Lys Glu Ile Arg Phe 325 330 335

Wall Glin Asn Gly Lys Wall Ile Pro Asn Ser Glu Ser Thir Trp Thir Gly 34 O 345 35. O

Wall Ser Gly Asn Ser Ile Thir Thir Glu Thir Ala Glin Ser 355 360 365

Lell Phe Glin Asp Glin Asn Wall Phe Glu His Gly Gly Luell Glu Gly 37 O 375

Met Gly Ala Ala Lell Ala Glin Gly Met Wall Luell Wall Met Ser Luell Trp 385 390 395 4 OO

Asp Asp His Ser Ala Asn Met Luell Trp Luell Asp Ser Asn Pro Thir 4 OS 415

Thir Ala Ser Ser Thir Thir Pro Gly Wall Ala Arg Gly Thir Cys Asp Ile 425 43 O

Ser Ser Gly Wall Pro Ala Asp Wall Glu Ala ASn His Pro Asp Ala 435 44 O 445

Wall Wall Ser Asn Ile Lys Wall Gly Pro Ile Gly Ser Thir Phe Asn 450 45.5 460

Ser Gly Gly Ser Asn Pro Gly Gly Gly Thir Thir Thir Thir Thir Thir Thir 465 470 48O

Glin Pro Thir Thir Thir Thir Thir Thir Ala Gly ASn Pro Gly Gly Thir Gly 485 490 495

Wall Ala Glin His Tyr Gly Glin Gly Gly Ile Gly Trp Thir Gly Pro SOO 505 51O

Thir Thir Cys Ala Ser Pro Thir Glin Lys Lell Asn Asp Tyr 515 52O 525

Ser Glin Luell 53 O

<210s, SEQ ID NO f

US 9, 187,739 B2 173 174 - Continued

28O 285 aag gac ttic tac gga c cc gga aag acc at C gac a CC aac agg aag atg 912 Asp Phe Gly Pro Gly Lys Thir Ile Asp Thir Asn Arg Lys Met 29 O 295 3 OO a CC gt C gt C acc Cala tto atc. acc CaC gac aac a CC gac act ggc acc 96.O Thir Wall Wall Thir Glin Phe Ile Thir His Asp ASn Thir Asp Thir Gly Thir 3. OS 310 315 32O citc. gtt gac at C cgc citc. tac gtt Cala gac ggc cgt gtC att gcc OO8 Lell Wall Asp Ile Arg Arg Lell Wall Glin Asp Gly Arg Wall Ile Ala 3.25 330 335 aac cott cc c acc aac tto c cc ggt citc. atg coc gcc CaC gac to c at C Asn Pro Pro Thir Asn Phe Pro Gly Luell Met Pro Ala His Asp Ser Ile 34 O 345 35. O a CC gag cag ttic tgc act gac cag aag aac citc. tto ggc gac tac agc 104 Thir Glu Glin Phe Cys Thir Asp Glin Lys Asn Luell Phe Gly Asp Ser 355 360 365 agc ttic gct cgt gac ggit ggit citc. gct CaC atg ggit cgc to c citc. gcc 152 Ser Phe Ala Arg Asp Gly Gly Luell Ala His Met Gly Arg Ser Luell Ala 37 O 375 38O aag ggt CaC gt C citc. gct citc. to c at C tgg aac gac CaC ggt gcc CaC 2OO Lys Gly His Wall Lell Ala Lell Ser Ile Trp ASn Asp His Gly Ala His 385 390 395 4 OO atg ttg tgg citc. gac t cc aac tac cc c acc gac gct gac cc c aac aag 248 Met Luell Trp Luell Asp Ser Asn Pro Thir Asp Ala Asp Pro Asn Lys 4 OS 41O 415 c cc ggt att gct cgt ggit a CC tgc cc.g acc act ggit ggc acc cc c cgt 296 Pro Gly Ile Ala Arg Gly Thir Cys Pro Thir Thir Gly Gly Thir Pro Arg 42O 425 43 O. gaa acc gala Cala aac CaC cott gat gcc cag gtc atc. tto to c aac att 344 Glu Thir Glu Glin Asn His Pro Asp Ala Glin Wall Ile Phe Ser Asn Ile 435 44 O 445 a.a.a. ttic ggt gac atc. ggc tcg act ttic tot ggt tac taa 383 Phe Gly Asp Ile Gly Ser Thir Phe Ser Gly Tyr 450 45.5 460

<210s, SEQ ID NO 58 &211s LENGTH: 460 212. TYPE : PRT <213> ORGANISM: Sporotrichum pruinosum

<4 OOs, SEQUENCE: 58

Met Phe Llys Lys Wall Ala Lell Thir Ala Luell Phe Lell Ala Wall Ala 1. 5 15

Glin Ala Glin Glin Wall Gly Arg Glu Wall Ala Glu Asn His Pro Arg Luell 2O 25 3O

Pro Trp Glin Arg Cys Thir Arg Asn Gly Gly Glin Thir Wall Ser Asn 35 4 O 45

Gly Glin Wall Wall Lell Asp Ala Asn Trp Arg Trp Lell His Wall Thir Asp SO 55 6 O

Gly Thir Asn Cys Tyr Thir Gly Asn Ser Trp Asn Ser Thir Wall Cys 65 70

Ser Asp Pro Thir Thir Ala Glin Arg Cys Ala Lell Glu Gly Ala Asn 85 90 95

Glin Glin Thir Tyr Gly Ile Thir Thir Asn Gly Asp Ala Luell Thir Ile 105 11 O

Phe Luell Thir Ser Glin Glin Thir Asn Wall Gly Ala Arg Wall 115 12 O 125

Lell Met Glu Asn Glu Asn Arg Glin Met Phe Asn Lell Luell Asn US 9, 187,739 B2 175 176 - Continued

13 O 135 14 O

Glu Phe Thir Phe Asp Wall Asp Wall Ser Wall Pro Gly Ile Asn 145 150 155 160

Gly Ala Luell Tyr Phe Ile Glin Met Asp Ala Asp Gly Gly Met Ser 1.65 17s

Glin Pro Asn Asn Arg Ala Gly Ala Lys Gly Thir Gly Tyr Asp 18O 185 19 O

Ser Glin Cys Pro Arg Asp Ile Lys Phe Ile Asp Gly Wall Ala Asn Ser 195

Ala Asp Trp Thir Pro Ser Glu Thir Asp Pro ASn Ala Gly Arg Gly Arg 21 O 215

Tyr Gly Ile Ala Glu Met Asp Ile Trp Glu Ala Asn Ser Ile 225 23 O 235 24 O

Ser Asn Ala Tyr Thr Pro His Pro Arg Thir Glin Asn Asp Gly Gly 245 250 255

Glin Arg Cys Glu Gly Arg Asp Cys Asn Glin Pro Arg Tyr Glu Gly 26 O 265 27 O

Lell Asp Pro Asp Gly Asp Asn Pro Phe Arg Met Gly Asn 285

Asp Phe Pro Gly Thir Ile Asp Thir Asn Arg Met 29 O 295 3 OO

Thir Wall Wall Thr Gin Phe Ile Thir His Asp ASn Thir Asp Thir Gly Thir 3. OS 310 315

Lell Wall Asp Ile Arg Arg Lell Wall Glin Asp Gly Arg Wall Ile Ala 325 330 335

Asn Pro Pro Thir Asn Phe Pro Gly Luell Met Pro Ala His Asp Ser Ile 34 O 345 35. O

Thir Glu Glin Phe Cys Thir Asp Glin Asn Luell Phe Gly Asp Ser 355 360 365

Ser Phe Ala Arg Asp Gly Gly Luell Ala His Met Gly Arg Ser Luell Ala 37 O 375

Lys Gly His Wall Lieu. Ala Lell Ser Ile Trp ASn Asp His Gly Ala His 385 390 395 4 OO

Met Luell Trp Lieu. Asp Ser Asn Pro Thir Asp Ala Asp Pro Asn 4 OS 415

Pro Gly Ile Ala Arg Gly Thir Pro Thir Thir Gly Gly Thir Pro Arg 42O 425 43 O

Glu Thir Glu Glin Asn His Pro Asp Ala Glin Wall Ile Phe Ser Asn Ile 435 44 O 445

Phe Gly Asp Ile Gly Ser Thir Phe Ser Gly Tyr 450 45.5 460

SEO ID NO 59 LENGTH: 1578 TYPE: DNA ORGANISM: Scytalidium thermophilum FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1578)

< 4 OOs SEQUENCE: 59 atg cgt acc gcc aag tto gcc acc ctic goc goc citt gtg gcc tog gcc 48 Met Arg Thr Ala Lys Phe Ala Thir Lieu Ala Ala Lell Wall Ala Ser Ala 1. 5 15 gcc gcc cag cag gcg tgc citc. acc acc gag agg CaC cott to c citc. 96 Ala Ala Glin Glin Ala Ser Luell Thir Thr Gul Arg His Pro Ser Luell

US 9, 187,739 B2 179 180 - Continued

Glu Ser Thir Ile Pro Gly Wall Glu Gly Asn Ser Ile Thir Glin Asp Trp 34 O 345 35. O tgc gac cgc cag aag gtt gcc titt ggc gac att gac gac ttic aac cgc 104 Cys Asp Arg Glin Lys Wall Ala Phe Gly Asp Ile Asp Asp Phe Asn Arg 355 360 365 aag ggc ggc atg aag Cag atg ggc aag gcc citc. gcc ggc cc c atg gt C 152 Lys Gly Gly Met Lys Glin Met Gly Lys Ala Luell Ala Gly Pro Met Wall 37 O 375 38O

Ctg gt C atg to c atc. tgg gat gac CaC gcc to c aac atg citc. tgg citc. 2OO Lell Wall Met Ser Ile Trp Asp Asp His Ala Ser Asn Met Luell Trp Luell 385 390 395 4 OO gac tog acc ttic cott gtc gat gcc gct ggc aag c cc ggc gcc gag cgc 248 Asp Ser Thir Phe Pro Wall Asp Ala Ala Gly Lys Pro Gly Glu Arg 4 OS 41O 415 ggit gcc tgc cc.g a CC a CC tcg ggt gt C cott gct gag gtt gcc gag 296 Gly Ala Cys Pro Thir Thir Ser Gly Wall Pro Ala Glu Wall Ala Glu 42O 425 gcc cc c aac agc aac gtc gtc ttic to c aac atc. cgc tto cc c at C 344 Ala Pro Asn Ser Asn Wall Wall Phe Ser Asn Ile Arg Phe Pro Ile 435 44 O 445 ggc tog acc gtt gct ggit citc. cc c ggc gcg ggc aac ggc aac aac 392 Gly Ser Thir Wall Ala Gly Lell Pro Gly Ala Gly Asn Gly Asn Asn 450 45.5 460 ggc ggc aac cc c cc.g c cc c cc acc acc acc acc t cc tcg cc.g gcc 44 O Gly Gly Asn Pro Pro Pro Pro Thir Thir Thir Thir Ser Ser Pro Ala 465 470 47s 48O a CC acc acc acc gcc agc gct ggc cc c aag gct ggc CaC tgg cag cag 488 Thr Thr Thr Thr Ala Ser Ala Gly Pro Lys Ala Gly His Trp Gln Gln 485 490 495 tgc ggc ggc at C ggc tto act ggc cc.g acc cag tgc gag gag cc c tac 536 Cys Gly Gly Ile Gly Phe Thir Gly Pro Thir Glin Cys Glu Glu Pro SOO 505 51O act tgc acc aag citc. aac gac tgg tac tot cag tgc Ctg taa Thir Cys Thir Lys Lell Asn Asp Trp Ser Glin Cys Lell 515 52O 525

<210s, SEQ ID NO 60 &211s LENGTH: 525 212. TYPE : PRT &213s ORGANISM: Scytalidium thermophilum

<4 OOs, SEQUENCE: 6 O

Met Arg Thr Ala Lys Phe Ala Thir Luell Ala Ala Lell Wall Ala Ser Ala 1. 5 15

Ala Ala Glin Glin Ala Ser Luell Thir Thir Glu Arg His Pro Ser Luell 2O 25 3O

Ser Trp Lys Cys Thir Ala Gly Gly Glin Cys Glin Thir Wall Glin Ala 35 4 O 45

Ser Ile Thir Luell Asp Ser Asn Trp Arg Trp Thir His Glin Wall Ser Gly SO 55 6 O

Ser Thir Asn Tyr Thir Gly Asn Trp Asp Thir Ser Ile Thir 65 70 7s

Asp Ala Ser Cys Ala Glin Asn Cys Wall Asp Gly Ala Asp 85 90 95

Thir Ser Thir Tyr Gly Ile Thir Thir Asn Gly Asp Ser Lell Ser Luell Lys 1OO 105 11 O

Phe Wall Thir Gly Glin His Ser Thir Asn Wall Gly Ser Arg Thir Tyr 115 12 O 125 US 9, 187,739 B2 181 182 - Continued

Lell Met Asp Gly Glu Asp Lys Glin Thir Phe Glu Lieu. Lieu. Gly Asn 13 O 135 14 O

Glu Phe Thir Phe Asp Wall Asp Wall Ser Asn Ile Gly Gly Luell Asn 145 150 155 160

Gly Ala Luell Tyr Phe Wall Ser Met Asp Ala Asp Gly Luell Ser Arg 1.65 17O 17s

Pro Gly Asn Ala Gly Ala Lys Gly Thir Tyr Asp 18O 185 19 O

Ala Glin Cys Pro Arg Asp Ile Lys Phe Ile ASn Gly Ala Asn Ile 195

Glu Gly Trp Thir Gly Ser Thir Asn Asp Pro ASn Ala Ala Gly Arg 21 O 215

Tyr Gly Thir Cys Ser Glu Met Asp Ile Trp Glu Asn Asn Met 225 23 O 235 24 O

Ala Thir Ala Phe Thir Pro His Pro Thir Ile Ile Glin Ser Arg 245 250 255

Glu Gly Asp Ser Gly Gly Thir Ser Asn Arg Ala 26 O 265 27 O

Gly Wall Cys Asp Pro Asp Gly Cys Asp Phe ASn Ser Tyr Arg Glin Gly 27s 285

Asn Lys Thir Phe Tyr Gly Lys Gly Met Thir Wall Asp Thir Thir 29 O 295 3 OO

Ile Thir Wall Wall Thir Glin Phe Luell Asp Ala Asn Gly Asp Luell Gly 3. OS 310 315

Glu Wall Arg Phe Wall Gln Asp Gly Lys Ile Ile Pro Asn Ser 3.25 330 335

Glu Ser Thir Ile Pro Gly Wall Glu Gly Asn Ser Ile Thir Glin Asp Trp 34 O 345 35. O

Asp Arg Glin Wall Ala Phe Gly Asp Ile Asp Asp Phe Asn Arg 355 360 365

Gly Gly Met Lys Glin Met Gly Ala Luell Ala Gly Pro Met Wall 37 O 375

Lell Wall Met Ser Ile Trp Asp Asp His Ala Ser Asn Met Luell Trp Luell 385 390 395 4 OO

Asp Ser Thir Phe Pro Wall Asp Ala Ala Gly Lys Pro Gly Ala Glu Arg 4 OS 415

Gly Ala Pro Thir Thir Ser Gly Wall Pro Ala Glu Wall Glu Ala Glu 425 43 O

Ala Pro Asn Ser Asn Wall Wall Phe Ser Asn Ile Arg Phe Gly Pro Ile 435 44 O 445

Gly Ser Thir Wall Ala Gly Lell Pro Gly Ala Gly Asn Gly Gly Asn Asn 450 45.5 460

Gly Gly Asn Pro Pro Pro Pro Thir Thir Thir Thir Ser Ser Ala Pro Ala 465 470 47s 48O

Thir Thir Thir Thir Ala Ser Ala Gly Pro Lys Ala Gly His Trp Glin Glin 485 490 495

Gly Gly Ile Gly Phe Thir Gly Pro Thir Glin Glu Glu Pro Tyr SOO 505 51O

Thir Thir Lell Asn Asp Trp Ser Glin Lell 515 52O 525

<210s, SEQ ID NO 61 &211s LENGTH: 519 &212s. TYPE: DNA

US 9, 187,739 B2 187 188 - Continued atg gcg gat act Cag tat aag atg ttic cag citc. aag aac aag gag 432 Met Ala Ser Asp Thir Glin Tyr Met Phe Glin Lell Lys Asn Lys Glu 13 O 135 14 O titt acg titt gat gtt gat gtc tot aat citt cott gga tta aac gga Phe Thir Phe Asp Wall Asp Wall Ser Asn Luell Pro Gly Luell Asn Gly 145 150 155 160 gcg ttg tat titt gtg gag atg gat gcg gat gga atg tog a.a.a. tac 528 Ala Luell Phe Wall Glu Met Asp Ala Asp Gly Met Ser Lys 1.65 17O 17s cc.g tot aat a.a.a. gcc 999 gca a.a.a. tat gga acc tat gat gcg 576 Pro Ser Asn Lys Ala Gly Ala Tyr Gly Thir Cys Asp Ala 18O 185 19 O

Cag CC a Cat gat atc. a.a.a. titt at C aac 999 gca aat citc. Cta 624 Glin Pro His Asp Ile Phe Ile Asn Gly Ala Asn Luell Luell 195 2OO 2O5 gac tgg acg cott toa a CC agc gac a.a.a. aat gcc t cc gga tac 672 Asp Trp Thir Pro Ser Thir Ser Asp Asn Ala Ser Gly Arg 21 O 215

999 acc Cala gaa atg gac at C tgg gaa aac agc atg gca 72 O Gly Thir Glin Glu Met Asp Ile Trp Glu Asn Ser Met Ala 225 23 O 235 24 O a CC gcc aca cc.g Cat c cc gt C toa cott acc cga 768 Thir Ala Thir Pro His Pro Ser Wall Ser Pro Thir Arg 245 250 255 toa gga acc Cala tgt 999 gat ggt tot aac Cat aac gga att 816 Ser Gly Thir Glin Cys Gly Asp Gly Ser Asn His Asn Gly Ile 26 O 265 27 O gat a.a.a. gat ggc tgc gat tto aat to c tac atg ggc aat acg aca 864 Asp Asp Gly Cys Asp Phe Asn Ser Met Gly Asn Thir Thir 27s 28O 285 tto ttic ggc aag gga gca acg gtt aac acc aac t cc a.a.a. titt act gtt 912 Phe Phe Gly Gly Ala Thir Wall Asn Thir ASn Ser Phe Thir Wall 29 O 295 3 OO gta acg Cala ttic atc. a CC t cc gac aac acc toa act gga gcg Cta aag 96.O Wall Thir Glin Phe Ile Thir Ser Asp Asn Thir Ser Thir Gly Ala Luell Lys 3. OS 310 315 32O gag att citt tat att cag aat gga a.a.a. gtc atc. cag aac tog OO8 Glu Ile Arg Arg Lell Ile Glin Asn Gly Wall Ile Glin Asn Ser 3.25 330 335 a.a.a. aat at C ggc atg to a gct tac gac tot ata acc gag gat Asn Ile Ser Gly Met Ser Ala Asp Ser Ile Thir Glu Asp 34 O 345 35. O tto gcc gct Cala a.a.a. a CC gca titt gga gac a Ca aat gac titt aag 104 Phe Ala Ala Glin Thir Ala Phe Gly Asp Thir Asn Asp Phe 355 360 365 gca aag ggc gga titt a Ca aac citt 999 aat gC9 ttg Cala aag gga atg 152 Ala Lys Gly Gly Phe Thir Asn Luell Gly Asn Ala Lell Glin Gly Met 37 O 375 38O gtt ttg gcg ttg att tgg gat gat Cat gct gcg Cag atg citt tgg 2OO Wall Luell Ala Luell Ser Ile Trp Asp Asp His Ala Ala Glin Met Luell Trp 385 390 395 4 OO ttg gat tot tac cc.g citc. gat a.a.a. gac cott tot Cala CC a ggt gtt 248 Lell Asp Ser Ser Tyr Pro Lell Asp Asp Pro Ser Glin Pro Gly Wall 4 OS 41O 415 aag agg ggc gcg tgt gct a CC tot tot ggt a.a.a. cc.g tcg gat gt C gag 296 Arg Gly Ala Cys Ala Thir Ser Ser Gly Pro Ser Asp Wall Glu 42O 425 43 O US 9, 187,739 B2 189 190 - Continued aac cag tot ccg aat gcg tcg gtg act ttt tog aac att aag titt ggg 1344 Asn Glin Ser Pro Asn Ala Ser Wall Thr Phe Ser Asn Ile Llys Phe Gly 435 44 O 445 gat att gga tog act tat t cc tot tag 1371 Asp Ile Gly Ser Thir Ser Ser 450 45.5

<210s, SEQ ID NO 66 &211s LENGTH: 456 212. TYPE : PRT &213s ORGANISM: Pseudoplectania nigrella

<4 OOs, SEQUENCE: 66

Met Luell Ser Asn. Luell Lell Lell Ser Luell Ser Phe Lell Ser Luell Ala Ser 1. 5 1O 15

Gly Glin Asn Ile Gly Thir Asn Thir Ala Glu Ser His Pro Glin Luell Arg 2O 25 3O

Ser Glin Thir Cys Thr Gly Asn Gly Ser Thir Glin Ser Thir Ser 35 4 O 45

Wall Wall Luell Asp Ser Asn Trp Arg Trp Luell His Asn Asn Gly Gly Ser SO 55 6 O

Thir Asn Tyr Thr Gly Asn Ser Trp Asp Ser Thir Lell Pro Asp 65 70

Pro Wall Thir Cys Ala Asn Ala Luell Asp Gly Ala Asp Tyr Ser 85 90 95

Gly Thir Gly Ile Thir Ser Thir Gly Asp Ala Lell Thir Luell Phe 1OO 105 11 O

Wall Thir Glin Gly Pro Ser Thir Asn Ile Gly Ser Arg Wall Luell 115 12 O 125

Met Ala Ser Asp Thr Glin Tyr Met Phe Glin Lell Asn Glu 13 O 135 14 O

Phe Thir Phe Asp Val Asp Wall Ser Asn Luell Pro Gly Luell Asn Gly 145 150 155 160

Ala Luell Tyr Phe Wall Glu Met Asp Ala Asp Gly Met Ser Lys 1.65 17O 17s

Pro Ser Asn Lys Ala Gly Ala Tyr Gly Thir Cys Asp Ala 18O 185 19 O

Glin Pro His Asp Ile Phe Ile Asn Gly Ala Asn Luell Luell 195 2OO

Asp Trp Thir Pro Ser Thir Ser Asp Asn Ala Ser Gly Arg 21 O 215

Gly Thir Cys Glin Glu Met Asp Ile Trp Glu Asn Ser Met Ala 225 23 O 235 24 O

Thir Ala Thir Pro His Pro Ser Wall Ser Pro Thir Arg 245 250 255

Ser Gly Thir Gln Cys Gly Asp Gly Ser Asn Arg His Asn Gly Ile 26 O 265 27 O

Asp Asp Gly Cys Asp Phe Asn Ser Arg Met Gly Asn Thir Thir 27s 285

Phe Phe Gly Ala Thir Wall Asn Thir ASn Ser Phe Thir Wall 29 O 295 3 OO

Wall Thir Glin Phe Ile Thir Ser Asp Asn Thir Ser Thir Gly Ala Luell Lys 3. OS 310 315

Glu Ile Arg Arg Lieu. Tyr Ile Glin Asn Gly Lys Wall Ile Glin Asn Ser 3.25 330 335 US 9, 187,739 B2 191 192 - Continued Llys Ser Asn. Ile Ser Gly Met Ser Ala Tyr Asp Ser Ile Thr Glu Asp 34 O 345 35. O Phe Cys Ala Ala Glin Llys Thr Ala Phe Gly Asp Thr Asn Asp Phe Lys 355 360 365 Ala Lys Gly Gly Phe Thr Asn Lieu. Gly Asn Ala Lieu Gln Lys Gly Met 37 O 375 38O Val Lieu Ala Lieu. Ser Ile Trp Asp Asp His Ala Ala Gln Met Lieu. Trip 385 390 395 4 OO Lieu. Asp Ser Ser Tyr Pro Lieu. Asp Lys Asp Pro Ser Glin Pro Gly Val 4 OS 41O 415 Lys Arg Gly Ala Cys Ala Thir Ser Ser Gly Llys Pro Ser Asp Val Glu 42O 425 43 O Asn Glin Ser Pro Asn Ala Ser Val Thr Phe Ser Asn Ile Llys Phe Gly 435 44 O 445 Asp Ile Gly Ser Thr Tyr Ser Ser 450 45.5

<210s, SEQ ID NO 67 &211s LENGTH: 951 &212s. TYPE: DNA <213> ORGANISM: Phytophthora infestans 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (1) ... (951) <223> OTHER INFORMATION: Partial CBH1 encoding sequence

<4 OO > SEQUENCE: 67 tgcgatgctg atggttgttga Ctt Caact ct tac CSCC agg gtaac acct C titt Ct atggit 60 gcaggit citta ccgtgaacac caacaaagtt ttcaccgttg taacccalatt cat caccaac 12 O gatggalacag Ctt Cagg tac Cttgaaagaa atc.cgacgat tictatgttca gaatggcgt.c 18O gtgatt coaa act cqcaatc. cacaatcgct ggagttccag gaaattic cat caccgactict 24 O ttctgtgc.cg cacaaaagac tottttggit gacaccalacg aatt.cgctac taagggaggt 3OO

Cttgcc acaa taggaaagc tittggcaaag gg tatgg tac ttgtcatgtc. catttgggat 360 gaccataccg cca acatgtt gtggct cqat gcc ccttacc cagdalaccala atc.cccaagc 42O gccc.caggtg tcactic gagg at catgcagt gct actt cag gta accc.cgt tatgttgaa 48O gcca attct c caggttctitc cqt cacct tc. tcaaacatca agtgggg.tcc catcaactict 54 O acct acactg gatctggagc cgc.cccaagt gttcCaggca ctaca accgt tagct cqgca 6OO cc.cgcatcga Ctgcaact tc aggagctggt ggtgtc.gcta agtatgcc.ca atgtggaggt 660 actggataca gtggagctac cqcttgcgtt t caggcagda cctgttgttgc cct Caac cct 72 O tact actic cc aatgcc-aata gattgttt co ct caggagca attaggtttic caacctaagg 78O ggagagat ct tca caagttct gtacataggg toagctaaat gttgat catt catatt ctitt 84 O catgitattta gttgttgaca atttgaagtt gcaagttcaag acgggaaaac agaag cagga 9 OO aatatatggg acataacaaa gttcaatcgtt tacataagaa cct tctittaa a 951

The invention claimed is: 2. The nucleic acid construct of claim 1, wherein the 1. A nucleic acid construct comprising a polynucleotide 60 polypeptide having cellobiohydrolase I activity has at least encoding a polypeptide having cellobiohydrolase I activity, 85% sequence identity with amino acids 1 to 526 of SEQID wherein the polynucleotide is operably linked to one or more NO: 2. heterologous control sequences that direct the production of 3. The nucleic acid construct of claim 1, wherein the the polypeptide in an expression host, and wherein the is polypeptide having cellobiohydrolase I activity has at least polypeptide has at least 80% sequence identity with amino 90% sequence identity with amino acids 1 to 526 of SEQID acids 1 to 526 of SEQID NO: 2. NO: 2. US 9, 187,739 B2 193 194 4. The nucleic acid construct of claim 1, wherein the 14. The method of claim 12, wherein the polypeptide hav polypeptide having cellobiohydrolase I activity has at least ing cellobiohydrolase I activity comprises the sequence of 95% sequence identity with amino acids 1 to 526 of SEQ ID amino acids 1 to 526 of SEQID NO: 2. 15. A method for producing ethanol from biomass, said NO: 2. method comprising: 5. The nucleic acid construct of claim 1, wherein the 5 (a) contacting the biomass with the polypeptide having polypeptide having cellobiohydrolase I activity has at least cellobiohydrolase I activity, wherein the polypeptide has 97% sequence identity with amino acids 1 to 526 of SEQID at least 80% sequence identity with amino acids 1 to 526 NO: 2. of SEQID NO: 2; (b) converting the degraded biomass 6. The nucleic acid construct of claim 1, wherein the to ethanol; and (c) recovering the ethanol. polypeptide having cellobiohydrolase I activity has at least 10 16. The method of claim 15, wherein the polypeptide hav 98% sequence identity with amino acids 1 to 526 of SEQID ing cellobiohydrolase I activity has at least 85% sequence NO: 2. identity with amino acids 1 to 526 of SEQID NO: 2. 17. The method of claim 15, wherein the polypeptide hav 7. The nucleic acid construct of claim 1, wherein the ing cellobiohydrolase I activity has at least 90% sequence polypeptide having cellobiohydrolase I activity has at least 15 99% sequence identity with amino acids 1 to 526 of SEQID identity with amino acids 1 to 526 of SEQID NO: 2. NO: 2. 18. The method of claim 15, wherein the polypeptide hav 8. The nucleic acid construct of claim 1, wherein the ing cellobiohydrolase I activity has at least 95% sequence polypeptide having cellobiohydrolase I activity comprises identity with amino acids 1 to 526 of SEQID NO: 2. the sequence of amino acids 1 to 526 of SEQID NO: 2. 19. The method of claim 15, wherein the polypeptide hav 9. The nucleic acid construct of claim 1, wherein the ing cellobiohydrolase I activity has at least 97% sequence polypeptide having cellobiohydrolase I activity is a variant identity with amino acids 1 to 526 of SEQID NO: 2. comprising a substitution, deletion, and/or insertion of one or 20. The method of claim 15, wherein the polypeptide hav more amino acids of amino acids 1 to 526 of SEQID NO: 2. ing cellobiohydrolase I activity has at least 98% sequence 10. A recombinant expression vector comprising the identity with amino acids 1 to 526 of SEQID NO: 2. 25 21. The method of claim 15, wherein the polypeptide hav nucleic acid construct of claim 1. ing cellobiohydrolase I activity has at least 99% sequence 11. A recombinant host cell comprising the nucleic acid identity with amino acids 1 to 526 of SEQID NO: 2. construct of claim 1. 22. The method of claim 15, wherein the polypeptide hav 12. A method of producing a polypeptide having cellobio ing cellobiohydrolase I activity comprises the sequence of hydrolase I activity, said method comprising: (a) cultivating 30 the recombinant host cell of claim 11 under conditions con amino acids 1 to 526 of SEQID NO: 2. ducive for production of the polypeptide; and (b) recovering 23. The method of claim 15, wherein the polypeptide hav the polypeptide. ing cellobiohydrolase I activity is a variant comprising a 13. The method of claim 12, wherein the polypeptidehav Substitution, deletion, and/or insertion of one or more amino ing cellobiohydrolase I activity has at least 95% sequence acids of amino acids 1 to 526 of SEQID NO: 2. identity with amino acids 1 to 526 of SEQID NO: 2. ck ck ck ck *k