US 2015021 1 037A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2015/0211037 A1 Graham et al. (43) Pub. Date: Jul. 30, 2015

(54) POLYPEPTIDES FOR USE IN THE Related U.S. Application Data DECONSTRUCTION OF CELLULOSE (60) Provisional application No. 61/369,588, filed on Jul. 30, 2010. (75) Inventors: Joel Edward Graham, Baltimore, MD Publication Classification (US); Melinda E. Clark, San Francisco, CA (US); Frank Thomson Robb, (51) Int. Cl. Gaithersburg, MD (US); Douglas S. CI2P 19/14 (2006.01) Clark, Orinda, CA (US); Harvey W. CI3K L/02 (2006.01) Blanch, San Francisco, CA (US) CI2N 9/42 (2006.01) CI2P 19/02 (2006.01) (73) Assignees: University of Maryland, Baltimore, (52) U.S. Cl. Baltimore, MD (US); The Regents of CPC ...... CI2P 19/14 (2013.01); C12P 19/02 the University of California, Oakland, (2013.01); CI3K I/02 (2013.01); C12N 9/2437 CA (US) (2013.01) (21) Appl. No.: 13/813,154 (57) ABSTRACT Hydrolysis and degradation of cellulose-containing biomass (22) PCT Filed: Jul. 14, 2011 by use of a polypeptide having cellulase activity is provided. (86). PCT No.: PCT/US11A4074 Also provided are polypeptides having cellulase activity, Such as archaeal cellulases, polynucleotides encoding the polypep S371 (c)(1), tides, and compositions containing the polypeptides, and (2), (4) Date: Mar. 20, 2014 methods of use thereof.

Patent Application Publication Jul. 30, 2015 Sheet 1 of 28 US 2015/0211037 A1

S SS. Patent Application Publication Jul. 30, 2015 Sheet 2 of 28 US 2015/0211037 A1

30 DAYS 50 DAYS

65 DAYS 65 DAYS S.

200 66 31

215

6.5 Patent Application Publication Jul. 30, 2015 Sheet 3 of 28 US 2015/0211037 A1

M N B W1

175

58

46

FIG. 4 Patent Application Publication Jul. 30, 2015 Sheet 4 of 28 US 2015/0211037 A1

CLOSTRIDIUM THERMOCELLUM CTL-6 (FJ599513) CANDDATUSKORARCHAEUM CRYPTOFILUM OPF8 (CPOOO968) METHANOPYRUS KANDLER AW19 (M59932) SULFOLOBUS TOKODAIISTRAIN 7 (AB022438) 1OO SULFOLOBUS SHIBATAE (M32504) 100 SULFOLOBUS SOLFATARICUS P2 (D26490) 100 ACIDILOBUS SACCHAROVORANS STRAIN 345-15 (AY350586) 99 SULFOPHOBOCOCCUS ZILLIG|| K1 (X98064) DESULFURCCOCCUS KAMCHATKENSIS 100 95 STRAIN 1221N (EU167539) DESULFURC)COCCUS FERMENTANS STRAINZ-1312 (AY264344) 71 THERMOSPHAERA AGGREGANS 99 STAPHYLOTHERMUSRSX-1138.895 MARINUS F1 (X99560) 100 - STAPHYLOTHERMUSHELLENCUS DSM 12710 (AJO12645) 100 GNISPHAERA-LIKE ORGANISM IGNISPHAERA SP. TOK 10A.S1 (DQ060322) 100 IGNISPHAERAAGGREGANSDSM 17230 96 (DP060321) IGNISPHAERA SP. TOK 37.31 (DQ000323) AEROPYNUMPERNIXK1 (D83259) HYPERTHERMUSBUTYICUS DSM 5456 (X99553) 100- PYRODICTIUM OCCULTUMPL-19 (M21087) 100 THERMOFILUM-LIKE ORGANISM THERMOFILISCESE STR. SRI-370 (AF266606)

76 1OO THERMOFILUMPENDENS HWV 3 (X14835) 100 THERMOFILUMPENDENS HRK 5 (CP000505) WULCANSAETA DISTRIBUTADSM 14429 (AB063630) 100 THERMOPROTEUS TENAXKRA1 (M35966) 991 100 PYROBACULUM CALIDIFONTIS JCM 11548 (AB078332) PYROBACULUMNEUTROPHILUM (X81886) 100 PYROBACULUM-LIKE ORGANISM 88 PYROBACULUMISLAND CUM DSM 4184 (LO7511) FIG. 5 Patent Application Publication US 2015/0211037 A1 Patent Application Publication Jul. 30, 2015 Sheet 6 of 28 US 2015/0211037 A1

Patent Application Publication Jul. 30, 2015 Sheet 7 of 28 US 2015/0211037 A1

Patent Application Publication Jul. 30, 2015 Sheet 8 of 28 US 2015/0211037 A1

-

GC•ZT0* GILLLº?LdiLo?dia?ILº?LÕLÂN?ATVZVŽAVIXI Patent Application Publication Jul. 30, 2015 Sheet 9 of 28 US 2015/0211037 A1

|ad?NN-––EVAGTVÆDVIH,HÆVÆÐIHMVH

Patent Application Publication Jul. 30, 2015 Sheet 11 of 28 US 2015/0211037 A1

1808SE||VQ8SE|

Patent Application Publication Jul. 30, 2015 Sheet 15 of 28 US 2015/0211037 A1

Patent Application Publication Patent Application Publication Jul. 30, 2015 Sheet 17 of 28 US 2015/0211037 A1

CIENIVILSCIENOSONOO‘9%9||-%8EC||WITANOV‘OWNO%9ZOINV/>H9OWNWZ Patent Application Publication Jul. 30, 2015 Sheet 18 of 28 US 2015/0211037 A1

dp m1 m2 O 1 2 3 4 5 6 7

FIG. 12 Patent Application Publication Jul. 30, 2015 Sheet 19 of 28 US 2015/0211037 A1

AMMONIUMSULFATE FRACTIONATION OF EB5326244 S 2O 40 60 90 1.O O.4 O.2 1.O. O.4 O.2 10 O.4 O.2 10 O.4 O.2 1.0

FIG. 13

FLOW THROUGH FRACTION WASH (10CV) FIG. 14 Patent Application Publication Jul. 30, 2015 Sheet 20 of 28 US 2015/0211037 A1

S.S

25KD

O.8

O.7

O.6

O.5

O.4

O.3

O.2

O. 1

O 7O 75 80 85 90 95 1 OO 105 11 O 115

TEMPERATURE (°C) FIG. 16 Patent Application Publication Jul. 30, 2015 Sheet 21 of 28 US 2015/0211037 A1

2.0 4.5 9 4.0 2 as 3.5 O2S 2 3.0 i 2.5 cd Oo O CN w co o O O r r r v- y 2 3 2.0 v- v- v- v- v- v- v 5 TEMPERATURE (C) < 1.5 CCD 1.0

sn 0.5 O O -0.5 6O 65 7O 75 8O 85 90 95 1 OO 105 11 O 115 120 TEMPERATURE (C)

DNS ASSAY: 1/2 FP WHATMAN #1 + EEI-44 SUPER (6.7.2010) IN 50mMSODIUMACETATE, pH 5.0 18 16 S 14 St 12 310 n5 8 6 O 4 2 O 85 90 95 100 TEMPERATURE (C) FIG. 18 Patent Application Publication Jul. 30, 2015 Sheet 22 of 28 US 2015/0211037 A1

THERMOSTABILITY OF EB5326244

0.1 T1/2 =34 min ACTIVITYC O.OO1 -- 1 OOC O.OO1 -- 105C

O.OOO1 O.OOOO1

HOURS PRETREATMENT FIG. 19

Ln REMAINING ACTIVITY

O 1 2 3 4 5 TIME (HOURS) FIG. 20 Patent Application Publication Jul. 30, 2015 Sheet 23 of 28 US 2015/0211037 A1

Ln REMAINING ACTIVITY

O 10 20 30 40 50 60 70 80 90 100 110 TIME (min) FIG 21

OM 2M 4M SATURATING

OM 1M 2M SATURATING

Patent Application Publication Jul. 30, 2015 Sheet 24 of 28 US 2015/0211037 A1

CMCASE ACTIVITY OF EB5326244 IN HIGH SALT INHEPPS pH 6.8AND 90C

1200

1 OOO

MICROGRAMS –0- CMC(90C) 600 -x- NaCl(2.5M) PRODUCT -H KCI(3.0M) 400

2OO

O 1 15 2 HOURS FIG. 23

2.4 2.2 2.0 1.8 1.6 1.4 FOLD INCREASE 12 1.O O.8 0.6 O.4 O2

BUFFER TWEEN TRITON NP-40 CHAPS ONLY 2O X-100 SUB FIG. 24 Patent Application Publication Jul. 30, 2015 Sheet 25 of 28 US 2015/0211037 A1

1100 1000 900 800 700 PRODUCT 600 (M) 500 400 300 200 100 O O 0.2 0.4 0.6 0.8 1.0 12 14 1.6 1.8 2.0 2.2 TIME (hrs) FIG. 25

PRODUCT (MICROGRAMS)

64 66 68 70 72 74 76 78 80 TEMPERATURE (C) FIG. 26 Patent Application Publication Jul. 30, 2015 Sheet 26 of 28 US 2015/0211037 A1

TEMPERATURE MAXIMUM VSIONIC LIQUID (DMIM) DMP

110 105 100 a. 95 TEMPERATURE 90 -- T-MAX (C) 85 80 75 70 O 10 20 30 40 50 60

% DMIM) DMP FIG. 27

10 9 8 7 6 5 -- BUFFER 4 -H 40% 3 - A - 50% 2 1 O 75 77 79 81 83 85 87 89.91 93 95 97 99 101 FIG. 28 Patent Application Publication Jul. 30, 2015 Sheet 27 of 28 US 2015/0211037 A1

THERMOSTABILITY OF RECOMBINANT EB-5326244AT 1 OOC min O 10 2O 40 O 10 2O 40 230 150

8O

50 PHOSPHATE BUFFER PHOSPHATE BUFFER ONLY +SDS

pH PROFILE OF EBI244

O 5

5 g g o os pH FIG. 30 Patent Application Publication Jul. 30, 2015 Sheet 28 of 28 US 2015/0211037 A1

18 a 1.6 E S 14 S Y 1.2 8 1.0 - 9 0.8 2 3 0.6 s 0.4 b< 0.2 O E. 2 3 4 5 6 7 8 9 10 11 US 2015/021 1 037 A1 Jul. 30, 2015

POLYPEPTIDES FOR USE IN THE binant thermostable endoglucanase of Aquifex aeolicus pro DECONSTRUCTION OF CELLULOSE duced in E. coli showed maximal activity at 80°C. and pH 7.0 with a half-life of 2 hat 100° C. (Kim J S. Lee YY. Torget, R CROSS-REFERENCE TO RELATED W (2001). Cellulose hydrolysis under extremely low sulfuric APPLICATION acid and high-temperature conditions. Appl. Biochem. Bio 0001. This application claims the benefit of U.S. Provi technol. 91-93.331-340)). The endoglucanases produced by sional Application No. 61/369,388, filed Jul. 30, 2010, which Anaerocellum thermophilum and Caldicellulosiruptor sac is hereby incorporated by reference, in its entirety. charolyticus are multidomain enzymes composed of two catalytic domains, linked to carbohydrate binding domains by SUBMISSION OF SEQUENCE LISTING AS proline-threonine-rich regions (Zverlov V. Mahr S, Riedel K. ASCII TEXT FILE Bronnenmeier K (1998a), “Properties and gene structure of a bifunctional cellulolytic enzyme (CelA) from the extreme 0002 The content of the following submission on ASCII thermophile Anaerocellum thermophilum with separate text file is incorporated herein by reference in its entirety: a glycosyl hydrolase family 9 and 48 catalytic domains. computer readable form (CRF) of the Sequence Listing (file Microbiology 144 (Pt 2): 457-465; Te'o V S, Saul DJ, name: 677792.000940SEQLIST.txt, date recorded: Jun. 30, Bergquist P L (1995), “celA, another gene coding for a mul 2011, size: 206 KB). tidomain cellulase from the extreme thermophile Caldocel lum saccharolyticum,' Appl Microbiol Biotechnol 43: 291 FIELD 296; Saul et al. 1990. The recombinant endoglucanase of 0003. The present disclosure relates to hydrolysis of cel Rhodothermus marinus has a pH optimum of 6.0–7.0 and a lulose-containing polysaccharides and degradation of biom temperature optimum at 100° C. (Halldórsdóttir S, Thórólfs ass using polypeptides having cellulase activity, including dóttir ET, Spilliaert R. Johansson M, Thorbjarnardóttir SH, hyperthermophilic polypeptides. In particular the present dis Palsdottir A. Hreggvidsson GO, Kristjánsson J K, Holst O. closure relates to archaeal polynucleotides encoding the Eggertsson G. (1998), “Cloning, sequencing and overexpres polypeptides, the polypeptides themselves, and composi sion of a Rhodothermus marinus gene encoding a thermo tions, methods and uses thereof. stable cellulase of glycosylhydrolase family 12, 'Appl Micro biol Biotechnol 49: 277-284). The aerobic thermophilic BACKGROUND bacterium Thermus caldophilus also produces an endogluca nase which exhibits high activity on CMC with cellobiose and 0004 Cellulose, the major component of plant biomass, is cellotriose as products (Kim D. Park B H, Jung B-W. Kim considered the most abundant biopolymer. Bayer, E. A., M-K, Hong SI, Lee, DS (2006) Identification and molecular Chanzy, H., Lamed, R., Shoham, Y. (1998) Cellulose, cellu modeling of a family 5 endocellulase from Thermus caldo lases and cellulosomes. Curr. Opin. Struct. Biol. 8, 548-557. philus GK24, a cellulolytic strain of Thermus thermophilus. Certain microorganisms are able to convert the monomer of Int J Mol Sci 7: 571-589). In contrast, high-temperature, cellulose, glucose, into various products useful in the produc crystalline deconstructing cellulases from hyperthermophilic tion of biofuels and other methods. Cellulose is highly stable, Archaea are few in number, despite efforts to identify such has a high storage potential, low cost, and plentiful Supply. enzymes. Hyperthermophilic enzymes that act on cellulose Based on these and other properties, cellulose and enzymes typically lack identifiable cellulose binding domains. capable of degrading and hydrolyzing it are useful in the 0007 Thus there is a need for improved cellulases, includ sequestration, storage, and production of bioenergy. Lynd L ing cellulases encoded by hyperthermophilic archaea, and R. Weimer P.J. van Zyl W. H. Pretorius IS (2002), “Microbial cellulose utilization: fundamentals and biotechnology.” cellulases having high stability and tolerance to a range of Microbiol Mol Biol Rey 66: 506-577. chemical and physical parameters, including cellulases with 0005 Crystalline cellulose is composed of linear poly activity at high temperatures and over a broad range oftem mers of 31-4 linked glucose, held in a tightly crosslinked peratures and pH, cellulases with higher catalytic activity and crystalline lattice by a high degree of intermolecular hydro rate of conversion, activity in the presence of salts, ionic gen bonding. This structure confers stability but also hinders , Sulfhydryl reagents, and ionic liquids. Provided efficient deconstruction of cellulose. Strategies for commer are polypeptides, compositions and methods that meet this cial depolymerization of cellulose typically combine pre need. treatment to disrupt the crystalline structure, followed by enzymatic hydrolysis. Hilden L. Johansson G (2004), BRIEF SUMMARY “Recent developments on cellulases and carbohydrate-bind 0008. The present disclosure relates to isolated polypep ing modules with cellulose affinity. Biotechnol Lett, 26: tides (), and in particular cellulases, including cellu 1683-1693. Disruption of the crystalline structure and chemi lases encoded by hyperthermophilic archaea, and cellulases cal hydrolysis typically requires high temperatures and low having high Stability and tolerance to a range of chemical and pH. See Kim J S. Lee YY. Torget, R. W. (2001) “Cellulose physical parameters, including cellulases with activity at high hydrolysis under extremely low Sulfuric acid and high-tem temperatures and over a broad range oftemperatures and pH, perature conditions, Appl. Biochem. Biotechnol. 91-93331 cellulases with higher catalytic activity and rate of conver 340. Enzymatic hydrolysis generally occurs under milder Sion, activity in the presence of salts, ionic detergents, Sulf conditions. The degree of pretreatment required and the hydryl reagents, and ionic liquids. For example, provided are expense of Subsequent cleanup steps are affected by proper polypeptides. Such as EBI244, having cellulase activity, e.g., ties of the enzymes used. endoglucanase, exoglucanase and/or B-Glucosidase or 0006 Bacteria capable of degrading cellulose include B-Glucosidaseglucohydrolase activity. Such as cellulases pro those belonging to the genera Aquifex, Rhodothermus, Ther duced by archaea. Certain aspects of the present disclosure mobifida, Anaerocellum, and Caldicellulosiruptor. A recom relate to an isolated EBI244 having the amino acid US 2015/021 1 037 A1 Jul. 30, 2015

sequence of SEQ ID NO: 1, and variants and fragments acids 50-842 of SEQID NO: 1; or (xii) amino acids 130-842 thereof. The present disclosure also relates to isolated poly of SEQ ID NO: 1. In one such aspect, the isolated mature nucleotides encoding the polypeptides, as well as vectors and cellulase protein includes an amino acid sequence having at genetically modified host cells containing Such isolated poly least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, nucleotides. 95%, 96%, 97%, 98%, or 99% identity or 100% identity to 0009. The present disclosure further relates to composi amino acids 24-482 of SEQID NO: 1. tions comprising the isolated polypeptides or enriched in Such 0013. In one embodiment, the protein contains an amino polypeptides. Moreover the present disclosure relates to acid sequence at least 30%, 40%, 50%, 60%, 70%, 75%,80%, methods for the identification and production of the polypep 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or tides, and methods for their use in the degradation and 99% identical or 100% identical to SEQID NO: 1, such as a hydrolysis of poly- and oligo-Saccharides, such as biomass, protein of SEQID NO: 1 or a protein variant thereof. In one e.g., hemicellulose, for example, in the conversion of biom aspect, the protein has identity at glutamates 413 and 506 of ass, such as lignocellulocytic biomass, including pretreated SEQID NO: 1. In another embodiment, the protein contains lignocellulocytic biomass, into Soluble Sugars, including for an amino acid sequence that is at least 70%, 75%, 80%, 85%, use in the fermentive production of biofuels, polishing of 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% cotton fabrics, production of laundry detergents, production identical or 100% identical to SEQID NO: 5. In yet another of polished crystalline cellulose, assays of cellulases, embodiment, the protein contains an amino acid sequence expansins, and cellulose binding proteins, and in pulping that is at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, cellulolytic materials. 94%. 95%, 96%, 97%, 98%, or 99% identical or 100% iden 0010. In some embodiments, the provided polypeptides tical to SEQID NO: 16. are isolated proteins that include a domain having an amino 0014. In another embodiment, the protein contains an acid sequence at least at or about 30%, 40%, 50%, 60%, amino acid sequence encoded by a nucleic acid sequence with typically at leastator about 70%, 75%, 80%, 85%, 90%,91%. at least 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical or 91%, 92%.93%, 94%, 95%,96%.97%, 98%, or 99% identity at or about 100% identical to a domain of SEQID NO: 1, such or 100% identity to SEQID NO: 2, SEQID NO:3, SEQID as to amino acids 250-580 of SEQ ID NO: 1, where the NO: 4, or SEQ ID NO: 15. protein is a cellulase. In some embodiments, the protein 0015. In some embodiments, the protein is a protein of includes or further includes a domain at least at or about 30%, SEQID NO: 1, SEQID NO. 5, or SEQID NO: 16. 40%, 50%, 60%, typically at leastator about 70%, 75%, 80%, 0016 Other aspects of the present disclosure relate to an 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or isolated protein having amino acids 250-580 of SEQID NO: 99% identical or at or about 100% identical to amino acids 1, where the protein is a cellulase. Still other aspects of the 130-250 of SEQID NO: 1. In some embodiments, the protein present disclosure relate to an isolated protein having amino includes or further includes a domain at least at or about 30%, acids 130-250 of SEQ ID NO: 1, where the protein is a 40%, 50%, 60%, typically at leastator about 70%, 75%,80%, cellulase. Yet other aspects of the present disclosure relate to 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or an isolated protein having amino acids 750-842 of SEQ ID 99% identical or at or about 100% identical to amino acids NO: 1, where the protein is a cellulase. Further aspects of the 750-842 of SEQID NO: 1. In some embodiments, the protein present disclosure relate to an isolated protein having amino includes or further includes a domain at least at or about 30%, acids 580-750 of SEQ ID NO: 1, where the protein is a 40%, 50%, 60%, typically at leastator about 70%, 75%,80%, cellulase. 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 0017. In some embodiments, the protein further includes a 99% identical or at or about 100% identical to amino acids domain, such as a catalytic domain or cellulose binding 580-750 of SEQID NO: 1. domain of a bacterial or archaeal enzyme. In one aspect, Such 0011. In one aspect, the protein contains a domain having proteins are fusion proteins, containing one or more domains an amino acid sequence at least 70%, 75%, 80%, 85%, 90%, of SEQID NO: 1, 5, or SEQID NO:6-13, such as a catalytic 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% iden or cellulose binding domain, and one or more domains of tical or 100% identical to amino acids 250 through 580 of another protein, Such as another cellulase. In one embodi SEQID NO: 1, amino acids 130-250 of SEQID NO: 1, amino ment, the domain, e.g., catalytic domain or cellulose binding acids 750-842 of SEQID NO: 1, or amino acids 580-750 of domain, is from another organism, for example, B. fibrisol SEQID NO: 1, where the protein is a cellulase. vens, S. solfataricus, A. cellulolyticus, Pfitriosus, P horiko 0012. In one embodiment, the isolated protein has an Shii, P abyssi, A. cellulolyticus, S. lividans, B. fibrisolvens, or amino acid sequence that is at least 70%, 75%, 80%, 85%, T. reesei, or other cellulase-encoding organism disclosed 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% herein or well know in the art. identical or 100% identical to SEQ ID NO: 1. In another 0018. In some embodiments, the protein includes a modi embodiment, the protein is a mature cellulase protein, con fication, such as a tag, for example, an N-terminal or C-ter taining an amino acid sequence having at least 70%, 75%, minal histidine tag. 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 0019. In some embodiments, the protein exhibits cellulase 98%, or 99% identity or 100% identity to (i) amino acids activity, for example, one or more of endoglucanase activity, 5-842 of SEQID NO: 1, (ii) amino acids 10-842 of SEQID exoglucanase activity, and B-Glucosidase activity. In some NO: 1, (iii), amino acids 15-842 of SEQID NO: 1, (iv) amino embodiments, the protein exhibits such activity over a range acids 20-842 of SEQ ID NO: 1, (v) amino acids 24-482 of of physical and chemical conditions, such as at a high tem SEQ ID NO: 1: (vi) amino acids 25-482 of SEQ ID NO: 1; perature or over a broad temperature range, such as at a (vii) amino acids 30-842 of SEQID NO: 1; (viii) amino acids temperature greater than 105° C., 95° C. to 110° C., or at a 35-842 of SEQID NO: 1; (ix) amino acids 40-842 of SEQID temperature exceeding 90,91, 92,93, 94, 95, 96, 07, 98.99, NO: 1; (x) amino acids 45-842 of SEQID NO: 1; (xi) amino or 100° C., or over a broad temperature range, such as US 2015/021 1 037 A1 Jul. 30, 2015

between at or about 60° C. and 110° C. or between 65° C. and with the provided compositions and proteins to form a first 110°C., such as between 90 and 110°C., between 65 and 70° product and then culturing the first product with one or more C., between 85 and 105°C., or between 95 and 105° C. fermentive microorganisms under conditions sufficient to 0020. In some embodiments, the activity has a half-life of produce a fermentation product, or incubating the first prod at least one, two, three, four, or five hours at 100° C., or 105° uct with a chemical Solution, under conditions sufficient to C., for example, a half-life of at least five hours at 100° C., or produce a fermentation product by a chemical process. In a half-life of at least one hour at 105°C., at a pH of about 6.8. Some aspects, the fermentation product is a biofuel. In some embodiments, the activity has a half-life of at least 0026. Also provided are methods for fermenting biomass five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 20, 25, 30. by fermenting the biomass with one or more fermenting 35, 45, or 60 minutes at 108°C., for example, a half-life of at microorganisms, wherein the biomass is or has been treated least 15 minutes at 108°C., or a at a pH of about 6.8. In some by a provided composition or protein. embodiments, the protein exhibits the activity at 90° C., in a 0027. Also provided are methods for producing a fuel by solution containing up to 50% ionic liquid, 3.2M KCl, or 4M contacting a biomass with the composition or protein to yield NaCl. In some embodiments, the cellulase activity is at least a Sugar Solution and culturing the Sugar Solution with a fer 50% maximum over a pH range of between about 4.5 and mentative microorganism under conditions sufficient to pro 8.75, or is at least 70% maximum at a pH of greater than about duce a fuel or under conditions sufficient to produce a fer 7 or at a pH of about 8.5. mentation product by a chemical process. 0021. Also provided are compositions containing the iso 0028. Also provided are methods for food production, by lated proteins, and nucleic acids encoding the proteins. Such contacting a plant material with the provided composition or as polynucleotides encoding any of the proteins, for example, protein, yield a treated plant material, and methods for textile an isolated nucleic acid encoding a protein that comprises an cleaning by contacting a soiled textile with the composition or amino acid sequence at least 30%, 40%, 50%, 60%, typically protein, to yield a clean textile. Also provided are methods for at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, paper pulp bleaching by contacting paper pulp with the com 95%, 96%, 97%, 98%, or 99% identical or 100% identical to position or protein to yield bleached paper pulp. SEQ ID NO: 1, SEQ ID NO: 5, or SEQ ID NO: 16, and 0029. Also provided are laundry compositions, isolated nucleic acids having a nucleotide sequence at least at containing the provided proteins and detergent, and methods least 30%, 40%, 50%, 60%, typically at least 70%, 75%,80%, for use of Such compositions in cleaning, anti-deposition, or 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or color care, by contacting the laundry detergent composition 99% identical or 100% identical to SEQID NO: 2, SEQID with a textile. NO:3, SEQID NO: 4, or SEQID NO: 15. 0030. In some aspects, the methods, e.g., the contacting, 0022. Also provided are expression vectors containing the are conducted at a pH between 4.5 and 8.5, such as a pH of at isolated nucleic acids, operably linked to a regulatory least 5.5 or at least 6.5, for example, at least 7, at least 7.5, at sequence, and host cells containing the expression vectors, least 8, at least 8.5. In some aspects, the methods or contacting and methods for producing a protein by culturing the host cell are performed at a temperature between 90 and 110° C. in a culture medium, under Suitable conditions to produce a between 60 and 70° C., between 95 and 105° C., or at least protein encoded by the expression vector. Also provided are 100° C. In some aspects, the method or contacting is per compositions containing the host cells in culture medium, and formed in a solution containing KCl or NaCl, for example, at compositions containing the provided proteins in the Super a concentration of at least 1 M, 2 M, 3 M, or 4 M, or at a natant of culture medium. saturating condition. In one aspect, the method or contacting 0023. In some embodiments, the composition contains a is performed in a solution containing at least 10%, at least high salt or ionic solution, Such as a solution including NaCl 20%, at least 30% or at least 40% ionic liquid. or KCl at a concentration of at least 1 M, 2M, 3M, or 4M. In 0031. In some aspects, the biomass is a lignocellulose. In Some embodiments, the composition has a pH of at least at or Some embodiments, the biomass is pretreated prior to con about 5.5, 6.5, 7, 7.5, 8, or 8.5. In some embodiments, the tacting. composition includes an ionic liquid at a concentration of between at or about 20% and 50% or up to at or about 50%. BRIEF DESCRIPTION OF THE DRAWINGS 0024. Also provided are methods of reducing the viscosity 0032 FIG. 1A shows the source of lignocellulose-degrad of a pretreated biomass mixture, by contacting a pretreated ing consortium of hyperthermophilic Archea enrichment, and biomass mixture having an initial viscosity with the provided degradation of filter paper. A circumneutral geothermal pool compositions and/or proteins, and incubating the contacted at 94°C., with a level-maintaining syphon. Sediment from the biomass mixture under conditions sufficient to reduce the floor of this site was enriched on pulverized Miscanthus at 90° initial viscosity of said pretreated biomass mixture. C. and subsequently transferred to filter paper enriched 0025. Also provided are methods for converting a biomass media. FIG. 1B depicts the degradation of filter paper by the to Sugars, hydrolyzing or degrading a biomass, by contacting enrichment culture in a spherical 2 L culture flask. Circular the biomass with the provided compositions and/or proteins. discs of Whatman(R) #3 filter paper were shredded and par Also provided are methods for producing a fermentation tially dissolved after incubation for 30 days at 90° C. FIG. 1C product by contacting biomass with the compositions or pro depicts control Whatman(R) #3 filter paper discs. Incubation as teins to form a first product, and then culturing the first prod in panel B. uct with one or more fermentive microorganisms under con 0033 FIG. 2 depicts results of additional experiments ditions Sufficient to produce a fermentation product, or showing filter degradation by the enrichment. FIG. 2A shows incubating the first product with a chemical Solution, under Whatman(R) #1 filter paper in media without (C) and with (E) conditions Sufficient to produce a fermentation product by a inoculation with the enrichment at 90° C. Lettering was chemical process. Also provided are methods for producing a applied with a number 2 graphite pencil. FIG. 2B shows fermentation product, by hydrolyzing or degrading biomass Whatman(R) #3 filter paper strip (2 mmx40 mm) in growth US 2015/021 1 037 A1 Jul. 30, 2015 media supported by a glass tube, without (C) and with (E) ing. Domain 4 search area is highlighted in orange. All inoculation with the enrichment at 90° C. The inoculated sequences were globally aligned using the MUSCLE pro sampled showed complete dissolution of the filter paper strip. gram. 0034 FIG. 3 shows endoglucanase activity of protein in 0040 FIG. 9 shows a homology structural model of the three-organism hyperthermophilic Archaea consortium EBI244 domain 2, constructed by the I-TASSER server, built enriched on Avicel(R) as described in Example 1A, measured from multiple GH5 domain structures in the PDB database, by Zymograms on SDS-PAGE fractions from detergent wash showing the common TIM-barrel architecture with 8 beta of Avicel(R) from the enrichment culture. The lanes are labeled sheets inside 8 alpha-helices. as follows: 1 (Marker), (2) 1% SDS wash (experiment 1); (3) 0041 FIG. 10 shows schematically a relationship of the whole cell extract; (4) Avicel(R); (5) CHAPS fraction; (6) glycolytic domain of EBI244 to known glycosyl hydrolase Pellet after CHAPS wash, (7) 1% SDS wash (experiment 2). family 5 proteins. 0042 FIG. 11 shows Zymogram activity of recombinant 0035 FIG. 4 shows protein extraction and detection of protein fractions, compared to native protein fraction. CMCase activity. Protein extraction and detection of M-prestained molecular weight standard; P=B121 (pet16b:: CMCase activity from proteins eluted from Avicel(R) particles 5326244 (His-tagged EBI244 protein)), pellet fraction; after deconstruction by enrichment at 90° C. for 8 days. Image S=B121 (pet16b::5326244 (His-tagged EBI244 protein)), shows SDSPAGE gradient zymogram, 10%-15% acrylimide, boiled fraction: N-native protein from J1 enrichment eluted with 0.2% CMC embedded in gel. Lanes: M-marker, N-na from Avicel(R) with 2% SDS. Cleared areas (white) represent tive whole SDS extract, B-buffer only soluble extract, W1-0. activity, while dark areas represent intact carboxymethylcel 6% CHAPS extract, W2-1% CHAPS 5% Cellobiose extract lulose. Recombinant protein fractions (P and S) were #1 (1 hr incubation at 90° C.), W-3 1% CHAPS 5% Cello insoluble or soluble portions of the E. coli extract. Native biose extract #2 (1 hr incubation at 90° C.), S-1% SDS extract fraction was eluted from Avicel(R) with boiling SDS. The final wash (15 minute incubation 100° C.). For lanes B lower band represents an internal control, E. coli endogluca through S, the Avicel(R)pellet was sonicated continuously for aSC. 2 minutes in the wash solution. 0043 FIG. 12 shows the Fluorophore Assisted Carbohy 0036 FIG. 5 depicts a maximum likelihood phylogenetic drate Electropheresis (FACE) results of time course of tree. Maximum likelihood 16S rRNA phylogenetic tree, EBI244 on cellohexaose. Reaction condition was 10 ug showing the relationship of full-length 16S rRNAs from the enzyme, 0.33 mM cellohexaose in 25 mMHEPPSph 6.8, 95° three component organisms of the assembled metagenome. C. in 100 uL volume. The experiment tracked degree of Branches in bold and labeled with larger type represent the polymerization (dp) over time. FIG. 12A depicts cello three sequences from the metagenome. hexaose (0.33 mM) substrate. FIG.12B depicts cellopentaose (0.4 mM) and glucose (0.4 mM) substrates. FIG. 12C depicts 0037 FIG. 6 displays a phylogentic tree, showing rela cellotriose (0.67 mM) substrates. FIG. 12D depicts cellobiose tionship of three reverse gyrases, from the metagenome (1 mM) substrate. Standards were a mixture of glucose, cel described in Example 1A, to other archaeal reverse gyases. lotriose and cellopentaose (ml) and mixture of cellobiose, Reverse gyrase 1 and 2, found on high-read density contigs, cellotetraose and cellopentaose (m2). Time points (minutes, appear closely allied with the two reverse gyrases encoded by label) were (0,0), (1:20,1), (2:40, 2), (6:20, 3), (12:40, 4), Ignisphaera aggregains. The reverse gyrase of the bacterium (25:20, 5), (50:40, 6), (120:00, 7). Oligomers higher than Dictyoglomus turgidium was set as the root. cellohexaose up to dp-11 were rapidly formed then degraded 0038 FIG. 7 depicts the phylogeny of the EBI244 protein over time. putative catalytic domain. A phylogenetic tree was produced 0044 FIG. 13 shows results of a zymogram assay, show showing the relationship of EBI244's catalytic domain to the ing EBI244 activity distributed among 20-40% saturating closest characterized GH families. Tree entry information: ammonium sulfate fractions. Each fraction is represented by Uniprot identifier; enzyme function (if known); organism three lanes: undiluted (1.0), dilution 2 in 5 (0.4), and dilution name: Pfamhit GH family (asterisk indicated characterized 1 in 5 (0.2). Initial sample was soluble recombinant protein enzyme in CAZY database); and E-value (no GH listed indi after pretreatment at 80 C for 30 minutes. Protein was pre cates no Pfamhits). cipitated using 20, 40, 60, and 90% saturating ammonium 0039 FIG. 8A displays schematically the predicted sulfate. domain architecture of EBI244 protein sequence, with 0045 FIG. 14 shows a graph of endoglucanase activity, approximate amino acid positions of domain boundaries measured by DNS assay, with 1% low-viscosity carboxym labeled. FIG. 8B displays similar N-terminal protein regions ethylcellulose as the substrate. Fractions 1-11 represent a among genes identified in the hyperthermophilic Archaea linear gradient from 1 M to 0 Mammonium sulfate in potas consortium metagenome in whichEBI244 was discovered, as sium phosphate buffer, pH 7.0. described in Example 1. The top sequence is EBI244. FIG. 8C 0046 FIG. 15 shows a picture of a comassie-stained SDS shows a multiple sequence alignment of a non-redundant PAGE gel, demonstrating stepwise purification of EBI244 to sample of the thirty-eight sequences identified using Hidden -60% purity. M=marker; L=whole cell lysate; AS=20-40% Markov Model (HMM) searching and analysis based on ammonium sulfate fraction; HIC-pooled active fraction, domain 1 of EBI244, as described in Example 1. FIG. 8D purified using Macro-Prep t-butyl hydrophobic interaction shows a multiple sequence alignment of EBI244 domain 2 chromatography (HIC) Support (methacrylate-based, 50 um with sequences identified in the domain 1 HMM search/ beads) (butyl HIC). The sample was heated to 80°C. prior to analysis (see FIG. 8C). Catalytic residues of EBI244 pre ammonium Sulfate fractionation. dicted from Pfam analysis (glutamates 413 and 506) both 0047 FIG. 16 shows an activity-temperature profile of glutamates) are highlighted in yellow. FIG.8E shows a Mul EBI244 on 1% CMC (carboxymethyl cellulose) (DNS tiple sequencealignment of all hits to domain 4 HMMsearch assay). US 2015/021 1 037 A1 Jul. 30, 2015

0048 FIG. 17 shows the temperature profile of EBI244. 0059 FIG. 28 shows results of a DNS assay, showing The temperature vs. activity profile was measured by 20-min activity of EBI244 on 1% CMC in buffer alone, and in the assay in 1% CMC in 25 mM sodium acetate buffer, pH 6.0. presence of 40% and 50% DMIMDMP. The products were detected by DNS reducing sugar assay and 0060 FIG. 29 shows the results of a zymogram assay of normalized to a cellobiose standard. Error for this experiment EBI244, after pretreatment in phosphate buffer orphosphate was below 15%. Inset: Differential scanning calorimetry buffer plus 0.1% sodium dodecyl sulfate at 100 C, demon results of enzyme from 102-116°C. A dual Tm was observed strating the thermostability of recombinant EBI244. at 111.5° C. and 113° C. 0061 FIG. 30 shows a pH-profile of EBI244 activity, 0049 FIG. 18 shows results of a DNS assay using What based on DNS assays of CMC hydrolysis. man(R) #1 filter paper in 10 mMSodium Acetate pH 5.0 curve, 0062 FIG.31 shows a pH profile of EBI244 activity mea demonstrating enzyme activity on filter paper over a range of sured against PNP-cellobioside at 95° C. Buffers used were temperatures. sodium acetate/acetic acid (pH 2.5-5.5), MED (pH 6.5), 0050 FIG. 19 shows thermostability of EBI244 activity, HEPPS (pH 7.5-8.5), and CAPS (pH 9.5-10.5). After 20 min preincubated at 100° C. or 105°C. in buffer, then assayed for incubation, sodium hydroxide to a final concentration of 50 activity on 1% CMC at 95°C. mM and absorbance was measured at 410 nm. Values were 0051 FIG. 20 shows the thermostability of EBI244 at calculated by a paranitrophenol standard in the same buffer. 100° C. (O) and 105° C. (o) in 50 mM HEPPS buffer, pH 6.8. Error bars are standard deviations of the mean of four dupli Data points represent the mean of four assays. Enzyme was cate addays. incubated at the appropriate temperature, Samples were col lected at 1 hour intervals, and activity was measured using the DEFINITIONS DNS assay with cellobiose as a standard. 0063. The term “catalytic activity” or “activity” describes 0052 FIG.21 shows thermostability of EBI244 at 108° C. quantitatively the conversion of a given Substrate under with(o) and without (O)0.5% w/v Avicel(R) in 25 mM sodium defined reaction conditions. The term “residual activity” is acetate buffer, pH 6.0. Enzyme was pretreated for 30 min at defined as the ratio of the catalytic activity of the enzyme 90° C. prior to incubation at 108°C. to allow for interaction under a certain set of conditions to the catalytic activity under with the cellulose. Samples were removed at time intervals a different set of conditions. The term “specific activity” and activity was measured in triplicate using the DNS assay describes quantitatively the catalytic activity per amount of using cellobiose as a standard. enzyme under defined reaction conditions. 0053 FIG. 22 shows Zymogram assay results following 0064. The term “thermostability” describes the property incubation of recombinant EBI244 enzyme at 90°C. in phos of a protein to withstand a limited exposure to certain tem phate buffer, at various salt concentrations. Upper panel: peratures, such as high temperatures, without losing the activ NaCl; lower panel: KC1. ity it possesses attemperatures where its activity is measur 0054 FIG. 23 shows DNS assay results showing product able or is optimal. The term “thermoactive' describes a formation for EBI244 with 1% CMC in HEPPS buffer with property of a protein which retains activity at high tempera tures. no added salt, 2.5 M NaCl, or 3.0 M KC1. 0065. The term “pH-stability” describes the property of a 0055 FIG. 24 depicts activity of EBI244 against PNP protein to withstand a limited exposure to pH-values signifi cellobioside at 95°C. in the presence of various detergents. cantly deviating from the pH where its stability is optimal Conditions tested were 25 mM potassium phosphate buffer, (e.g., more than one pH-unit above or below the pH-optimum, pH 6.8 alone or buffer plus 0.1% of either Tween R. 20, Tri without losing its activity under conditions where its activity ton(R) X-100, NP-40 substitute or CHAPS. After a 20 min is measurable). The term “pH active' describes a property of incubation, sodium hydroxide was added to 50 mM and a protein which retains activity at a pH value deviating sig absorbance was measured at 410 nm. Values were calculated nificantly from pH values typically optimal for such activi via paranitrophenol standard in the same buffer. Ratios were ties. calculated based on activity in buffer alone. 0066. The term “cellulase' refers to an enzyme (or enzy 0056 FIG. 25 depicts a time course of EBI244 activity matic activity thereof) that catalyzes an enzymatic reaction in against 1% CMC while in the presence of salts or ionic which cellulose is hydrolyzed into glucose, cellobiose, or liquids. All assays were done in HEPPS buffer, pH 6.8 at 90° cellooligotose, including enzymes having endoglucanase, C. (shown with O) and either 2.5M sodium chloride (A), 3.0 exoglucanase, e.g., glucanohydrolase or cellobiohydrolase, M potassium chloride (), 25% (v/v) DMMDMP (o), or B-Glucosidase or B-Glucosidaseglucohydrolase activity, and 25% (v/v) EMMAcetate (A). Activity was measured using the corresponding enzymatic activity of Such enzymes. DNS assay after each time point using cellobiose as a stan 0067. The term “lignocellulose' refers to any material dard. Error bars represent the standard error of the mean of primarily consisting of cellulose, hemicellulose, and lignin. four assays. 0068. The term “hemicellulose” refers to a polymer of 0057 FIG. 26 depicts a temperature profiles showing short, highly-branched chains of mostly five-carbon pentose CMC activity of EBI244 in 50% ionic liquid. Enzyme activity Sugars (e.g., Xylose and arabinose) and to a lesser extent was measured in 50% (v/v) DMIMDMP in 25 mM phos six-carbon hexose Sugars (e.g., galactose, glucose and man phate, pH 6.8 (O) and 25 mM potassium phosphate buffer, pH nose). 6.8 alone (o). Activity was measured using DNS assay after 2 0069. The term “renewable resources' refers to biomass hours using cellobiose as a standard. Substrates that are grown and harvested, like crops, straw, 0058 FIG. 27 shows results of a DNS assay, representing wood and wood products. The term “biological fuels' refers temperature optima compiled from activity-temperature pro to Solid, liquid, orgas fuel including or derived from biomass, files of EBI244 in increasing amounts of the ionic liquid Such as biodiesel, biogas, vegetable oil, bioethanol, and bio DMIMDMP. hydrogen. US 2015/021 1 037 A1 Jul. 30, 2015

0070. As used herein, when it is generally stated that a length of the molecule. Alternatively, sequence identity can polypeptide or nucleic acid molecule or region thereof con be compared along the length of a molecule, compared to a tains or has “identity” or “homology.” perse (without speci region of another molecule. fying a particular percent identity), to another polypeptide or nucleic acid molecule or region thereof, the two molecules DETAILED DESCRIPTION and/or regions share at least at or about 40%, and typically at 0074 Crystalline cellulose is composed of linear poly least at or about 50%. 60% or 70% sequence identity, such as mers of B1-4 linked glucose, held in the crystalline lattice by at least at or about 60%. 65%, 70%, 75%, 80%, 85%, 90%, a high degree of intermolecular hydrogen bonding. The 95%, 96%, 97%, 98%, 99% or 100% sequence identity. The tightly crosslinked structure is primarily responsible for the precise percentage of identity can be specified. inherent stability of cellulose, but also can hinder efficient deconstruction. The conversion of cellulose to glucose is 0071 Sequence “identity” has an art-recognized meaning. generally accomplished by chemical hydrolysis (typically The percentage of sequence identity between two nucleic using a single step of acid treatment) or enzymatic hydrolysis acid or polypeptide molecules and/or regions can be calcu (generally involving acid pretreatment followed by hydroly lated using well-known and published techniques, such as sis with cellulase enzymes). High temperatures combined those described below. In general, for determination of the with low pH are generally required for the disruption of the percentage sequence identity, sequences are aligned so that crystalline structure and chemical hydrolysis. See Kim J S. the highest order match is obtained (see, e.g.: Computational Lee YY. Torget, R. W. (2001) “Cellulose hydrolysis under Molecular Biology, Lesk, A. M., ed., Oxford University extremely low Sulfuric acid and high-temperature conditions, Press, New York, 1988: Biocomputing. Informatics and Appl. Biochem. Biotechnol. 91-93. 331-340. Enzymatic Genome Projects, Smith, D. W., ed., Academic Press, New hydrolysis generally occurs under milder conditions. Strate York, 1993: Computer Analysis of Sequence Data, Part I, gies for commercial depolymerization of cellulose typically Griffin, A. M., and Griffin, H. G., eds., Humana Press, New combine pretreatment and enzymatic hydrolysis. Hilden L. Jersey, 1994; Sequence Analysis in Molecular Biology, Von Johansson G (2004). “Recent developments on cellulases and Heinje, G., Academic Press, 1987; and Sequence Analysis carbohydrate-binding modules with cellulose affinity. Bio Primer, Gribskov, M. and Devereux, J., eds., M. Stockton technol Lett, 26: 1683-1693. The degree of pretreatment Press, New York, 1991; Carrillo et al. (1988)SIAMJApplied required and the expense of Subsequent cleanup steps Math 48:1073). For sequence identity, the number of con required depend upon the properties of the enzymes that will served amino acids or nucleotides is determined by standard be used. alignment algorithms programs, and can be used with default gap penalties established by each Supplier. Substantially EMBODIMENTS homologous nucleic acid molecules specifically hybridize typically at moderate Stringency or at high Stringency all 0075. The present disclosure relates to isolated polypep tides, including cellulases and other polypeptides, for along the length of the nucleic acid of interest. example, cellulases having endoglucanase, exoglucanase 0072 The term “identity, when associated with a particu and/or B-Glucosidase or B-Glucosidaseglucohydrolase activ lar number, represents a comparison between the sequences ity, activity, including those produced by archaea, Such as an of a first and a second polypeptide or polynucleotide or EBI244 polypeptide (SEQID NO: 1) and variants and frag regions thereof. As used herein, the term at least "90% iden ments thereof. The present disclosure also relates to isolated tical to refers to percent identities from 90 to 99.99 of one polynucleotides encoding the polypeptides, as well as vectors nucleotide or amino acid sequence to the other. Identity of and genetically modified host cells containing Such isolated 90% or more is indicative of the fact that, assuming for polynucleotides. The present disclosure further relates to exemplification purposes, the full length of a first and second compositions comprising the isolated polypeptides or polypeptide, each 100 amino acids in length, are compared, enriched in such polypeptides. Moreover the present disclo no more than 10% (i.e., 10 out of 100) of the amino acids in sure relates to methods for the identification and production the first polypeptide differs from that of the second polypep of the polypeptides, and methods for their use in the degra tide. Similar comparisons can be made between first and dation and hydrolysis of poly- and oligo-Saccharides, such as second polynucleotides. Such differences among the first and biomass, e.g., hemicellulose, for example, in the conversion second sequences can be represented as point mutations ran of biomass. Such as lignocellulocytic biomass, including pre domly distributed over the entire length of a polypeptide or treated lignocellulocytic biomass, into soluble Sugars, includ they can be clustered in one or more locations of varying ing for use in the fermentive production of biofuels, polishing length up to the maximum allowable, e.g. 10/100 amino acid of cotton fabrics, production of laundry detergents, produc difference (approximately 90% identity). Differences are tion of polished crystalline cellulose, assays of cellulases, defined as nucleotide or amino acid residue Substitutions, expansins, and cellulose binding proteins, and in pulping insertions, additions or deletions. At the level of homologies cellulolytic materials. Also provided herein are hyperthermo or identities above about 85-90%, the result should be inde philic organisms and polypeptides encoded by the organisms, pendent of the program and gap parameters set; such high capable of utilizing crystalline cellulose, and methods for levels of identity can be assessed readily, often by manual their identification and production. alignment without relying on Software. (0076) Polypeptides 0073. Sequence identity can be measured along the full 0077. The present disclosure relates to isolated polypep length of a polynucleotide or polypeptide or along a region tides having cellulase activity and fragments thereof. In par thereof. Sequence identity compared along the full length of ticular, the present disclosure provides polypeptides of SEQ two polynucleotides or polypeptides refers to the percentage ID NO: 1, SEQID NO:5, SEQID NO: 16, and fragments and of identical nucleotide or amino acid residues along the full variants thereof. In some embodiments, the polypeptide US 2015/021 1 037 A1 Jul. 30, 2015

includes a sequence having at least 50%, 60%, typically at Needleman et al. (1970).J. Mol. Biol. 48:443, as revised by least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, Smith and Waterman ((1981) Adv. Appl. Math. 2:482). 95%, 96%, 97%, 98%, 99% or 100% sequence identity to Briefly, the GAP program defines similarity as the number of SEQID NO: 1, or to one or more regions or domains thereof, aligned symbols (i.e., nucleotides or amino acids), which are including amino acid residues 1-25 of SEQID NO: 1, amino similar, divided by the total number of symbols in the shorter acid residues 30-130 of SEQID NO: 1, amino acid residues of the two sequences. Default parameters for the GAP pro 250 through 580 of SEQID NO: 1 (Domain 2), amino acids gram can include: (1) a unary comparison matrix (containing 130-250 of SEQID NO: 1 (domain 1), amino acids 750-842 a value of 1 for identities and 0 for non-identities) and the of SEQID NO: 1 (Domain 4), or amino acids 580-750 pro weighted comparison matrix of Gribskov et al. (1986) Nucl. line-threonine rich region, Domain 1, Domain 2, Domain 3. Acids Res. 14:6745, as described by Schwartz and Dayhoff, or Domain 4 of SEQID NO: 1, where the polypeptide is a eds., ATLAS OF PROTEINSEQUENCE AND STRUCTURE, cellulase. National Biomedical Research Foundation, pp. 353-358 0078. In some embodiments, the polypeptide is a variant (1979); (2) a penalty of 3.0 for each gap and an additional 0.10 or fragment of SEQID NO: 1, SEQID NO: 5, or SEQID NO: penalty for each symbol in each gap; and (3) no penalty for 16, with one or more amino acid deletions, insertions, modi end gaps. Various programs and methods for assessing iden fications, or Substitutions, such as a polypeptide having at tity are known to those of skill in the art. High levels of least 30%, 40%, 50%, 60%, typically at least 70%, 75%,80%, identity, such as 90% or 95% identity, readily can be deter 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, mined without software. 99% or 100% sequence identity to SEQID NO: 1, SEQ ID 0080. In some embodiments, the polypeptides are pro NO: 5, or SEQID NO: 16, or containing a domain, such as a duced recombinantly, while in others the polypeptides are catalytic domain or carbohydrate binding motif (CBM) that is produced synthetically, or are purified from a native source, at least 30%, 40%, 50%, 60%, typically at least 30%, 40%, Such as an archaea, such as one described herein. 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, I0081. The provided polypeptides generally have cellulase 94%, 95%, 96%, 97%, 98%, 99% or 100% identical to a activity, for example, endoglucanase, exoglucanase, e.g., glu domain thereof. Typically, the variant or fragment retains a canohydrolase or cellobiohydrolase, B-Glucosidase or Substantial amount of the cellulase or other enzymatic activity B-Glucosidaseglucohydrolase activity, and/or cellulose bind or cellulose binding capability of the wild-type protein. For ing ability. In one aspect, the provided polypeptides exhibit example, in Some embodiments, the variant or fragment the cellulase activity or binding ability, for example, an activ retains one, typically both, of the wildtype active site residues ity or binding ability of at least 40%, 50%, 60%, 70%, 75%, at E413 and E506. In some embodiments, the variants include or more of maximum (or with a half-life of activity or binding a protein comprising the sequence of a protein listed in any of ability of at least 10 minutes, 15 minutes, 20 minutes, 25 Tables 1, 2, 3, and 4. Such as a polypeptide having a sequence minutes, 30 minutes, 35 minutes, 40 minutes, 45 minutes, 50 at least 30%, 40%, 50%, 60%, typically at least 70%, 75%, minutes, 55 minutes, 1 hour, 1.25 hours, 1.5 hours, 1.75 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, hours, 2 hours, 3 hours, 4 hours, 5 hours, or more hours) over 98%, 99% or 100% identical to SEQID NO: 6,7,8,9, 10, 11, a broad range of conditions, for example, over range of con 12, or 13. ditions that is broader than that observed for one or more 0079. Whether any two nucleic acid or polypeptide mol known cellulases, such as bacterial cellulases, including ecules have sequences that contain, or contain at least, a those produced by Anaerocellum thermophilum, Caldicellu certain percent (e.g. 60%, 70%, 80%, 85%, 90%, 95%, 96%, losiruptor saccharolyticus, Rhodothermus marinus, or Ther 97%, 98% or 99%) sequence identity can be determined using mus caldophilus. For example, in some aspects, the polypep known computer algorithms such as the "FASTA' program, tides exhibit activity or binding ability in the presence of high using for example, the default parameters as in Pearson et al. salt Solution, such as in the presence of a saturating concen (1988) Proc. Natl. Acad. Sci. USA 85:2444 (other programs tration of salt, such as in a solution containing sodium chlo include the GCG program package (Devereux, J., et al., ride (NaCl) at a concentration of at least at or about 0.5M, 1 Nucleic Acids Research 12(I):387 (1984)). BLASTP M, 1.5 M, 2 M, 2.5M, 3 M, 3.5 M, or 4 M sodium chloride, BLASTN, FASTA (Altschul, S. F., et al., J Molec Biol 215: or potassium chloride (KCl), at a concentration at or about 0.5 403 (1990); Guide to Huge Computers, Martin J. Bishop, ed., M, 1 M, 1.5M, 2 M, 2.5 M 3.0 M or 3.2 M KCl and/or ionic Academic Press, San Diego, 1994, and Carrillo et al. (1988) liquids, such as 1,3-dimethylimidazolium dimethyl phos SIAM J Applied Math 48:1073). For example, the BLAST phate (DMIMDMP) or EMIMOAc, or in the presence of function of the National Center for Biotechnology Informa one or more detergents, such as ionic detergents (e.g., SDS, tion database can be used to determine identity. Other com CHAPS), Sulfhydryl reagents, such as in Saturating ammo mercially or publicly available programs include, DNAStar nium sulfate or ammonium sulfate between at or about 0 and “MegAlign' program (Madison, Wis.) and the University of 1 M. Wisconsin Genetics Computer Group (UWG) “Gap' pro I0082 In some aspects, the polypeptides exhibit the activ gram (Madison Wis.)). The extent of sequence identity (ho ity or binding ability at high temperatures, such as a tempera mology) and complementarity may be determined using any ture exceeding 90° C., 91° C., 92°C., 93° C., 94° C., 95°C., computer program and associated parameters, including 96°C.,970 C., 98°C.,999 C., 100° C., 101° C., 102°C., 1030 those described herein, such as BLAST 2.2.2. or FASTA C., 104°C., 105° C., 106° C., 107° C., 108° C., 109° C., or version 3.0t78, with the default parameters. It is understood 110°C., or over a broad temperature range, such as between that for the purposes of determining sequence identity among at or about 60° C. and 110° C. or between 65° C. and 110°C., DNA and RNA sequences thymidine nucleotide is equivalent such as between 90° C. and 110°C., between 6°C. 5 and 70° to (represents identity with) a uracil nucleotide. Percent iden C., between 85°C. and 105°C., between 85°C. and 110°C., tity further can be determined, for example, by comparing between 95°C. and 105°C., or between 95°C. and 110°C. In sequence information using a GAP computer program (e.g., Some aspects, the polypeptides exhibit the activity or binding US 2015/021 1 037 A1 Jul. 30, 2015 ability over abroad pH range, for example, at a pH of between mannanases, glucanases, arabinases, galactosidases, pecti about 4.5 and 8.75, at a pH of greater than 7 or at a pH of 8.5, nases, and/or other activities such as proteases, lipases, acid or at a pH of at least 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 83.0, or 8.5. phosphatases and/or others or functional fragments thereof. 0083 Cellulase activity and binding capabilities can be I0088. In some embodiments, the fusion proteins contain measured by a number of well-known methods, including the catalytic or enzymatically active domain of another cel Zymograms, reducing Sugar assays (e.g., DNS Micro or lulase or Sugar-reducing enzyme, such as fused to one or more Macro, Nelson-Somogyi Micro or Macro, Nelson Semi-Mi domains of the provided polypeptides, such as a CMB cro, Ferricyanide-1, Ferricyanide-2, PAHBAH Micro or domain, for example, to Domain 1, Domain 4, or Domain 3 of Macro, BCA, and Modified BCA), assays using paranitro SEQID NO: 1 or a variant thereof. In another embodiment, phenol-labeled glycosides, product analysis, total Sugar the fusion protein contains a catalytic domain of one of the assays, such as Phenol-HSO or Anthrone HSO, enzy provided peptides, such as a domain having a certain percent matic glucose assays, and cellulose binding assays, for identity to domain 2 of SEQID NO: 1, amino acid residues example, using the methods described herein. 250 through 580. Typically, the fusion proteins exhibit 0084 Substrates for cellulase activity and binding assays improved stability, cellulase activity, tolerance for various include soluble and insoluble substrates. Soluble substrates conditions, and/or cellulose binding compared to the other include, for example, cellodextrins and their derivatives, enzyme, e.g., cellulase, alone. including radiolabelled versions thereof, short chain cellu I0089 For example, the tight binding of the provided lase, B-methylumbelliferyl-oligosaccharides, p-nitrophenol polypeptides to crystalline cellulose as described herein, oligosaccharides, Long chain cellulose derivatives, Car makes it useful in methods for identifying and producing new boxymethyl cellulose (CMC), hydroxyethyl cellulose hyperstable cellulases. In one embodiment, the hyperstable (HEC), dyed CMC. Insoluble substrates, include, for cellulases are produced using well-known engineering meth example, cotton, Whatman No. 1 filterpaper, pulp (e.g., Solka ods, which have been used to engineer thermophilic and Floc), crystalline cellulose. Such as cotton, microcrystalline hyperthermophilic cellulases to improve the activity on crys cellulose (e.g., Avicel(R), valonia cellulose, bacterial cellu talline substrates. In one example, the methods involve the lose, amorphous cellulose (e.g., PASC, alkali-swollen cellu addition of a thermostable cellulose binding domain provided lose), dyed cellulose, fluorescent cellulose, chromogenic and herein to a catalytic domain, for example, as carried out to fluorephoric derivatives, such as trinitrophenyl-carboxym introduce chitin binding domains to increase binding and ethylcellulose (TNP-CMC) and fluram-cellulose, practical activity toward crystalline cellulose. cellulose-containing substrates, C-cellulose, and pretreated I0090 Domains of the fusion proteins are optionally linked lignocellulosic biomass. to the polypeptides through a linker sequence that simply 0085. In some embodiments, the polypeptides are pro joins the provided cellulose polypeptide or fragment thereof duced as N- and/or C-terminal fusion proteins, for example to and the fusion domain without significantly affecting the aid in extraction, detection and/or purification and/or to add properties of either component, or the linker optionally has a functional properties to the cellulases. Examples of fusion functional importance for the intended application. protein partners include, but are not limited to, glutathione 0091. In some embodiments, the provided polypeptides S-transferase (GST), 6x His, GAL.4 (DNA binding and/or are used in conjunction with one or more additional proteins transcriptional activation domains), FLAG-, MYC-tags or of interest. Non-limiting examples of proteins of interest other tags well known to anyone skilled in the art. In some include: hemicellulases, alpha-galactosidases, beta-galac embodiments, a proteolytic cleavage site is provided between tosidases, lactases, beta-glucanases, endo-beta-1,4-gluca the fusion protein partner and the protein sequence of interest nases, cellulases, Xylosidases, Xylanases, Xyloglucanases, to allow removal of fusion protein sequences. Preferably, the Xylan acetyl-esterases, galactanases, exo-mannanases, pecti fusion protein does not hinder the cellulase activity of the nases, pectin lyases, pectinesterases, mannanases, polygalac polypeptides. turonases, arabinases, rhamnogalacturonases, laccases, I0086. In some embodiments, the polypeptide is fused to reductases, oxidases, phenoloxidases, ligninases, proteases, one or more domains, for example, of other proteins, such as amylases, phosphatases, lipolytic enzymes, cutinases and/or other cellulases or Sugar-reducing enzyme, including a bac other enzymes. terial, archaeal, and/or hyperthermophilic cellulase or 0092. Polynucleotides enzyme, for example, cellulases and enzymes belonging to 0093. Also provided are isolated and/or purified nucleic glycosylhydrolase family GH5 or GH12 or CBM family 1 or acid molecules, e.g., polynucleotides, encoding the provided 2. Such as those encoded by mesophiles, such as B. fibrisol polypeptides, e.g., cellulases. In some embodiments, the iso vens, and cellulases encoded by thermophiles such as S. sol lated polynucleotide encodes SEQID NO: 1, SEQID NO: 5, fataricus, R. marinus, A. cellulolyticus, Pfuriosus, P hori SEQ ID NO: 15, or a fragment or variant thereof, such as Koshii, P abyssi, or A. cellulolyticus, S. lividans, B. fragments thereof including amino acid residues 1-25 of SEQ fibrisolvens, or T. reesei. ID NO: 1, amino acid residues 30-130 of SEQ ID NO: 1, 0087 Such domains can include a leader peptide, propep amino acid residues 250 through 580 of SEQ ID NO: 1 tide, binding domain and/or catalytic domain. Suitable bind (Domain 2), amino acids 130-250 of SEQID NO: 1 (domain ing domains include, but are not limited to, carbohydrate 1), amino acids 750-842 of SEQ ID NO: 1 (Domain 4), or binding domains (e.g., CBM) of various specificities, amino acids 580-750 proline-threonine rich region, Domain providing increased affinity to carbohydrate components 1, Domain 2, Domain 3, or Domain 4 of SEQID NO: 1, or present during the application of the cellulase. Suitable enzy containing a sequence of SEQID NO: 1, or a sequence that is matically active domains possess an activity that Supports the at least 30%, 40%, 50%, 60%, typically at least 70%, 75%, action of the polypeptide in producing the desired product. 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, Non-limiting examples of catalytic domains include: cellu 98%, 99% or 100% identical to such a protein or region lases, hemicellulases such as Xylanase, mannanases, exo thereof. For example, provided are polynucleotides encoding US 2015/021 1 037 A1 Jul. 30, 2015

polypeptides containing a domain of the provided polypep 0100 Vectors and Host Cells tide. Such as a catalytic domain or carbohydrate binding motif 0101 Also provided are vectors, host cells, and methods (CBM) that is at least 50%, 60%, typically at least 30%, 40%, for the production of the provided polypeptides and poly 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, nucleotides. In some embodiments, DNA encoding the 94%, 95%, 96%, 97%, 98%, 99% or 100% identical to such a polypeptide is chemically synthesized based on the provided domain thereof, where the encoded polypeptide is a cellulase. sequences or obtained directly from host cells harboring the In one embodiment, provided are polynucleotides containing gene (e.g., by cDNA library screening or PCR amplification). a nucleic acid sequence having at least 50%. 60%, typically at In some embodiments, the provided polynucleotide is least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, included in an expression cassette and/or cloned into a Suit 95%, 96%, 97%, 98%, 99% or 100% sequence identity to able expression vector by Standard molecular cloning tech SEQID NO:3, SEQID NO: 4, SEQID NO: 15, or to one or niques. Such expression cassettes or vectors contain more regions or domains thereof. Also provided are poly sequences that assist initiation and termination of transcrip nucleotides encoding the polypeptides listed in Tables 1-4. tion (e.g., promoters and terminators), and generally contain 0094 Typically, the variant or fragment encodes a protein a selectable marker. retaining a Substantial amount of the cellulase or other enzy 0102 Expression vector/host cell combinations are well matic activity or cellulose binding capability of the wild-type known and can be used in the provided methods. Typically, protein. For example, in some embodiments, the variant or the expression cassette or vector is introduced in a Suitable fragment retains one, typically both, of the wild type active expression host cell, which then expresses the corresponding site residues at E413 and E506. In some embodiments, the polypeptide. Particularly suitable expression hosts are bacte variants include polypeptides encoding a protein comprising rial expression host genera including Escherichia (e.g., the sequence of a protein listed in any of Tables 1, 2, 3, and 4. Escherichia coli), Pseudomonas (e.g., P. fluorescens or P 0095 Methods for the Identification of Hyperthermo Stutzerei), Proteus (e.g., Proteus mirabilis), Ralstonia (e.g., philes and Hyperthermophilic Cellulases and Characteriza Ralstonia eutropha), Streptomyces, Staphylococcus (e.g., S. tion Assays carnosus), Lactococcus (e.g., L. lactis), or Bacillus (subtilis, megaterium, licheniformis, etc.). Also particularly Suitable 0096. There is an absence of known archaeal hyperther are yeast expression hosts such as Saccharomyces cerevisiae, mophiles Subsisting on plant biomass as exclusive carbon Schizosaccharomyces pombe, Yarrowia lipolytica, sources. Despite the discovery of multiple endo and exocel Hansenula polymorpha, Kluyveromyces lactis or Pichiapas lulases in thermophiles, the upper temperature limit for toris, and fungal expression hosts such as Aspergillus niger, organisms known to grow on crystalline cellulose has risen Chrysosporium lucknowense, Aspergillus (e.g., A. Oryzae, A. slowly. The compositions provided here are based in part on niger; A. nidulans, etc.) or Trichoderma reesei. Also Suited are the discovery that a known method for identifying thermo mammalian expression hosts such as mouse (e.g., NSO), philic cellulases is the isolate-centric nature of these studies. Chinese Hamster Ovary (CHO) or Baby Hamster Kidney Thus, provided are high throughput metagenomic, transcrip (BHK) cell lines. Other eukaryotic hosts such as insect cells tomic, and proteomic methods for identification of cellulases, or viral expression systems (e.g., bacteriophages such as including for the identification of hyperthermophilic cellu M13, T7 phage or Lambda, or viruses such as Baculovirus) lases. For example, provided is a metagenomic approach for are also Suitable. identification of cellulases, such as stable and thermoactive 0103 Promoters and/or signal sequences associated with endoglucanase from a lignocellulose-degrading consortium secreted proteins in a particular host of interest are candidates of hyperthermophilic Archaea. for use in the heterologous production and secretion of the 0097. In one embodiment, such methods are carried out by provided polypeptides in that host or in other hosts. Such cultivating archaea growing on a cellulose-containing carbon sequences are well known. In some embodiments, the pro Source. Such as crystalline cellulose, at above a certain tem vided polynucleotide is recombinantly associated with a perature, such as at or about at least 90° C., 94°C., or 100°C., polynucleotide encoding a suitable homologous or heterolo and selection of organisms capable of utilizing cellulose gous signal sequence that leads to secretion of the enzyme under these conditions. In one aspect, the method allows for into the extracellular (or periplasmic) space, thereby allowing selection of a minimal consortium, rather than a single iso direct detection of enzyme activity in the cell Supernatant (or lates. An exemplary method is the isolation described herein periplasmic space or lysate). Particularly suitable signal in Example 1. sequences for Escherichia coli, other Gram negative bacteria and other organisms known in the art include those that drive 0098. Also provided are methods for identifying and pro expression of the HlyA, DsbA, Pbp. PhoA, PelB, Omp A, ducing new hyperstable cellulases by mutating known OmpT or M13 phage Gill genes. For Bacillus subtilis, Gram enzymes to include one or more domains, such as the cellu positive organisms and other organisms known in the art, lose binding domain, for example, any of domains 1, 3, and/or particularly suitable signal sequences further include those 4, of the provided polypeptides, for example, to improve the that drive expression of the AprE, NprB, Mpr, Amy A, AmyE, activity on crystalline Substrates. In one example, the meth Blac, Sacb, and for S. cerevisiae or other yeast, include the ods involve the addition of a thermostable cellulose binding killer toxin, Barl, Suc2, Mating factor alpha, Inu1A or Ggp1p domain provided hereinto a catalytic domain, for example, as signal sequence. Signal sequences can be cleaved by a num carried out to introduce chitin binding domains to increase ber of signal peptidases, thus removing them from the rest of binding and activity toward crystalline cellulose. the expressed protein. In some embodiments, the provided 0099. Also provided are methods using the provide polypeptide is expressed alone or as a fusion with other pep polypeptides for the characterization of cellulose degradation tides, tags or proteins located at the N- or C-terminus (e.g., 6x and production of polished crystalline cellulose for assays of His, HA or FLAG tags). Suitable fusions include tags, pep cellulases, expansins, and cellulose binding proteins. tides or proteins that facilitate affinity purification or detec US 2015/021 1 037 A1 Jul. 30, 2015

tion (e.g., 6x His, HA, chitin binding protein, thioredoxin or 0110. In some embodiments, the disclosed methods are FLAG tags), as well as those that facilitate expression, secre carried out as part of a pretreatment process. The pretreatment tion or processing of the provided polypeptide. Suitable pro process may include the additional step of adding any of the cessing sites include enterokinase, STE13, Kex2 or other polypeptides or compositions of the present disclosure to protease cleavage sites for cleavage in vivo or in vitro. pretreated biomass mixtures after the step of pretreating the 0104. In some embodiments, the provided polynucle biomass under high temperature, and incubating the pre otides are introduced into expression host cells by any of a treated biomass with the polypeptides or compositions under number of transformation methods including, but not limited conditions sufficient to reduce the viscosity of the mixture. to, electroporation, lipid-assisted transformation or transfec The polypeptides or compositions may be added to the pre tion ("lipofection'), chemically mediated transfection (e.g., treated biomass mixture while the temperature of the mixture CaCl and/or CaP), lithium acetate-mediated transformation is high, or after the temperature of the mixture has decreased. (e.g., of host-cell protoplasts), biolistic "gene gun' transfor In some embodiments, the methods are carried out in the mation, PEG-mediated transformation (e.g., of host-cell pro same vessel or container where the heat pretreatment was toplasts), protoplast fusion (e.g., using bacterial or eukaryotic performed. In other embodiments, the methods are carried protoplasts), liposome-mediated transformation, Agrobacte out in a separate vessel or container where the heat pretreat rium tumefaciens, adenovirus or other viral or phage trans ment was performed. formation or transduction. 0111. In some embodiments, the methods are carried out 0105. Alternatively, the polypeptides are expressed intra in the presence of high salt, such as solutions containing cellularly. Optionally, after intracellular expression of the saturating concentrations of salts, Solutions containing polypeptides, or secretion into the periplasmic space using sodium chloride (NaCl) at a concentration of at least at or signal sequences such as those mentioned above, a permeabi about 0.5 M, 1 M, 1.5 M, 2 M, 2.5 M, 3 M, 3.5 M, or 4 M lisation or lysis step can be used to release the cellulase into Sodium chloride, or potassium chloride (KCl), at a concen the Supernatant. The disruption of the membrane barrier is tration at or about 0.5M, 1 M, 1.5M, 2 M, 2.5 M3.0 M or 3.2 effected by the use of mechanical means such as ultrasonic MKCl and/or ionic liquids, such as 1,3-dimethylimidazolium waves, pressure treatment (French press), cavitation or the dimethyl phosphate (DMIMDMP) or EMIMOAc, or in use of membrane-digesting enzymes such as lysozyme or the presence of one or more detergents, such as ionic deter enzyme mixtures. As a further alternative, the polynucle gents (e.g., SDS, CHAPS), sulfydryl reagents, such as in otides encoding the polypeptides are expressed by use of a saturating ammonium sulfate or ammonium sulfate between suitable cell-free expression system. In cell-free systems, the at or about 0 and 1 M. In other embodiments, the polypeptides polynucleotide of interest is typically transcribed with the or compositions of the present disclosure are contacted with assistance of a promoter, but ligation to form a circular the pretreated biomass mixture at a temperature exceeding expression vector is optional. In other embodiments, RNA is 90° C., 910 C., 920 C., 93° C., 940 C., 95°C., 96° C., 97° C., exogenously added or generated without transcription and 98° C. 99 C., 100° C., 101° C., 102°C., 103° C., 104°C., translated in cell free systems. 105°C., 106°C., 107°C., 108°C., 109°C., or 110°C., or over abroad temperature range, such as between at or about 60° C. 0106 Reduction of the Viscosity of Pretreated Biomass and 110° C. or between 65° C. and 110°C., such as between Mixtures 90° C. and 110°C., between 65° C. and 70° C., between 85° 0107 The provided polypeptides and compositions con C. and 105°C., between 85°C. and 110°C., between 95°C. taining the polypeptides find use in a variety of industrial and 105° C., or between 95°C. and 110°C. In some aspects, applications, including in the reduction of the Viscosity of the polypeptides exhibit the activity or binding ability over a pretreated biomass mixtures prior to their degradation into broad pH range, for example, at a pH of between about 4.5 monosaccharides and oligosaccharides, for example, in bio and 8.75, at a pH of greater than 7 or at a pH of 8.5, or at a pH fuel production. of at least 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 83.0, or 8.5. 0108 Biomass that is used for as a feedstock, for example, 0112 Biomass includes, but is not limited to, plant mate in biofuel production generally contains high levels of lignin, rial, municipal Solid waste, and wastepaper, including ligno which can block hydrolysis of the cellulosic component of the cellulosic feedstocks, e.g., agricultural residues such as corn biomass. Typically, biomass is pretreated with, for example, stover, wheat Straw, barley Straw, oat Straw, rice Straw, canola high temperature and/or high pressure to increase the acces Straw, and soybean stover, grasses such as Switch grass, mis sibility of the cellulosic component to hydrolysis. However, canthus, cord grass, and reed canary grass, fiber process pretreatment generally results in a biomass mixture that is residues such as corn fiber, beet pulp, pulp mill fines and highly viscous. The high viscosity of the pretreated biomass rejects and Sugarcane bagasse, forestry wastes such as aspen mixture can also interfere with effective hydrolysis of the wood, other hardwoods, softwood and sawdust, and post pretreated biomass. Advantageously, the polypeptides and consumer waste paper products; palm kernel, coconut, kon compositions of the present disclosure can be used to reduce jac, locust bean gum, gum guar, soybeans. Suitable crop the viscosity of pretreated biomass mixtures prior to further residue for production of biomass includes but is not limited degradation of the biomass. to palm kernel meal, palm kernel expellers, copra meal, copra 0109 Accordingly, certain embodiments of the present pellets and soybean hulls. disclosure relate to methods of reducing the Viscosity of a 0113 Degradation of Biomass to Mono- and Oligosaccha pretreated biomass mixture, by contacting a pretreated bio rides mass mixture having an initial viscosity with any of the 0114. The polypeptides, polynucleotides, vectors, and polypeptides or compositions of the present disclosure; and host cells of the present disclosure find use in a variety of incubating the contacted biomass mixture under conditions industrial applications, including in the degradation of biom sufficient to reduce the initial viscosity of the pretreated bio ass, e.g., cellulase and lignocellulose, into monosaccharides mass mixture. and oligosaccharides, for example, in biofuel production, US 2015/021 1 037 A1 Jul. 30, 2015

textile methods, including cleaning, cotton Softening, and trins and their derivatives, short chain cellulase, B-methylum denim finishing, in production and uses of detergents, for belliferyl-oligosaccharides, p-nitrophenol-oligosaccharides, example, for color care, cleaning, and anti-deposition; for long chain cellulose derivatives, carboxymethyl cellulose food-based methods, including food processing and mashing; (CMC), hydroxyethyl cellulose (HEC), and insoluble sub for pulp and paper methods, such as paper pulp bleaching, strates, including cotton, Whatman No. 1 filter paper, Pulp deinking, drainage improvement, and fiber modification. (e.g., Solka Floc), Crystalline cellulose, Such as cotton, Thus, also provided are methods and uses of the provided microcrystalline cellulose (e.g., Avicel(R), Valonia cellulose, polypeptides, polynucleotides, and compositions for Such bacterial cellulose. Amorphous cellulose (e.g., PASC, alkali purposes, for example, in degrading or hydrolyzing cellulose swollen cellulose), dyed cellulose, fluorescent cellulose, containing compositions to produce soluble Sugars, for chromogenic and fluorephoric derivatives, such as trinitro example, followed by enzymatic or chemical fermentation. phenyl-carboxymethylcellulose (TNP-CMC) and Fluram 0115. In some embodiments, the methods are carried out cellulose, practical cellulose-containing Substrates, C.-cellu in the presence of high salt, Such as Solutions containing lose, and pretreated lignocellulosic biomass. saturating concentrations of salts, Solutions containing 0118 Biofuel Production sodium chloride (NaCl) at a concentration of at least at or 0119 The provided polypeptides and compositions con about 0.5 M, 1 M, 1.5 M, 2 M, 2.5 M, 3 M, 3.5 M, or 4 M taining the polypeptides find use in the degradation and Sodium chloride, or potassium chloride (KCl), at a concen hydrolysis of cellulase and cellulase-containing biomass and tration at or about 0.5M, 1 M, 1.5M, 2 M, 2.5 M3.0 M or 3.2 feedstocks, for example, for the production of monosaccha MKCl and/or ionic liquids, such as 1,3-dimethylimidazolium rides, disaccharides, and oligosaccharides from biomass, dimethyl phosphate (DMIMDMP) or EMIMOAc, or in Such as chemical or fermentation feedstocks, for the produc the presence of one or more detergents, such as ionic deter tion of biofuel. Such as ethanol, butanol, other products, and gents (e.g., SDS, CHAPS), sulfydryl reagents, such as in intermediates. Provided are methods and compositions for saturating ammonium Sulfate or ammonium sulfate between Such uses of the provided polypeptides, such as conversion of at or about 0 and 1 M. In some embodiments, the conversion lignocellulocytic biomass into Soluble Sugars for fermenta occurs at a temperature exceeding 90° C., 91° C., 92°C., 93° tive production of biofuels, conversion of pretreated lignocel C., 94°C.,950 C., 96° C., 97° C., 98°C.,990 C., 100° C. 1010 luose into Soluble Sugars, conversion of lignocellulose into C., 102°C., 103°C., 104°C., 105°C., 106° C., 107 C., 108° soluble Sugars in the presence of high salt or ionic liquids, C., 109°C., or 110°C., or over a broad temperature range, conversion of crystalline cellulose into soluble Sugars at high Such as between at or about 60° C. and 110° C. or between 65° temperatures, such as those exceeding 90° C., 91° C., 92°C., C. and 110°C., such as between 90° C. and 110°C., between 93°C., 940 C., 95°C., 96° C., 97° C., 98° C. 999 C., 100° C., 65° C. and 70° C., between 85°C. and 105° C., between 85° 101° C., 102°C., 103°C., 104°C., 105° C., 106° C., 107°C., C. and 110°C., between 95°C. and 105°C., or between 95° 108°C., 109°C., 110°C., or over a broad temperature range, C. and 110°C. In some aspects, the polypeptides exhibit the Such as betweenator about 60° C. and 110° C. or between 65° activity or binding ability over abroad pH range, for example, C. and 110°C., such as between 90° C. and 110°C., between at a pH of between about 4.5 and 8.75, at a pH of greater than 65° C. and 70° C., between 85°C. and 105° C., between 85° 7 or at a pH of 8.5, or at a pH of at least 5.0, 5.5, 6.0, 6.5, 7.0, C. and 1110°C., between 95°C. and 105°C., or between 95° 7.5, 83.0, or 8.5. C. and 110°C., or under other conditions as described herein 0116 Bioenergy feedstocks consist primarily of the plant above. cell wall components cellulose and hemicellulose. Hydroly sis of these polysaccharides to their monomeric Sugars I0120 In one embodiment, the provided composition involves a set of enzymes acting synergistically to cleave the includes the peptide in a composition of crude fermentation different chemical linkages (Dodd and Cann, GCB Bioen broth, with or without the cells removed, or in the form of a ergy, 1:2, 2009). Cellulose is the predominant polysaccharide semi-purified or purified enzyme preparation. In another in biomass (with others including hemicellulose, lignin, and embodiment, the provided host cells are used as a source of pectin). Cellulose is a homopolymer of anhydrocellobiose (a the polypeptide in a fermentation process with the biomass. linear beta-(1-4)-D-glucan), and includes glucose units I0121. In one embodiment, the polypeptides of the present linked together in B-1,4-glycosidic linkages. The hemicellu disclosure find use in the degradation of cellulose to aid in the losic component can vary in chemical composition. Hemicel degradation of biomass, to form biofuels, such as ethanol. luloses include a variety of compounds, such as Xylans, Xylo Ethanol is produced by enzymatic degradation of biomass glucans, arabinoxylans, and mannans in complex branched and conversion of the released saccharides to ethanol (often structures with a spectrum of Substituents. Although gener referred to as bioethanol or biofuel, used as a fuel additive or ally polymorphous, cellulose is found in plant tissue prima extender in blends of from less than 1% and up to 100% (a fuel rily as an insoluble crystalline matrix of parallel glucan substitute)). In one embodiment, for the production of biofu chains. els from biomass, the provided polypeptides, compositions, 0117 The provided polypeptides may be used to degrade and methods are used in the conversion of cellulose to its various types of cellulosic biomass, which are well-known in monomer (glucose) or other soluble Sugar, for Subsequent the art, including plant biomass, microbial biomass, purified conversion to biofuel (e.g., ethanol) by fermentation, such as cellulose, and lignocellulosic feedstocks. Cellulosic biomass by microbial or chemical fermentation. For example, the pro includes lignocellulose biomass, containing cellulose, hemi vided polypeptides and methods may be used for Such con cellulose, and lignin. Purified celluloses include holocellu version by enzymatic hydrolysis, optionally including acid lases, such as Solka Flok, microcrystalline celluloses, such as pretreatment, typically carried out at high temperatures, fol Avicel(R) and SigmacellR), and the highly soluble cellulose lowed by hydrolysis with the provided polypeptides. ether, carboxymethylcellulose (CMC). Cellulose-containing I0122. In one embodiment, the polypeptides are used in Substrates include Soluble and Substrates, such as cellodex combination with other carbohydrases (e.g., mannanases, US 2015/021 1 037 A1 Jul. 30, 2015 glucanase, Xylanase, alpha-galactosidase and/or cellulase) granule, a paste or a liquid. A liquid detergent is generally for more extensive hydrolysis of the plant material. aqueous, typically containing up to 70% water and 0-30% (0123 Food Processing organic solvent(s), or non-aqueous component(s). Typically, 0124 Compositions comprising the polypeptides of the the detergent composition comprises one or more present disclosure also find use in the processing and manu (e.g., non-ionic including semi-polar, anionic, cationic and/or facturing of food or animal feed. Such as in mashing. Provided Zwitterionic). The surfactants are typically present at a level are methods employing the provided compositions in Such of from 0.1% to 60% by weight. When included, detergents uses. Several anti-nutritional factors limit the use of specific typically contain from about 1% to about 40% of an anionic plant material in the preparation of animal feed and food for Such as linear alkylbenzenesulfonate, alpha-olef humans. Plant material containing oligosaccharides can insulfonate, alkyl sulfate (fatty alcohol sulfate), alcohol reduce the digestibility and absorption of nutritional com ethoxysulfate, secondary alkanesulfonate, alpha-sulfo fatty pounds such as minerals, vitamins, Sugars and fats by the acid methyl ester, alkyl- or alkenylsuccinic acid, or soap. animals. Provided are methods for food processing using the When included, detergents typically contain from about 0.2% provided compositions. In one embodiment, the polypeptides to about 40% of a non-ionic surfactant such as alcohol and compositions are used to degrade or hydrolyze polymers ethoxylate, nonylphenol ethoxylate, alkylpolyglycoside, into simpler Sugars, which can be more readily assimilated to alkyldimethylamineoxide, ethoxylated fatty acid monoetha provide additional energy. nolamide, fatty acid monoethanolamide, polyhydroxy alkyl 0.125 Polypeptides of the present disclosure also are use fatty acid amide, or N-acyl N-alkyl derivatives of glu ful as additives to feed for monogastric animals such as poul cosamine (glucamides). try and Swine, as well as for human food. In some embodi I0131 Detergent compositions optionally comprise 0-65% ments, the polypeptides are used to pretreat the feedinstead of of a detergent builder or complexing agent such as Zeolite, as a feed additive. In some embodiments, the polypeptides are diphosphate, triphosphate, phosphonate, carbonate, citrate, added to or used to pretreat feed for weanling pigs, nursery nitrilotriacetic acid, ethylenediaminetetraacetic acid, diethyl pigs, piglets, fattening pigs, growing pigs, finishing pigs, enetriaminepentaacetic acid, alkyl- or alkenylsuccinic add, laying hens, broiler chicks, turkeys, for example, added to or soluble silicates, or layered silicates. Detergent compositions used to pretreat feed from plant material Such as palm kernel, optionally comprise one or more polymers such as carboxym coconut, konjac, locust bean gum, gum guar, soybeans, bar ethylcellulose (CMC), poly(vinylpyrrolidone), poly (ethyl ley, oats, flax, wheat, corn, linseed, citrus pulp, cottonseed, ene glycol), poly(vinyl alcohol), poly(vinylpyridine-N-ox groundnut, rapeseed, Sunflower, peas, and lupines. ide), poly(vinylimidazole), polycarboxylates such as 0126. Because of their stability, e.g., thermostability, they polyacrylates, maleic/acrylic acid copolymers, and lauryl find used in processes of producing pelleted feed in which methacrylate/acrylic acid copolymers. The detergent option heat is applied to the feed mixture before the pelleting step, as ally comprises a bleaching system (e.g., hydrogen peroxide it is the case in most commercial pellet mills. In one example, Source) Such as perborate or percarbonate, which may be the polypeptides are added to the other feed ingredients in combined with a peracid-forming bleach activator Such as advance of the pelleting step or after the pelleting step to the tetraacetylethylenediamine or nonanoyloxybenzene already formed feed pellets. Sulfonate. Alternatively, the bleaching system comprise per 0127. In some embodiments, the provided compositions oxyacids of the amide, imide, or Sulfone type. containing the provided polypeptide for use in food process 0.132. In one embodiment, the provided polypeptides are ingoras a feed Supplement contain other Substituents, such as added to the detergent composition in an amount correspond coloring agents, aroma compounds, stabilizers, Vitamins, ing to 0.01-100 mg of enzyme protein per liter of wash liquor, minerals, other feed or food enhancing enzymes and the like. preferably 0.05-5 mg of enzyme protein per liter of wash This applies in particular to the so-called pre-mixes. Food liquor, in particular 0.1-1 mg of enzyme protein per liter of additives according to this present disclosure may be com wash liquor. bined with other food components to produce processed food (0.133 Paper Pulp Processes products. The resulting, combined food additive is mixed in I0134. In another embodiment, the provided compositions an appropriate amount with other food components such as and polypeptides find use in pulp and paper methods. Such as cereal or plant proteins to form a processed food product. in paper pulp bleaching, deinking, drainage improvement, 0128 Textile Cleaning and Laundry Detergents and fiber modification, for example, in high temperature 0129. The provided polypeptides, methods, and composi applications for the pulping of cellulolytic materials. Pro tions also find use in textile methods, including cleaning, vided are methods and compositions for use of the provided cotton Softening, and denim finishing, the polishing of cotton polypeptides for Such purposes. For example, in some fabrics under high temperature treatments, and in production embodiments, the polypeptides find use in the enzyme aided and uses of detergents, for example, for color care, cleaning, bleaching of paper pulps such as chemical pulps, semi-chemi and anti-deposition. For example, the provided polypeptides cal pulps, kraft pulps, mechanical pulps or pulps prepared by find use in detergent compositions to facilitate the removal of the sulfite method. In some embodiments, the pulps are chlo cellulose-containing stains and soils. In one embodiment, the rine free pulps bleached with oxygen, oZone, peroxide or polypeptides are used in detergent compositions; provided peroxyacids. In some embodiments, the provided polypep are such detergent compositions and methods for their use. In tides are used in enzyme aided bleaching of pulps produced one embodiment, the detergent compositions contain the by modified or continuous pulping methods that exhibit low polypeptides in combination with other enzymes from the lignin contents. In some embodiments, the provided polypep group of amylases, mannases, cellulases, lipases, pectinases, tides are applied alone; in other embodiments, they are pro proteases, endoglucanases, and exoglucanases. vided in combination with other enzymes, such as Xylanase 0130. The detergent compositions include those in any and/or endoglucanase and/or alpha-galactosidase and/or cel convenient form, including in a bar, a tablet, a powder, a lobiohydrolase enzymes. US 2015/021 1 037 A1 Jul. 30, 2015

0135 The following examples are offered to illustrate pro dients (except vitamins, bicarbonate, cellobiose and sulfide) vided embodiments and are not intended to limit the scope of dissolved, boiled for 1 min., then cooled to room temperature the invention. under 80% N2 and 20% CO2 gas atmosphere, adding vita mins, feedstock Solutions and bicarbonate from a sterile stock EXAMPLES solution, prior to inoculation, adjusted to a pH of 7.1-7.3. 0142. After incubation for three weeks at 90° C., a sec 0136. The following examples describe the results of a ondary enrichment was performed by innoculating with metagenomic approach to identify extremely stable and ther microcrystalline cellulose, with ~50 um particle size moactive endoglucanases from a lignocellulose-degrading (Avicel(R) pH101 Fluka, Ireland), as the carbon source. The consortium of hyperthermophilic Archaea, including the minimal enrichment obtained on microcrystalline cellulose endoglucanase EBI244, with a capacity to tightly bind micro (Avicel(R) was transferred to the same salts medium crystalline cellulose (Avicel(R) PH-101). described above, with Whatman(R) #3 (Qualitative Grade 3) Filter Paper as a carbon source, (FIGS. 1B and 1C). Enrich Example 1 ment on Avicel(R) was chosen for scaled up production of the consortium because this finely divided crystalline substrate Enrichment of Hyperthermophilic Archaea and resulted in more rapid growth. Metagenomic Sequencing 0143. This enrichment strategy yielded a three-organism 0.137 Hyperthermophilic Archaea were enriched on pull consortium, capable of deconstructing crystalline filter paper verized plant biomass (microcrystalline cellulose). For this at 90° C., as demonstrated by pitting, shredding or complete process, a sample of sediment collected from a continental dissolution of strips of Whatman(R) #1 (Qualitative Grade 1) volcanic hot spring at 94° C. and neutral pH was selectively or Whatman(R) #3 (Qualitative Grade 3) filter paper (FIG. 2). enriched to obtain a consortium of hyperthermophilic Specifically, the consortium degraded a strip of Whatman #1 Archaea growing on lignocellulose as sole carbon Source. A filter paper Supported by glass tubing, a circular piece of secondary minimal enrichment of three hyperthermophilic Whatman(R) #3 filter paper (confirmed by visible pits). Pits Archaea was isolated on minimal salts medium containing were more often seen with the thicker Whatman(R) #3 filter microcrystalline cellulose (Avicel(R) as the major carbon paper (FIG. 2B), while shredding/dissolution was more often SOUC. seen with the thinner Whatman(R) #1 filter paper (FIG. 2A). 0138 Source Material 0144 Repeated efforts to separate the three species of the 0139 Sediment was sampled from great boiling springs consortium failed. near Gerlach Nev., from a pool having a temperature of 94° 0145 Extraction, Purification, and Analysis of Native Pro C., known to maintain temperatures around 90° C. (FIG. 1A). tein A Small glass jar (4 oZ) was filled with sediment, topped off 0146 Avicel(R) from a 17.5 L enrichment, grown on with spring water, closed, and sealed with Parafilm R. M. Avicel(RPH 101 in a 20 L specialized fermentor, was washed Samples were transported on ice; long-term storage was car and extracted with CHAPS detergent and SDS as follows. ried out in anaerobic jars at 4°C. The enrichment was harvested by centrifugation and the pel 0140. Enrichment of Hyperthermophilic Archaea let, principally Avicel(R), was washed 3 times with Tris buffer 0141 Approximately 3 mL of sediment was used as (100 mM sodium chloride and 0.05% Tween R. 20) to remove inoculum to generate an anaerobic microbial enrichment on soluble proteins. The remaining pellet was washed with 0.6% minimal salts medium (90 mL). The medium was similar to CHAPS detergent in TE (Tris-EDTA) buffer, then twice with DSMZ medium #516 (ANAEROCELLUM MEDIUM), 2% CHAPS in TE buffer, 20 minutes each, at 90° C., then except that pulverized lignocellulosic feedstock Miscanthus boiled in 1% SDS for 20 minutes, and in 2% SDS for 20 gigas, ground to 80 uM particle size, was used as the carbon minutes. The 1% SDS and 2% SDS fractions contained pro Source feedstock, and yeast extract was reduced to 0.2 g/L. teins determined to have been transferred to Avicel(R) during Specifically, the medium contained NH4C1 (0.33 g). growth, and tightly bound to partly digested cellulose fibrils. KH2PO4 (0.33 g), KCl (0.33 g), MgCl2x6 H2O (0.33 g), 0147 Preliminary Assay of Endoglucanase Function CaCl2x2 H2O (0.33 g). Trace element solution (Nitrilotriace Using Zymograms tic acid 1.500 g, MgSO4x7 H2O 3.000 g, MnSO4xH2O 0.148 Zymograms were used as a preliminary assay to 0.500 g, NaCl 1.000 g, FeSO4x7 H2O 0.100 g, CoSO4x7 screen the fractions for endoglucanase activity. As shown in H2O 0.180g, CaCl2x2 H2O 0.100g, ZnSO4x7 H2O 0.180g, FIG. 3, Zymograms performed on the protein extractions CuSO4x5 H2O 0.010 g, KAl(SO4)2x12 H2O 0.020 g, from the Avicel(R) enrichment demonstrated detectable activ H3BO3 0.010 g, Na2MoC)4x2 H2O 0.010 g, NiCl2x6 H2O ity in a split band at apparent molecular weights ranging from 0.025 g, Na2SeO3x5 H2O 0.300 mg, Distilled water 1000. 80 to 250 kDa for the 2% CHAPS fractions. As shown in FIG. 000 ml, made by first dissolving nitrilotriacetic acid and 3. subsequent washes with 1-2% SDS yielded the most activ adjusting pH to 6.5 with KOH, then adding minerals, adjust ity, localized in a small number of distinct protein bands. ing pH to 7.0 with KOH), Distilled water 1000.000 ml) (1.00 0149. The 1% CHAPS/5% cellobiose fraction showed ml), Yeast extract (0.2g), ResaZurin (0.50 mg), Vitamin solu detectable CMCase activity on Zymograms. Active cellulases tion (Biotin 2.000 mg. Folic acid 2.000 mg. Pyridoxine-HCl with apparent molecular weights of about 40kDa and 80 kDa 10.000 mg. Thiamine-HClx2H2O 5.000 mg, Riboflavin were detected (FIG. 4). Subsequent washes with 1% SDS at 5.000 mg. Nicotinic acid 5.000 mg, D-Ca-pantothenate 5.000 100° C. yielded the release of additional hyperstable, high mg, Vitamin B12 0.100 mg. p-Aminobenzoic acid 5.000 mg. molecular weight enzymes with CMCase activity as indi Lipoic acid 5.000 mg) (10.00 ml), NaHCO3 (1.50g), pulver cated by the activity in a smaller number of more distinct ized lignocellulosic feedstock Miscanthus gigas, ground to bands with apparent molecular weights of about 80 kDa and 80 uM particle size for use as the carbon source (5.00 g), 180 kDa (FIG. 4). It was apparent that this consortium was Na2Sx9H2O 0.50 g. Distilled water 1000.00 ml, with ingre producing cellulases that could bind to Avicel(R)particles, and US 2015/021 1 037 A1 Jul. 30, 2015

were able to withstand boiling in 1% SDS, abilities not yet times. Slices were vacuum-dried, then reduced by incubation observed in well-characterized cellulases from hyperthermo with 10 mM DTT in 25 mM ammonium bicarbonate with philic archaea. Therefore, metagenomics was employed to 10% acetonitrile and alkylated with 55 mMiodoacetamide in identify potential cellulases from this consortium. 25 mMammonium bicarbonate. Proteins were then digested 0150. Extraction of High Molecular Weight DNA from with one volume of trypsin for 6 h at 37° C. After digestion, Avicel(R) Enrichment the slices were washed with water and the Supernatant saved. 0151 Standard protocols were used to extract high Gel slices were then washed twice with a solution of 45% molecular weight DNA from the Avicel(R) enrichment using water, 50% acetonitrile, and 5% formic acid; all supernatants the CTAB method (Ausubel et al., Current Protocols in were saved. Supernatants containing the peptides were Molecular Biology. Vol. 2 (John Wiley & Sons Inc., 1994) reduced to a volume of 10 uL and then analyzed with tandem with Volumes increased 4-fold. Using this method, approxi mass spectrometry. Peptide sequences were annotated using mately 20 ug of high molecular weight DNA was obtained the annotated genome created by MicrobesOnline. from a 1.5 L enrichment grown on 5 g Avicel(R/L. The average 0158 Similar topology and bootstrap supported was size of the DNA was determined by pulsed-field electro obtained for the Neighbor-joining method (results not shown) phoresis to be about 50 kDa. The 16S rRNA gene from the Ignisphaera-like organism was 0152 Sequencing and Sequence Analysis 99% identical to 16S rRNA clones from uncultured archaea 0153 Metagenomic analysis was performed on the mini from geothermal systems in both Nevada (accession number mal enrichment identified multiple endoglucanase homologs HM448083.1) and Montana (accession number EU635921. in the metagenome. 1). The Ignisphaera-like 16S RNA was 94% identical to the 0154 Metagenomic sequencing was performed on DNA type species and represented the dominant organism in the from the consortium. Library preparation and sequencing was enrichment, based on large number of reads per kilobase of performed at the University of Illinois, W. M. Keck Center for sequence (-300) for 16S RNA and the hyperthermophilic Comparative and Functional Genomics. Sequencing was housekeeping gene reverse gyrase, compared to read densi done via Roche 454 Titanium Shotgun Sequencing. Initial ties (<20) for 16 RNA fragments and reverse gyrases from the automated assembly was by done at the Center by Newbler other organisms. Like Ignisphaera aggregans, the Ign Assembly program (Newbler Assembler software, 454 isphaera-like organism appeared to have two reverse gyrase Sequencing/Roche). Automated annotation was done using a genes, as shown in FIG. 6. The sum of the high read density local MANATEE database and the nr BLAST database, avail contigs represented about 1.8 Mb, or most of the expected able through NCBI. In addition, further annotation was con coding sequence of a single hyperthermophile (~2.0 Mb). ducted through the MicrobesOnline Comparative Genomics Sequence analysis found a large number a glycosyl hydro Database (VIMSS funded by DOE Genomics:GTL), which lases (>40) and 21 contigs containing potential cellulases, includes protein coding prediction using CRITICA and Glim based on automated annotation. mer3, followed by annotation using the VIMSS genome pipe line composed of all publicly available sequence databases. Example 2 O155 The consortium of three Archaea contained a domi nant organism related to Ignisphaera aggregans, but suffi Identification of Carbohydrate Active Enzymes ciently distinct to be assigned to a different genus, as well as 0159 Annotation analysis found a large number of GHs two Archaea related to Pyrobaculum islandicum and Ther (37) and included 4 potential GH family 5 endoglucanases, mofilum pendens. The major organism is designated based on automated annotation. Twelve of these GHs were Pyrosphaera cellulolytica Candidatus Nov Gen Nov Sp (P encoded by the closed genome of the dominant strain. One cellulolytic). The incomplete genome of this hyperthermo predicted GH, designated EBI244 (accession number philic Archaeon shares several features of the genome of I. JF509452), was chosen for further study because it was a aggregains, including a pair of homologous but somewhat potential multi-domain cellulase, 842 amino acids in length, distantly related genes encoding reverse gyrase. The genome and a member of the TIM barrel glycosyl hydrolase super of P cellulolytica indicates that the strain is specialized for family (B/C)8. Large multidomain cellulases are ubiquitous heterotrophic utilization of a variety of carbohydrates. The amongst cellulolytic organisms but have not been previously draft genome has significant coding capacity for glycolytic found in hyperthermophilic archaea. The central domain of enzymes including putative endo and exocellulases, glucosi this enzyme (AA250-580) had a Pfam match (E-value 1X dases and hemicellulases. e') to the GH family 5 (GH5). The gene encoding EBI244 0156 Metagenomic sequencing yielded 1,283.902 reads, was found on the chromosome of the dominant organism and with a total of 497,707,575 bases. Assembly yielded 4206 at 94 kDa EBI244 was the largest of three proteins encoded on contigs representing 6,954,058 bases. One complete 16S the chromosome with Pfam hits to GH family 5 (GH5); the RNA and two fragmented 16S RNAs were identified, which others were a 43 kDa Pfam match (E-value 6.3 E7) and a 44 matched most closely to characterized organisms Ign kDa Pfam match (E-value 8 E°). isphaera aggregains DSM 17320 (95%), Pyrobaculum 0160 Potential homologs were gathered with PSI islandicum DSM 4184 (98%), and Thermofilum pendens Hrk BLAST (Johnson, M et al., Nucleic Acids Res. 36, W5-9, (93%), respectively. A maximum likelihood 16S rRNA gene 2008) using each putative domain of EBI244 as the query phylogenetic tree is shown in FIG. 5. sequence against the nr protein sequence database. The SAM 0157 Proteomics analysis was done by tandem mass spec Software package (Karplus et al., Bioinformatics 14, 846 trometry conducted at the California Institute for Quantitative 856, 1998) was used to build hidden Markov models Biosciences Proteomics/Mass Spectrometry Core Facility. (HMM’s), score the potential homolog sequences, and create Briefly, gel slices were prepared by vortexing with 25 mM alignments for building new models. This method was used ammonium bicarbonate 1:1 acetonitrile/water for 10 min and iteratively with each putative domain to build more general discarding the Supernatant. This step was repeated three models in order to detect distant homologs. JalView (Water US 2015/021 1 037 A1 Jul. 30, 2015

house, A. M. et al., Bioinformatics 25, 1189-1191, 2009) was The size of the tree was reduced by using JalView's remove used to view and edit multiple sequence alignments. The redundancy function, thereby also preserving the diversity of resulting alignments allowed for approximate domain bound each family. The Pfam web server (Finn, R D et al., Nucleic ary determination. Acids Res. 38, D211-222, 2010) was used to score the 0161 According to BLASTp searches EBI244 is a weak sequences against Pfam HMM models of the GH families. match to its closest apparent homolog, an uncharacterized 0166 The catalytic domain of EBI244 clustered with a hypothetical protein from Caldicellulosiruptor saccharolyti unique subset of TIM barrel sequences that show distant cus (35% identity). The conserved central domain (AA250 relationships to both GH families 5 and 42 in the calculated 580) had only 9 significant hits (NCBI nonredundant protein phylogenetic tree. In this analysis, three members of Family database) with BLAST E-values less then 1E-20, including 30 formed a distant out-group although they are assigned to proteins from Herpetosiphon aurantiacus ATCC 23779, Spi the Clan Astructural clade that includes the families GH5 and rochaeta thermophila DSM 6578, Spirochaeta thermophila GH12. EBI244 clusters with three characterized mannanases DSM 6192, Opituitus terrae PB90-1, Chitinophaga pinensis that have been classified in the GH5 family. The eight closest DSM 2588, Zunongwangia profilinda SM-A87, Clostridium homologs of the EBI244 catalytic domain include six that leptum DSM 753, Victivallis vadensis ATCC BAA-548; with have a GHPfam match (five from GH5, one from GH42), and % identities ranging from 25-35%. two with no predictive matches (E-values shown in FIG. 7). Given this uncertain association, the unique architecture, and Example 3 the diversity of the GH5 family, it is unclear whether the sequence cluster containing the EBI244 catalytic domain is a Analysis of a Hyperthermophilic Cellulase-Encoding divergent subfamily of the GH5 family or the nucleus of a new Gene (ebi244) and Polypeptide Encoded Thereby family of glycoside hydrolases. (EBI244 Protein) (0167 EBI244 Domain Architecture: 0162 Based on sequencing and analysis, one gene and (0168 Protein database searches and bioinformatic server polypeptide encoded thereby were chosen for further study, predictors indicated that EBI244 contains four structural based on the gene's homology to the cellulase Superfamily/ domains, one unstructured region, and an N-terminal signal glycosylhydrolase family 5/EC 3.2.1.4. The gene/protein was or lipid-anchorsequence. The domains and regions are shown designated ebi244/EBI244. The EBI244 protein had apparent schematically in FIG. 8A, with approximate amino acid posi but distant similarity to type 5 glycosyl hydrolases (cellulase tions indicated for each. Superfamily). The gene mapped to a high-read density contig (0169 N-Terminal Sequence: embedded in a sequence flanked by other assembled genes. 0170 The analysis revealed that the first approximately 25 The contig did not display synteny or detectable homology to amino acids of the native EBI244 enzyme are highly hydro the draft genome sequence of I. aggregains web site genome. phobic and likely represent a signal peptide (for directing ornl.gov/microbial/iag17230/. protein localization with eventual cleavage) or membrane/ 0163 Sequence analysis revealed that ebi244 was a puta lipid anchor (to hold the protein on the cell surface). While tive cellulase-encoding gene, isolated from a hyperthermo signal sequence and transmembrane (TM) region prediction philic archaeal consortium metagenome, having no global servers are not built with archaeal sequences, they can be identity to any previously characterized protein or enzyme. useful for Some guidance. Thus, various servers were used to The predicted open reading frame (ORF) encodes a protein analyze this region of EBI244, given mixed results, with some having a deduced sequence 842 amino acids in length, set predicting a TM-region (e.g. Phobius: TM region a.a. 6-25. forth as SEQID NO: 1. The recombinant forms generally add TMHMM: TM region a.a. 5-27), some predicting a signal a terminal methionine (Met) bringing the total to 843 amino peptide (e.g. SignalP3.0: predicted cleavage between a.a. 22 acids (SEQID NO:14.) Achea proteins sometimes start with and 23), and others giving inconclusive predictions (e.g. SIG amino acids other than Met, such as leucine (Leu). Pred: Eukaryote predicted signal sequence with cleavage 0164 Sequence comparison revealed that the protein con between a.a. 18 and 19, but no prokaryotic signal sequence tained no close global identity to any previously characterized predicted). protein or enzyme. A central region of the protein (Domain 2) 0171 Given the varied results using server predictions, showed similarity to the known glycosyl-hydrolase family 5 further studies were carried out to identify similar N-terminal (GH5) domain, present in a family of glycosyl hydrolases, protein regions among genes found in the metagenome (FIG. which was evidence of cellulase or similar Sugar hydrolase 8B). Two representative sequences are shown in illustration 2, activity. Aside from this glycosylhydrolase domain, none of VIMSS5327647 (Pfamhit: Extracellular solute-binding pro the remainder of the amino acid sequence shows any similar tein family 5) and VIMSS5324142 (Pfam hit: Extracellular ity to any known domain or protein in the major databases. Solute-binding protein family 1). This type of proteins (ac 0.165 Phylogenetic analysis of EBI244 was carried out cording to Pfam’s description) is known in gram (+) bacteria using the sequence of domain 2 (GH5 match) in order to (containing no outer membrane) to be bound in the membrane determine its evolutionary relationship to characterized via N-terminal lipid-anchors, indicating that EBI244 may enzymes (FIG. 7). The phylogenetic tree was built using the also be attached to the extracellular side of the lipid mem SATCHMO-JS server (Hagopian, Retal. Nucleic Acids Res. brane with its N-terminal hydrophobic amino-acid region. 38, W29-34, 2010). All sequences were aligned with the 0172 Well-known methods, such as those employing soft Expresso server (Armougom, F. et al., Nucleic Acids Res. 34. ware (free and commercially available services) may be used W604-608, 2006) in order to trim sequences down to only the to predict signal sequences (see, for example, the Transmem structurally related GH domain. All characterized GH family brane helix and signal peptide prediction list available on the 5 and GH family 42 sequences in the CAZy database (Can World Wide Web, at the URL cmgm.stanford.edu/WWW/ tarel, B. et al., Nucleic Acids Res. 37, D233-238, 2009) were www predict.html, and the program “SignalP 3.0 Server.” used initially to compare to EBI244 and its closest homologs. available on the World Wide Web at www.cbs.dtu.dk/ser US 2015/021 1 037 A1 Jul. 30, 2015 16 vices/SignalP. The SignalP 3.0 program was used to predict (0175 Domains 1-4 the location of a signal sequence for the polypeptide of SEQ (0176 Based on Hidden Markov Modeling (HMM), the ID NO: 1. Using this method, a cleavage site was predicted remainder of the protein was predicted to encode up to four between amino acids 23 and 24. Thus, the predicted mature structural domains (Domains 1-4). (0177. Hidden Markov Model (HMM) searching and protein is 24-842 of SEQID NO: 1. analysis was carried out on the domain 1 region of EBI244. 0173 Proline/Threonine-Rich Region This searching and analysis identified sequences of thirty eight (38) proteins, a non-redundant sample of which is 0.174. The analysis revealed that the N-terminal putative shown in FIG. 8C. Table 1 lists the ID (GenBank Accession signal peptide is followed by a one hundred (100) amino acid number or UniProt ID), start and stopamino acid positions for region, rich in threonine and proline. Threonine/proline rich domain with identity to domain 1, e-Value, protein length, and regions are generally highly unstructured, often serving as organism for each hit. The same information also is provided flexible linkers in cellulases. Such sequences are known to be for EBI244 (with VIMSS5326244 listed as the ID. found in many types of proteins, including cellulases. The VIMSS5326244 is electronically designated by the sequence size of the threonine/proline-rich region in EBI244, as well as analysis Software (microbes online) for specific open reading frames (orfs). Prior to this work, none of the identified pro the degree of enrichment for threonine (44% for the region teins had been experimentally characterized; almost all had 33-126) and proline (24% for the region 33-126) are highly only electronically-inferred annotations. Annotations varied unusual. In many cellulases, threonine/proline rich regions among sequences, with a good number of glycoside hydro serve as linker domains, connecting different domains (e.g., a lases; many had no annotations. catalytic domain connected to a cellulose-binding domain). 0.178 Global alignment of sequences identified by domain In EBI244, however, this region is positioned too close to the 1 HMM revealed that the next domain in the carboxy direc N-terminus to be positioned between functional domains. tion (domain 2 in EBI244) was related among all these Other deduced carbohydrate enzymes from the metagenome sequences. Thus, based on the HMM multiple sequence also showed threonine rich motifs at N- or C termini. None analysis, Domain 1 appeared always to be accompanied by was as dramatic as the region from EBI244. Domain 2. TABLE 1. Protein sequence hits and e-values from domain HMM Searching.

Protein ID Start stop e-value length Organism A9AYFS HERA2 60 168 7.S2E-29 591 Herpetosiphon aurantiacus (strain ATCC 23779/DSM 785) VIMSSS326244 157 273 1.56E-27 842 94C Metagenome (EB144) A4XMG8 CALS8 62 2O3 7.04E-27 611 Caldicellulosinuptor saccharolyticus (strain ATCC 43494/DSM 8903) YP 003585990.1 26 127 4.45E-26 531 Zunongwangia profunda SM-A87 C7PTR3 CHIPD 56 153 3.35E-2S 557 Chitinophaga pinensis (strain ATCC 43595, DSM 2588/NCIB 11800/UQM 2034) B17N6O OPITP 52 169 2.46E-24 749 Opituitus terrae (strain DSM 11246/PB90-1) D1N449 9BACT 214 33O 4.36E-24 777 Vicialis adensis ATCC BAA-548 ZP 036284.44.1 63 157 2.32E-23 559 bacterium EllinS14 NP 87.0950.1 148 24O 2.84E-23 634 Rhodopinellula baitica SH 1 ZP 03626656.1 66 170 4.66E-23 1596 bacterium EllinS14 ZP 01717989.1 53 14O 113E-22 542 Algoriphagus sp. PR1 YP OO3323724.1 24 121 217E-22 528 Thermobacilium terrentin ATCC BAA 798 A7 VX72 9CLOT 37 157 149E-21 787 Clostridium leptum DSM 753 YP 001297.703.1 2.91E-21 534 Bacteroides vulgatus ATCC 8482 ZP 05256313.1 2.94E-21 534 Bacteroides sp. 4 3 47FAA ZP O6742O86.1 2.94E-21 534 Bacteroides vulgatus PC510 NP 228758.1 3.68E-21 509 Thermotoga maritima MSB8 ZP 045401 12.1 1.28E-2O 518 Bacteroides sp. 9 1 42FAA ZP 03298724.1 1.3OE-2O 534 Bacieroides doei DSM 17855 ZP 04555706.1 1.3OE-2O 534 Bacteroides sp. D4 YP 003548440.1 382 466 5.86E-20 1258 Coraliomargarita akajimensis DSM 45221 YP OO1819159.1 1.26E-19 536 Opituitus terrae PB90-1 YP 003195709.1 52 137 4.31 E-19 1160 Robiginitalea biformata HTCC2501 ZP 03628309.1 55 148 6.44E-19 725 bacterium EllinS14 P OO3243090.1 1.18E-18 481 Geobacillus sp. Y412MC10 P 764889.1 2.53E-18 SO6 P OO1819827.1 3.02E-17 570 P 002278657.1 3.46E-17 SO6 P 869354.1 472 574 4.06E-17 1043 Rhodopinellula baitica SH 1 P OO1818722.1 4.95E-16 648. Opituitus terrae PB90-1 P 826861.1 1.18E-15 604 P OO182O771.1 62 148 1.35E-15 859 Opituitus terrae PB90-1 P 003547883.1 4.64E-15 606 Coraliomargarita akajimensis DSM 45221 A7HFC4 ANADF 60 1SO 9.41E-15 566 Anaeromyxobacter sp. (strain Fw 109-5) US 2015/021 1 037 A1 Jul. 30, 2015

TABLE 1-continued Protein sequence hits and e-values from domain l HMM searching. Protein ID start stop e-value length Organism ZP 04488111.1 1.6OE-14 526 YP OO3387974.1 1.61E-14 534 Spirosoma linguale DSM 74 ZP 02918195.1 2.3OE-14 529 Bifidobacterium dentium ATCC 27678 YP 003547687.1 718, 821 1.10E-12 1853 Coraliomargarita akajimensis DSM 4.5221 YP OO3O11267.1 3.54E-09 554 YNP18 461130 311 Microbial community from Yellowstone Hot Springs (Washburn Springs #1) BISONR 127760 597 Bison Hot Spring Pool, Yellowstone (11FEB08 BISONR) BISONS 6715 777 Bison Hot Spring Pool, Yellowstone (14JANO8 BISONS)

0179 Domain 2 represents the largest predicted domain of TABLE 2 EBI244, and is the region having similarity to the known Protein sequence hits and E-values glycosyl-hydrolase family 5 (GH5) domain family glycosyl from domain 2 HMM Searching. hydrolases, evidencing the protein's cellulase or similar Sugar hydrolase activity. The sequence of the GH5 domain was Protein ID e-val length determined to be highly divergent (Pfam server analysis: VIMSS5326244 (EBI244) 7.01E-143 842 A4XMG8 CALS8 193E-134 611 e-value=1e-12) compared to previously characterized GH5 A9AYFS HERA2 3.SSE-125 591 proteins. FIG. 8D shows a number of highly conserved resi B17N60 OPITP 2.46E-123 749 A7 VX72 9CLOT 2.26E-118 787 dues across all sequences in the domain 2 region, including B7BGD2 9PORP 1.02E-116 470 the two predicted catalytic residues of EBI244 (highlighted in D1N449 9BACT 1.39E-110 777 yellow; glutamates 413 and 506). C7PTR3 CHIPD 3.82E-69 557 YP 003585990.1 1.01E-61 531 0180. Despite low sequence identity in this region across ZP 04378853.1 1.14E-56 446 C7M3Y3 CAPOD 1.16E-56 470 all sequences, the conservation of key residues, including the ZP 03390557.1 6.8OE-54 466 predicted catalytic residues glutamate 413 and glutamate A7HFC4 ANADF 9.39E-46 566 506, Suggests a similar fold in this region. Other structural BOUPRO METS4 3.32E-32 SO4 B9RNO3 RICCO 1.67E-28 404 predictions revealed that the protein is a member of the gly C6T835 SOYBN 2.7SE-28 418 cosidase superfamily, within the TIM-barrel fold (InterProS XP 002264115.1 2.44E-25 433 C7A7X8 MALDO 2.58E-2S 429 can; e-values-1e-27; see FIG. 9). FIG. 10 shows a schematic VIMSS9423O33 9.82E-2S 431 representation of the relationship of domain 2 of EBI244 to MAN7 ARATH 9.86E-25 431 other glycosylhydrolases in this superfamily. Many of the XP 002281804.1 192E-24 433 known glycoside hydrolase families are within the TIM-bar C7A7X6 9ERIC 2.98E-24 433 B2BMP9 PRUPE 3.27E-24 431 rel fold (the CAZY database shows at least 18), which B9H4D6 POPTR 4.59E-24 420 includes GH5 (see Illustration 5). HMM analysis/searching XP 002272344.1 4.61E-24 4O2 carried out for domain 2 of EBI244 identified a very large C7A7X7 MALDO 4.84E-24 428 Q9FTO3 COFAR S.SSE-24 416 number of significant hits. Q9P893 AGABI 5.91E-24 439 B9GRV2 POPTR 7.04E-24 415 0181 Table 2 lists the ID (GenBank Accession number, C6TAYO SOYBN 7.2OE-24 431 UniProt ID), e-value, protein length, and organism for each XP 002270.023.1 1.72E-23 403 hit, with the same information provided for EBI244 (listing B9R7X5 RICCO 1.90E-23 432 VIMSS5326244 as the ID). As shown in Tables 1 and 2, many VIMSS98868OO 2.2SE-23 379 B2BMQ0 PRUPE 2.91E-23 433 of the top hits (eight top hits) were the same protein sequences Q2IO11 HORVD 2.92E-23 380 identified as top hits in the domain 1 searching. However, BOFPH4 9ROSA 3.26E-23 433 beyond those first eight, most of the hits were not identified in Q0ZR47 THEHA 3.56E-23 431 other domain searches indicating that they do not have very similar domains outside of domain 2. 0183 HMM searching on Domain 3 revealed only one significant hit (B1 Znó0 OPITP), which was also a hit in 0182 Even though the sequence identity is very low in this searching the other three domains. This hit appears co-linear region across all sequences, the high conservation of a num with EBI244 except for the threonine rich N-terminus. Sec ber of residues, especially the predicted catalytic residues of ondary structure predictions show mostly beta-sheets. Table 3 EBI244, indicates that all of these sequences have the possi lists the ID (GenBank Accession number or UniProt ID) for bility of a similar fold in this region. The observation that the each hit. The same information also is provided for EBI244 domain1 region did not appear to be present in any protein not (with VIMSS5326244 listed as the ID). The start and stop having this similar domain2 region indicates that the function positions of domain 3 in EB244 also are listed. 03379646.1 of the domain 1 region may be dependent on or affect the was unlikely a true domain hit because of low sequence function of the domain 2 region. identity US 2015/021 1 037 A1 Jul. 30, 2015

TABLE 3 ments and sequence-based homolog searches. For example, the enzyme contains a highly divergent core catalytic domain Protein sequence hits from domain 3 HMM searching. and unusual domains flanking the catalytic domain. The few distant homologs of EBI244 in the public databases are dis Domain 3 Start stop tributed in organisms that occupy abroad Swath of habitats, VIMSSS326244 60S 734 from rice paddies to mammalian intestines. B17N6O OPITP YP OO3379646.1 Example 4 0184 Domain 4 is the C-terminal domain of EBI244. Expression and Analysis of Synthetic Protein HMM search analysis of Domain 4 returned seven significant hit. As shown in FIG. 8E, all seven of these sequences aligned 0188 An ebi244 protein-coding region, having the nucleic globally with EBI244, except over the domain 3 and T/P rich acid sequence set forth in SEQID NO: 2 (original sequence regions. This result indicates that Domain 4 is related in some with hyperthermophilic codon usage) was synthesized de way to domains 1 and 2. Given that only one other sequence novo by GenScript, ltd (Piscataway, N.J.). A second version aligned in the domain 3 region, domain 3 may have been of the coding region, which was codon-optimized for expres added to EBI244 at some point in evolution or it was removed sion in E. coli (SEQID NO:3) also was synthesized by DNA from an ancestor to the other proteins. Each of these seven 2.0 (Menlo Park, Calif.). sequences were top hits in the domain 2 searching; 6 of them (0189 Protein Expression and Purification showed up in the hits of domain1 searching, providing further 0190. The 94 kDa protein was expressed by autoinduction evidence of the link between domain 4 and the rest of the in E. coli and purified. Expression of the recombinant EBI244 protein. Many of the domain 1 hits do not have a related protein in E. coli was carried out by the auto-induction region to domain 4. (Studier, F.W. Protein Expres. Purif. 41, 207-234, 2005). 0185. Table 4 lists the ID (GenBank Accession number or 0191) Using this method, EBI244 was successfully UniProt ID), start and stop amino acid positions for domain expressed in two E. coli strains, BL21 (de3) and Rosetta cells with identity to domain 4, e-value, and organism for each hit. (Invitrogen, Carlsbad, Calif.), as an N-terminally His tagged The same information also is provided for EBI244 (with protein, from the plasmid pET16b, in shaking flasks or in a VIMSS5326244 listed as the ID). 17.5 L fermenter. For expression, each strain was transformed TABLE 4 Protein sequence hits and E-values from domain 4 HMM searching. Domain 4 Start stop e-val length A4XMG8 CALS8 534,557 604 5.64E-23 611 Caldicellulosinuptor saccharolyticus (strain ATCC 43494/DSM 8903) VIMSSS326244 759,785 838 187E-21 842 94C Metagenome A7 VX72 9CLOT 519,544 597 4.65E-2O 787 Clostridium leptum DSM 753 B17N60 OPITP 667,692 745 168E-18 749 Opittituts terrae (strain DSM 11246/PB90-1) BOUPRO METS4 423,447 SO1 11 OE-16 504 Methyliobacterium sp. (strain 4-46) D1N449 9BACT 698,718 771 4.54E-16 777 Victivais vadensis ATCC BAA-548 A7HFC4 ANADF 493/510 562 1.59E-15 566 Anaeromyxobacter sp. (strain Fw 109-5) A9AYFS HERA2 537 S88 1.46E-14 591 Herpetosiphon aurantiacus (strain ATCC 23779/DSM 785)

0186. In summary, no highly similar BLAST hits resulted with plasmid and plated on YT media supplemented with during searches with EBI244, implying that no known and 0.8% glucose at 35°C. The plT16b N-terminal His-tagged sequenced Archaea or other hyperthermophiles in the NCBI gene appeared to be toxic, producing variable colony size. non-redundant protein database have cellulase-encoding Only smaller colonies picked from freshly transformed plates genes with the same domain structure as this enzyme. This resulted in significant expression. These were picked into a enzyme occupies a highly divergent sequence space with less small volume of ZYP-0.8G media, 5 mL-50 mL and incu than 30% identity to the catalytic domain of the nearest char bated at 25°C. until cells reached an optical density at 550 nm acterized endoglucanase. Consideration of the weak of -0.4. Then about 2.5 mL was inoculated per liter of ZYP homologs identified established that none are biochemically 5052 rich media for auto-induction. characterized, and the conserved glycosylhydrolase family 5 0.192 Cells were then incubated with shaking at 20°C. or catalytic domains of the hyperthermophilic cellulase is 25°C. for 48 hours or 36 hours respectively. Expression was extremely divergent from characterized proteins of the fam optimized in 1 liter shake flask cultures, and Subsequently ily, with its nearest blast hits separated from known members scaled up to 17.5 L in a specialized New Brunswick Bioflow of this family. Thus, this enzyme may represent the first IV fermentor. Cells were grown to an OD55.0 nm of approxi characterized member of a highly divergent branch of the mately 2.5-3.0 then harvested by centrifugation at 6,000xg. glycosyl hydrolase family 5 catalytic motif, or alternatively Expression in the fermentoryielded 3-5 times higher levels of should be classified as the prototype a new glycosylhydrolase cellulose activity as compared with shake flasks. Cells were family. lysed by French Pressure Cell in 50 mM. Naphosphate buffer 0187 Thus, the EBI244 cellulase appears to represent a or 50 mM HEPPS buffer and incubated for 30 min at 90° C. highly unusual type of glycosidase, based on structural align Denatured host proteins were removed by centrifugation at US 2015/021 1 037 A1 Jul. 30, 2015

8,000xg for 15 minutes followed by 100,000xg for 30 min 0201, Zymograms utes and the cleared Supernatant, representing a partially puri 0202 Zymograms were performed as described above, fied soluble fraction was used for immediate and downstream with gels made as standard 8% SDS-PAGE gels, with 0.25% assays or purification medium Viscosity carboxymethyl cellulose incorporated into 0193 Expression levels were low (50 micrograms per g the gel. In the case of gradient gels the gels were 10% to 15% cells) but the protein was readily obtained in soluble form acrylamide and contained 0.20% CMC. Standard SDS-PAGE after heating whole cell extracts to 90° C. protocols were used, with standard loading buffer, with the 0194 C-terminal poly his-tagged codon optimized gene is exception that samples were kept at 20° C. and were not expressed by a similar process, using well-known methods boiled prior to loading. Gels were gently agitated for 30 and plasmids. Recombinant protein was purified as follows: minutes in 50 mM tris buffer pH 6.8 with 2% triton X-100, Clarified Supernatants were fractionated by ammonium Sul and then for 30 minutes in 50 mM tris buffer, pH 6.8, to fate precipitation. The initial supernatant was brought to 20% reactivate cellulases. Gels were then incubated in 50 mM saturating ammonium sulfate, centrifuged at 10,000xg, and potassium phosphate, pH 6.8, or 50 mM HEPPS buffer, pH decanted. The supernatant was then brought to 40% saturat 6.8, for 3 hours at 90° C. After incubation, the gels were ing ammonium Sulfate and centrifuged at 10,000xg. The pel cooled to 20° C. and stained with 0.5% Congo Red (sodium let fraction was resuspended in 50 mM phosphate buffer. The salt of benzidinediaZo-bis-1-naphthylamine-4-Sulfonic acid buffer was exchanged twice on a PES membrane centrifugal (formula: CHNNaOS; molecular weight: 696.66 concentrator (Sartorius). Ammonium Sulfate was added to a g/mol), for 40 minutes, then destained with 1M Tris Buffer, concentration of 500 mM (sans potassium chloride and the pH 6.8, for approximately 15 minutes. The dye then was set in protein was loaded on a hi-trap butyl-hydrophobic interaction 1M MgCl. column (GE Healthcare, Piscataway, N.J.) and eluted with a 0203 Reducing Sugar Assays linear gradient from 1MKCl to OM KCl in 50 mM phosphate 0204 Reducing sugar assays were performed to detect the pH 7.0. The most active fractions were then pooled, buffer presence of reducing Sugars. Dinitrosalicylic acid (DNS) exchanged in 50 mM borate (pH 9.5) and loaded on a Q reagent was made according to International Union of Pure sepharose fast flow column (GE Healthcare, Piscataway, and Applied Chemistry (IUPAC) guidelines. Results were N.J.) and eluted with a potassium chloride gradient from OM calibrated to standard solutions of calaboose. Assays on CMC to 500 mM. (carboxymethyl cellulose), Avicel(R), ionic liquid pretreated 0.195 Additionally, an ebi244 gene construct was gener Avicel(R) and Whatman(R) #1 filter paper were carried out in 50 ated by replacing the native signal peptide sequence of ebi244 mM potassium phosphate pH 6.8 or 50 mM sodium acetate with the ompA signal peptide sequence from E. coli. The pH 5.0. Assays with high concentrations of salts or ionic construct was generated by two rounds of amplification by liquids were carried out in Phosphate buffer. To compare PCR with primers that collectively reconstruct the signal activity at various pH levels, the following buffers were used peptide sequence from omp A in place of the native signal 50 mM sodium acetate/acetic acid pH 3.5, 4, 4.5, 5, 5.6; 50 peptide sequence. The construct was subcloned into pet16b mM sodium phosphate buffer: pH 6, 6.5: 50 mM MOPS: pH and expressed in E. coli Rosetta cells by standard IPTG 7, 7.5; 50 mM EPPS: pH 6.8, 8, 8.5, 9; 50 mM CAPS: pH induction at 25°C. or autoinduction at 25°C. The replace 9.5-11.1. Assays were generally conducted in 100 uL of ment of the archaeal signal peptide with the omp A signal buffer, in dome-capped PCR tubes, for a temperature of less peptide resulted in increased expression of the new construct than 99°C., incubated in a bio-rad mycycler thermocycler ebi 244-OA in E. coli as compared to the unmodified with heated lid. Screw cap 1.5 mL polypropylene tubes in a sequence ebi 244. silicone oil bath were used for temperature range from 0196. The nucleotide sequence ebi 244-OA is set forth as 99-114°C. Alternatively, assays from 100-130° C. were con SEQ ID NO: 15. The amino acid sequence of the EBI244 ducted in 10 ml sealed serum stoppered Hungate tubes over encoded by ebi 244-OA is set forth as SEQID NO: 16. pressured with 30 psi of N, then incubated in a Binder oven. 0197) The results of a comparison of the expression levels In the case of the Hungate tubes, controls were removed from of EBI244 and EBI244-OA expression when induced with the oven at the calculated time of temperature equilibration IPTG is shown in Table 5. Expression via auto-induction (equilibration times were calculated using standard equations resulted in a 5-fold increase in the expression of EBI244-OA for unsteady-state heat conduction, see for example, J. R. as compared to EBI244. Welty, C. E. Wicks, and R. E. Wilson, Fundamentals of Momentum, Heat, and Mass Transfer, 3rd Edition, John Expression Results: Wiley & Sons, 1984, pp. 297-304) and stopped with the addition of an equal volume of cold 0.1 M sodium hydroxide. 0198 0205 Assays on alternative substrates described in Table 6 were done as follows: Pretreated substrates were treated as TABLE 5 preciously described (Kim, T et al., Biotechnol. Bioeng, 2010). All cellulolytic assays for insoluble substrates were EBI244-OA Expression carried out in quadruplicate in a final Volume of 70 uL con Method (25° C.) EBI244 EBI244-OA taining 1%(w/v) Substrate (glucan loading), 0.2 LM of the IPTG N.D. 18 gig cell pellet EBI244 and 100 mM sodium acetate buffer, pH 5.5 at 90° C. Auto-induction 20 Igg cell pellet 100 gig cell pellet in a thermal cycler (Applied Biosystems). Cellulase activities were measured for Avicel(R), Lichenan, AFEX pretreated corn stover, ionic-liquid pretreated Avicel(R) (IL-Avicel(R), Mis (0199 Analysis of Purified Protein canthus (IL-Miscanthus), and corn Stover (IL-corn Stover). 0200 Activity of the recombinant protein was analyzed by The mixtures were incubated at 90° C. for 15 h after which a number of methods, as follows. they were cooled to 4°C. prior to measuring the amount of US 2015/021 1 037 A1 Jul. 30, 2015 20 soluble reducing Sugar released using the glucose oxidase (FIG. 12A). The transglycosylation activity was not greatly peroxidase assay as previously described (Kim, Tet al., Bio enhanced by the presence of glucose (FIG. 12B) and the technol. Bioeng, 2010). enzyme showed no significant activity on cellotriose or cel 0206 Paranitrophenol-Labeled Glycosides lobiose (FIGS. 12C and 12D). 0207. The chromogenic substrate 4-nitrophenyl-beta-D- glucopyranoside was utilized at 2.5 mM in Sodium acetate TABLE 6 buffer pH 5.0. Alternatively the chromogenic substrate 4-ni The specific activity of EBI 244 endoglucanase trophenyl-beta-D-cellobioside was utilized as a substrate in on different substrates. 100 mM sodium acetate buffer. Sodium acetate buffer con taining 4-nitrophenol was used as a standard and reagent Substrate Activity Error (%) blank during assays at 95°C. Absorbance was measured at pNP-cellobioside 178 1 410 nm. To compare activity at various pH levels, the follow CMC 138 5 ing buffers were used at a buffer strength of 50 mM: pH Barley Glucan 518 7 2.5-5.5 acetate/acetic acid, pH 6.5 MES, pH 7.5-8.5 HEPPS, Lichenan 6296 5 Avice 1241 3 pH 9.5-10.5 CAPS. All assays on PNP-substrates and stan IL-Avice 8261 2 dards were adjusted with an equal volume of 100 mM sodium IL-Miscanthus 1002 4 hydroxide before recording the absorbance at 410 nm. IL-Cornstower 1318 2 0208 Dionex Product Analysis AFEX Cornstower 89 5 Xylan NA 0209 For Dionex product analysis, assay conditions were Mannan NA the same as those utilized for the DNS assay. Reactions were stopped with the addition of an equal volume of 0.1 M sodium hydroxide. 0215. In Table 6 above, “a” represents umol GE/umol 0210 Cellulose Binding Assay Enzyme/min, “b’ represents umol GE/umol Enzyme/15 hr, 0211 Cellulose binding assays were carried out as fol and “GE' represents glucose equivalents. Substrates pre lows. Soluble extract was adjusted to 50 mL in 25 mM treated with Ionic Liquid (IL) and Ammonia Fiber Expansion HEPPS buffer pH 6.8 with 1 g of Avicel(R), then incubated at (AFEX) are indicated. “NA’ indicates no measurable activ 80° C. for 30 minutes with shaking. The suspension was ity. centrifuged at 8,000xg, the supernatant removed and the 0216 Truncated versions of the EBI244 protein were ana Avicel(R) resuspended in 5 mL of HEPPS buffer with 0.6% lyzed for activity on PNP-cellobiose, CMC, and Avicel(R) to CHAPS detergent added. The suspension was centrifuged at determine potential functions for each domain. A truncation 8,000xg, the supernatant removed, and the Avicel(R) resus variant (EBI244A1-127 V128M-hereafter EBI244AN) lack pended in 5 mL 0.6% CHAPS buffer, heated to 80°C., for 15 ing the Thr/Pro rich region, maintained similar activity as the min with shaking. The suspension was centrifuged at 8,000x full length version on the PNP-cellbioside and CMC (data not g, the Supernatant removed, and the Avicel(R) re-suspended in shown). This result is expected because the threonine/proline 5 ml of 2.0% CHAPS at 25° C. and shaken. The suspension rich region is predicted to be a highly flexible low complexity then was centrifuged at 8,000xg, the supernatant removed region. Domains 3 and 4 do not align to experimentally char and the Avicel(R) re-suspended in 5 mL of 2% CHAPS and acterized domains, thus it is possible that these domains act as incubated at 80° C. for 15 minutes. The suspension was a cellulose binding domain (CBD) or function is protein centrifuged at 8,000xg, the supernatant removed, and the protein interactions. Truncations removing both domains 3 Avicel(R) re-suspended in 5 mL of 2% CHAPS and incubated and 4, or just domain 4 alone, were constructed and expressed at 90° C. for 30 minutes. The suspension was centrifuged at at higher levels than the full length protein, but were inactive 8,000xg and the supernatant removed. againstall Substrates. This result indicates that domain 3, and 0212 Endoglucanase Activity of Recombinant EBI244 on possibly 4 as well, is required for the enzyme to remain active, a Wide Range of High Molecular Weight Carbohydrate Sub possibly due to a stabilizing effect on the enzyme. Treatment strates Containing B1-4 Linked Glucose. of the recombinant enzyme with proteinase Kat 50° C. for 30 0213 Zymograms performed on recombinantly expressed minutes, resulted in a uniform N-terminal truncation to threo EBI244 proteins revealed endoglucanase activity of recom nine-121, determined by N-terminal Edman degradation. The binant EBI244, both with and without a refolding step. As proteinase treated enzyme showed similar mobility and activ shown in FIG. 11, the behavior of the protein on Zymogram ity to the EBI244AN variant, suggesting that the remainder of gels was similar to that observed for active endoglucanase the proteinforms an integrated structure that is inaccessible to fractions from the archaeal enrichment. The enzyme was proteinase Kat 50° C. active on carboxymethyl cellulose in liquid assays as well. 0217 Amenability of the Enzyme to Ammonium Sulfate 0214. The enzyme also showed activity on a range of high Fractionation and Purification molecular weight carbohydrate substrates that contained 0218. The EBI244 enzyme also proved amenable to B1-4 linked glucose (Table 6). Product analysis by fluoro ammonium sulfate fractionation (see FIG. 13, showing phore-assisted carbohydrate electrophoresis (FACE) results of a Zymogram assay showing activity distributed revealed release of oligomers from Avicel(R) (FIG. 12). Puri among the 20-40% saturating ammonium sulfate fractions, fied EBI244 was supplied with various cellulose oligomers at each represented by three lanes (undiluted (1.0), dilution 2 in 95° C. and the reaction was monitored over two hours. The 5 (0.4), and 1 in 5 (0.2); initial sample was soluble recombi reactions show the conversion of higher order oligomers into nant protein after pretreatment at 80° C. for 30 minutes: mixtures of cellobiose, cellotriose and cellotetraose. The protein was precipitated using 20, 40, 60, and 90% saturating reactions show a dramatic pattern of trans-glycosylation ammonium sulfate), hydrophobic interaction chromatogra resulting in transient formation of oligomers up to dip (degree phy (see FIG. 14, showing results of a DNS assay using 1% of polymerization) of eleven when starting with cellohexaose low-viscosity carboxymethylcellulose as the substrate, with US 2015/021 1 037 A1 Jul. 30, 2015

fractions 1-11 representing a linear gradient from 1 M to 0 M remained stable and active at 90° C. in 25% DMIMDMP ammonium Sulfate in potassium phosphate buffer, pH 7), and (FIG. 23). Interestingly, in these assays, the enzyme’s Topt anion exchange chromatography. decreased in the presence of ionic liquids (FIG. 26), Suggest 0219. The N-terminal histidine tagged enzyme, however, ing that denaturing effects of the ionic liquids may stimulate did not interact with a nickel or cobalt affinity column, pre activity at lower temperatures at which the enzyme would sumably because the threonine rich N-terminal region otherwise be inactive. occluded the tag. FIG. 15 shows a comassie stained SDS 0228. The enzyme was also equilibrated in buffer with PAGE gel demonstrating stepwise purification to 60% purity, ionic liquid added in both Zymogram assays and liquid DNS with the sample heated prior to ammonium Sulfate fraction assays, with carboxymethylcellulose as the substrate. The ation. While this figure shows EBI244 that is approximately enzyme was tested in two different ionic liquids, DMIM 60% pure, purities over 95% have been obtained. DMP and EMIMOAc. Zymogram activity was detected in 0220. Thermostability gels incubated in 25% of either ionic liquid at 90° C. in 50 mM 0221) When assayed on 1% CMC (carboxymethyl cellu phosphate buffer at pH 6.8. The enzyme was shown to be lose) (DNS assay), 50 mM HEPPS buffer, the enzyme dem active in up to 50%. 1,3-dimethylimidazolium dimethyl phos onstrated almost no activity at 75°C., 50% maximal activity phate. The temperature of maximum activity was determined at ~92°C., and maximal activity at about 109°C. The results for different concentrations of this ionic liquid. FIG.27 shows are shown in FIGS. 16 and 17, showing activity-temperature results from a DNS assay, representing temperature optima profiles of EBI244 on 1% CMC. compiled from activity-temperature profiles of EBI244 in 0222. The temperature profile of the enzyme on What increasing amounts of the ionic liquid (DMIM) DMP. While man(R) # 1 paper showed a similar trend, with overall activity the maximum active temperature declined with increasing decreasing with the increasing crystalline nature of the Sub ionic liquid, purified EBI244 was demonstrated to be active in strate (FIG. 18). liquidassays at high concentrations of ionic liquids through a 0223) To assess thermostability, the enzyme was preincu wide range of temperatures. bated at 100° C. or 105°C. in HEPPS buffer, then assayed for activity on 1% CMC at 90° C. The results, shown in FIGS. 19 0229 FIG.28 shows the results of a DNS assay measuring and 20, demonstrate that the enzyme had a half-life of about activity of EBI244 on 1% CMC in buffer alone, and in the 4.5 hours at 100° C., and about 34 minutes at 105° C. Addi presence of 40% and 50% DIMM DMP. As shown, the tionally, the enzyme had a half-life of 10 min in HEPPS highest activities in the low temperature range from 50-80°C. buffer, pH 6.8, at 108°C. in the presence of microcrystalline were recorded in the presence of ionic liquid, implying that cellulose (0.5% Avicel(R) (FIG. 21). Differential scanning the enzyme is activated at low temperature by the addition of calorimetry of the enzyme (FIG. 17, inset) showed a bifur ionic liquids. cated transition with two Tms of 111° C. and 113°C. 0230 Tolerance for Various Detergents 0224 Stability and Activity in High Ionic Strength, 0231. All detergents tested, including SDS at 100°C., had 0225 Zymogram assays also revealed that the recombi little effect on enzyme stability. No loss of activity was nant enzyme is active in Solutions of high ionic strength. For observed in non-ionic detergents, Triton X-100, NP-40, this study, Zymogram gels were made as described, then Tween 20. The enzyme was stable in up to 2% CHAPS (ionic equilibrated to various salt concentrations at room tempera non-denaturing detergent). Zymogram activity was retained ture prior to incubation at 90°C. The results are presented in after SDS-PAGE without the customary wash and refold FIG. 22, showing the enzyme exhibited Zymogram activity in steps, indicating a tolerance for 0.1-1% SDS at room tem up to 4 M Sodium chloride and up to Saturating potassium perature. The recombinant enzyme was pretreated at 100° C. chloride. with and without the addition of 0.1% SDS, then assayed by 0226. A DNS assay was used to measure product forma Zymography at 90° C., showing thermostability at 100° C. in tion for EBI244 with 1% CMC in HEPES buffer with no Salt the presence of 0.1% SDS (FIG. 29). added, 2.5 M sodium chloride, and 3.0 M KCl. The results, 0232 Activity Over a Broad pH Range shown in FIG. 23, revealed that the initial reaction kinetics of the enzyme were linear in up to 2.5M sodium chloride and 3.0 0233. The enzyme retained activity over a very broad pH M potassium chloride, at rates about 40% of that of buffer range with significant activity up to pH 8.5, as shown in FIG. alone. These results indicate that the enzyme is very halotol 30 (showing results of a DNS assay of CMC hydrolysis over erant but functions better at lower salt concentrations. More abroad pH range). Moreover, the enzyme had an optimum of over, ionic detergents, including SDS, had little effect on about pH 5.5 (FIG. 31). enzyme activity or stability and both non-ionic and non 0234. The results of this study demonstrate that the recom denaturing ionic detergents such as CHAPS stimulated activ binant enzyme has cellulolytic activity, releasing reducing ity (FIG. 24). Sugars from carboxymethyl-cellulose, microcrystalline cel 0227 Given that EBI244 remained active under high lulose (Avicel(R) and Whatman(R) #1 filter paper, at reaction (NaCl) to near-saturating (KCl) salt conditions (FIG. 25), its temperatures exceeding 105°C., with an optimal temperature activity was measured in the presence of the ionic liquids range from 95-110°C. The results further demonstrate that 1,3-dimethylimidazolium dimethyl phosphate (DMIM the enzyme has a half-life of greater than five hours at 100° C. DMP) and 1-ethyl-3-methylimidazole acetate (EMIM and tolerates sodium chloride in near Saturating concentra OAc), which could potentially be used to pretreat substrates tions (4M) at 90° C. and potassium chloride at saturating like Miscanthus 17. The concentrations tested, 25% and 50% concentration (~3.2M) at 90°C. The results further show that (v/v), are well above the expected residual ionic liquid of the enzyme is active toward carboxymethylcellulose in the 10-15% that may be carried over after pretreatment (18). presence of the ionic detergents CHAPS (2%) and sodium CMCase activity was demonstrated in Zymograms incubated dodecyl sulfate (0.1%) and to function in up 50% ionic liquids at 90° C. in 25% (v/v) of either ionic liquid (pH 6.8). EBI244 (i.e., 1,3-dimethylimidazolium dimethyl phosphate) at 90° US 2015/021 1 037 A1 Jul. 30, 2015 22

C., and functions over an unusually broad range of pH, with 0236 Throughout this application, various website data greater than 50% of the maximum activity exhibited from pH content, publications, patent applications and patents are ref 4.5-8.75. erenced. (Websites are referenced by their Uniform Resource 0235. The results demonstrate that the EBI244 enzyme is Locator, or URL, addresses on the World Wide Web.) The an extremely thermostable, thermoactive cellulose-binding disclosures of each of these references are hereby incorpo endoglucanase, with a unique sequence composition. rated by reference herein in their entireties. Because the enzyme maintains a high proportion of its activ 0237. The present invention is not to be limited in scope by ity over an exceptionally broad range of Salinities, ionic the embodiments disclosed herein, which are intended as strength, detergents, and pH, the enzyme is useful in provid single illustrations of individual aspects of the invention, and ing cellulase activity Suitable for long-term use under the any that are functionally equivalent are within the scope of the broad and variable range of conditions encountered in indus invention. Various modifications to the compositions and trial conditions. Furthermore, given the ability of EBI244 to methods of the invention, in addition to those described bind tightly to crystalline cellulose, the enzyme will be useful herein, will become apparent to those skilled in the art from in engineering hyperstable endocellulases for greateractivity the foregoing description and teachings, and are similarly on crystalline substrates, for example, by the addition of intended to fall within the scope of the invention. Such modi thermostable cellulose binding domain, e.g., the N-terminal fications or other embodiments can be practiced without and/or C-terminal domain(s) of EBI244 to catalytic domains. departing from the true scope and spirit of the invention.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 37

<21 Oc SEO ID NO 1 <211 LENGTH: 842 <212> TYPE PRT <213> ORGANISM: Artificial Sequence <22 Os FEATURE; <223> OTHER INFORMATION: EBIS326244 deduced protein sequence

<4 OOs SEQUENCE: 1 Lieu Lys Llys Val His Ile Ile Ala Ile Val Val Ile Ile Ala Ile Ala 1. 5 1O 15 Phe Ala Lieu. Ile Lieu Ala Arg Tyr Thr Met Glin Arg Gly Tyr Glu 2O 25 3 O

Thir Wall. Thir Pro Thir Thr Pro Pro Gin Glin. Thir Thr Th Thir Glu Thir 35 4 O 45

Thr Pro Val Pro Thr Glu Ala Gly. Thir Thr Thr Pro Ile Thr Glu Ala SO 55 60

Thir Wall. Thir Gin Pro Pro GLn. Thir Pro Thir Thr Pro Ser Pro Gn. Thir 65 70 7s

Pro Th Thir Pro Thir Ala Lieu. Pro Thr Pro Ser Pro Thir Pro Thir Ala 85 90 95

Pro Ser Ala Thr Wall. Thir Gu Thir Thir Ser Pro Glin Thir Pro Th Thir 105 110

Thir Ilie Thir Thr Gu Thir Thir Thir Thr Pro Ala Pro Glin Pro Glin Wall 115 12O 125

Wall Phe Lell Leul Pro Glu Gly Glu Glu Pro Llys Phe Gly Lieu Wall 13 O 135 14 O

Glu Ile Ala Phe Asn Ile Ser Gly Lieu Ser Tyr Ser Asn. Pro Phe Asp 145 15 O 155 16 O

Thir Ser Asp Ile Asp Wall Trp Wall His Ile Glu Thir Pro Ser Gly Ser 1.65 17 O 17s

Arg Wall Ala Wall Pro Ala Phe Phe Gln Asn Tyr Thir Wall Lys 18O 185 190

Luell Gly Pro Gly Glu Glu Ile Ile Wall Arg Wall Gly Arg Pro Tyr Trp 195 2 OO 2O5

Luell Ala Arg Phe Ala Pro Wall Glu Glu Gly Wall His Phe Tyr Val 210 215 22 O US 2015/021 1 037 A1 Jul. 30, 2015 23

- Continued Lys Ala Val Asp Gly Arg Gly Ser Ala Val Val Ser Glu Ile Arg Glu 225 23 O 235 24 O Phe Met Val Lys Gly Val Ala Gly Arg Gly Phe Val Arg Val Asp Ser 245 250 255 Gly Lys Arg Lieu Phe Val Phe Asp Ser Gly Glu Ser Met Phe Met Leu 26 O 265 27 O Gly Ile Asp Wall Ala Trp Pro Pro Asp Arg Arg Ser Ser Ile Ser Phe 27s 28O 285 Tyr Glu Gln Trp Phe Asp Llys Lieu. Asn Llys Ser Gly Ile Llys Val Val 29 O 295 3 OO Arg Ile Gly Lieu Val Pro Trp Ala Lieu. Thir Lieu. Glu Trp Ser Lys Lieu 3. OS 310 315 32O His Tyr Tyr Ser Lieu. Asp Asp Ala Ala Arg Ile Asp Glu Ile Val Lys 3.25 330 335 Lieu Ala Glu Lys Tyr Asp Ile Tyr Ile Val Phe Val Phe Met Trp His 34 O 345 35. O Gly Glu Lieu Ala Asp Asn Trp Gly Asp Asin Pro Tyr Asn Ala Ala Arg 355 360 365 Gly Gly Pro Leu Gln Ser Pro Glu Glu Phe Trp Ser Asn Ala Val Ala 37 O 375 38O Ile Ser Ile Phe Lys Asp Llys Val Arg Tyr Ile Ile Ala Arg Trp Gly 385 390 395 4 OO Tyr Ser Thr His Ile Lieu Ala Trp Glu Lieu. Ile ASn Glu Ala Asp Lieu 4 OS 41O 415 Thir Thr Asn Phe Phe Ser Ala Arg Ser Ala Phe Val Ser Trp Val Lys 42O 425 43 O Glu Ile Ser Ser Tyr Ile Llys Ser Val Asp Pro Tyr Asn Arg Ile Val 435 44 O 445 Thr Val Asn Lieu Ala Asp Tyr Asn. Ser Glu Pro Arg Val Trp Ser Val 450 45.5 460 Glu Ser Ile Asp Ile Ile Asn Val His Arg Tyr Gly Pro Glu Gly Phe 465 470 47s 48O Lys Asp Ile Ala Lieu Ala Ile Pro Ser Ile Val Glu Gly Lieu. Trp Asn 485 490 495 Thr Tyr Arg Llys Pro Ile Ile Ile Thr Glu Phe Gly Val Asp Tyr Arg SOO 505 51O Trp Ile Gly Tyr Pro Gly Phe Lys Gly Thr Pro Tyr Trp Ala Tyr Asp 515 52O 525 Lys Ser Gly Val Gly Lieu. His Glu Gly Leu Trp Ser Ser Ile Phe Ser 53 O 535 54 O Lieu. Ser Pro Val Ser Ala Met Ser Trp Trp Trp Asp Thr Glin Ile Asp 5.45 550 555 560 Ser Tyr Asn Lieu. Trp Tyr His Tyr Lys Ala Lieu. Tyr Glu Phe Lieu Lys 565 st O sts

Ser Val Asp Pro Val Arg Gly Gly Lieu. Gly Lys Ala Arg Ala Ser Lieu. 58O 585 59 O

Val Ile Thr Asp Val Thr Pro Ser Ser Ile Thr Lieu. Tyr Pro Leu Ala 595 6OO 605

Gly Trp Val Trp Val Ser Pro Val Arg Glu Asn Arg Lieu Val Ile Arg 610 615 62O

Pro Asp Gly Ala Ile Glu Gly Arg Val Asp Lieu Lleu Ser Gly Phe Ile US 2015/021 1 037 A1 Jul. 30, 2015 24

- Continued

625 630 635 64 O Tyr Gly Thr Cys His Ser Glin Arg Thr Lieu. Asn Pro Val Phe Thr Val 645 650 655 Met Phe Ile Asp Arg Gly Arg Val Val Lieu. His Ile Asn. Ser Val Gly 660 665 67 O Arg Gly Ser Ala Lys Lieu Val Ile Tyr Val Asn Gly Ser Lieu Ala Thr 675 68O 685 Glin Lieu. Asp Lieu Pro Asp Lys Asp Gly Lys Ser Asp Gly Ser Ala Asn 69 O. 695 7 OO Glu Tyr Asp Met Asp Val Glu Lieu. Trp Phe Glu Pro Gly Thr Tyr Glu 7 Os 71O 71s 72O Ile Lys Ile Asp Ser Glu Ala Cys Asp Trp Phe Thir Trp Asp Tyr Ile 72 73 O 73 Val Phe Glu Asn Ala Val Tyr Arg Ala Ala Lys Val Asp Lieu. Tyr Ala 740 74. 7 O Lieu Ala Asn. Ser Thr Phe Ala Met Lieu. Trp Val Arg Asn Lys Asp Tyr 7ss 760 765 Asn Trp Trp Asn Val Val Val Lieu. Asn Llys Thr Lieu. Glu Pro Ala Glu 770 775 78O Gly Val Glu Val Glu Ile Arg Gly Lieu. Glin Asp Gly Val Tyr Arg Val 78s 79 O 79. 8OO Glu Phe Trp Asp Thr Cys Arg Gly Val Val Val Lys Ser Met Glu Val 805 810 815 Glin Val Ser Asn Gly Val Ala Arg Val Pro Val Gly Ser Val Glu Lys 82O 825 83 O Asp Ile Ala Met Lys Ile Thr Arg Ala Gly 835 84 O

<210s, SEQ ID NO 2 &211s LENGTH: 2532 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: EBI5326244 coding region <4 OOs, SEQUENCE: 2 atgttgaaaa aggttcacat tattgctata gtagttataa tagcdatago ttittgcactt 6 O atactagdac gig tact acac aatgcagaga gqctatogaaa cagtgacacic tacaa.cacca 12 O cct cagcaaa c tactaccac agaaacaact c cagtgccta cagaggcagg tact acaa.ca 18O c caataactg aggccactgt gacticaac cc cct caaacco ctacaac acc titcaccacaa 24 O acaccaacaa caccaa.ca.gc gttgcc.cacg ccatcaccaa ccc.ccacagc gcc ct ctogcc 3OO acagta acag agacta catc gcct caaact c ctacaacta caataactac agaaacaa.ca 360 actacaccag ccc.cccalacc C caggtggtg tttctaaagc taccagaggg agaggagc.ca 42O aagtttggct tagttgaaat agc ctittaac at atctgg to taagctactic aaaccc.ctitt 48O gacacaag.cg at attgatgt gtgggtgcac atagagacgc Caagtggct C tagagtggct 54 O gtaccagctt t ct act tcca gaact acact gtgaaaaggc titggaccagg ggaggagat C 6OO atagt caggg ttggaaggcc at actggctic gct aggttcg Cacctgttga ggagggggtg 660

Cacaagttct acgtaaaggc agttgatggc agggggagtg Ctgtggtgag cagattaga 72 O gagttt atgg tta agggggt ggctggcagg gggtttgtca gagttgacag tecaaaagg 78O

US 2015/021 1 037 A1 Jul. 30, 2015 28

- Continued cgtttggitta titcgt.ccaga tiggcgcaatc gaaggcc.gcg ttgacctgct gagcdgttt C 1920 atctatggta cqtgtcacag ccagogtacc ctdaatc.cgg tttittacggit catgttcatt 198O gatcgtggtc gcgtggtgct gcacattaac agcgtgggtc gtggttctgc taagctggtg 2O4. O atttacgt.ca atggcagcct ggcgacgcaa citggatttgc cacaaaga C9gcaa.gagc 21OO gacggtagcg cgaacgagta catatggac gtcgagctgt ggttcgagcc ggg tacctac 216 O gagatcaaaa ttgatt CC9a agcttgcgac tigttcacct gggattacat titttitcgaa 222 O aatgcggttt atcgtgcggc aaaggttgat Ctgtatgcct tcaaacag Cacct ttgcc 228O atgctgtggg tacgtaacaa ggattacaat ttggaatg tdgtggtgct gaataagacic 234 O

Ctggaaccgg C9gagggtgt taagtggag atc.cgtggcc ticaagacgg ttgt accgc 24 OO gtcgaattct gggacacgtg cc.gcggtgtg gttgttaaaa goatggaagt C caggtttct 246 O aatggtgtgg cycgtgtc.cc ggttggtagc gtc.gagaaag at attgcaat galagattacc 252O cgtgcaggc 2529

<210s, SEQ ID NO 5 &211s LENGTH: 849 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic Construct < 4 OO SEQUENCE: 5 Met Lieu Lys Llys Val His Ile Ile Ala Ile Val Val Ile Ile Ala Ile 1. 5 1O 15 Ala Phe Ala Lieu. Ile Lieu Ala Arg Tyr Tyr Thr Met Glin Arg Gly Tyr 2O 25 3O

Gu. Thir Wall. Thir Pro Thir Thr Pro Pro Gin Glin. Thir Thir Thir Thr Gul 35 4 O 45 Thir Thr Pro Val Pro Thr Glu Ala Gly Thr Thr Thr Pro Ile Thr Glu SO 55 6 O

Ala Thir Wall Thr Glin Pro Pro Glin. Thir Pro Thir Thr Pro Ser Pro Glin 65 70 7s 8O

Thir Pro Thir Thr Pro Thir Ala Leu Pro Thr Pro Se Pro Thr Pro Thr 85 90 95

Ala Pro Ser Ala Thir Wall. Thir Glu. Thir Thir Ser Pro Glin. Thir Pro Thr 1OO 105 11 O

Thir Thir Ilie Thir Thr Gu Thir Thir Thir Thr Pro Ala Pro G. Pro Gin 115 12 O 125 Val Val Phe Lieu Lys Lieu Pro Glu Gly Glu Glu Pro Llys Phe Gly Lieu 13 O 135 14 O

Val Glu Ile Ala Phe Asn Ile Ser Gly Lieu Ser Tyr Ser Asn Pro Phe 145 150 155 160 Asp Thr Ser Asp Ile Asp Val Trp Val His Ile Glu Thr Pro Ser Gly 1.65 17O 17s

Ser Arg Val Ala Val Pro Ala Phe Tyr Phe Glin Asn Tyr Thr Val Lys 18O 185 19 O

Arg Lieu. Gly Pro Gly Glu Glu Ile Ile Val Arg Val Gly Arg Pro Tyr 195 2OO 2O5 Trp Lieu Ala Arg Phe Ala Pro Val Glu Glu Gly Val His Llys Phe Tyr 21 O 215 22O US 2015/021 1 037 A1 Jul. 30, 2015 29

- Continued

Val Lys Ala Val Asp Gly Arg Gly Ser Ala Val Val Ser Glu Ile Arg 225 23 O 235 24 O Glu Phe Met Val Lys Gly Val Ala Gly Arg Gly Phe Val Arg Val Asp 245 250 255 Ser Gly Lys Arg Lieu. Phe Val Phe Asp Ser Gly Glu Ser Met Phe Met 26 O 265 27 O Lieu. Gly Ile Asp Wall Ala Trp Pro Pro Asp Arg Arg Ser Ser Ile Ser 27s 28O 285 Phe Tyr Glu Gln Trp Phe Asp Llys Lieu. Asn Llys Ser Gly Ile Llys Val 29 O 295 3 OO Val Arg Ile Gly Lieu Val Pro Trp Ala Lieu. Thir Lieu. Glu Trp Ser Lys 3. OS 310 315 32O Lieu. His Tyr Tyr Ser Lieu. Asp Asp Ala Ala Arg Ile Asp Glu Ile Val 3.25 330 335 Lys Lieu Ala Glu Lys Tyr Asp Ile Tyr Ile Val Phe Val Phe Met Trp 34 O 345 35. O His Gly Glu Lieu Ala Asp Asn Trp Gly Asp ASn Pro Tyr Asn Ala Ala 355 360 365 Arg Gly Gly Pro Leu Glin Ser Pro Glu Glu Phe Trp Ser Asn Ala Val 37 O 375 38O Ala Ile Ser Ile Phe Lys Asp Llys Val Arg Tyr Ile Ile Ala Arg Trip 385 390 395 4 OO Gly Tyr Ser Thr His Ile Lieu Ala Trp Glu Lieu. Ile Asn. Glu Ala Asp 4 OS 41O 415 Lieu. Thir Thr Asn Phe Phe Ser Ala Arg Ser Ala Phe Val Ser Trp Val 42O 425 43 O Lys Glu Ile Ser Ser Tyr Ile Llys Ser Val Asp Pro Tyr Asn Arg Ile 435 44 O 445 Val Thr Val Asn Lieu Ala Asp Tyr Asn. Ser Glu Pro Arg Val Trp Ser 450 45.5 460 Val Glu Ser Ile Asp Ile Ile Asn. Wal His Arg Tyr Gly Pro Glu Gly 465 470 47s 48O Phe Lys Asp Ile Ala Lieu Ala Ile Pro Ser Ile Val Glu Gly Lieu. Trp 485 490 495 Asn Thr Tyr Arg Llys Pro Ile Ile Ile Thr Glu Phe Gly Val Asp Tyr SOO 505 51O Arg Trp Ile Gly Tyr Pro Gly Phe Lys Gly Thr Pro Tyr Trp Ala Tyr 515 52O 525 Asp Llys Ser Gly Val Gly Lieu. His Glu Gly Lieu. Trp Ser Ser Ile Phe 53 O 535 54 O Ser Leu Ser Pro Val Ser Ala Met Ser Trp Trp Trp Asp Thr Glin Ile 5.45 550 555 560 Asp Ser Tyr Asn Lieu. Trp Tyr His Tyr Lys Ala Lieu. Tyr Glu Phe Lieu 565 st O sts

Llys Ser Val Asp Pro Val Arg Gly Gly Lieu. Gly Lys Ala Arg Ala Ser 58O 585 59 O

Lieu Val Ile Thr Asp Val Thr Pro Ser Ser Ile Thr Lieu. Tyr Pro Leu 595 6OO 605

Ala Gly Trp Val Trp Val Ser Pro Val Arg Glu Asn Arg Lieu Val Ile 610 615 62O US 2015/021 1 037 A1 Jul. 30, 2015 30

- Continued Arg Pro Asp Gly Ala Ile Glu Gly Arg Val Asp Lieu Lleu Ser Gly Phe 625 630 635 64 O Ile Tyr Gly Thr Cys His Ser Glin Arg Thr Lieu. Asn Pro Val Phe Thr 645 650 655 Val Met Phe Ile Asp Arg Gly Arg Val Val Lieu. His Ile Asn. Ser Val 660 665 67 O Gly Arg Gly Ser Ala Lys Lieu Val Ile Tyr Val Asn Gly Ser Lieu Ala 675 68O 685 Thr Glin Lieu. Asp Lieu Pro Asp Lys Asp Gly Lys Ser Asp Gly Ser Ala 69 O. 695 7 OO Asn Glu Tyr Asp Met Asp Val Glu Lieu. Trp Phe Glu Pro Gly Thr Tyr 7 Os 71O 71s 72O Glu Ile Lys Ile Asp Ser Glu Ala Cys Asp Trp Phe Thir Trp Asp Tyr 72 73 O 73 Ile Val Phe Glu Asn Ala Val Tyr Arg Ala Ala Lys Val Asp Lieu. Tyr 740 74. 7 O Ala Lieu Ala Asn. Ser Thr Phe Ala Met Lieu. Trp Val Arg Asn Lys Asp 7ss 760 765 Tyr Asn Trp Trp Asn Val Val Val Lieu. Asn Llys Thr Lieu. Glu Pro Ala 770 775 78O Glu Gly Val Glu Val Glu Ile Arg Gly Lieu. Glin Asp Gly Val Tyr Arg 78s 79 O 79. 8OO Val Glu Phe Trp Asp Thr Cys Arg Gly Val Val Val Llys Ser Met Glu 805 810 815 Val Glin Val Ser Asn Gly Val Ala Arg Val Pro Val Gly Ser Val Glu 82O 825 83 O Lys Asp Ile Ala Met Lys Ile Thr Arg Ala Gly His His His His His 835 84 O 845

His

<210s, SEQ ID NO 6 &211s LENGTH: 611 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: A4XMG8 CALS8

<4 OOs, SEQUENCE: 6 Met Arg Llys Lys Ile Thr Ser Lieu. Ile Ser Tyr Val Ile Ala Phe Leu 1. 5 1O 15 Ile Leu Lieu. Thir Lieu Ser Val Thr Gly Phe Gly Ala Pro Ser Asn Ile 2O 25 3O Lys Ile Thr Asp Phe Lys His Lieu. Thir Ser Val Ala Tyr Lys Tyr Ser 35 4 O 45

Llys Phe Glu Ile Ser Phe Lys Thr Pro Ala Phe Lys Gly Asn Cys Phe SO 55 6 O

Asp Pro Asp Glu Ile Asp Ile Trp Gly Glu Phe Val Ser Pro Ser Gly 65 70 7s 8O

Lys Llys Tyr Val Met Pro Ala Phe Trp Tyr Glin Asp Tyr Lys Arg Glin 85 90 95

Lieu. Lieu Pro Ile Asn. Glu Lys Llys Lieu. Glu Arg Lieu. Asn Lys Asn Gly 1OO 105 11 O

Ile Gly Gly. Thir Ala Ser Asn. Asn. Pro Asn. Glu Pro Glin Gly Lys Glu US 2015/021 1 037 A1 Jul. 30, 2015 31

- Continued

115 12 O 125 Val Lieu. Thir Lys Val Gly Glin Pro Glu Trp Arg Ile Arg Phe Cys Pro 13 O 135 14 O Val Glu Ile Gly Llys Trp Llys Tyr Thir Ile Tyr Val Lys Ala Lys Gly 145 150 155 160 Arg Val Glin Asp Phe Llys Lys Gly Glu Phe Ser Wall Lys Glu Ala Lys 1.65 17O 17s Asn His Gly Phe Ile Arg Val Glu Pro Llys Llys Lys Arg His Phe Val 18O 185 19 O Phe Asp Asp Gly Thr Pro Tyr Ile Pro Ile Gly Glin Asn Val Ala Trp 195 2OO 2O5 Trp. Thir Ser Pro Thr Arg Gly Ser Tyr Asp Tyr Asn Val Trp Phe Ser 21 O 215 22O Llys Met Ala Glu Ser Gly Ala Asn. Phe Ala Arg Ile Trp Met Gly Ser 225 23 O 235 24 O Trp Ser Phe Gly Lieu. Tyr Trp Asn Asp Thr Gly Ile Tyr Asp Phe Thr 245 250 255 Asn Arg Lieu. Asp Arg Ala Tyr Glin Lieu. Asp Llys Val Lieu. Glu Lieu Ala 26 O 265 27 O Glu Gln Lys Gly Ile Tyr Ile Met Lieu. Thr Phe Ile Asn His Gly Glin 27s 28O 285 Phe Ser Thr Llys Val Asn Pro Gln Trp Asin Glu Asn Pro Trp Asn Lys 29 O 295 3 OO Lys Asn Gly Gly Ile Lieu. Thr Llys Pro Glu Glu Phe Phe Thr Asn Thr 3. OS 310 315 32O Glu Ala Lys Lys Glin Phe Llys Lys Ile Ile Arg Tyr Ile Ile Ala Arg 3.25 330 335 Trp Gly Tyr Ser Thr Asn Ile Met Ser Trp Glu Lieu Phe Asin Glu Val 34 O 345 35. O Ser Trp Thr Asp Asn Tyr Asp Pro Glu Lys Ser Asn Ala Trp His Lys 355 360 365 Glu Met Ala Lieu. Phe Ile Llys Ser Ile Asp Pro Tyr Llys His Lieu Val 37 O 375 38O Ser Ser Ser Ser Ala Val Lieu. Tyr Asp Pro Lieu. Glu Lys Wall Lys Glu 385 390 395 4 OO Lieu. Asp Phe Ile Asn. Ile His Asp Tyr Gly Ile Thr Asn. Phe Cys Llys 4 OS 41O 415 Asn. Ile Pro Ser Lys Glin Arg Asp Ile Ala Asp Met Tyr Asn Llys Pro 42O 425 43 O Ala Phe Phe Cys Glu Met Gly Ile Ala Ser Asp Pro Thr Thr Thr Lys 435 44 O 445 Arg Lieu. Asp Pro Llys Gly Met His Val His Lieu. Gly Lieu. Trp Ala Gly 450 45.5 460

Val Met Gly Gly Gly Ala Gly Thr Gly Met Thir Trp Trp Trp Asp Ser 465 470 47s 48O

Tyr Val His Pro Lieu. Asn Lieu. Tyr Thr Tyr Phe Llys Pro Val Ser Lieu. 485 490 495

Tyr Val Llys Lys Ile Pro Trp Asin Asp Pro Phe Lieu Lys Tyr Ile Asp SOO 505 51O Glu Met Glin Lieu. Asp Ile Ser Asn. Phe Asp Val Gly Val His Gly Tyr 515 52O 525 US 2015/021 1 037 A1 Jul. 30, 2015 32

- Continued

Ile Lys Glin Asp Ser Ala Tyr Luell Trp Phe Asp Thir Glu Tyr Ser 53 O 535 54 O

His Ile Gly Gly Ile Glu Arg Luell Phe Lys Asp Wall Thir Wall Arg Ile 5.45 550 555 560

Luell Asp Asn Gly Ile Glin Wall Glu Trp Phe Asp Thir Phe Ser 565 st O sts

Gly Asn Ala Wall Lys Glu Asn Wall Ala Wall Asn Lys Ile Luell 585 59 O

Asn Ile Lys Met Pro Asn Trp Lys Ile Asp Ile Ala Phe Ile Ala 595 6OO 605

Wall 610

SEO ID NO 7 LENGTH: 787 TYPE : PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: A7WX729 CLOT

SEQUENCE: 7

Met Thr Glin Asp Llys Glu Lell Phe Luell Trp Asp Asn Thir Ile His 1. 5 1O 15

Gly Ala Ile Pro Ala Phe Glu Phe Glu Ala Ala Phe Ala Phe Glu 2O 25 30

Wall Phe Glu His Pro Cys Pro Glu Glu Wall Asp Luell Ala 35 4 O 45

Ile Luell Pro Asn Gly Asp Glin Glin Ile Ser Gly Phe Trp SO 55 6 O

Tyr Glu Gly Phe Glin Arg Wall Luell Arg Asn Gly Arg Glu Ile Luell Ile 65 70

Ser Thir Luell Glu Lys Asp Trp Arg Ile Arg Ser Ala Glin Wall Pro 85 90 95

Gly Glu Arg Tyr Wall Thir Luell Luell Asp Arg His Arg 1OO 105 11 O

Ser Arg Pro Glu Gly Glu Luell Ser Phe Thir Wall Thir Pro 115 12 O 125

Ser Asp Arg Gly Phe Lell Arg Wall Ser Ala Arg Asp Pro Ala Tyr 13 O 135 14 O

Lell Glu Phe Ser Asp Gly Ser Pro Luell Gly Ile Gly His Asn Luell 145 150 155 160

Gly Trp Glu Trp Gly Gly Thir Asp Asn Arg Lell Gly Thir Tyr Glu 1.65 17O 17s

Asp Arg Trp Lell Ser Ser Met Ala Glin ASn Gly Ala Asn Luell Thir 18O 185 19 O

Glin Phe Asp Phe Glu Gly Asp Glin Ile Glu Trp Thir Pro Asp 195

Asn Glu Luell Pro Phe Ser Glu Asp Trp Gly Lell Asn Glu Tyr Asn 21 O 215 22O

Glin Glin Asn Ala Trp Lys Met Asp Arg Arg Phe Glin Thir Ala Glu Glu 225 23 O 235 24 O

Lell Gly Ile Phe Phe Arg Lell Ser Luell Phe His Trp Glu Asp Phe Asp 245 250 255 US 2015/021 1 037 A1 Jul. 30, 2015 33

- Continued

Asp Glu Thr Glu Lys Phe Pro Asp Trp Gly Trp Asn Arg Asn Pro Tyr 26 O 265 27 O His Asp Glin Asn Gly Gly Pro Ala Lys Asn Val Ser Glu Phe Phe Glu 27s 28O 285 Llys Pro Ala Cys Llys Llys Tyr Val Arg Tyr Tyr Lieu Lys Tyr Val Ala 29 O 295 3 OO Ala Arg Trp Gly Tyr Ser Pro Asn Lieu Met Ala Tyr Glu Lieu. Trp Asn 3. OS 310 315 32O Glu Ile Asp Ala Pro Glu Val Met Trp Arg Ala Gly Glu Asp Tyr Asp 3.25 330 335 Gln Glu Ala Ser Lys Val Ile Gly Trp His Ser Glu Met Gly Ser Tyr 34 O 345 35. O Lieu Lys Glin Lieu. Asp Ser Llys His Lieu Val Thir Ser Ser Phe Ala Asp 355 360 365 Ser Arg Arg Asp Lieu. Asn Lieu. Trp Glin Lieu Pro Cys Ile Asp Lieu. Thir 37 O 375 38O Thr Val His Arg Tyr Thr Tyr Phe Asin Glu Glu Tyr Gly Glin Arg Glin 385 390 395 4 OO Tyr Asp Thr Glu Gly Ala Lieu. Ser Ala Val Lieu Lys Glu Arg Phe Ser 4 OS 41O 415 Glin Val Glu Lys Pro Val Lieu Phe Gly Glu Phe Ala Leu Ser Pro Gly 42O 425 43 O Gly Asp Ile Glin Lys Asp Tyr Asp Pro Glu Gly Ile Glu Phe His Asn 435 44 O 445 Glin Lieu. Trp Ala Ser Lieu Lleu Lieu Lys Ser Lieu. Gly Thr Ala Met His 450 45.5 460 Trp. Thir Trp Gly Ser Tyr Val Asp Lys Asn Arg Lieu. Tyr Ser Lys Tyr 465 470 47s 48O Lieu Pro Val Ser Arg Phe Phe Ala Gly Glu Asp Lieu. Arg Arg Thr Val 485 490 495 Ser Phe Ser Asn Lieu. Asp Ala Val Thr Glu Arg Lieu. Lieu. Ile Lieu. Gly SOO 505 51O Lieu. Arg Llys Thir Asp Arg Ala Cys Lieu. Trp Ile Llys Lys Arg Asp Trp 515 52O 525 Gly Phe Cys Glin Ala Asn. Glu Gly Lys Ser Ser Ser Val Glu Lys Gly 53 O 535 54 O Arg Thr Ala Glu Val Pro Gly Lieu Lys Ala Gly Asp Tyr Glin Val Glu 5.45 550 555 560 Phe Tyr Asp Thr Lys Thr Gly Lys Ile Leu Glu Lys Ser Thr Ile Thr 565 st O sts Ala Ala Gly Glu Thir Lieu. Thir Lieu. Lieu. Lieu Pro Gly Phe Ser Gly Asp 58O 585 59 O

Lieu Ala Wall Lys Lieu Lys Pro Lys Glu Lys Asp Thir Lieu. Trp Llys Ser 595 6OO 605

Ile Asp Phe Pro Arg Pro Llys Llys Ser Ser Arg Thr Glu Phe Leu Gln 610 615 62O

Asp Gly Ala Ile Lieu. Ser Ala Gly Gly Ala Gly Phe Cys Gly Glu Lys 625 630 635 64 O

Glu Glu Tyr Arg Phe Val Tyr Glin Glin Ala Ser Gly Asp Phe Arg Lieu. 645 650 655 US 2015/021 1 037 A1 Jul. 30, 2015 34

- Continued

Ser Ala Glu Ile Arg Ser Lieu. Thir Asn Lieu. Gly Glu Arg Val Ala Ala 660 665 67 O Gly Lieu Met Val Arg Asp Ser Lieu. Glu Pro Glu Ser Gly Tyr Ile Ala 675 68O 685 Val Lieu. Lieu. His Pro Tyr Ser Lys Ala Glin Val Ile Ile Arg Arg Asp 69 O. 695 7 OO Gly Asn Thr Glu Ile Lieu Lys Glu Phe Asp Ala Gly Glu Arg Pro Cys 7 Os 71O 71s 72O Phe Gly Lieu. Asn Arg Ala Ala Gly Val Lieu. Thr Val Arg Lieu Ala Lys 72 73 O 73 Gln Gly Arg Glu Trp Glu Pro Val Phe Glin Ile Glin Val Ser Lys Glu 740 74. 7 O Lys Glu Lieu. Lieu Val Gly Lieu. Thir Ala Ala Ser Ser His Thir Ile Thr 7ss 760 765 Tyr Ile Thr Ala Glu Phe His Glin Lieu. Arg Lieu Ala Lys Ile Glu Glu 770 775 78O

Glu Ile Lieu. 78s

<210s, SEQ ID NO 8 211 LENGTH: 787 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223 OTHER INFORMATION: B12N6O OPITP

<4 OOs, SEQUENCE: 8 Met Thr Glin Asp Llys Glu Lieu. Cys Phe Lieu. Trp Asp Asn. Thir Ile His 1. 5 1O 15 Gly Ala Ile Pro Ala Phe Glu, Llys Phe Glu Ala Ala Phe Ala Phe Glu 2O 25 3O Llys Val Phe Glu. His Pro Tyr Cys Pro Glu Glu Val Asp Lieu Lys Ala 35 4 O 45 Tyr Ile Lieu Lys Pro Asn Gly Asp Glin Lys Glin Ile Ser Gly Phe Trp SO 55 6 O Tyr Glu Gly Phe Glin Arg Val Lieu. Arg Asn Gly Arg Glu Ile Lieu. Ile 65 70 7s 8O Ser Thr Lieu. Glu Lys Asp Trp Arg Ile Arg Tyr Ser Ala Glin Val Pro 85 90 95 Gly Glu Tyr Arg Tyr Tyr Val Thir Lieu. Lieu. Asp Llys Lys Arg His Arg 1OO 105 11 O Ser Tyr Arg Tyr Pro Glu Lys Gly Glu Lieu Ser Phe Thr Val Thr Pro 115 12 O 125 Ser Asp Arg Lys Gly Phe Lieu. Arg Val Ser Ala Arg Asp Pro Ala Tyr 13 O 135 14 O Lieu. Glu Phe Ser Asp Gly Ser Pro Tyr Lieu. Gly Ile Gly. His Asn Lieu. 145 150 155 160 Cys Gly Trp Glu Trp Gly Gly Thr Asp Asn Arg Lieu. Gly. Thir Tyr Glu 1.65 17O 17s

Tyr Asp Arg Trp Lieu. Ser Ser Met Ala Glin Asn Gly Ala Asn Lieu. Thir 18O 185 19 O

Glin Phe Asp Phe Cys Glu Gly Asp Glin Ile Glu Trp Thr Pro Cys Asp 195 2OO 2O5 US 2015/021 1 037 A1 Jul. 30, 2015 35

- Continued Asn Glu Lieu Pro Phe Ser Glu Asp Trp Llys Gly Lieu. Asn. Glu Tyr Asn 21 O 215 22O Glin Glin Asn Ala Trp Llys Met Asp Arg Arg Phe Glin Thr Ala Glu Glu 225 23 O 235 24 O Lieu. Gly Ile Phe Phe Arg Lieu. Ser Lieu. Phe His Trp Glu Asp Phe Asp 245 250 255 Asp Glu Thr Glu Lys Phe Pro Asp Trp Gly Trp Asn Arg Asn Pro Tyr 26 O 265 27 O His Asp Glin Asn Gly Gly Pro Ala Lys Asn Val Ser Glu Phe Phe Glu 27s 28O 285 Llys Pro Ala Cys Llys Llys Tyr Val Arg Tyr Tyr Lieu Lys Tyr Val Ala 29 O 295 3 OO Ala Arg Trp Gly Tyr Ser Pro Asn Lieu Met Ala Tyr Glu Lieu. Trp Asn 3. OS 310 315 32O Glu Ile Asp Ala Pro Glu Val Met Trp Arg Ala Gly Glu Asp Tyr Asp 3.25 330 335 Gln Glu Ala Ser Lys Val Ile Gly Trp His Ser Glu Met Gly Ser Tyr 34 O 345 35. O Lieu Lys Glin Lieu. Asp Ser Llys His Lieu Val Thir Ser Ser Phe Ala Asp 355 360 365 Ser Arg Arg Asp Lieu. Asn Lieu. Trp Glin Lieu Pro Cys Ile Asp Lieu. Thir 37 O 375 38O Thr Val His Arg Tyr Thr Tyr Phe ASn Glu Glu Tyr Gly Glin Arg Glin 385 390 395 4 OO Tyr Asp Thr Glu Gly Ala Lieu. Ser Ala Val Lieu Lys Glu Arg Phe Ser 4 OS 41O 415 Glin Val Glu Lys Pro Val Lieu Phe Gly Glu Phe Ala Leu Ser Pro Gly 42O 425 43 O Gly Asp Ile Glin Lys Asp Tyr Asp Pro Glu Gly Ile Glu Phe His Asn 435 44 O 445 Glin Lieu. Trp Ala Ser Lieu Lleu Lieu Lys Ser Lieu. Gly Thr Ala Met His 450 45.5 460 Trp. Thir Trp Gly Ser Tyr Val Asp Lys Asn Arg Lieu. Tyr Ser Lys Tyr 465 470 47s 48O Lieu Pro Val Ser Arg Phe Phe Ala Gly Glu Asp Lieu. Arg Arg Thr Val 485 490 495 Ser Phe Ser Asn Lieu. Asp Ala Val Thr Glu Arg Lieu. Lieu. Ile Lieu. Gly SOO 505 51O Lieu. Arg Llys Thir Asp Arg Ala Cys Lieu. Trp Ile Llys Lys Arg Asp Trp 515 52O 525 Gly Phe Cys Glin Ala Asn. Glu Gly Lys Ser Ser Ser Val Glu Lys Gly 53 O 535 54 O

Arg Thr Ala Glu Val Pro Gly Lieu Lys Ala Gly Asp Tyr Glin Val Glu 5.45 550 555 560

Phe Tyr Asp Thr Lys Thr Gly Lys Ile Leu Glu Lys Ser Thr Ile Thr 565 st O sts

Ala Ala Gly Glu Thir Lieu. Thir Lieu. Lieu. Lieu Pro Gly Phe Ser Gly Asp 58O 585 59 O

Lieu Ala Wall Lys Lieu Lys Pro Lys Glu Lys Asp Thir Lieu. Trp Llys Ser 595 6OO 605

Ile Asp Phe Pro Arg Pro Llys Llys Ser Ser Arg Thr Glu Phe Leu Gln US 2015/021 1 037 A1 Jul. 30, 2015 36

- Continued

610 615 62O Asp Gly Ala Ile Lieu. Ser Ala Gly Gly Ala Gly Phe Cys Gly Glu Lys 625 630 635 64 O Glu Glu Tyr Arg Phe Val Tyr Glin Glin Ala Ser Gly Asp Phe Arg Lieu. 645 650 655 Ser Ala Glu Ile Arg Ser Lieu. Thir Asn Lieu. Gly Glu Arg Val Ala Ala 660 665 67 O Gly Lieu Met Val Arg Asp Ser Lieu. Glu Pro Glu Ser Gly Tyr Ile Ala 675 68O 685 Val Lieu. Lieu. His Pro Tyr Ser Lys Ala Glin Val Ile Ile Arg Arg Asp 69 O. 695 7 OO Gly Asn Thr Glu Ile Lieu Lys Glu Phe Asp Ala Gly Glu Arg Pro Cys 7 Os 71O 71s 72O Phe Gly Lieu. Asn Arg Ala Ala Gly Val Lieu. Thr Val Arg Lieu Ala Lys 72 73 O 73 Gln Gly Arg Glu Trp Glu Pro Val Phe Glin Ile Glin Val Ser Lys Glu 740 74. 7 O Lys Glu Lieu. Lieu Val Gly Lieu. Thir Ala Ala Ser Ser His Thir Ile Thr 7ss 760 765 Tyr Ile Thr Ala Glu Phe His Glin Lieu. Arg Lieu Ala Lys Ile Glu Glu 770 775 78O

Glu Ile Lieu. 785

<210s, SEQ ID NO 9 &211s LENGTH: 504 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: BOUPRO METS4

<4 OOs, SEQUENCE: 9 Met Thr Asp Asp Arg Ala Lieu. Asp Pro Asp Cys Ser Lieu Ala Asp Gly 1. 5 1O 15 Pro Ser Pro Ser Pro Arg Pro Asp Pro Ala Pro Ala Arg Thr Ala Gly 2O 25 3O Ala Arg Gly Gly Arg Lieu Pro Trp Ile Arg Val Ala Gly Pro Gly Ile 35 4 O 45 Pro Tyr Phe Glu Thr Glu Thr Gly Ala Ala Trp Thr Pro Val Gly Glin SO 55 6 O Asn Asp Ala Ile Ser Trp His Glu Lieu. Glu Gly Lieu. Phe Gly Arg Arg 65 70 7s 8O Asp Lieu Ala Ala Ala Glu Ala His Lieu. Arg His Lieu Ala Asp His Gly 85 90 95 Val Thr Cys Lieu. Arg Lieu Met Lieu. Glu Tyr Ala Glin Val Arg His Arg 1OO 105 11 O

Tyr Ile Glu Arg Pro Val Gly Arg Phe Val Pro Ala Met Val Arg Lieu. 115 12 O 125

Trp Asp Asp Lieu. Phe Ala Lieu. Cys Glu Thr Val Gly Lieu. Arg Ile Lieu. 13 O 135 14 O Lieu. Thr Pro Phe Asp Thr Phe Trp Met Trp Lieu. His Trp His Arg His 145 150 155 160 Pro Tyr Asn Arg Arg His Gly Gly Pro Lieu Ala Glu Pro Ser Arg Phe US 2015/021 1 037 A1 Jul. 30, 2015 37

- Continued

1.65 17O 17s Lieu. Lieu. Asp Pro Glin Val Arg Glu Ala Ile Lys Asn Arg Lieu Ala Phe 18O 185 19 O Ala Val Ala Arg Trp Gly Gly Ser Gly Ala Lieu. Phe Ala Trp Asp Lieu 195 2OO 2O5 Trp Asn. Glu Ile His Pro Ala His Ala Glu Gly Ser Ala Glu Gly Phe 21 O 215 22O Ala Pro Phe Ile Ala Asp Lieu. Ser Arg His Val Arg Ala Lieu. Glu Thr 225 23 O 235 24 O Arg Lieu. Tyr Gly Arg Ala His Pro Gln Thr Val Ser Leu Phe Gly Pro 245 250 255 Glu Lieu. Gly Trp Arg Pro His Lieu. Gly Lieu. Glu Glu Pro Ile Phe Arg 26 O 265 27 O His Pro Asp Lieu. Asp Phe Ala Thr Lieu. His Ile Tyr Ala Glu Gly. Thir 27s 28O 285 Ile Asp Asp Pro Arg Asn Thr Val Glu Pro Ala Ile Ala Met Gly Arg 29 O 295 3 OO Ile Val Arg Glu Gly Lieu Ala Glin Ile Arg Asp Gly Arg Pro Phe Lieu. 3. OS 310 315 32O Asp Ser Glu. His Gly Pro Ile His Ser Phe Lys Asp Arg Arg Lieu. Thr 3.25 330 335 Lieu Pro Glu Pro Phe Asp Asp Glu Tyr Phe Arg His Met Glin Trp Ala 34 O 345 35 O His Lieu Ala Ser Gly Gly Ala Gly Gly Gly Met Arg Trp Pro Asn Arg 355 360 365 His Pro His Ser Lieu. Thir Ala Gly Met Arg Ala Ala Glin Arg Gly Lieu. 37 O 375 38O Ser Gly Phe Lieu Pro Lieu. Ile Asp Trp Arg Arg Phe Arg Arg Arg Asn 385 390 395 4 OO Lieu. Ser Gly Asp Lieu. Gly Asp Pro Gly Pro Gly Ala Ala Lieu. Phe Ala 4 OS 41O 415 Cys Gly Asp Ala Glu Glin Ala Val Ile Trp Cys Lieu. Arg Ala Asp Ser 42O 425 43 O Lieu Ala Pro Asp Gly Arg Lieu. Arg Arg Asp Ala Ala Pro Lieu. Gly Ile 435 44 O 445 Arg Lieu Ala Lieu Pro Gly Lieu. Arg Ala Gly Arg Tyr Ala Lieu. Thir Ala 450 45.5 460 Trp Asp Thr Arg Ala Gly Arg Pro Cys Gly Arg Arg Glu Val Thir Ala 465 470 47s 48O Arg Asp Gly Val Ala Thr Glu Ile Glu Pro Pro Pro Phe Val Thr Asp 485 490 495

Val Ala Lieu Ala Val Arg Arg Val SOO

<210s, SEQ ID NO 10 211 LENGTH: 777 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: D1N4499BACT

<4 OOs, SEQUENCE: 10 Met Arg Arg Ile Ser Ala Leu Phe Pro Thr Ile Leu Ser Leu Ala Phe US 2015/021 1 037 A1 Jul. 30, 2015 38

- Continued

1. 5 1O 15 Lieu Pro Lieu. Lieu. Asn Ala Ala Glu Lieu. Thr Gly Val Trp Lys Ala Asp 2O 25 3O Gly Thr Asn Thr Pro His Ser Ser Pro Glu Ala Pro Gly Glu Ala Ala 35 4 O 45 Val Thr Val Arg Phe Pro Gly Ser Ala Gln Leu Tyr Arg Glu Pro Asp SO 55 6 O Arg Ala Thir Phe Arg Pro Ser Arg Glu Ala Phe Glu Ala Ala Glu Phe 65 70 7s 8O Glu Lieu. Glu Ala Arg Val Ile Thr Asp Thr Pro Asp Pro Val Arg Ala 85 90 95 Trp Lieu. Phe Phe Lys Asp Lys Asp Gly Arg Trp Tyr Glin Thir Ile Glu 1OO 105 11 O Glu Tyr Arg Lieu Ala Pro Gly Val Trp Gln Lys Lieu. Ser Ala Arg Lieu. 115 12 O 125 Asp Arg Thr Gly Ala Val Trp Arg Gly Val Gly His Thr Ala Thr Phe 13 O 135 14 O Asp Ala Met Ala Ala Thr Glu Phe Tyr Ala Gly Gly Ile Ser Val Tyr 145 150 155 160 Gly Glu Glu Lys Arg Glu Phe Thr Lieu. Glu Val Arg Asn Ala Ala Arg 1.65 17O 17s Thr Gly Lys Arg Glu Pro Gly Lys Lieu Ala Lieu. Lieu. Asp Cys His Phe 18O 185 19 O Pro Glu Glin Gly Glu Ala Asn Ala Lieu. Phe Glin Gly Arg Phe Arg Lieu. 195 2OO 2O5 Lieu. Arg Glu Phe Phe Asn Pro Phe Asp Pro Asp Glu Val Thr Val Asp 21 O 215 22O Phe Glu Ile Lys Ala Pro Asn Gly Lys Lieu. Thir Arg Lieu Pro Ala Phe 225 23 O 235 24 O Tyr Ser Arg Asp Tyr Glu Arg Arg Lieu. His His Thr Arg Glu Thir Ala 245 250 255 Thr Pro Ile Gly Glin Gly Phe Trp Glu Phe Arg Phe Thr Pro Pro Val 26 O 265 27 O Pro Gly Glu Tyr Arg Lieu. Arg Ala Val Ile Ala Asp Llys Thir Ala Arg 27s 28O 285 Glu Thr Val Thr Gly Ser Trp Llys Ser Phe Thr Ala Leu Pro Ser Arg 29 O 295 3 OO Arg Pro Gly Lieu Val Arg Ala Ser Glu Lys Asp Pro Phe Phe Glu Lieu 3. OS 310 315 32O Gly Thr Gly Glu Phe Phe Phe Pro Val Gly Lieu. Asn Ile His Thr Asn 3.25 330 335 Thir Asp Arg Arg Ser Glu Phe Gly Phe Llys Phe Gly Glin Lieu Pro Asp 34 O 345 35. O

Arg Gly. Thir Phe Asp Tyr Asp Asp Tyr Lieu. Glu Ala Cys Gly Arg Gly 355 360 365

Gly Ile Asn Ala Val Glu Ile Trp Met Ala Gly Trp Thr Tyr Ala Ile 37 O 375 38O Glu. His Asp Ala Thr Arg Ala Gly Asn Tyr Gly Val Gly Arg Tyr Asn 385 390 395 4 OO Lieu. Glu Ala Ala Trp Llys Lieu. Asp His Ile Phe Glu Glin Ala Arg Llys 4 OS 41O 415 US 2015/021 1 037 A1 Jul. 30, 2015 39

- Continued

Asn Gly Ile Tyr Lieu. Asn Lieu. Ile Lieu. Asp Asn His Gly Arg Lieu. Ser 42O 425 43 O Asp Arg Ser Asp Pro Glu Trp Glin Asp Asin Pro Ile Asn. Ser Thir Thr 435 44 O 445 Pro Tyr Ala Lys Ala Asn Gly Gly Phe Lieu Ala Asn Pro Ala Asp Phe 450 45.5 460 Phe Arg Ser Glu Ala Ala Glu Lys Asn Asp Arg Lys Arg Ala Arg Tyr 465 470 47s 48O Ile Ala Ala Arg Trp Gly Asn Ala Pro Asn Lieu Met Ala Val Glu Lieu. 485 490 495 Trp Ser Glu Val Asp Lieu. Thr Glu Asp Tyr Trp Gly Arg Tyr Asn Asp SOO 505 51O Gly Ser Ala Ile Arg Trp Ala Glu Lys Ala Ala Ala Phe Lieu. Glin Ala 515 52O 525 Asn Ser Arg Pro Asp Leu Pro Val Ser Ile His Phe Cys Ser Asp Tyr 53 O 535 54 O Asn Asn Val Arg Arg Phe Ile Llys Lieu. Phe Asp ASn Pro Ser Ile Thr 5.45 550 555 560 His Lieu Ala Gly Asp Ala Tyr Arg Ser Pro Glin Ile His Phe Val Asp 565 st O sts His Lieu. Arg Gly Tyr Glu Glin Asn Met Arg Tyr Asn Llys Pro Glin Lieu. 58O 585 59 O Ile Thr Glu Phe Gly Gly Asn Pro Glin Gly Ser Ser Glu Arg Glin Val 595 6OO 605 Lieu Ala Asp Ile His Ser Gly Lieu. Trp Ser Ser Lieu. Phe Val Arg Lieu. 610 615 62O Ala Gly Thr Pro Phe Leu Trp Trp His Asp Phe Val His Lieu. Arg Asn 625 630 635 64 O His Tyr Gln His Tyr Lieu. Gly Phe Ser Arg Tyr Lieu Ala Gly Ile Asp 645 650 655 Lieu. Arg Gly Lys Glu Arg Val Tyr Phe Thr Pro Ala Val Ala Val Pro 660 665 67 O Ala Asn Glin Glin Llys Tyr Glu Ser Lieu. Gly Lieu. Ser Lieu Pro Ala Ala 675 68O 685 Ala Tyr Gly Trp Ile Tyr Asn Arg Asn Ala Met Lieu. Glu Tyr Pro Asp 69 O. 695 7 OO Asp Pro Asn Glin Phe Pro Glu Thr Arg Pro Gly Ser Val Thr Lieu Ala 7 Os 71O 71s 72O Gly His Asn Lieu. Thr Gly Gly Val Tyr Lieu. Leu Arg Trp Phe Val Pro 72 73 O 73 Lieu. Thr Gly Glu. Cys Lieu Pro Gly Glu Lieu Lys Lieu. Asn Val Glu Ala 740 74. 7 O

Gly Llys Pro Val Thr Phe Ala Val Pro Ser Phe Arg Lieu. Asp Lieu Ala 7ss 760 765

Phe Llys Lieu. Glu Lys Thr Glu Ala Lys 770 775

<210s, SEQ ID NO 11 &211s LENGTH: 566 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: US 2015/021 1 037 A1 Jul. 30, 2015 40

- Continued

<223> OTHER INFORMATION: A7HFC4 ANADF

<4 OOs, SEQUENCE: 11 Met Ser Gly Val Thr Thr Arg Arg Lieu. His Cys Thr Gly Pro Arg Ser 1. 5 1O 15 Ala Ala Ala Ala Met Lieu Ala Ala Ala Lieu Ala Lieu. Gly Cys Ala Arg 2O 25 3O Ala Pro Val Arg Pro Gly Ala Asp Ala Ala Gly Gly Arg Glu Ala Ala 35 4 O 45 Val Asn Gly Val Val Glu Lieu. Arg Lieu Ala Val Pro Ala Gly Ala Pro SO 55 6 O Val Arg Ala Glu Val Lieu Ala Pro Ser Gly Ala Arg Ile His Val Pro 65 70 7s 8O Ala Phe Pro Val Pro Gly Gly Trp Ala Ala Arg Phe Arg Pro Arg Glu 85 90 95 Pro Gly Arg His Arg Trp Val Ala Arg Ser Gly Glu Gly Ala Ala Ala 1OO 105 11 O Ala Glu Val Ala Arg Gly Glu Val Met Ala Glu Asp Arg Gly Lieu Ala 115 12 O 125 Gly Glin Val Ile Val Ser Gly Gly. Thir Lieu. Arg Thr Glu Asp Gly Arg 13 O 135 14 O Pro Phe Arg Pro Leu Gly Glu Asn Arg Phe Asn Val Tyr Asp Pro Thr 145 150 155 160 Trp Ser Asp Gly Lieu. Ser Pro Ala Asp Tyr Val Ala Arg Met Ala Ala 1.65 17O 17s Asp Gly Met Asn Ala Lieu. Arg Val Phe Val Phe Thr Ala Cys Gly Arg 18O 185 19 O Ala Gly Thr Met Pro Asn Pro Gly Cys Lieu. Glu Pro Val Lieu. Gly Ala 195 2OO 2O5 Phe Asp Glu Ala Ala Ala Ala Arg Tyr Asp Ala Ile Phe Ala Ala Ala 21 O 215 22O Glu Ala His Gly Val Llys Val Val Lieu. Ser Val Phe Ala Ile Gly Phe 225 23 O 235 24 O Thr Pro Gly Asp Ala Trp Llys Gly Trp Glu Glu ASn Pro Tyr Ser Ala 245 250 255 Ala Arg Gly Gly Pro Ala Ala Gly Asn. Thir Asp Phe Phe Lieu. Asp Pro 26 O 265 27 O Arg Ala Arg Glu Ala Ala Arg Ala Arg Lieu. Arg Tyr Val Lieu Ala Arg 27s 28O 285 Trp Gly Ala Ser Pro Ala Lieu. Lieu Ala Ile Asp Lieu. Lieu. Asn. Glu Pro 29 O 295 3 OO Glu Trp Asp Gly Ala Ile Pro Glu Asp His Trp Ile Pro Trp Ala Glu 3. OS 310 315 32O Asp Lieu Ala Arg Thir Trp Arg Ala Glu Asp Pro Tyr Gly His Pro Val 3.25 330 335 Thir Ala Gly Pro Val Gly Lieu. His Trp Asn Val Glu Glu Asp Glu Arg 34 O 345 35. O Ala Trp Trp Ala Ser Ala Ala Cys Asp Ile Val Glin Trp His Arg Tyr 355 360 365 Gly Pro Asp Wal His Asp Wal His Asp Lieu Ala Glu Ala Lieu Val Glu 37 O 375 38O US 2015/021 1 037 A1 Jul. 30, 2015 41

- Continued Thir Thr Arg Asp Thr Ala Arg Tyr Gly Llys Pro Val Lieu. Ile Gly Glu 385 390 395 4 OO Phe Gly Trp Gly Gly Asp Ala Lys Pro Glu. His Asp His Thr His Val 4 OS 41O 415 Gly Ile Trp Ala Ala Thr Phe Ala Gly Ala Gly Val Lieu. Ser His Ser 42O 425 43 O Ala Pro Pro Phe Thr Glu Asp Ser Asp Glu Pro Met Thr Pro Ala Arg 435 44 O 445 Ala Arg His Phe Arg Thr Lieu Ala Ala Phe Lieu. Arg Arg Ala Glu Ala 450 45.5 460 Arg Gly Pro Lieu Ala Pro Ala Pro Glu Pro Ala Val Arg Arg Ala Pro 465 470 47s 48O Gly Lieu. Arg Ala Lieu Ala Lieu. Gly Gly Glu Arg Ala Ala Ala Val Trp 485 490 495 Lieu. Lieu Ala Pro Arg Pro Gly Tyr Gly Gly Arg Val Lys Gly Ala Arg SOO 505 51O Lieu. Thir Lieu Ala Gly Ile Ala Pro Gly Arg Trp Arg Val Thir Trp Val 515 52O 525 Glu Asp Val Ser Gly Glu Val Ile Ala Val Glu Glu Arg Asp Ala Ser 53 O 535 54 O Gly Pro Lieu Pro Lieu. Asp Val Pro Pro Phe Ala Arg His Val Ala Ala 5.45 550 555 560 Lieu Val Glu Arg Ile Glu 565

<210s, SEQ ID NO 12 &211s LENGTH: 591 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: A9AYF5. HERA2

<4 OOs, SEQUENCE: 12 Met Arg Arg Trp Lieu. Tyr Arg Lieu. His Lieu. Trp Lieu Val Lieu. Lieu. Lieu. 1. 5 1O 15 Lieu. Ile Ala Ala Cys Thr Glin Val Gly Glu Ser Gly Gly Asn Glin Thr 2O 25 3O Lieu. Ser Lieu. Arg Thr Lieu. Thr Gly Asn Ala Ala Val Phe Gly. Thir Ile 35 4 O 45 Glu Lieu Ala Ile Asp Thr Thr Ile Thr Val Ala Asn Pro Tyr Asp Pro SO 55 6 O Asn Glin Ile Asp Leu Met Val Ser Phe Ile Ser Ala Thr Gly Glin Ile 65 70 7s 8O Tyr Arg Val Pro Ala Phe Trp Tyr Glin Asp Phe Asp Gln Lieu. Ser Lieu. 85 90 95

Gln Pro Lys Gly Asn Pro Glu Trp Arg Val Arg Phe Thr Pro Ser Glu 1OO 105 11 O

Pro Gly Ala Trp Glin Val Lys Ala Glu Lieu Ala Lys Pro Ala Lieu. Ser 115 12 O 125

Ser Asp Val Ile Thr Ile Glu Val Ser Ala Asn Lys Glin Ser Pro Gly 13 O 135 14 O

Phe Val Arg Ile Asn. Thir Ser Asn Pro Arg Tyr Phe Ala Arg Glin Asp 145 150 155 160 US 2015/021 1 037 A1 Jul. 30, 2015 42

- Continued Gly Thr Phe Phe Met Pro Ile Gly Lieu. Asn Lieu. Gly Trp Ser Thr Glin 1.65 17O 17s Glin Gly. Thr Gly Ile Lieu. Arg Glu Tyr Glu. His Trp Phe Asp Gln Lieu. 18O 185 19 O Ser Lys Asn Gly Gly Asn. Ile Ala Arg Ile Trp Met Ala Ser Trp Ser 195 2OO 2O5 Phe Gly Ile Glu Trp Glin Asp Thr Gly Lieu. Gly Asp Tyr Ser Lys Arg 21 O 215 22O Met Glin Glin Ala Trp Met Lieu. Asp Glin Ile Phe Llys Lieu Ala Glu Glin 225 23 O 235 24 O Arg Asn Ile Thr Ile Met Lieu. Thir Lieu. Ile Asn His Gly Ala Phe Ser 245 250 255 Thir Ser Thr Asp Ser Glu Trp Ala Ser Asn Pro Tyr Asn Ala Ala Asn 26 O 265 27 O Gly Gly Pro Ile Ala Glu Pro Arg Lieu. Phe Ala Thr Asp Ile Glin Ser 27s 28O 285 Arg Glu Val Phe Llys His Arg Val Arg Tyr Ile Ala Ala Arg Trp Ala 29 O 295 3 OO His Ser Pro Ser Leu Phe Ala Trp Glu Trp Trp Asn Glu Ala Asn Trp 3. OS 310 315 32O Thr Pro Ile Asn Asp Ala Leu Met Gln Pro Trp Ile Ser Glu Met Thr 3.25 330 335 Arg His Lieu Ala Glin Phe Asp Pro Tyr Gln His Lieu Val Ser Thr Ser 34 O 345 35. O Tyr Ala Ser Asn Thr Ser Thr Ser Met Trp Val Glin Pro Glu Ile Asn 355 360 365 Phe Thr Gln His His Asp Tyr Thr Gly Arg Asp Leu Gly Glin Ala Phe 37 O 375 38O Pro Lieu Val Ile Arg Glu Lieu. Asn Ala Ala Ala Pro Gln Llys Pro Ala 385 390 395 4 OO Lieu Val Ser Glu Lieu. Gly Tyr Ala Gly Thr Gly Arg Asp Glu Val Ile 4 OS 41O 415 Asn Arg Asp Val Trp Glin Phe His Glin Gly Lieu. Trp Ala Ala Pro Phe 42O 425 43 O Ser Gly Phe Ala Gly Ser Gly Met Tyr Trp Trp Trp Asp Thr Lieu Val 435 44 O 445 Asp Pro Asp Asn Lieu. Trp Ser Glu Tyr Ser Lys Lieu Ala Glu Phe Phe 450 45.5 460 Lys Asp Glin Asp Lieu. Thir Ile Tyr Asn Pro Val Val Ala Glin Ile Ser 465 470 47s 48O Pro Lieu Lys Ala Arg Ala Lieu Ala Lieu. Glin Thr Lys Ser Glin Ala Lieu. 485 490 495

Val Trp Val Arg Ser Asn. Glu Tyr Glu Pro Glu Ala Lieu. Thir Lys Ala SOO 505 51O

Tyr Glu Glu Ala Lieu Lys Lys Arg Glu Phe Asn Asp Thir Trp Glu Tyr 515 52O 525

Val Pro Pro Thr Tyr Ala Asp Lieu. Thir Lieu Lys Lieu. Asn Gly Lieu. Glu 53 O 535 54 O

Ala Gly Asn Tyr Glin Ala Thir Trp Tyr Asp Pro Gln Thr Gly. Thir Trp 5.45 550 555 560

Ser Glin Pro Thir Thir Wall. Thir Lieu. Glu Ala Asn. Glin Ser Ser Ile Ala US 2015/021 1 037 A1 Jul. 30, 2015 43

- Continued

565 st O sts Val Pro Ser Phe ASn Tyr Asp Lieu Ala Lieu Lys Lieu Val Lys Glin 585 59 O

SEQ ID NO 13 LENGTH: 557 TYPE : PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: C7PTR3 CHIPD

SEQUENCE: 13

Met Lys Asn Tyr Val Ala Ile Pro Cys Ala Lell Luell Met Ala 1. 5 1O 15

Thir Phe Thir Val Tyr Ala Asn Asp Ser Thir Lys Lell Glin Arg Ile Thir 2O 25

Pro Pro Ala Ala Ala Wall Asn Luell Tyr Glu Lys Ala Glu Trp Thir Ile 35 4 O 45

Asp Luell Thir Ala Asn Tyr Ser Asn Pro Asp Glin Arg Glu Ile SO 55 6 O

Lell Asp Met Cys Lieu Wall Ser Pro Ser Gly Lys Pro Lell Luell Luell Pro 65 70

Ala Phe Asp Glin Wall Asn His His Trp Glin Ser Arg Phe Ala Pro 85 90 95

Gln Glu Thr Gly Glin Tyr Gln Tyr Phe Glu Lieu Ile Ala Gly Lys 1OO 105 11 O

Asp Thir Wall Glin Ser Pro Ser Wall Phe Thir Wall Tyr Ser Thir 115 12 O 125

Arg Lys Gly Phe Lieu. His Lys Asn Asp Luell Trp Thir Phe Arg Phe Asp 13 O 135 14 O

Asn Gly Glu Lieu. Phe Arg Gly Wall Gly Glu ASn Wall Ala Trp Glu Ser 145 150 155 160

Arg Ser Phe Glu Asp Asp Trp Thir Tyr Asp Tyr Lell Luell Pro Ser 1.65 17s

Lell Ala His Asin Gly Ala Asn Phe Phe Arg Thir Trp Met Cys Trp 18O 185 19 O

Asn Luell Pro Lieu. Glu Trp Glin Pro Arg Ser Thir Lys Arg Glin 195

Pro Ser Ala Glu Tyr Phe His Pro Gly Ala Ile Arg Arg Met Asp Glin 21 O 215

Lell Wall Asp Met Cys Asp Ser Luell Gly Luell Tyr Phe Met Luell Thir Luell 225 23 O 235 24 O

Asp Trp His Gly His Lell Met Glu His Gly Gly Trp His Ser Ser 245 250 255

Asn Ala Asn Gly Gly Pro Ala Glu Thir Pro Thir Ala Phe Phe 26 O 265 27 O

Thir Ser Glin Glin Ala Glin Glu Lys ASn Lell Arg Ile 27s 285

Ile Ala Arg Trp Gly Tyr Ser Ser Ser Ile Ala Wall Trp Glu Phe Phe 29 O 295 3 OO

Asn Glu Wall Asp Asn Ala Ala Phe Thir Glin Glin Asp Ser Ile Luell Ile 3. OS 310 315

Pro Luell Pro Wall Ile Ala Glin Trp His Luell Glu Met Ser Arg Luell US 2015/021 1 037 A1 Jul. 30, 2015 44

- Continued

3.25 330 335 Lys Asp Ile Asp Pro Tyr His His Leu Val Ser Thr Ser Ile Ser His 34 O 345 35. O Arg Asp Ile Ile Gly Met Asn Ala Ile Pro Tyr Ile Asp Phe Asin Glin 355 360 365 Lys His Ile Tyr Lys His Thr Glu Lys Ile Pro Gly Ile Tyr Pro Asp 37 O 375 38O Tyr Ile Glin Thr Phe Gly Llys Pro Tyr Val Val Gly Glu Phe Gly Tyr 385 390 395 4 OO Arg Trp Glu Asp Glin Asp Pro Llys Tyr Ala Thr Glu Ala Asn Tyr Asp 4 OS 41O 415 Tyr Arg Arg Gly Lieu. Trp Tyr Gly Met Phe Ser Pro Thr Pro Val Lieu. 42O 425 43 O Pro Met Ser Trp Trp Trp Glu Lieu Phe Asp Asp Gln His Met Thr Pro 435 44 O 445 Tyr Lieu. Glin Ser Val Ser Thr Ile Asn Llys Met Met Leu Glin Ala Gly 450 45.5 460 Lys Gly Glin Phe Glu Glin Lieu Pro Val Glin Ala Ala Ile Lieu. Glu Ser 465 470 47s 48O Tyr Ala Ile Lys Cys Gly Asn. Thir Ile Phe Val Tyr Ala Lieu. Asn. Asn 485 490 495 Thir Thr Lys Glin Glin Ser Ala Asp Ile Arg Val Asn. Ile Pro Ser Gly 500 505 51O Tyr Thr Lieu Gln Cys Phe His Pro Leu Lys Asn. Thir Trp Asn Llys Ser 515 52O 525 Ile Tyr Lys Arg Thr Ala Asp Gly Thr Val Glin Ile Ser Asn Thr Val 53 O 535 54 O Lieu Pro Ala Lys Glu Glu Ile Ile Lieu Val Phe Llys Pro 5.45 550 555

<210s, SEQ ID NO 14 &211s LENGTH: 843 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: EBI5326244 deduced protein sequence <4 OOs, SEQUENCE: 14 Met Lieu Lys Llys Val His Ile Ile Ala Ile Val Val Ile Ile Ala Ile 1. 5 1O 15 Ala Phe Ala Lieu. Ile Lieu Ala Arg Tyr Tyr Thr Met Glin Arg Gly Tyr 2O 25 3O

Gu. Thir Wall. Thir Pro Thir Thr Pro Pro Gin Glin. Thir Thir Thir Thr Gul 35 4 O 45

Thir Thr Pro Val Pro Thr Glu Ala Gly Thr Thr Thr Pro Ile Thr Glu SO 55 6 O

Ala Thir Wall Thr Glin Pro Pro Glin. Thir Pro Thir Thr Pro Ser Pro Glin 65 70 7s 8O

Thir Pro Thir Thr Pro Thir Ala Leu Pro Thr Pro Se Pro Thr Pro Thr 85 90 95

Ala Pro Ser Ala Thir Wall. Thir Glu. Thir Thir Ser Pro Glin. Thir Pro Thr 1OO 105 11 O

Thir Thir Ilie Thir Thr Gu Thir Thir Thir Thr Pro Ala Pro G. Pro Gin US 2015/021 1 037 A1 Jul. 30, 2015 45

- Continued

115 12 O 125 Val Val Phe Lieu Lys Lieu Pro Glu Gly Glu Glu Pro Llys Phe Gly Lieu 13 O 135 14 O Val Glu Ile Ala Phe Asn Ile Ser Gly Lieu Ser Tyr Ser Asn Pro Phe 145 150 155 160 Asp Thr Ser Asp Ile Asp Val Trp Val His Ile Glu Thr Pro Ser Gly 1.65 17O 17s Ser Arg Val Ala Val Pro Ala Phe Tyr Phe Glin Asn Tyr Thr Val Lys 18O 185 19 O Arg Lieu. Gly Pro Gly Glu Glu Ile Ile Val Arg Val Gly Arg Pro Tyr 195 2OO 2O5 Trp Lieu Ala Arg Phe Ala Pro Val Glu Glu Gly Val His Llys Phe Tyr 21 O 215 22O Val Lys Ala Val Asp Gly Arg Gly Ser Ala Val Val Ser Glu Ile Arg 225 23 O 235 24 O Glu Phe Met Val Lys Gly Val Ala Gly Arg Gly Phe Val Arg Val Asp 245 250 255 Ser Gly Lys Arg Lieu. Phe Val Phe Asp Ser Gly Glu Ser Met Phe Met 26 O 265 27 O Lieu. Gly Ile Asp Wall Ala Trp Pro Pro Asp Arg Arg Ser Ser Ile Ser 27s 28O 285 Phe Tyr Glu Gln Trp Phe Asp Llys Lieu. Asn Llys Ser Gly Ile Llys Val 29 O 295 3 OO Val Arg Ile Gly Lieu Val Pro Trp Ala Lieu. Thir Lieu. Glu Trp Ser Lys 3. OS 310 315 32O Lieu. His Tyr Tyr Ser Lieu. Asp Asp Ala Ala Arg Ile Asp Glu Ile Val 3.25 330 335 Lys Lieu Ala Glu Lys Tyr Asp Ile Tyr Ile Val Phe Val Phe Met Trp 34 O 345 35. O His Gly Glu Lieu Ala Asp Asn Trp Gly Asp ASn Pro Tyr Asn Ala Ala 355 360 365 Arg Gly Gly Pro Leu Glin Ser Pro Glu Glu Phe Trp Ser Asn Ala Val 37 O 375 38O Ala Ile Ser Ile Phe Lys Asp Llys Val Arg Tyr Ile Ile Ala Arg Trip 385 390 395 4 OO Gly Tyr Ser Thr His Ile Lieu Ala Trp Glu Lieu. Ile Asn. Glu Ala Asp 4 OS 41O 415 Lieu. Thir Thr Asn Phe Phe Ser Ala Arg Ser Ala Phe Val Ser Trp Val 42O 425 43 O Lys Glu Ile Ser Ser Tyr Ile Llys Ser Val Asp Pro Tyr Asn Arg Ile 435 44 O 445

Val Thr Val Asn Lieu Ala Asp Tyr Asn. Ser Glu Pro Arg Val Trp Ser 450 45.5 460 Val Glu Ser Ile Asp Ile Ile Asn. Wal His Arg Tyr Gly Pro Glu Gly 465 470 47s 48O

Phe Lys Asp Ile Ala Lieu Ala Ile Pro Ser Ile Val Glu Gly Lieu. Trp 485 490 495

Asn Thr Tyr Arg Llys Pro Ile Ile Ile Thr Glu Phe Gly Val Asp Tyr SOO 505 51O

Arg Trp Ile Gly Tyr Pro Gly Phe Lys Gly Thr Pro Tyr Trp Ala Tyr 515 52O 525 US 2015/021 1 037 A1 Jul. 30, 2015 46

- Continued

Asp Llys Ser Gly Val Gly Lieu. His Glu Gly Lieu. Trp Ser Ser Ile Phe 53 O 535 54 O Ser Leu Ser Pro Val Ser Ala Met Ser Trp Trp Trp Asp Thr Glin Ile 5.45 550 555 560 Asp Ser Tyr Asn Lieu. Trp Tyr His Tyr Lys Ala Lieu. Tyr Glu Phe Lieu 565 st O sts Llys Ser Val Asp Pro Val Arg Gly Gly Lieu. Gly Lys Ala Arg Ala Ser 58O 585 59 O Lieu Val Ile Thr Asp Val Thr Pro Ser Ser Ile Thr Lieu. Tyr Pro Leu 595 6OO 605 Ala Gly Trp Val Trp Val Ser Pro Val Arg Glu Asn Arg Lieu Val Ile 610 615 62O Arg Pro Asp Gly Ala Ile Glu Gly Arg Val Asp Lieu Lleu Ser Gly Phe 625 630 635 64 O Ile Tyr Gly Thr Cys His Ser Glin Arg Thr Lieu. Asn Pro Val Phe Thr 645 650 655 Val Met Phe Ile Asp Arg Gly Arg Val Val Lieu. His Ile Asn. Ser Val 660 665 67 O Gly Arg Gly Ser Ala Lys Lieu Val Ile Tyr Val Asn Gly Ser Lieu Ala 675 68O 685 Thr Glin Lieu. Asp Lieu Pro Asp Lys Asp Gly Lys Ser Asp Gly Ser Ala 69 O. 695 7 OO Asn Glu Tyr Asp Met Asp Val Glu Lieu. Trp Phe Glu Pro Gly Thr Tyr 7 Os 71O 71s 72O Glu Ile Lys Ile Asp Ser Glu Ala Cys Asp Trp Phe Thir Trp Asp Tyr 72 73 O 73 Ile Val Phe Glu Asn Ala Val Tyr Arg Ala Ala Lys Val Asp Lieu. Tyr 740 74. 7 O Ala Lieu Ala Asn. Ser Thr Phe Ala Met Lieu. Trp Val Arg Asn Lys Asp 7ss 760 765 Tyr Asn Trp Trp Asn Val Val Val Lieu. Asn Llys Thr Lieu. Glu Pro Ala 770 775 78O Glu Gly Val Glu Val Glu Ile Arg Gly Lieu. Glin Asp Gly Val Tyr Arg 78s 79 O 79. 8OO Val Glu Phe Trp Asp Thr Cys Arg Gly Val Val Val Lys Ser Met Glu 805 810 815 Val Glin Val Ser Asn Gly Val Ala Arg Val Pro Val Gly Ser Val Glu 82O 825 83 O Lys Asp Ile Ala Met Lys Ile Thr Arg Ala Gly 835 84 O

<210s, SEQ ID NO 15 &211s LENGTH: 2516 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223 OTHER INFORMATION: EBIS326244 - OA

<4 OOs, SEQUENCE: 15 atgaaaaaga cagctatcgc gattgcagtg gCactggctg gtttcgctac C9ttgcgcaa. 6 O gctgctatgc agagaggcta togaaacagtg acaccitacaa caccacctica gcaaact act 12 O accacagaaa Caactic cagt gcc tacagag gcagg tacta Caacaccaat alactgaggcc 18O

US 2015/021 1 037 A1 Jul. 30, 2015 48

- Continued gttc.cggtgg gtagcgtaga aaaggacata gctatgaaaa t cact agggc tiggcta 2516

<210s, SEQ ID NO 16 &211s LENGTH: 838 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223 OTHER INFORMATION: EBIS326244 - OA

<4 OOs, SEQUENCE: 16 Met Lys Llys Thr Ala Ile Ala Ile Ala Val Ala Lieu Ala Gly Phe Ala 1. 5 1O 15 Thr Val Ala Glin Ala Ala Met Glin Arg Gly Tyr Glu Thr Val Thr Pro 2O 25 3O

Thir Thr Pro Pro Gin GLn. Thir Thir Thir Thr Gu Thir Thr Pro Wall Pro 35 4 O 45 Thr Glu Ala Gly Thr Thr Thr Pro Ile Thr Glu Ala Thr Val Thr Glin SO 55 6 O

Pro Pro Gin Thr Pro Thir Thr Pro Ser Pro GLn. Thir Pro Thir Thr Pro 65 70 7s 8O

Thir Ala Lieu Pro Thr Pro Ser Pro Thr Pro Thir Ala Pro Ser Ala Thr 85 90 95

Wall. Thir Gu Thir Thir Ser Pro Gin Thr Pro Thir Thir Thir Ilie Thir Thr 1OO 105 11 O Glu Thir Thr Thr Thr Pro Ala Pro Gln Pro Glin Val Val Phe Leu Lys 115 12 O 125 Lieu Pro Glu Gly Glu Glu Pro Llys Phe Gly Lieu Val Glu Ile Ala Phe 13 O 135 14 O Asn Ile Ser Gly Lieu Ser Tyr Ser Asn Pro Phe Asp Thr Ser Asp Ile 145 150 155 160 Asp Val Trp Val His Ile Glu Thr Pro Ser Gly Ser Arg Val Ala Val 1.65 17O 17s Pro Ala Phe Tyr Phe Glin Asn Tyr Thr Val Lys Arg Lieu. Gly Pro Gly 18O 185 19 O Glu Glu Ile Ile Val Arg Val Gly Arg Pro Tyr Trp Lieu Ala Arg Phe 195 2OO 2O5 Ala Pro Val Glu Glu Gly Val His Llys Phe Tyr Val Lys Ala Val Asp 21 O 215 22O Gly Arg Gly Ser Ala Val Val Ser Glu Ile Arg Glu Phe Met Val Lys 225 23 O 235 24 O Gly Val Ala Gly Arg Gly Phe Val Arg Val Asp Ser Gly Lys Arg Lieu. 245 250 255 Phe Val Phe Asp Ser Gly Glu Ser Met Phe Met Leu Gly Ile Asp Val 26 O 265 27 O

Ala Trp Pro Pro Asp Arg Arg Ser Ser Ile Ser Phe Tyr Glu Glin Trp 27s 28O 285

Phe Asp Llys Lieu. Asn Llys Ser Gly Ile Llys Val Val Arg Ile Gly Lieu. 29 O 295 3 OO Val Pro Trp Ala Lieu. Thir Lieu. Glu Trp Ser Lys Lieu. His Tyr Tyr Ser 3. OS 310 315 32O

Lieu. Asp Asp Ala Ala Arg Ile Asp Glu Ile Val Llys Lieu Ala Glu Lys 3.25 330 335 US 2015/021 1 037 A1 Jul. 30, 2015 49

- Continued Tyr Asp Ile Tyr Ile Val Phe Val Phe Met Trp His Gly Glu Lieu Ala 34 O 345 35. O Asp Asn Trp Gly Asp Asn Pro Tyr Asn Ala Ala Arg Gly Gly Pro Lieu 355 360 365 Gln Ser Pro Glu Glu Phe Trp Ser Asn Ala Val Ala Ile Ser Ile Phe 37 O 375 38O Lys Asp Llys Val Arg Tyr Ile Ile Ala Arg Trp Gly Tyr Ser Thr His 385 390 395 4 OO Ile Lieu Ala Trp Glu Lieu. Ile Asn. Glu Ala Asp Lieu. Thir Thr Asn. Phe 4 OS 41O 415 Phe Ser Ala Arg Ser Ala Phe Val Ser Trp Val Lys Glu Ile Ser Ser 42O 425 43 O Tyr Ile Llys Ser Val Asp Pro Tyr Asn Arg Ile Val Thr Val Asn Lieu. 435 44 O 445 Ala Asp Tyr Asn. Ser Glu Pro Arg Val Trp Ser Val Glu Ser Ile Asp 450 45.5 460 Ile Ile Asn. Wal His Arg Tyr Gly Pro Glu Gly Phe Lys Asp Ile Ala 465 470 47s 48O Lieu Ala Ile Pro Ser Ile Val Glu Gly Lieu. Trp Asn. Thir Tyr Arg Llys 485 490 495 Pro Ile Ile Ile Thr Glu Phe Gly Val Asp Tyr Arg Trp Ile Gly Tyr SOO 505 51O Pro Gly Phe Lys Gly Thr Pro Tyr Trp Ala Tyr Asp Llys Ser Gly Val 515 52O 525 Gly Lieu. His Glu Gly Lieu. Trp Ser Ser Ile Phe Ser Leu Ser Pro Val 53 O 535 54 O Ser Ala Met Ser Trp Trp Trp Asp Thr Glin Ile Asp Ser Tyr Asn Lieu. 5.45 550 555 560 Trp Tyr His Tyr Lys Ala Lieu. Tyr Glu Phe Leu Lys Ser Val Asp Pro 565 st O sts Val Arg Gly Gly Lieu. Gly Lys Ala Arg Ala Ser Lieu Val Ile Thr Asp 58O 585 59 O Val Thr Pro Ser Ser Ile Thr Lieu. Tyr Pro Leu Ala Gly Trp Val Trp 595 6OO 605 Val Ser Pro Val Arg Glu Asn Arg Lieu Val Ile Arg Pro Asp Gly Ala 610 615 62O Ile Glu Gly Arg Val Asp Lieu. Lieu. Ser Gly Phe Ile Tyr Gly Thr Cys 625 630 635 64 O His Ser Glin Arg Thr Lieu. Asn Pro Val Phe Thr Val Met Phe Ile Asp 645 650 655 Arg Gly Arg Val Val Lieu. His Ile Asn. Ser Val Gly Arg Gly Ser Ala 660 665 67 O

Llys Lieu Val Ile Tyr Val Asn Gly Ser Lieu Ala Thr Glin Lieu. Asp Lieu. 675 68O 685

Pro Asp Lys Asp Gly Llys Ser Asp Gly Ser Ala Asn. Glu Tyr Asp Met 69 O. 695 7 OO

Asp Val Glu Lieu. Trp Phe Glu Pro Gly Thr Tyr Glu Ile Lys Ile Asp 7 Os 71O 71s 72O

Ser Glu Ala Cys Asp Trp Phe Thr Trp Asp Tyr Ile Val Phe Glu Asn 72 73 O 73

Ala Val Tyr Arg Ala Ala Lys Val Asp Lieu. Tyr Ala Lieu Ala Asn. Ser US 2015/021 1 037 A1 Jul. 30, 2015 50

- Continued

740 74. 7 O Thr Phe Ala Met Leu Trp Val Arg Asn Asp Tyr Asn Trp Trp Asn 7ss 760 765

Val Val Val Lieu. Asn Llys Thr Lieu. Glu Pro Ala Glu Gly Val Glu Val 770 775 78O

Glu Ile Arg Gly Lieu. Glin Asp Gly Val Arg Wall Glu Phe Trp Asp 78s 79 O 79.

Thr Cys Arg Gly Val Val Val Lys Ser Met Glu Wall Glin Wal Ser Asn 805 810 815

Gly Val Ala Arg Val Pro Val Gly Ser Wall Glu Asp Ile Ala Met 82O 825 83 O Lys Ile Thr Arg Ala Gly 835

<210s, SEQ ID NO 17 &211s LENGTH: 60 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223 OTHER INFORMATION: EBIS324142

<4 OOs, SEQUENCE: 17 Met Ala Met Arg Thr Gly Lieu Ala Lieu. Gly Ile Wall Ala Lieu. Ile Ala 1. 5 1O 15

Wall Ile Lieu. Ile Ala Wall Lieu. Lieu Ala Thr Gln Gln Glin Pro Thr Pro 2O 25

Thir Pro Ser Pro Thir Pro Thr Pro Ser Pro Thir Pro Thir Pro Thr Pro 35 4 O 45

Thir Pro Thr Pro Thir Pro Thr Pro Thir Pro Thir Pro SO 55 6 O

<210s, SEQ ID NO 18 &211s LENGTH: 41 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223 OTHER INFORMATION: EBIS327647

<4 OOs, SEQUENCE: 18 Lieu. Asn Llys Thr Val Ile Ala Ile Ala Wall Lieu. Lieu. Wal Wal Wall Ile 1. 5 15 Ala Ala Ala Lieu. Ile Tyr Val Ile Tyr Tyr Pro Thr Thr Pro Thir Thr 2O 25

Thir Thr Pro Thir Wall. Thir Thr Pro Wall 35 4 O

<210s, SEQ ID NO 19 &211s LENGTH: 1258 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO354844. O.1

<4 OOs, SEQUENCE: 19 Met His Arg Cys Arg Tyr Ser Ile Ser Lieu Val Trp Llys Glin Lieu. Ser 1. 5 15

Gly Arg Llys Lieu Ala Lieu. Thir Lys Ser Lys Ala Ser Ala Ser Arg Lieu. 2O 25 US 2015/021 1 037 A1 Jul. 30, 2015 51

- Continued

Arg Thr Arg Glin Ser Lieu. Trp Ala Lieu. Lieu. Ser Ser Val Lieu. Lieu. Gly 35 4 O 45 Phe Val Cys Ala Gly Ala Lieu. Glin Ala Glin Val Lieu. Glu Asp Ala Tyr SO 55 6 O Lieu. Glu Ser Gly Gly Ala Val Val Phe Glu Val Glu Ser Glu Ala Ala 65 70 7s 8O Val Ser Pro Trp Met Lieu. Asp Asn Ser Val Ala Gly Tyr Lys Gly Thr 85 90 95 Gly Tyr Phe Glu Gly Thr Ala Asp Tyr Phe Ser Thr Pro Gly Glin Gly 1OO 105 11 O Val Val Arg Tyr Pro Ile Lys Ile Thr Thr Ser Gly Arg Tyr Glin Leu 115 12 O 125 Gln Trp Arg Ser Arg Ile Asn Phe Gly Thr Glu Thir Ser Glu. His Asn 13 O 135 14 O Asp Ser Trp Ala Arg Lieu. Thir Asp Ala Asn Gly ASn Pro Val Ser Pro 145 150 155 160 Ala Ser Asn. Ser Asn. Wall Ala Asn. Ser Glin Trp Tyr Llys Val Tyr Val 1.65 17O 17s Gly Trp Thr Gly Trp Gln Trp Gly Ser Ser Asn Lys Asp Asin Asp Pro 18O 185 19 O Arg Ser Lieu. Ser Trp Asn Lieu. Thir Ala Gly Asp Tyr Tyr Tyr Val Glu 195 2OO 2O5 Ile Ser Val Arg Ser His Tyr His Ala Lieu. Asp Arg Ile Val Lieu. Trp 21 O 215 22O Asp His Asn Arg Lieu Ala Lieu Ala Asn. Thir Thir Thr Gly Lys Gly Ala 225 23 O 235 24 O Asn Asn. Ser Ala Lieu. Asp Ala Lieu Pro Val Ser Ala Ile Glu Val Glin 245 250 255 Glu Gly Pro Asp Val Glu Ile Thr Asp Pro Val His Gly Thr Thr Ile 26 O 265 27 O Val Pro Gly Gly Thr Val Thr Phe Thr Ala Ser Ala Ser Asp Ala Glin 27s 28O 285 Gly Ser Val Val Ser Val Glu Phe Phe Ala Gly Thr Thr Ser Leu Gly 29 O 295 3 OO Ile Asp Thr Ser Ala Pro Phe Ser Glin Ala Trp Ser Ser Ala Ala Glu 3. OS 310 315 32O Gly Val Tyr Glu Ile Thr Ala Leu Ala Thr Asp Asn Glu Gly Tyr Thr 3.25 330 335 Thir Thr Ser Ala Pro Ile Thr Lieu. His Val Ala Pro Ser Met Gly Ala 34 O 345 35. O Asn Gly Thr Val Ser Gly Glu Lieu Met Gln Trp His Llys Val Met Leu 355 360 365

Thr Phe Asp Gly Pro Gly. Thir Ser Glu Thir Ala Thr Pro Asn Pro Phe 37 O 375 38O

Arg Asp Tyr Arg Met Asp Val Thr Phe Thr Gly Pro Ser Ser Glin Ser 385 390 395 4 OO

Tyr Val Val Pro Gly Tyr Tyr Ala Ala Asp Gly Asn Ser Gly Glu Thr 4 OS 41O 415

Ser Lieu. Gly Ser Gly Asn Arg Trp Arg Val Ala Phe Ala Pro Asp Glu 42O 425 43 O US 2015/021 1 037 A1 Jul. 30, 2015 52

- Continued Ala Gly. Thir Trp Asn Tyr Ser Val Ser Phe Val Thr Gly Thr Asp Ile 435 44 O 445 Ala Ala Asp Lieu. Ser Gly Gly Ala Ser Ala Gly Phe Phe Asp Gly Ala 450 45.5 460 Thr Gly Thr Phe Ser Val Ala Ala Ser Asp Llys Ser Gly Ala Asp Lieu. 465 470 47s 48O Arg Ala Lys Gly Llys Lieu. Glu Tyr Val Gly Asp His Tyr Lieu. Glin Phe 485 490 495 Arg Asn Gly Glu Tyr Phe Ile Lys Gly Gly Ala Asn. Ser Pro Glu Val SOO 505 51O Lieu. Leu Glu Tyr Ser Gly Phe Asp Asn Thr Asp Ser Thr Arg Thr Tyr 515 52O 525 Ser Ala His Thr Ile Asn Trp Gln Leu Gly Asp Pro Thr Trp Lys Gly 53 O 535 54 O Gly Glu Gly Lys Gly Lieu Val Gly Val Ile Asn Tyr Lieu Ala Asp Lieu. 5.45 550 555 560 Gly Lieu. Asn. Ser His Tyr Phe Lieu. Lieu Met Asn. Ser Tyr Gly Asp Gly 565 st O sts Llys Lys Ala Phe Pro Phe Lieu. Gly Glu Asp Asp Ile Trp Arg Tyr Asp 58O 585 59 O Cys Ser Lys Lieu. Glu Gln Trp Asp Val Lieu. Phe Glu. His Phe Asp Arg 595 6OO 605 Lys Gly Met Met Met His Phe Val Met Thr Glu Gln Glu ASn Gln Gln 610 615 62O Lieu. Phe Glu Val Ala Asp Pro Ala Thr Val Glu Gly Gly Phe Ser Asp 625 630 635 64 O Ser Arg Arg Ile Tyr Phe Arg Glu Met Val Ala Arg Phe Gly His His 645 650 655 Met Ala Ile Thir Trp Asn. Ile Gly Glu Glu Asn Gly Trp Glu Lys Glin 660 665 67 O Thr Arg Pro Thr Ile Tyr Ala Gly Ala Cys Ser Asp Thr Glin Arg Lys 675 68O 685 Asp Phe Ser Asp His Lieu. Arg Ala Lieu. Lieu Pro Tyr Glu Asp His Ile 69 O. 695 7 OO Ser Ile His Asn Gly Pro Ser Ser Thr Asp Ala Ile Phe Asn Ala Lieu. 7 Os 71O 71s 72O Val Gly His Thr Ser Phe Thr Gly Pro Ala Phe Gln Trp Asn Ile Asn 72 73 O 73 Thir Asn. Ile Ala Ala Lys Thr Lys Glin Trp Arg Asp Ala Ser Ile Ala 740 74. 7 O Ser Gly His Llys Trp Val Phe Cys Met Asp Glu Pro Tyr Lieu. Gly Gly 7ss 760 765 Asn Pro Asn Asp Ala His Asp Thr Asn Arg Lys Glin Thr Lieu. Trp Pro 770 775 78O

Ala Tyr Met Ala Gly Ala Ala Gly Val Glu Trp Tyr Ile Gly Gly Gly 78s 79 O 79. 8OO

Gln Asp Leu Glin Val Glin Asp Tyr Thr Lieu. Tyr Glu Pro Leu Trp Thr 805 810 815

Glu Met Gly Tyr Ala Val Asp Lieu. Lieu. Glu Ile Ile Pro Phe His Ala 82O 825 83 O

Met Glu Pro Asn Asp Ala Lieu. Lieu. Thr Gly Glu Thr Gly Gly Ala Gly US 2015/021 1 037 A1 Jul. 30, 2015 53

- Continued

835 84 O 845

Glin Wall Luell Ala Asp Lell Gly Ala Ser Luell Ala Luell Pro Asn 850 855 860

Ala Thir Ala Ser Ala Ser Lell Asn Luell Ser Gly Glin Ser Gly Asn. Phe 865 88O

Asp Wall Met Trp Tyr Asp Pro Arg Asn Gly Gly Asp Lell Glin Met Gly 885 890 895

Ser Wall Ser Thir Wall Thir Gly Gly Gly Ile Arg Ser Lell Gly Ala Ala 9 OO 905 91 O

Pro Ser Ala Ser Ala Glu Asp Trp Luell Wall Luell Wall Phe Ala Glu Gly 915 92 O 925

Thir Met Pro Wal Met Pro Gly Asp Pro Wall Pro Pro Ala Luell Ser Tyr 93 O 935 94 O

Lell Glu Ile Trp Asn Glu Gly Phe Glu Asn Ala Asn Lell Gly Ala Thr 945 950 955 96.O

Ser Ala Ser Asn Ala Asp Lell Pro Gly Ala Wall Phe Gly Arg Asn 965 97.

Gly Luell Thir Ala Glu Wall Wall Asn Ala Pro Ala Gly Phe Ser Ser Ala 98O 985 99 O

Ser Gly Glin Wall Ile Ala Lell Ser Thir Thir Thir Asn Ala Tyr Ala Ala 995 1OOO 1 OOS

Ala Lys Arg Glin Glu Ser Ala Ile Asp Luell Ser Ala Lell Ser Lieu Lys 1010 1015

Ala Gly Asp Thr Tyr Arg Lieu. Ser Phe Asp Met Tyr Ile Pro Ser Pro 1025 103 O 1035 104 O

Lell Ser Thir Ala Val Gly Ala Ile Ser Phe Arg Trp Arg Thir Ala Thr 1045 1OSO 105.5

Ala Thir Gly Asin Gly Pro Thr Asp Ser Ser Glin Ala Thr Luell Ser Ala 106 O 1065

Gly Wall His Arg Ile Glu Tyr Thr Gly Thr Phe Pro Val Ile Asin Gly 1075 108O 1085

Ser Glu Ile Leu Pro Thir Ser Wall Glu Pro Phe Ile Met Phe His Glin 1090 1095 11OO

Asn Gly Val Ala A a Ser Glin His Val Tyr Lieu. Asp Asn. Ile Lieu. Phe 1105 111 O 1115 112 O

Glu Ile Glu Ser Pro Glin Leu Ser Gly Phe Glu Lys Phe Ala Asp Asp 1125 113 O 1135

Tyr Ala Lieu. Ile G y Gly Lys Thir Asp Asp Asp Asp Lieu. Asp Gly Glin 114 O 1145 1150

Thir Asn. Phe Met G u Phe Ala Thr Gly Gly ASn Pro Thr Asp Pro Ser 1155 1160 1165

Asp Ile Gly Lieu. I e Arg Val Ser Phe Asp Gly Asp Gly Asn Ala Arg 1170 1175 118O

Wall Ser Wall Pro G in Arg Ile Asp Gly Asn. Glu Lieu. Gly Lieu. Ser Tyr 1185 119 O 11.95 12 OO

Thr Val Tyr Asn Arg Thr Ser Lieu. Thr Glu Gly Ser Trp Ala Glu Lieu 12O5 121 O 1215

Ser Thir Asn Ala Ile Phe Thr Ser Ser Ile Glu Gly Thr Ala Glu Tyr 122 O 1225 1230

Glu Thr Tyr Gly Tyr Arg Phe Trp Val Gly Thr Gly Ser Phe Ser Asp 1235 124 O 1245 US 2015/021 1 037 A1 Jul. 30, 2015 54

- Continued

Arg Phe Phe Arg Val Glu Ile Ser Asp Asn 1250 1255

<210s, SEQ ID NO 2 O &211s LENGTH: 859 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO182O771.1

<4 OOs, SEQUENCE: 2O Met Met Lieu. Lieu. Arg Thr Val Val Trp Ala Gly Ala Lieu Val Lieu. Gly 1. 5 1O 15 Ser Val Phe Cys Val Ser Ala Ser Gly Ala Thr Arg Asn Lieu. Gly Ser 2O 25 3O Lys. Thir Thr Phe Ser Gly Glu Gln Lys Glin Trp His Llys Val Ser Lieu. 35 4 O 45 Thr Phe Ala Gly Pro Ser Thr Ser Glu Thir Asn Ser Val Asin Pro Phe SO 55 6 O Thir Asn Tyr Arg Lieu. Asn Val Thr Phe Llys His Ser Ala Ser Asn Arg 65 70 7s 8O Thir Lieu. Ile Val Pro Gly Tyr Phe Ala Ala Asp Gly Asn Ala Ala Asn 85 90 95 Thr Gly Ala Val Ser Gly Asp Llys Trp Arg Val Asp Phe Thr Pro Asp 1OO 105 11 O Ala Thr Gly. Thir Trp Thr Tyr Val Ala Ser Phe Arg Thr Gly Ser Asn 115 12 O 125 Val Ala Ala Ser Thr Ser Ala Thr Ala Gly Thr Ala Thr Ser Phe Asn 13 O 135 14 O Gly Glu Ser Gly Ser Phe Thir Ile Asp Pro Thr Asp Llys Thr Gly Ala 145 150 155 160 Asp Phe Arg Ala Lys Gly Arg Lieu. Arg Glu Val Gly Gln His Tyr Lieu 1.65 17O 17s Glin His Ala Gly Ser Lys Glu Tyr Phe Ile Llys Ser Gly Ala Gly Ser 18O 185 19 O Pro Glu Asn. Phe Lieu Ala Phe Ala Asp Phe Asp Asn. Thir Ser Ala Gly 195 2OO 2O5 Llys Lys Ile Lieu. His His Tyr Thr Ala His Lieu. Ser Ala Tyr Arg Ser 21 O 215 22O Gly Asp Pro Thir Trp Llys Ser Gly Lys Gly Lys Ala Ile Ile Gly Ala 225 23 O 235 24 O Lieu. Asn Tyr Lieu Ala Ser Llys Llys Val Asn. Ser Val Tyr Phe Lieu. Thir 245 250 255 Met Asn. Ile Gly Gly Asp Gly Asp Asp Val Phe Pro Phe Val Ser Lys 26 O 265 27 O

Thir Asp Arg Thr Arg Phe Asp Val Ser Lys Lieu Ala Glin Trp Glu Ile 27s 28O 285

Val Phe Ser His Met Asp Llys Lieu. Gly Ile Met Lieu. Asn Val Val Thr 29 O 295 3 OO

Glin Glu Glin Glu. Cys Asp Gln Lieu. Lieu. Asp Gly Gly Ser Lieu. Gly Asn 3. OS 310 315 32O Thir Arg Lys Ile Tyr Tyr Arg Glu Lieu Val Ala Arg Phe Gly His His 3.25 330 335 US 2015/021 1 037 A1 Jul. 30, 2015 55

- Continued

Lieu. Gly Val Thir Trp Asn Lieu. Gly Glu Glu Asn. Thir Asn. Thir Asp Ala 34 O 345 35. O Glin Arg Glu Ala Phe Ala Asp Tyr Lieu. Asn Ala Lieu. Asp Pro Tyr Phe 355 360 365 Ser Lieu. Ile Ala Val His Thr Tyr Pro Ser Glin Arg Asp Thir Ile Tyr 37 O 375 38O Thr Gly His Lieu. Gly Ser Glu Lieu. Ile Ser Gly Ala Ser Lieu. Glin Lieu. 385 390 395 4 OO Glu Ser Pro Ser Ile Val His Glu Gln Thr Lieu Lys Trp Val Lys Lys 4 OS 41O 415 Ser Ala Ala Ala Gly Ser Llys Trp Val Val Ser Val Asp Glu Lieu. Gly 42O 425 43 O Pro Ser Ser Ala Gly Val Val Pro Asp Ala Asn Asp Pro Ala His Glu 435 44 O 445 Thir Ile Val His Arg Val Lieu. Trp Gly Ser Lieu. Lieu Ala Gly Gly Ala 450 45.5 460 Gly Val Glu Trp Tyr Phe Gly Tyr Asn Tyr Pro Gln Thr Asp Lieu. Thr 465 470 47s 48O Lieu. Glu Asp Trp Llys Ser Arg Asp Llys Met Trp Thir Lieu. Thr Glin His 485 490 495 Ala Ala Glin Phe Met Arg Asp Tyr Met Pro Lieu Pro Lieu Val Ala Asn SOO 505 51O Tyr Asp Ser Ile Thr Ser Ser Thr Ser Asp Tyr Cys Phe Gly Llys Pro 515 52O 525 Gly Val Ala Tyr Ala Ile Tyr Lieu Pro Glin Gly Ala Ile Thr Asn Ile 53 O 535 54 O Thr Val Pro Ser Gly Glu Gly Tyr Thr Val His Trp Tyr Asn Pro Arg 5.45 550 555 560 Ala Gly Gly Ser Lieu. Glin Thr Gly. Thr Val Lys Ser Ile Ala Gly Gly 565 st O sts Thr Ala Ala Ile Gly Arg Pro Pro Thr Glin Glin Ser Glu Asp Trp Val 58O 585 59 O Ala Lieu. Leu Arg Arg Thr Ser Gly Thr Thr Thr Gly Ala Pro Ala Pro 595 6OO 605

Ala Pro Thr Glu Pro Thir Ser Thir Thir Ala Wall. Thir Glin Lieu. Thir Lieu. 610 615 62O Val Asn Ala Ser Thr Glu Lys Asp Lieu. Arg Ala Lieu. Thir Asn Gly Ser 625 630 635 64 O Thir Ile Thr Phe Gly Thr Asp Gly Lys Ala Lieu. Asn Val Arg Ala Thr 645 650 655 Thir Ser Gly Thr Val Gly Ser Val Ala Phe Ile Lieu. Asp Gly Glin Thr 660 665 67 O

Ile Glin Thr Glu Asn Met Ala Pro Tyr Thr Lieu Ala Gly Asp Ser Asn 675 68O 685 Gly Asp Tyr Ala Ser Trp Thr Pro Ser Val Gly Thr His Val Lieu Lys 69 O. 695 7 OO

Val Val Pro Tyr Ser Gly Arg Asp Arg Thr Gly Asn Ala Gly. Thir Ala 7 Os 71O 71s 72O

Lieu. Glin Val Ser Phe Thr Val Glin Ser Thr Ala Thr Glu Asp Ser Ser 72 73 O 73 US 2015/021 1 037 A1 Jul. 30, 2015 56

- Continued

Ser Ala Pro Val Val Ser Glu Pro Ala Ser Gly Ala Ser Val Thir Lys 740 74. 7 O Lieu. Thir Lieu. Ile Asn Ala Ser Thr Glu Lys Asp Lieu. Arg Ala Lieu. Thir 7ss 760 765 Asn Gly Ser Thr Ile Thr Phe Gly Thr Asp Gly Lys Ala Lieu. Asn Val 770 775 78O Arg Ala Glu Thir Ser Gly Thr Val Gly Ser Val Ala Phe Ile Lieu. Asp 78s 79 O 79. 8OO Gly Lys Thir Lieu. Arg Thr Glu Asn. Wall Ala Pro Tyr Thr Lieu Ala Gly 805 810 815 Asp Gly Thr Gly Asn Tyr Tyr Ser Trp Thr Pro Ser Val Gly Ser His 82O 825 83 O Thr Lieu Lys Val Val Pro Tyr Ser Gly Lys Asp Arg Thr Gly Thr Ala 835 84 O 845 Gly. Thir Ser Leu Glin Val Gly Phe Thr Val Lys 850 855

<210s, SEQ ID NO 21 &211s LENGTH: 606 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO3547883.1

<4 OOs, SEQUENCE: 21 Met Lys Asn. Thir Ile Arg Lieu. Thir Thr Lieu Ala Lieu. Lieu Ala Ala Ala 1. 5 1O 15 Glin Ala Ala His Ala Glu Ser Ile Ala Glu Val Ser Gly Ser Lieu. Arg 2O 25 3O Thir Trp His Llys Val Thr Lieu Ser Trp Asin Gly Pro Gln Thr Asn Glu 35 4 O 45 Lieu Ala Thr Pro Asn Pro Phe Thr Asp Tyr Arg Lieu. Asp Val Arg Phe SO 55 6 O Thr His Glin Glin Ser Gly. Thir Ser Tyr Lieu Val Pro Gly Tyr Tyr Ala 65 70 7s 8O Ala Asp Gly Asp Ala Ala Asn Thr Gly Ala Asp Ser Gly Ser Val Trip 85 90 95 Arg Val His Phe Ala Pro Asp Ala Ile Gly Asn Trp Asp Tyr Ala Val 1OO 105 11 O Ser Phe Arg Thr Gly Glu Ala Val Ala Met Ala Gln His Pro Glin Val 115 12 O 125 Gly Asp Gly Thr His Phe Asp Gly Asp Ser Gly Thr Lieu. Asn. Ile Arg 13 O 135 14 O Pro Ser Ala Glin Lys Ala Pro Asp Lieu. Arg Ala Lys Gly Arg Lieu. Glin 145 150 155 160 Tyr Val Gly Glu. His Tyr Lieu Lys Phe Ala Ala Ser Gly Glu Tyr Phe 1.65 17O 17s

Lieu Lys Glin Gly Ala Asp Ala Pro Glu Asn. Phe Lieu. Ser Tyr Lys Gly 18O 185 19 O Phe Asp Gly Asp Phe Llys Ser Asp Gly Ile Asn Asp His Lieu Val Lys 195 2OO 2O5 Asp Trp Glu Pro His Val Glin Asp Trp Lys Asp Gly Asp Pro Ser Trip 21 O 215 22O US 2015/021 1 037 A1 Jul. 30, 2015 57

- Continued Ala Asp Gly Glin Gly Lys Gly Ile Ile Gly Ala Val Asn Tyr Lieu Ala 225 23 O 235 24 O Ser Glu Gly Lieu. Asn Ala Phe Ser Phe Lieu. Thr Met Asn Ile Glu Gly 245 250 255 Asp Asp Arg Asn Val Phe Pro Tyr Thir Thr Tyr Lys Glu Arg Tyr Arg 26 O 265 27 O Met Asp Cys Ser Llys Lieu Ala Glin Trp Glu Val Val Phe Glu. His Ala 27s 28O 285 Asp Ser Lys Gly Met Phe Lieu. His Phe Llys Thr Glin Glu Thr Glu Asn 29 O 295 3 OO Glu Cys Lieu. Lieu. Asp Asn Gly Asp Thr Gly Pro Met Arg Arg Lieu. Tyr 3. OS 310 315 32O Tyr Arg Glu Lieu Val Ala Arg Phe Gly His His Lieu Ala Lieu. Asn Trp 3.25 330 335 Asn Lieu. Gly Glu Glu Asn Gly Lys Trp Asp Trp Pro Gly His Val Lys 34 O 345 35. O Glu. His Phe Glin Ser Thr Glu Glin Arg Glin Ala Met Ala Glin Trp Phe 355 360 365 Tyr Asp Asn Asp Pro Tyr Llys His His Lieu Val Ile His Asn Gly Glin 37 O 375 38O Ser Pro Asn Asp Lieu. Lieu. Gly Asp Ala Ser Lys Lieu. Thr Gly Phe Ser 385 390 395 4 OO Lieu Gln Thr Asn Lieu. Glu Asp Phe Ala ASn Val Pro Gly Thr Val Ala 4 OS 41O 415 Ser Trp Ile Arg Llys Ser Ala Glu Ala Gly Llys Pro Trp Ala Val Ala 42O 425 43 O Cys Asp Glu Pro Gly Asp Ala Ser His Ala Ile Arg Pro Asp Asp Asn 435 44 O 445 Ala Gly Ser Ser His Glu Asn Gly Arg Arg Asn Ala Lieu. Trp Gly Cys 450 45.5 460 Lieu Met Asn Glin Gly Tyr Gly Ser Glu Tyr Tyr Phe Gly Tyr Lys Asn 465 470 47s 48O Ala His Ser Asp Lieu. Thir Cys Asn Asp Tyr Arg Ser Arg Asp Llys Trip 485 490 495 Trp Asp Tyr Cys Arg Tyr Ala Lieu. Glu Phe Phe His Asn His Llys Val SOO 505 51O Ala Ile Trp Glu Lieu Ala Pro Ala His Llys Lieu. Ser Ser Asn. Ser Glu 515 52O 525 Ser Trp Cys Lieu Ala Lys Thr Gly Glu Thr Tyr Lieu. Ile Tyr Ile Llys 53 O 535 54 O Asp Gly Ala Thir Thr Asn Lieu. Asp Lieu. Ser Gly Asp Ser Gly Llys Phe 5.45 550 555 560

Glin Val Glin Trp Tyr Asp Thr Arg Lys Gly Gly Ser Lieu. Lieu Ala Gly 565 st O sts

Ser Glu Lys Lys Ile Lys Ala Gly Glu Ala Val Ser Ile Gly Llys Pro 58O 585 59 O

Pro Tyr Asp Pro Asp Arg Asp Trp Lieu Val Lieu Val Ser Lys 595 6OO 605

<210s, SEQ ID NO 22 &211s LENGTH: 1853 212. TYPE: PRT US 2015/021 1 037 A1 Jul. 30, 2015 58

- Continued <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO3547687.1

<4 OOs, SEQUENCE: 22 Met Met Lys Lieu. Lieu Gln Lieu. Phe Thr Lieu. Cys Lieu Lleu Ser Met Ala 1. 5 1O 15 Thir Phe Ala Glin Thr Ala Lieu. Gly Glin Asp Thr Val Asp Lieu. Ser Glin 2O 25 3O Lieu Pro Thr Ser Lieu. Thr Pro Gln Thr Ser Tyr Thr Val Ser Val Pro 35 4 O 45 Tyr Thr Ala Ser Val Asp Arg Asp Ile Ala Val Glu Phe Trp Llys Gly SO 55 6 O Gly Ala Trp Val Thr Ala Lys Thr Thr Thr Val Thr Ala Gly Ser Gly 65 70 7s 8O Thr Ala Ser Val Thr Lieu. Thir Lieu Ala Thr Ala Pro Val Glu Gly Thr 85 90 95 Asp Tyr Lieu. Trp Lys Ala Asn. Ile Arg Pro Val Gly. Thir Asp Trp Thr 1OO 105 11 O Glin Asn Lieu. Asn Gly Gly Val Val Glu Asn Val Val Val Ser Lieu Pro 115 12 O 125 Val Thr Glu Asp Thr Ile Asp Lieu. Thr Glu Lieu Pro Thr Ser Met Pro 13 O 135 14 O Pro Glin Ser Ser Tyr Thr Val Thr Val Pro Tyr Thr Ala Lieu. Glu Ser 145 150 155 160 Arg Asp Ile Ala Lieu. Ser Lieu. Tyr Lys Gly Gly Ile Trp Glin Thr Gly 1.65 17O 17s Lieu. Thr Glin Thr Val Ala Ala Gly Arg His Thr Ala Ser Phe Thr Lieu. 18O 185 19 O Asn Lieu. Gly Ser Glin Ala Ala Glu Asp Thr Asp Tyr Glu Trp Arg Cys 195 2OO 2O5 Gly Ile Arg Pro Val Gly Ala Asp Trp Thr Glin Asn Lieu. Asp Ala Gly 21 O 215 22O Thir Ile Asp Asn Val Val Val Ser Ser Gly Ser Ser Gly Gly Gly Ser 225 23 O 235 24 O Gly Asn Gly Ala Trp Ile Glu Ser Gly Gly Met Val Val Ile Glu Ala 245 250 255 Glu Asn Val Asp Lieu. Thir Ser Asp Trp Val Ala Arg Pro Ser Thr His 26 O 265 27 O Gly Ala Ala Asn Ala Met Gly Gly Ser Lieu. Gly Asp Gly Trp Lieu. Glu 27s 28O 285 Trp Thr Gly Ala Glin Tyr Tyr Gly Asn Thr Glin Thr Glu Ala Glin Ala 29 O 295 3 OO

Val Ala Ile Lieu. Thr Phe Glu Phe Glu Ile Thr Asn Pro Gly Asp Tyr 3. OS 310 315 32O

Tyr Phe Arg Trp Arg Ser Lys Glin Tyr Asn. Asn Val Gly Ser Gly Asp 3.25 330 335

Ala Gly Asn Asp Ser Tyr Val Ser Lieu. Thir Ser Gly Thr Pro Val Ala 34 O 345 35. O Gly Tyr Glin Asp Phe Gly Glin Phe His Llys Val Trp Val Glin Ser Glin 355 360 365

Glin Ala Trp Ser Trp Glin Thir Thr Phe Glu Pro His His Gly Glu. His US 2015/021 1 037 A1 Jul. 30, 2015 59

- Continued

37 O 375 38O Tyr Ala Asn. Asn Lieu Val Arg Arg His Tyr Glu Ala Gly. Thir His Thr 385 390 395 4 OO Ile Arg Lieu Ala Ala Arg Ser Pro Gly His Ala Ile Asp Arg Ile Val 4 OS 41O 415 Lieu. His Arg Thr Asp Val Pro Phe Asin Glin Ala Thr Phe Glu Ser Ala 42O 425 43 O Ala Glu Ser Glu Arg Ala Ala Gly Ile Gly Asp Thir Ile Thr Tyr Arg 435 44 O 445 Ala Thr Glu Asp Phe Pro Thr Lieu. Asn Ile Tyr Gly Thr Glu Ala Arg 450 45.5 460 Gly Thr Val Glin Val Asn Pro Gly Ala Gly Ala Val Asn Tyr Asp Asp 465 470 47s 48O Thr Val Phe Ala Ser Ala Thr Arg Thr Phe Asp Gly Pro Thr Gly Thr 485 490 495 Tyr Asp Ile Asp Lieu. Thir Thr Trp Val Glu Tyr Asp Gly Glu Ser Thr SOO 505 51O Tyr Arg Lieu. Lieu Val Asn Gly Ser Glin Val Ala Ser Tyr Glin Asn Pro 515 52O 525 Glin Val Thr Glu Ala Thr Asp Lieu. Thr Pro Asn Thr His Thr Trp Ser 53 O 535 54 O Asn. Ile Val Lieu. Thr Glin Gly Asp Ser Ile Thr Val Glin Ser Asn Ala 545 550 555 560 His Ser Asn Asn Ile Ile Pro Glu Ala Gly Pro Pro Asn Gly Phe Ala 565 st O sts Trp Ala Arg Gly Arg Trp Glu Glin Ile Glu Lieu. Thir Phe Val Ser Val 58O 585 59 O Asn Val Gly Ile Pro Thr Val Asp Ala Gly Pro Asp Glin Ser Val Ser 595 6OO 605 Thir Thr Glin Gly Ser Ala Thr Lieu. Asn Gly Thr Ala Ser Asp Asn Gly 610 615 62O Ser Ile Thr Asn Tyr Ala Trp Thr Glin Val Ser Gly Pro Asn Thr Ala 625 630 635 64 O Thr Lieu Ser Gly Glin Ser Thr Val Asp Lieu. Thir Ala Ser Asn Lieu. Ile 645 650 655 Ser Gly Thr Tyr Thr Phe Arg Lieu. Thr Val Thr Asp Asn Glu Ser Asn 660 665 67 O Thir Ala Ser Asp Asp Ala Ile Val His Val Val Ser Thr Gly Asn Gly 675 68O 685 Ala Val Ala Ile Thr Gly Asp Lieu Met Glin Trp His Asn Val Ile Lieu 69 O. 695 7 OO

Thr Met Asn Gly Pro Asn Ser Ser Glu Ser Ala Thr Pro Asn Pro Phe 7 Os 71O 71s 72O Lys Asp Tyr Arg Met Asn Val Thr Phe Thr His Pro Asn Ser Gly Lieu. 72 73 O 73

Ser Tyr Thr Val Pro Gly Tyr Phe Ala Ala Asp Gly Asn Ala Gly Glin 740 74. 7 O Thr Gly Ala Thir Ser Gly Gly Lys Trp Arg Ala His Lieu. Cys Pro Asp 7ss 760 765 His Ala Gly Gln Trp Thr Tyr Ser Val Ser Phe Arg Ser Gly Thr Asp 770 775 78O US 2015/021 1 037 A1 Jul. 30, 2015 60

- Continued

Val Ala Val Asn. Asn. Ser Lieu. Ser Ala Gly Thr Ala Phe Ala Gly Lieu 78s 79 O 79. 8OO Asp Gly Lys Thr Gly Ser Phe Thr Val Val Ala Thr Asn Llys Thr Gly 805 810 815 Arg Asp His Arg Gly Lys Gly Arg Lieu. Glin Tyr Asp Gly Thr Arg Tyr 82O 825 83 O Lieu Lys Phe Ala Gly Ser Gly Glu Ala Phe Lieu Lys Thr Gly Ala Asp 835 84 O 845 Ala Pro Glu Asn Phe Lieu. Asn Tyr Thr Glu Phe Asp Asn Thr Tyr Thr 850 855 860 His Gly Ala Asn Tyr Lieu Lys Asp Trp Ser Ala His Val Gly Asp Trp 865 87O 87s 88O Asn Ala Gly Asp Pro Thir Trp His Gly. Thir Lys Gly Lys Gly Ile Ile 885 890 895 Gly Ala Ile Asn Tyr Lieu Ala Ser Glu Gly Glin Asn Val Phe Ser Phe 9 OO 905 91 O Lieu. Thir Tyr Asn Ala Gly Gly Asp Ser Lys Asp Val Trp Pro Tyr Val 915 92 O 925 Ser His Thr Asn. Pro Lieu. Glin Phe Asp Cys Ser Llys Lieu. Asp Glin Trp 93 O 935 94 O Asp Ile Val Phe Ser His Gly Asp Llys Met Gly Met Tyr Lieu. His Phe 945 950 955 96.O Llys Thr Glin Glu Arg Glu Asn Asp Asp Lieu. Asp Gly Pro Gly Ser Ala 965 97O 97. Tyr Ala Lieu. Asp Gly Gly Asn Val Gly Thr Glu Arg Llys Lieu. Tyr Tyr 98O 985 99 O Arg Glu Lieu. Ile Ala Arg Phe Gly. His His Lieu Ala Lieu. Asn Trp Asn 995 1OOO 1 OOS Lieu. Gly Glu Glu Asn Thr Glin Ser Thir Ser Glin Arg Glin Ala Met Ala 1010 1 O15 1 O2O Gln Tyr Phe Arg Asp Thr Asp Pro Tyr Gly His Asn Ile Val Lieu. His 1025 103 O 1035 104 O Thr Tyr Pro Gly Glu Trp Glu Glin Val Tyr Arg Pro Leu Lieu. Gly Ser 1045 1OSO 105.5 Ala Ser Glu Lieu. Thr Gly Ala Ser Ile Glin Thr Asn Tyr Asn Thr Val 106 O 1065 1OO His Ser Arg Thr Lieu Gln Trp Lieu. Asn. Glu Ser Thr Ala Ala Gly Lys 1075 108O 1085 Val Trp Val Val Ala Asn Asp Glu Glin Gly Pro Ala Ser His Ala Asn 1090 1095 11OO Pro Pro Asp Asn Gly Trp Pro Gly Tyr Thr Gly Ser Thr Thr Pro Ser 1105 111 O 1115 112 O

Gln Lys Gln Met Arg Trp Gln Thr Val Trp Gly Asn Tyr Met Ala Gly 1125 113 O 1135

Gly Ala Gly Ile Glu Lieu. Tyr Ala Gly Tyr Glin Asn Pro Glin Ser Asp 114 O 1145 1150 Lieu. Thir Lieu. Asp Asp Phe Arg Ser Arg Asp Arg Met Trp Asp Tyr Cys 1155 1160 1165 Arg His Ala Asn Thr Phe Phe Thr Glu. His Leu Pro Phe Trp Glu Met 1170 1175 118O US 2015/021 1 037 A1 Jul. 30, 2015 61

- Continued

Ala Asn Ala Asn. Ser Lieu. Ile Gly Asn. Thir Ser Asn. Asn. Asn Asp Llys 1185 119 O 11.95 12 OO

Phe Ala Lys Thr Gly Glu Tyr Tyr Ala Ile Tyr Lieu Pro Asn 12O5 121 O 1215

Gly Gly Thir Thir Asn Lieu. Asn Lieu. Ser Gly Ala Thir Gly Thr Phe Asp 122 O 1225 1230

Ile Lieu. Trp Tyr Asp Pro Arg Asin Gly Gly Ala Lell Glin Ala Gly Thr 1235 124 O 1245

Wall Ser Ser Val Ile Gly Gly Ser Asn Val Ser Wall Gly Asn Ala Pro 1250 1255 126 O Ser Ser Thir Thir Asp Asp Trp Ala Ile Lieu Val Val Lys Glin Gly Lieu. 1265 127 O 1275 128O

Gly Thr Gly Lieu. Lieu Val Asp Ala Gly Ala Ala Lys Thir Ile Ile Lieu. 1285 129 O 1295

Pro Thir Asn. Glin Wall. Thir Lieu. Asn Gly Ser Ser Ser Asp Asp Gly Thr 13 OO 13 OS 1310

Ile Thr Ser Arg Lieu. Trp Thir Glin Ile Ser Gly Pro Asn. Thir Ala Ala 1315 132O 1325

Luell Ser Gly Glin Thr Ser Asn. Thir Lieu. Glin Ala Ser Ser Lieu. Ile Ala 1330 1335 134 O Gly Ser Tyr Val Phe Arg Lieu. Thr Val Thr Asp Asn Asp Ser Asn Thr 1345 1350 1355 1360

Ala Tyr Asp Gln Thr Thr Val Thr Val Glu Val Asp Ser Ala Pro Ser 1365 1370 1375

Ile Thr Thir Ser Ser Leu Pro Asp Gly Thr Val Ser Ala Ser Tyr Ser 1380 1385 1390

Gn. Thir Lieu Ala Ala Ser Gly Gly ASn Pro Glin Lieu. Ala Trp Ser Ile 1395 14 OO 14 Os

Ile Glu Gly Ser Leu Pro Thr Gly Lieu Ser Ile Asn Ser Ser Gly Val 1410 1415 142O

Ie Ser Gly Thr Pro Thr Ala Thr Gly Lieu Ser Val Phe Llys Val Glin 1425 1430 1435 144 O

Thr Glin Asp Ala Asn Gly Asp Thir Asp Asp Ala Wall Phe Ser Ile Llys 1445 1450 1455

Wall Wall Glu Wall. Thir Thir Ser Thr Lys Thr Phe Asn Pro Thr Asp Asp 1460 1465 1470

Ala Phe Ile Glu Trp Ser Thr Pro Tyr Asn Thr Thir Glin Lieu Lys Ile 1475 148O 1485

Glu Asin Gly Ser Arg Val Gly Tyr Met Llys Phe Asn Ile Thr Gly Ile 1490 1495 15OO

Thir Thr Glin Wall Glu Ser Ala Val Lieu. Ser Met Arg Val Ala Gly Asp 1505 1510 1515 152O

Ser Gly Asn Gly Thr Ile Arg Phe Tyr Lieu. Gly Ser His Asn Asn Trp 1525 153 O 1535

Thir Glu Ala Thir Ile Thr Thir Ala Asn Arg Pro Ala Lys Gly Ala Glin 1540 1545 1550

Val Gly Ser Met Thr Gly Ser Phe Ser Asn. Asn Thir Thr Tyr Glin Ala 1555 1560 1565

Asp Ile Thir Ser Met Leu Asn Gly Ser Gly Asp Gly Val Tyr Thr Lieu. 1570 1575 1580

Val Ile Glu Met Asp Ser Gly Gly Asn Asp Ala Trp Phe Ser Ser Thr US 2015/021 1 037 A1 Jul. 30, 2015 62

- Continued

1585 1590 1595 16OO Glu Gly Ala Asn Pro Pro Ser Lieu Val Val Asn Tyr Ser Asp Gly Ser 1605 1610 1615 Thir Asp Glu Ile Pro Val Ala Asn Ala Gly Ala Asp Lys Ala Ile Thr 162O 1625 1630 Lieu Pro Thr Asn. Glin Val Lieu. Ile Asin Gly Ser Gly Thr Asp Asp Gly 1635 164 O 1645 Ser Ile Ser Ser Tyr Ala Trp Ser Glin Val Met Gly Pro Asn Thr Ala 1650 1655 1660 Ser Lieu. Ser Gly Ala Phe Ser Ala Lys Lieu. Ile Ala Thr Gly Lieu. Ile 1665 1670 1675 168O Ala Gly Glu Tyr Ala Phe Val Lieu. Thr Val Thr Asp Asn. Thir Ala Asn 1685 1690 1695 Glu Asp Ser Asp Met Val Ile Val Val Val Asn Pro Ala Val Gly Ser 17OO 1705 1710 Gly Ser Ala Tyr Thr Asn Trp Ala Ser Asn. Glin Phe Ala Gly Lieu. Ser 1715 172O 1725 Gly Gly Ala Thr Asn Pro Lieu Ala Ala Phe Asp Ala Ser Tyr Met Gly 1730 1735 1740 Asn Gly Lieu Pro Asn Gly Lieu. Ile Tyr Ala Met Gly Gly Asn Pro His 1745 1750 1755 1760 Glu Ala Asn. Asn Asp Ile Arg Ala Met Lieu Pro Glu Ala Arg Gly Asp 1765 1770 1775 Arg Val Glu Phe Thr Lieu Pro Asp Ser Ile Pro Ala Gly Val Ser Val 1780 1785 1790 Arg Lieu. Tyr Glin Ala Ser Asp Lieu. Thir Ala Val Ser Pro Trp Ser Glu 1795 18 OO 1805 Thr His Val Arg Asn Ser Asn Gly. Thir Trp Thr Pro Ser Leu Ser Ser 1810 1815 182O Ser Ala Asn Gly Asp Gly. Thir Ser Thr Phe Thr Lieu Pro Leu Gly Gly 1825 1830 1835 184 O Gly Ser Thr Gly Phe Tyr Lieu. Lieu. Asp Phe Ser Ala Glu 1845 1850

<210s, SEQ ID NO 23 &211s LENGTH: 648 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO1818722.1

<4 OOs, SEQUENCE: 23 Met Arg Ile Arg His Ser Ser Ile Cys Ala Lieu Ala Ser Ala Ala Ile 1. 5 1O 15

Tyr Ala Val Phe Thr Pro Ala Ala Ala Gly Ala Ala Ala Lieu Val Ala 2O 25 3O Gly Lys Lieu. Glu Gln Trp His Lys Ile Thir Lieu. Ser Ile Asp Gly Pro 35 4 O 45

Glu Ala Arg Glu Thir Asp Thir Ser Pro Asn. Pro Phe Lieu. Asp Tyr Arg SO 55 6 O

Met Asp Val Thr Phe Thr His Glu Ser Gly Ala Pro Ser Tyr Arg Val 65 70 7s 8O

Pro Gly Tyr Phe Ala Val Asp Gly Asn Ala Ala Glu Thir Ser Ala Phe US 2015/021 1 037 A1 Jul. 30, 2015 63

- Continued

85 90 95 Ala Gly Arg Ile Trp Arg Ala His Lieu Ala Pro Asp Llys Pro Gly Met 1OO 105 11 O Trp Arg Tyr Ala Val Ser Phe Arg Arg Gly Pro Glu Val Ala Val Ser 115 12 O 125 Thir Lieu. Glu Ala Gly Ala Pro Val Asp Gly Cys Asp Gly Ile Ser Gly 13 O 135 14 O Glu Phe Thr Val Val Pro Thr Asp Llys Thr Gly Arg Asp Phe Arg Ala 145 150 155 160 His Gly Arg Lieu. Asp Tyr Val Gly Gly Arg Tyr Lieu. Arg Phe Ala Gly 1.65 17O 17s Ser Gly Glu Tyr Phe Lieu Lys Val Gly Ala Asp Ser Pro Glu Asn Lieu. 18O 185 19 O Lieu. Gly Tyr Ser Asp Phe Asp Gly Thr Arg Ser Asn Llys Pro Gly Thr 195 2OO 2O5 Pro Ala Arg Pro Asp Glu Ala Ala Pro Pro Ser Lieu Lleu Lys Thir Trp 21 O 215 22O Gln Pro His Val Arg Asp Trp Arg Glu Gly Asp Pro Thr Trp Gln His 225 23 O 235 24 O Arg Lys Gly Lys Gly Lieu. Ile Gly Ala Lieu. Asn Tyr Lieu Ala Ser Thr 245 250 255 Gly Cys Asn Ala Phe Ser Phe Lieu. Thir Tyr Asn Ala Gly Gly Asp Gly 26 O 265 27 O Asp Asp Val Trp Pro Phe Val Glu Arg Asp Asp Pro Lieu. His Phe Asp 27s 28O 285 Cys Ser Lys Lieu. Asp Glin Trp Glin Ile Ile Phe Asp His Ala Thr Ala 29 O 295 3 OO Lieu. Gly Lieu. His Lieu. His Phe Llys Lieu. Glu Glu Thr Glu Asn Asp Asp 3. OS 310 315 32O Asn Arg Pro Gly Gly Asp Gly Glin Ile Gly Asp Val Pro Thr Ala Lieu 3.25 330 335 Asp Arg Gly Lys Thr Gly Val Glu Arg Llys Lieu. Tyr Lieu. Arg Glu Lieu 34 O 345 35. O Ile Ala Arg Phe Ala His Glu Lieu Ala Lieu. Asn Trp Asn Lieu. Gly Glu 355 360 365

Glu Asn. Thir Lieu. Ser Thr Glu Glin Glin Glin Ala Met Ala Ala Phe Ile 37 O 375 38O Arg Asp Thr Asp Pro Tyr His His Pro Ile Val Lieu. His Thr Phe Pro 385 390 395 4 OO Asp Trp Glin Glu Arg Val Tyr Arg Pro Lieu. Lieu. Gly Asp Arg Ser Ala 4 OS 41O 415 Lieu. Thr Gly Val Ser Leu Gln Thr Gly Trp Glu Gln Ser His Arg Arg 42O 425 43 O

Val Lieu Gln Trp Ile Glu Glu Ser Ala Ala Ala Gly Lys Glin Trp Val 435 44 O 445

Val Ala His Asp Glu Glin Asn Pro His Tyr Thr Gly Val Pro Pro Asp 450 45.5 460

Thr Gly Trp Glu Gly Phe Asp Gly Thr Ala Arg Pro Glu Lys Tyr Ser 465 470 47s 48O Arg Pro Tyr Thr Ala Asp Asp Val Arg Llys His Thr Lieu. Trp Gly Ser 485 490 495 US 2015/021 1 037 A1 Jul. 30, 2015 64

- Continued

Lieu. Leu Ala Gly Gly Ala Gly Val Glu Tyr Tyr Phe Gly Tyr Thr Lieu. SOO 505 51O Pro Glin Asn Asp Lieu. Gly Ala Glin Asp Trp Arg Ser Arg Ala Glin Ser 515 52O 525 Trp Llys Trp Cys Asp Lieu Ala Lieu. Arg Phe Phe Arg Glu Asn Ala Ile 53 O 535 54 O Pro Phe Trp Asn Met His Asn Ala Asp Glu Lieu Val Gly Asn Pro Ser 5.45 550 555 560 His Asp Asn Ser Arg Tyr Cys Phe Ala Glin Pro Gly Glu Ile Tyr Val 565 st O sts Val Tyr Lieu Pro Asn Gly Gly Ser Ala Glu Lieu. Asp Lieu. Gly Arg Gly 58O 585 59 O Ala Asp Gly Ala Thr Phe Glin Val Arg Trp Phe ASn Pro Arg Glu Gly 595 6OO 605 Gly Pro Lieu. Glin Ser Gly Asn Val Ser Glu Val Arg Gly Ser Gly Arg 610 615 62O Val Ser Lieu. Gly Glu Pro Pro Ala Asp Ala Ala Ala Asp Trp Val Val 625 630 635 64 O Lieu Val Ala Arg Ala Pro Arg Pro 645

<210s, SEQ ID NO 24 & 211 LENGTH: 1043 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: NP 869354.1

<4 OOs, SEQUENCE: 24 Met Val Ala Pro Ile Thr Pro Ser Arg Ser Pro Asn Thr Met Glin Val 1. 5 1O 15 Cys Arg Lieu. Arg Llys Phe Lieu. Thir Lieu. Glin Thr Val Phe Ala Lieu Ala 2O 25 3O Val Thir Ala Thir Trp Cys Val Ser Val Ala Ala Glin Llys Pro Asp Ala 35 4 O 45 Val Phe Thr Glu Ala Asn Gly Phe Lieu Lys Val Glu Ala Glu Asp Phe SO 55 6 O Ala Ser Glin Thr Asn. Thir Asp Lys Arg Ala Phe Tyr Lieu. Thir Thr Ala 65 70 7s 8O Glu Ser Ala Pro Ser Val Glin Pro Asp Gly Asp Pro Ser His Ala Ser 85 90 95 Asp Ala Ser Gly Gly Ala Tyr Lieu. Glu Ile Lieu Pro Asp Thr Arg Arg 1OO 105 11 O Thr His Ala Asp Llys Lieu. Ile His Gly Thr Asn Phe Ser Pro Gln Pro 115 12 O 125

Gly Lys Met Ala Val Lieu. Thir Tyr Arg Val Asn Val Glin Thr Pro Gly 13 O 135 14 O

Arg Tyr Tyr Val Trp Val Arg Ala Tyr Ser Thr Gly Ser Glu Asp Asn 145 150 155 160 Gly Lieu. His Val Gly Ile Asp Gly. Thir Trp Pro Glu Ser Gly Glin Arg 1.65 17O 17s Lieu. Glin Trp Cys Glin Gly Llys His Ser Trp Tyr Trp Asp Ser Lys Glin 18O 185 19 O US 2015/021 1 037 A1 Jul. 30, 2015 65

- Continued

Arg Thr Glu Ala Gln His Cys Gly Glu Pro Gly Lys Ile Phe Lieu. Asp 195 2OO 2O5 Ile His Glu Pro Gly Glu. His Lys Ile His Phe Ser Met Arg Glu Asp 21 O 215 22O Gly Phe Glu Phe Asp Gln Trp Leu Met Thr Thr Asp Ser Ser Phe Glin 225 23 O 235 24 O Arg Pro Pro Ala Gly Llys Ser Asn Llys Pro Lys Glu Asn Ala Thir Thr 245 250 255 Glin Val Lieu. Ser Lieu Pro Ala Lys Glu Phe Glu Phe Llys Ser Gly Gly 26 O 265 27 O Tyr Tyr Lieu. Asp Glin Gly Llys Trp Lieu Ala Ile ASn Pro Asp Arg Asn 27s 28O 285 Gln Ser Ala Ala Ala Lys Llys Val Phe Pro Phe Pro Ser Gly Arg Tyr 29 O 295 3 OO Asp Val Thir Lieu Lys Ala Val Gly Glu Asn Asp Gly Glin Ser Thr Tyr 3. OS 310 315 32O Ser Val Ser Ala Asp Lys Glu Ser Ile Gly Ser Phe Thr Cys Pro Met 3.25 330 335 Ala Asp Gln Thr Phe Ala Glu Gly Lys Glin Phe His Glin Thr Phe Ala 34 O 345 35. O Asn Val Glin Ile Thr Glu Gly Ala Glu Lieu. Glu Val Ala Ser Lys Ile 355 360 365 Ala Ser Ala Asp Gly Ala Glu Tyr Ser Arg Ala Arg Trp Ser Glu Lieu 37 O 375 38O Thir Phe Thr Pro Ala Asn. Glu Ala Thr Ala Lys Ala Ala Ala Asn. Phe 385 390 395 4 OO Ala Lys Glu Asn His Lieu Val Ala Ala Lys Thir Ser Ala Glu Ser Asn 4 OS 41O 415 Asp Arg Thr Gly Ser Pro Thr Llys Pro Val Ser Asp Gln Pro Leu Gln 42O 425 43 O Met Pro Arg Glu Lys Asp Gly Asp Gly Ser Val Glin Val Thr Gly Glu 435 44 O 445 Lys Arg Met Trp His Llys Val Thr Val Thr Lieu. Asn Gly Pro Tyr Ala 450 45.5 460 His Glu Glin Asp Asn Thr Pro Asn Pro Tyr Lieu. Asp His Arg Met Glu 465 470 47s 48O Val Glu Phe Llys His Glu Ser Gly Lys Glin Tyr Lieu Val Pro Gly Tyr 485 490 495 Phe Ala Ala Asp Gly Asn Ala Ala Asn. Thir Ser Ala Glu Ser Gly Thr SOO 505 51O Gln Trp Arg Ala His Phe Ala Pro Asp Glu Thr Gly Glu Trp Thr Tyr 515 52O 525 Thr Val His Phe Ala Thr Gly Lys Asp Ala Ala Ile Asp Arg Asp Ala 53 O 535 54 O

Ser Ala Lys Thr Val Ala Ala Phe Asin Gly Lys Thr Gly Thr Phe Asn 5.45 550 555 560 Val Ala Lys Thr Asn Llys Ser Gly Arg Asp Phe Arg Ala His Gly Arg 565 st O sts Lieu. Arg Tyr Val Asn Glin Ser His Lieu. Glin Phe Ala Gly. Thr Gly Glin 58O 585 59 O US 2015/021 1 037 A1 Jul. 30, 2015 66

- Continued Tyr Phe Lieu Lys Ala Gly Ala Asp Ala Pro Glu Thir Lieu. Lieu. Gly Tyr 595 6OO 605 Ala Glu Phe Asp Gly. Thr Val Ala Gly Llys Pro Gly Llys Val Pro Lieu 610 615 62O Llys Llys Tyr Glu Pro His Lieu. Gly Asp Trp Arg Arg Gly Asp Pro Thr 625 630 635 64 O Trp Lys Asp Gly Glin Gly Lys Gly Lieu. Ile Gly Ala Val Asn Tyr Lieu. 645 650 655 Ser Ser Lys Gly Cys Asn Ala Phe Ser Phe Lieu. Thr Tyr Asn Ala Gly 660 665 67 O Gly Asp Gly Asp Asn Val Trp Pro Phe Ile Glin Arg Asp Asp Llys Lieu. 675 68O 685 His Tyr Asp Cys Ser Llys Lieu. Asp Gln Trp Gly Ile Val Phe Asp His 69 O. 695 7 OO Gly Thr Glu Asn Gly Met Tyr Lieu. His Phe Llys Lieu Gln Glu. Thr Glu 7 Os 71O 71s 72O Asn Asp Asp His Arg Glin Gly Glin Lys Ala Lys Gly Phe Llys Pro Glu 72 73 O 73 Ser Lieu. Asp Gly Gly Llys Lieu. Gly Ser Glin Arg Llys Lieu. Tyr Lieu. Arg 740 74. 7 O Glu Ile Ile Ala Arg Phe Gly His Asn Lieu Ala Lieu. Asn Trp Asn Lieu. 7ss 760 765 Ala Glu Glu Thir Thr Glin Thr Thr Asp Glu. His Lieu Ala Met Lieu. ASn 770 775 78O Tyr Ile Glu Glu Met Asp Pro Tyr Gly His His Arg Val Lieu. His Thr 78s 79 O 79. 8OO Tyr Pro Gly Glu Glin Asp Llys Llys Tyr Asp Pro Lieu. Lieu. Gly Asp Llys 805 810 815 Ser Asn Lieu. Thr Gly Val Ser Lieu. Glin Asn. Ser His Ile Lys Asp Thr 82O 825 83 O His Trp Glin Thr Val Llys Trp Ser Glu Lys Ala Arg Glu Ala Gly Lys 835 84 O 845 Pro Trp Val Val Ala Phe Asp Glu Ser Gly Ser Ala Ala His Gly Glin 850 855 860 Cys Pro Asp Lieu. Gly Tyr Arg Gly Tyr Asp Gly Arg Asp Llys Thr Gly 865 87O 87s 88O Lys Met Thr Tyr Thr Gln His Glu Val Arg Lys Glin Thr Lieu. Trp Gly 885 890 895 Asn Phe Met Gly Gly Gly Gly Gly Val Glu Tyr Tyr Phe Gly Tyr Glin 9 OO 905 91 O Tyr Asp Glu Asn Asp Lieu. Gly Cys Glu Asp Trp Arg Ser Arg Asp Glin 915 92 O 925

Ser Trp Asp Ala Cys Arg Val Ala Ile Glu Phe Phe Glin Asn. Asn Ala 93 O 935 94 O

Val Pro Phe Trp Glu Met Val Asn Ala Asp Glu Lieu Val Gly Asn. Glu 945 950 955 96.O Llys His Asp Asn. Ser Llys Tyr Cys Lieu Ala Lys Ala Gly Glu Ala Tyr 965 97O 97.

Val Val Tyr Lieu Pro Asn Gly Gly Thr Thr Ser Ile Asp Leu Ser Asp 98O 985 99 O Ala Asp Gly Glu Phe Glin Val His Trp Tyr Asn Ala Arg Ile Gly Gly US 2015/021 1 037 A1 Jul. 30, 2015 67

- Continued

995 1OOO 1 OOS Asp Leu Glin Ser Gly Ser Val Lys Thr Val Ser Gly Gly Gly Ser Val 1010 1 O15 1 O2O Glu Ile Gly Glin Pro Pro Ala Asp Ala Asp Glin Asp Trp Ala Val Lieu. 1025 103 O 1035 104 O Lieu. Arg Llys

<210s, SEQ ID NO 25 &211s LENGTH: 116 O 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO3195709. 1

<4 OOs, SEQUENCE: 25 Met Ser Pro Arg Asn Lieu Lleu Lieu. Ser Lieu. Thir Lieu. Phe Val Phe Ala 1. 5 1O 15 Thir Ala Gly Lieu. Arg Ala Glin Gly Glin Val Thr Gly Glu Lieu. Glin Lys 2O 25 3O Trp. His Arg Ile Glin Ile Leu Phe Asp Gly Pro Gln Thr Ser Glu Ser 35 4 O 45 Ala Ser Glin Asn Pro Phe Lieu. Asn Tyr Arg Lieu. Asn Val Lieu. Phe Thr SO 55 6 O Ala Pro Asp Gly Arg Glu Phe Thr Val Pro Gly Phe Phe Ala Ala Asp 65 70 75 8O Gly Asn Ala Ala Glu Ser Ser Ala Thir Ser Gly Asn Llys Trp Ala Val 85 90 95 Arg Phe Ser Pro Asp Glin Val Gly. Thir Trp Thr Tyr Thr Ala Ser Phe 1OO 105 11 O Arg Thr Gly Asp Glu Val Ala Ile Ser Lieu. Asp Pro Asn Ala Gly Thr 115 12 O 125 Ala Thr Gly Phe Asp Gly Ala Ser Gly Ser Phe Glin Ile Gly Lieu. Ser 13 O 135 14 O Thir Lys Ser Ala Pro Asp Asn Arg Ser Lys Gly Arg Lieu. Glu Tyr Val 145 150 155 160 Gly Glu Arg Tyr Lieu. Arg Phe Arg Glu Asn Gly Thr Tyr Phe Lieu Lys 1.65 17O 17s Ala Gly Ala Asp Ser Pro Glu Asn Lieu. Lieu Ala Tyr Ala Asp Phe Asp 18O 185 19 O Asn Thr Val Ala Ser Lys Thir Trp Ser Pro His Leu Gly Asp Trp Glin 195 2OO 2O5 Glin Gly Asp Ala Glu Trp Lys Asn Gly Lys Gly Arg Ala Lieu. Ile Gly 21 O 215 22O

Ala Val Asn Tyr Lieu Ala Ser Lys Gly Met Asn Ala Phe Ser Phe Lieu 225 23 O 235 24 O

Thr Met Ser Val Ile Gly Asp Gly Lys Asp Val Trp Pro Trp Val Ser 245 250 255 Thir Thr His Ser Gly Lieu. Asp Glu Pro Gly Gly Glin Asp Ala Ala Asn 26 O 265 27 O

Arg Lieu. Arg Tyr Asp Val Ser Lys Lieu. Glu Gln Trp Glu Ile Lieu. Phe 27s 28O 285

Gln His Ala Asp Ser Lys Gly Met Phe Lieu. His Phe Lys Thr Glin Glu 29 O 295 3 OO US 2015/021 1 037 A1 Jul. 30, 2015 68

- Continued

Glu Glu Asn Asp Arg Lieu. Lieu. Asp Gly Gly Glu Lieu. Gly Val Glin Arg 3. OS 310 315 32O Llys Lieu. Tyr Tyr Arg Glu Lieu Val Ala Arg Phe Gly His His Lieu Ala 3.25 330 335 Lieu. Asn Trp Asn Lieu. Gly Glu Glu Asn Asp Lieu. Tyr Asp Glu Lieu. Gly 34 O 345 35. O Asp Thr Asn. Asn. Thir Arg Val Arg Ala Tyr Ala Ser Tyr Ile Llys Ser 355 360 365 Lieu. Asp Pro Tyr Asn His His Ile Val Ile His Ser Tyr Pro Asn Ser 37 O 375 38O Glin Ser Glu Lieu. Tyr Glu Pro Lieu. Lieu. Gly Asp Ser Asp Lieu. Thr Gly 385 390 395 4 OO Pro Ser Lieu. Glin Ile Glin Ile Asin Asn. Ile His Arg Asp Wall Lys Arg 4 OS 41O 415 Trp Ile Asn Asp Ser Lys Ala Ser Gly Lys Gln Trp Val Val Thr Asn 42O 425 43 O Asp Glu Glin Gly Asp His Thir Thr Gly Val Ala Ala Asp Ala Ser Tyr 435 44 O 445 Gly Gly Asp Llys Gly Ser Arg Gly Asp Asn Arg Ser Asp Val Arg His 450 45.5 460 Llys Thr Lieu. Trp Gly Thr Lieu Met Ala Gly Gly Ala Gly Val Glu Tyr 465 470 47s 48O Tyr Phe Gly Tyr Glin Thr Gly Val Thr Asp Lieu. Thir Ala Glu Asp Trp 485 490 495 Arg Ser Arg Llys Thr Llys Trp Glu Asp Ala Lys Lieu Ala Lieu. Asp Phe SOO 505 51O Phe Asn Asp Tyr Lieu Pro Phe Trp Ala Met Glu Ser Arg Asp Glu Lieu. 515 52O 525 Ile Ser Lys Ser Gly Ser Tyr Cys Phe Ala Lys Thr Gly Glu Ile Tyr 53 O 535 54 O Val Val Tyr Ile Pro Ser Ser Gly Thr Glu Ser Lieu. Asn Lieu. Ser Gly 5.45 550 555 560 Val Ser Gly Thr Tyr Ser Val Arg Trp Tyr Asn Pro Arg Ser Gly Gly 565 st O sts Ser Lieu Lys Glin Gly Ser Val Ala Thir Ile Asin Gly Gly Gly Val Arg 58O 585 59 O Asn Lieu. Gly Thr Ala Pro Thr Asp Thr Gly Ala Asp Trp Val Ala Lieu 595 6OO 605 Val Glu Lys Thir Ser Asp Ser Gly Gly Asp Gly Gly Gly Thr Gly Asn 610 615 62O Cys Glu Ala Asp Phe Glu Glu Glin Asn Gly Arg Val Ile Ile Glu Ala 625 630 635 64 O

Glu Asn Lieu. Asp Lieu Ala Glin Gly Trp Asn. Thr Gly Asn. Ser Phe Ala 645 650 655

Asp Ala Thr Gly Ser Gly Tyr Ile Val Trp Llys Gly Gly Asn Ser Phe 660 665 67 O Ser Ser Pro Gly Asn Gly. Thir Ile Ser Thr Ser Ile Val Ile His Thr 675 68O 685 Pro Gly Thr Tyr Arg Phe Glu Trp Arg Asn Llys Val Gly His Gly Thr 69 O. 695 7 OO US 2015/021 1 037 A1 Jul. 30, 2015 69

- Continued Asn Ser Thr Glu Ala Asn Asp Ser Trp Val Arg Phe Pro Asp Ala Asp 7 Os 71O 71s 72O Asp Phe Tyr Gly Glu Lys Asn Gly Ser Arg Val Tyr Pro Lys Gly Ser 72 73 O 73 Gly Lys Thr Pro Asn Pro Ala Gly Ala Ser Ala Asp Gly Trp Phe Lys 740 74. 7 O Val Tyr Lieu Ser Gly Thr Thr Asp Trp Thir Trp Ser Thr Asn Thr Ser 7ss 760 765 Asp His Asp Ala His Glin Ile Tyr Ala Glu Phe Asp Thr Pro Gly Val 770 775 78O Tyr Thr Lieu. Glin Ile Ser Gly Arg Ser Asn Asp His Lieu. Ile Asp Arg 78s 79 O 79. 8OO Ile Thr Lieu Ala Lieu Ala Gly Glin Ser Ala Thr Asp Lieu. Ser Lieu. Gly 805 810 815 Glu Thir Lieu. Cys Glu Gly Gly Ser Glu Thr Val Ala Val Thr Gly Val 82O 825 83 O Thr Val Thr Pro Gly Asp Ala Thr Lieu. Lieu. Ile Gly Glu Thir Lieu. Glin 835 84 O 845 Phe Thr Ala Ala Val Lieu Pro Ala Asp Ala Thr Asn Llys Ser Val Ser 850 855 860 Trp Ser Ser Ser Asp Pro Ser Val Ala Ile Val Ser Gly Asin Gly Thr 865 87O 87s 88O Val Glin Ala Lieu Ser Glu Gly Glin Val Glu Ile Thr Ala Thir Thr Ala 885 890 895 Asp Gly Asn. Phe Thr His Ser Ala Lieu. Lieu. Thr Val Glu Ala Ala Asp 9 OO 905 91 O Pro Pro Gly Gly Asp Thr Gly Asp Gly Gly Ser Gly Glu Asp Pro Gly 915 92 O 925 Asn Asp Gly Gly Ser Gly Glu Asp Pro Gly Asn Asp Gly Gly Ser Gly 93 O 935 94 O Glu Asp Pro Gly Asn Asp Gly Gly Ser Gly Glu Asp Pro Asp Gly Asp 945 950 955 96.O Gly Ser Gly Glu Glu Pro Gly Asp Gly Ala Ala Glin Ala Ala Ile Glin 965 97O 97. Ala Val Asp Val Arg Glin Thr Glu Gly Arg Pro Lieu. Ile Phe Glu Phe 98O 985 99 O Ala Lieu. Ser Glin Pro Val Ser Glu Arg Ile Val Lieu. Glu Lieu. Glu Phe 995 1OOO 1 OOS Val Asp Ile Thr Thr Glu Gln Ser Asp Tyr Val Val Ser Glu. Thr Glu 1010 1 O15 1 O2O Lieu Val Phe Glu Pro Gly Ser Glin Glin Ala Phe Lieu. Glu Val Arg Thr 1025 103 O 1035 104 O

Ile Ser Asp Lieu Lys Thr Glu Glu Asp Glu Ser Phe Glin Ile Llys Val 1045 1OSO 105.5

Val Arg Val Val Ser Gly Glin Val Thr Val Pro Asp Ile Leu Ala Thr 106 O 1065 1OO

Gly. Thir Ile Lieu. Asp Asp Asp Arg Asp Met Llys Val Ser Pro Asn Pro 1075 108O 1085

Ala Thr Ser Tyr Ser Lieu Val Glin Met Ser Asn Val Glin Glu Gly Thr 1090 1095 11OO

Tyr Glu Lieu. Glu Ile Phe Ala Ala Ser Gly His Lieu Met Glin Arg Glu US 2015/021 1 037 A1 Jul. 30, 2015 70

- Continued

1105 111 O 1115 112 O Thr Val Thir Ala Asp Gly Ser Gly Ile Ala Ser Val Thir Lieu Ala Gly 1125 113 O 1135 Met Ala Lys Gly Lieu. Tyr Ile Val Llys Lieu. Thr Gly Ile Asp Tyr Ala 114 O 1145 1150 Tyr Thr Ala Lys Met Lieu Val Lys 1155 1160

<210s, SEQ ID NO 26 &211s LENGTH: 528 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: YP OO3323724.1

<4 OOs, SEQUENCE: 26 Met Ser Ile Arg Ser Leu Pro Arg Arg Thr Val Gly Glu Trp Glu Val 1. 5 1O 15 Thir Ser Thr Arg Glu Tyr Glu Asn Pro Phe Val Asp Val Glu Val Ile 2O 25 3O Gly Arg Phe Ile Ser Pro Ser Gly Arg Glu Trp Arg Val Pro Gly Phe 35 4 O 45 Tyr Asp Gly Asp Gly Val Trp Llys Val Arg Phe ASn Pro Gly Glu Glu SO 55 6 O Gly Arg Trp Ala Tyr Arg Lieu. Glu Ser Tyr Pro Glu Asp Pro Glu Lieu 65 70 7s 8O Arg Ala Glu Gly Thr Phe Glu Val Lieu Pro Arg Glu Ala Arg Gly Phe 85 90 95 Lieu. Arg Ser Val Pro Gly Glin Ala Trp Gly Phe Ile Tyr Glu Ser Gly 1OO 105 11 O Glu Pro Val Phe Ile Leu Gly Asp Thr Val Tyr Asn Lieu Phe Gly Met 115 12 O 125 Ala His Cys Gly Ala Asp Val Glu Ala Phe Lieu. Glu Arg Arg Ala Ser 13 O 135 14 O Gln Gly Phe Asn Lieu Lleu. Arg Val Arg Val Pro Val Ser Pro Phe His 145 150 155 160 Pro Pro Lys Gly Tyr Ser Glu Trp Gln Thr Arg Arg Thr Trp Pro Trp 1.65 17O 17s Glu Gly Ser Glu Glin Ala Pro Val Phe Asp Arg Phe Asn Lieu. Glu Tyr 18O 185 19 O Phe Ala Thr Val Asp Arg Val Val Arg Llys Val Glu Glu Lieu. Gly Lieu. 195 2OO 2O5 Gly Lieu. Glu Val Ile Met Glu Ala Trp Gly Phe Glu Phe Pro Phe Asn 21 O 215 22O Ser Arg His Ile Phe Val Ala Glu Trp Glu Glu Lieu. Trp Met Arg Tyr 225 23 O 235 24 O

Lieu Val Ala Arg Tyr Asp Ala Tyr Ser Cys Val Tyr Phe Trp Thr Pro 245 250 255 Met Asin Glu Tyr Glu Phe Tyr Pro Asn Gly Asp Trp His Tyr Llys Pro 26 O 265 27 O

Thir Ala Asp Arg Trp Ala Ile Arg Ile Ala Arg Trp Lieu. Arg Ala Asn 27s 28O 285

Ala Pro His Gly His Ile Val Ser Lieu. His Asn Gly Pro Trp Asp Pro US 2015/021 1 037 A1 Jul. 30, 2015 71

- Continued

29 O 295 3 OO Pro Phe Ala His Arg Phe Arg Ser Asp Pro Lys Ala Ile Asp Thir Ile 3. OS 310 315 32O Met Phe Glin Phe Trp Gly. Thir Thr Gly Arg Asp Asp Ala Trp Leu Ala 3.25 330 335 Ala Gly Ile Glu Asp Arg Ile Ala Tyr Ser Lieu. Gly Gly Trp Tyr Gly 34 O 345 35. O Thr Ala Val Phe Ala Glu Tyr Gly Tyr Glu Arg Asn Pro Ala Leu Pro 355 360 365 Lieu. Asn. Ile Pro Gly His Glu Phe Cys Asp Pro Glu. His Thir Arg Arg 37 O 375 38O Gly Ala Trp Arg Gly Ala Phe Cys Gly Lieu. Gly Val Ile His Gly Phe 385 390 395 4 OO Glu Asn Ser Trp Gly Pro Phe Met Val Lieu. Glu Glu Asp Glin Pro Gly 4 OS 41O 415 Lieu. Glu Tyr Lieu. Lieu. His Lieu. Arg Arg Phe Phe Thr Glu Val Val Pro 42O 425 43 O Phe His Arg Lieu Lleu Pro Asp Ala Ser Lieu Val Val Ser Asp Ile Ser 435 44 O 445 Glu Glin Gly Gly Llys Pro Lieu Ala Lieu. Ser Ser Pro Glu Arg Asp Wall 450 45.5 460 Lieu Ala Val Tyr Lieu Pro Arg Gly Gly Glu Phe Llys Lieu. Ser Val Asn 465 470 475 48O Pro Pro Ala Asp Pro Cys Trp Tyr Asp Pro Arg Thr Gly Glu Val Lieu. 485 490 495 Ala Ala Glu Ala Ser Pro Ser Gly Gly Trp Val Ala Pro Glin Ser Gly SOO 505 51O Pro Ala Asp Arg Pro His Asp Trp Val Trp Phe Ser Thr Ser Gly Arg 515 52O 525

<210s, SEQ ID NO 27 &211s LENGTH: 529 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: ZP O29181951

<4 OOs, SEQUENCE: 27 Met Ala Glu Tyr Lys Thr Glin Val Glu Gln His Arg Lieu Phe Glu Ile 1. 5 1O 15 Asn Lieu. Thr Gly Thr Thr Glu Gly Asn Pro Tyr Glin Asp Val Thr Lieu. 2O 25 3O Ser Ala Asp Phe Thr Asn Ala Glu Thr Gly Glin Ile Val Val Val Gly 35 4 O 45

Gly Phe Tyr Arg Gly Asn Gly Asn Tyr Ser Val Arg Phe Met Ala Ser SO 55 6 O

Ser Ala Gly Arg Trp Ala Phe Thir Thr Arg Ser Thr Asp Pro Ala Lieu. 65 70 7s 8O

Asp Gly Glin Thr Gly Val Phe Thr Val Thr Pro Ala Thr Glin Asp Asn 85 90 95 His Gly Arg Val Lieu. Thir Ala Thr Glu Ala Lieu. Ser Gly Lys Ala Arg 1OO 105 11 O

Glu Ala Tyr Gly Ser Glu Lieu Lys Tyr Arg Phe Thr Tyr Glu Asp Gly