Supplemental Data

Extent and Origins of Functional Diversity in a Subfamily of Glycoside

Authors:

Evan M. Glasgow1,2, Kirk A. Vander Meulen1,2, Taichi E. Takasuka1,2,3, Christopher M. Bianchetti1,2,4, Lai

F. Bergeman1,2, Samuel Deutsch5, and Brian G. Fox1,2

Affiliations:

1Great Lakes Bioenergy Research Center, Madison, WI, USA

2Department of Biochemistry, University of Wisconsin – Madison, USA

3Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan

4Department of Chemistry, University of Wisconsin – Oshkosh, USA

5DOE Joint Genome Institute, Walnut Creek, CA, USA

A Distinct Activity Distribution for Clade 3 — Supplementary Figure S1 illustrates that activity values in Clade 3 enzymes are shifted toward higher activities for lichenan and xylan. For mannanase

activities, the assays are likely sampling from the upper end of a distribution and so limit definitive

conclusions.

Compared to the other substrates, activities on xyloglucan show an overall increase in average activities.

The histogram is also more compressed than those of lichenan and xylan, possibly due to the fact that fully soluble xyloglucan is more easily degraded than the other substrates, so enzymes with activity on xyloglucan are approaching yield saturation. Most relevant to this work, the activities indicate a rank- ordering (Clade 3 > Clade 2 > Clade 1) in median activity that matches the results for lichenan and xylan.

Thus it appears that the general catalytic efficiency exhibited by an in assays of lichenase or xylanase activities transfers to the of xyloglucan. The possibility that open binding clefts in

Clades 1 and 2 make these enzymes more generalists with regard to branched substrates remains open, as the tamarind xyloglucan used in this study represents only one possible xyloglucan structure.

Figure S1. Per-clade histograms of specific activities on lichenan, xylan, mannan, or xyloglucan. Because the xyloglucan data are not as extensive, activity data for all substrates is presented not corrected for substrate saturation. Histograms are presented in terms of fraction of the tested population. Other correlations — Table S1 reports correlations observed following activity measurements in a second

set of assays collected with additional substrates (tamarind xyloglucan, carboxymethylcellulose (CMC),

phosphoric acid swollen cellulose (PASC) and beechwood xylan). All reactions in this smaller dataset were carried out for 3 hours at 30 °C, pH 6.

Table S1. Correlation Statistics for Additional Substrates and Lichenan or Xylan Lichenan a Xylan a Substrate b Enz.c Pts.d p e ρ f slope f Pts.d p e ρ f slope f 0.35 0.0 0.04 0.0 110 90 < 10-3 82 0.7 Xyloglucan (0.08 – 0.42) (0.0 – 0.5) (-0.16 – 0.19) (-0.3 – 0.3) 0.65 0.6 0.38 0.4 64 47 < 10-6 43 0.01 CMC (0.51 – 0.76) (0.5 – 0.9) (0.12 – 0.61) (0.0 – 0.9) 0.41 0.5 -0.11 0.3 46 20 0.07 20 0.6 PASC (0.04 – 0.69) (0.1 – 1.1) (-0.48 – 0.28) (-0.4 – 1.0) 0.56 0.8 0.52 0.8 64 60 < 10-5 53 < 10-4 Xylan (0.38 – 0.71) (0.6 – 1.0) (0.30 – 0.70) (0.6 – 1.1) a: Specific activities measured on lichenan and xylan in the main text complete assay (243 enzymes) b: Substrate assayed in second experimental subset c: Number of enzymes assayed in second experimental subset d: Number of datapoints in the log-log comparison dataset (enzymes with both activities > 0) e: p-value, Spearman ρ, regression slope f: Best estimate and 90% confidence interval

The strongest correlations observed are between specific activities measured on carboxymethylcellulose

and lichenan, and between xylan and either lichenan or xylan (the latter serves as an internal validation,

since protein translations and substrate preparations were prepared entirely independently of the first set

of experiments). Correlations between activities on xyloglucan or PASC and lichenan are also detectable,

with the former characterized as rather shallow (log-log slope ≤ 0.5).

In the full dataset, we do not detect a significant subfamily-wide correlation between activities on mannan

and either lichenan or xylan. If there is such a correlation, as mentioned above, both the average activities

and histograms suggest it is stronger with lichenan (Figure S2). Figure S2. Histograms of specific activities on lichenan (A and C) or xylan (B), stacked to display frequencies for enzymes with or without mannanase activity (M+ vs M-, panels (A) and (B)) or frequencies for enzymes with or without xylanase activity (X+ vs X-, panel (B)).

Impact of CBM modules — No obvious functional difference is observed between enzymes possessing or lacking native CBM modules in Clades 2 and 3. Histograms in Figure S3 were generated by merging data from all sub-clades in 2 and 3 following subtraction of the corresponding median activity for that sub- clade. No systematic shift is observable, and a significant difference was not detected (via t-test) between the groups for either xylanase or lichenase activities. While it is possible there are sub-clades or examples in this population where enzyme function is significantly hampered by examining only the catalytic core, the data does not provide strong evidence for this. Moreover, none of the enzymes with no detectable activity were from gene constructs possessing CBM modules (see below).

Figure S3. 2D histograms plotting the number of Clade 2 and 3 enzymes in a given span of lichenase and xylanase activity, relative to the median activity in the phylogenetic group for each enzyme. For enzymes originating from genes coding for CBMs in addition to the catalytic core (left), bin sizes are 0.5 log units; for enzymes originating from genes without CBMs, bin sizes are 0.33 log units.

In addition to CBMs, genes coding for the enzymes in this study may possess many other types of

domains, such as protein localization bacterial immunoglobulin (“Big”) or bacteroidetes-associated

carbohydrate-binding often N-terminal (BACON) domains, while other GH5_4 members harbor dockerin

domains for cellulosome association. As mentioned in the main text, several GH5_4 catalytic domains are fused to other enzymatic domains. These can be other GH5_4 domains (up to four total), a GH26

domain, or a carbohydrate esterase domain, and the functions of these auxiliary enzyme domains may be

related plant biomass degradation, such as simultaneous hydrolysis of cellulose and the tightly associated

hemicelluloses, or deacetylation of polysaccharide side chains. As was the case with GH5_4 at the outset

of this work, many of these protein families remain mostly under-characterized, and the functions

assigned to them are typically based on one or a few examples. A systematic examination of GH5_4 with

the full suite of additional domains observed in nature is beyond the scope of this work but would likely

yield tremendous insight into the proteins’ true function and utility.

Compensation for Substrate Depletion and Comparison to Previous Data — One nuance for the lichenase – xylanase correlation is that some measurements (10% of xylanase data, 22% of lichenase data) were performed after substrates depletions of 10% or more, meaning those data are outside of the linear region typically employed in studies (Iakiviak, et al. 2011; McGregor, et al. 2016).

Beyond this 10% cutoff, the steady-state approximation becomes invalid and the reaction velocity decreases, so that a single-point specific activity measurement would underestimate the enzyme activity relative to determination in the steady-state regime. As described in Methods, we therefore adjusted enzymes in the main text to a specific activity corresponding to the hypothetical experimental time point corresponding to 5% substrate depletion. This adjustment only significantly impacts data collected after depletions of 10% or more (Figure S4). Figure S4. Lichenase vs xylanase correlation plot pre- and post-adjustment to hypothetical value at 5% substrate depletion. Gray squares, simple specific activity calculations; closed and open circles, adjusted data as plotted in main text Figure 4. Arrows displayed if the plot location is altered by more than 0.5 log units. Encouragingly, our results also show a consistent relationship to literature measurements throughout the entire dynamic range. Figure S5 displays pre- and post-adjusted data in comparison with previous literature measurements. Measured specific activities on lichenan and xylan are roughly 10 – 100-fold lower in our assays than in several previous studies (Fierobe, et al. 1991; Foong and Doi 1992; Xue, et al.

1992; Liu, et al. 2001; Bianchetti, et al. 2013; McGregor, et al. 2016; Meng, et al. 2017). This is attributed to the lack of shaking in the high-throughput plate assays used here, differences due to scale of reaction, and possibly uncertainties in protein concentration calculations.

Figure S5. Comparison between specific activity values in quantitative screen and measurements in previous studies

Complete phylograms for each main group with detailed annotation are displayed below in Figure S6A –

D (figures corresponding to related subfamilies, and Clades 1, 2, and 3). Bayesian MCMC probabilities

(’supports’) are listed above each node. Activity annotations from this study are included to the right of each node tip as in Figure 4 of the main text, with an added purple square similarly denoting that xyloglucanase activity was measured. As for the other activities, the size of the xyloglucanase symbol corresponds to its activity lying within the first, second or third quantile. EC numbers listed in the CAZy database are also listed. In the informational table at the right-hand side of the figure, the accession code and Pfam motif annotation for each gene are listed. Motifs are plotted to highlight features most relevant to this work: purple rectangles, GH5; yellow rectangles, other GH families; purple triangles, CBM X2 motif (a.k.a. CBM46 and the related Ig-like domain); tan triangles, other CBM motifs; brown ellipses, all other motifs. Locations of crystal structures and representative PDB accession codes are also marked between the relevant node tip and table entry.

Letters in the final column list isolation notes as provided in NCBI database: W= aquatic sample,

S=terrestrial sample; R= rumen or other host-associated sample. Aside from handful of examples, Clade 1 enzymes have been found in soil or aquatic samples. Clades 2 and 3 are heavily represented by rumen and gut symbionts; of evolutionary note, even within subclades 2C, 2D and 3B, which are dominated by free living organisms, the enzymes that are most closely related to the common ancestor come from host- associated organisms.

Closely-related subfamilies and study outgroup representatives from subfamily 2 (AAC19169.1) and subfamily 5 (AFY52522.1) were included in the experimental set and the tree-building process. In Region

I, there is a representative crystal structure corresponding to the protein with Genbank accession code

ABN52701.1. This sequence was not included in the tree-building process, but it is most closely related to

ACL75216.1 and AKP16689.1. It is marked as an added hypothetical node – table connecting line.

Fig. S6A

3.2.1.4 1A3H AAC19169.1 - 1GZJ AFY52522.1 - AGT44217.1 - 1.0 1.0 AHM60127.1 - 1.0 AKQ46720.1 - ABW39346.1 1.0 - 1.0 ABW39360.1 - ABW39353.1 - ACU63658.1 S ADY36470.1 R 1.0 1.0 3.2.1.4 ADX05717.1 R AMO13174.1 - ACZ43095.1 0.8 0.9 S 1.0 AOS43600.1 W 0.5 ACB76596.1 S AIF26760.1 - 1.0 ADB80112.1 R 0.9 1.0 AKF17200.1 1.0 R 1.0 3.2.1.- ACA61171.1 R ADB80113.1 R

1.0 AFY59930.1 W 1.0 ABF40690.1 S ADW68526.1 S 1.0 0.5 1.0 ALA20142.1 - 1.0 CUT12384.1 - 1.0 1.0 ALN63830.1 - ALN80811.1 -

BAW97238.1 - 0.6 1.0 3.2.1.4 ABE60666.1 - 0.9 BAU42234.1 - AFY83745.1 1.0 S 1.0 AFZ29990.1 W ABX04545.1 W 3.2.1.74 ABE60715.1 0.9 - 1.0 3.2.1.74 AAA50210.1 R 1.0 3.2.1.- ADX05737.1 R ABW39332.1 - 0.9 1.0 1.0 3.2.1.- ABW39345.1 - 3.2.1.- ABW39351.1 - 1.0 AOA60286.1 1.0 - 1.0 CUH93289.1 - AOA60285.1 - 3.2.1.4 AEV59734.1 1.0 1.0 - 1.0 BAJ62400.1 W AJD08472.1 -

1.0 1.0 3.2.1.4 ACH67609.1 R 3.2.1.- ACA61165.1 R ABN54006.1 1.0 0.5 R 1.0 3.2.1.4 CAC27410.1 - 1.0 3.2.1.4 AAA23220.1 - 1.0 1CEC ALX07407.1 R 0.7 ANV75146.1 -

ADU73485.1 - 1.0 AIQ42917.1 - 0.9 3.2.1.4 AAA91966.1 - 3.2.1.- AEV59739.1 1.0 - 0.6 AEB15411.1 - 0.8 1.0 3.2.1.- AEV59731.1 - CBK82603.1 R

ADU22290.1 R 1.0 1.0 1.0 3.2.1.- ADD61786.1 - CCO06210.1 R 0.6 1.0 CBK95738.1 - CBL34054.1 R 1.0 3.2.1.4 ADX05739.1 R 1.0 3.2.1.- ACA61162.1 R

1.0 AHF25829.1 R 1.0 1.0 AHF26427.1 R 1.0 3.2.1.4 AHF23845.1 R 1.0 AHF24623.1 R AHF25090.1 R 1.0 AEL25420.1 R AKP51009.1 - 1.0 CEA16167.1 -

1.0 AHW61392.1 - 1.0 1.0 AEI48971.1 W

1.0 ACT91559.1 - 1.0 1.0 AOS45469.1 W 1.0 CCG98123.1 - AKD54702.1 - 1.0 1.0 ADB39348.1 W 0.6 AEE97278.1 S 1.0 ACI19118.1 W ADQ08066.1 1.0 W 1.0 ADQ47167.1 W ACM59384.1 - 3.2.1.78 AGA35556.1 - 0.6 1.0 3W0K 1.0 ADL69420.1 W 1.0 AFK87329.1 W ABX03536.1 W 1.0 AEE17338.1 1.0 R 1.0 1.0 AEJ61710.1 W ADN01668.1 W SCM55578.1 - 1.0 AFN75855.1 W 1.0 1.0 SDJ64628.1 - ACB73410.1 S 1.0 AQG82496.1 - 0.8 ACB74275.1 S SDK46852.1 1.0 - SFM73591.1 - 1.0 SDN54077.1 - SDJ33083.1 - SEN55982.1 -

SDH07135.1 - AHG56239.1 R

ADY55909.1 W APF24128.1 - 1.0 ANF15335.1 - 1.0 AOR95284.1 - ALS18212.1 - 1.0 ALP91715.1 - 1.0 CBA64573.1 R 0.8 CBE05753.1 R AKP43506.1 - 1.0 1.0 0.6 ALP03840.1 - 1.0 CDS88444.1 - 0.9 CDS85201.1 - AQU09054.1 - 0.7 CEJ99169.1 -

0.8 CKH36269.1 - 1.0 CAJ69433.1 R

ACT59657.1 W 0.8 APG62326.1 0.9 W 0.9 AMS30920.1 - 3.2.1.4 AID57617.1 - 3.2.1.4 3NCO ABS61403.1 W 1.0 3.2.1.4 AFY97404.1 - 0.9 1.0 AMW33355.1 - 1.0 AFG35892.1 1.0 W 0.9 CAH68695.1 - ANE42311.1 - AEE95680.1 S 0.8 1.0 1.0 3.2.1.4 ACI18520.1 W ACK41602.1 W 1.0 1.0 1.0 ACI19692.1 W 0.5 ACK41954.1 W 1.0 ACI19154.1 W 3.2.1.4 ACK41955.1 W 0.8 AKI98197.1 W 1.0 ACB09420.1 W WP_004082283.1 - 1.0 AKE31377.1 - 3.2.1.4 3.2.1.78 3MMU/3AMC AAD36816.1 W AKE27633.1 - 0.7 SDT11367.1 - BAJ63901.1 W APF18720.1 W

1.0 AEW47975.1 - 1.0 1.0 AEW47949.1 - 0.8 1.0 AEW47942.1 - 1.0 AEW47944.1 - AEW47966.1 -

CAD78375.1 W 1.0 ADV60918.1 W 0.8 AKC82422.1 W SDU12473.1 - 1.0 AMO25510.1 - 0.8 0.9 AHG93393.1 - 0.9 1.0 CDN55595.1 R 0.8 CDN49507.1 R 0.5 ANP47437.1 - SDH33403.1 0.8 - APR78007.1 S 1.0 APW61403.1 - AMV36037.1 - ADU46314.1 S 0.6 1.0 AMA57428.1 R APO54584.1 1.0 1.0 - 0.8 BAR60597.1 - 1.0 AND88774.1 - CUT09034.1 - 0.6 1.0 AHY51746.1 R 1.0 1.0 AJA64257.1 - 0.7 CUU20044.1 - APG13256.1 - 1.0 BAO65808.1 - ADV27430.1 - 1.0 AGU11488.1 - 1.0 1.0 AGU10498.1 - 1.0 AIT08352.1 - 0.9 AGU10231.1 - 1.0 ADU14148.1 W 1.0 3.2.1.4 ABE60714.1 - 0.7 AQQ75170.1 - 3.2.1.4 ABZ70413.1 W

ACK72958.1 S 1.0 3.2.1.4 ADX05718.1 R 3.2.1.4 ACX75120.1 Ι R 3.2.1.4 ABD81750.1 W 1.0 ABN52701.1 1.0 3.2.1.4 4U3A: ACL75216.1 S 0.6 1.0 AKP16689.1 - ANW96981.1 W

1.0 ANQ52487.1 - 1.0 ANW96642.1 W

1.0 AJR04904.1 - 1.0 1.0 1.0 CCG00653.1 - 1.0 AOE12941.1 - 1.0 AOE06904.1 - AOE06289.1 -

ADV50845.1 W 1.0 3.2.1.4 ABD82186.1 W 1.0 AIQ45878.1 1.0 - 1.0 AIQ29009.1 - AIQ57804.1 S Fig. S6B 1.0 ACI18413.1 W ACK42859.1 W

EGX48303.1 - 1.0 1.0 CCD46753.1 - 1.0 APA10925.1 - 1.0 CBX94583.1 - 1.0 EAA59690.1 R 1.0 CAP96338.1 - 1.0 1.0 3.2.1.4 AFG25592.1 1.0 - AIO06743.1 -

AEI40033.1 - 1.0 0.9 AFC28691.1 S AFH60868.1 S

1.0 SCA85699.1 -

AJO18129.1 - 1.0 1.0 AGN36284.1 S BAL45505.1 -

1.0 CAJ70720.1 - 1.0 AKQ73025.1 R ACY72382.1 - 3.2.1.4 4YZP AAU40777.1 R 1.0 CAJ70717.1 -

AMR10287.1 -

CAH10344.1 -

AOP14982.1 -

APJ26960.1 -

BAL62675.1 - 1.0 0.9 BAK20916.1 - AIM25574.1 - 1.0 1.0 AFC32864.1 S APO46518.1 1.0 - ABX43556.1 S 1.0 1.0 BAQ28012.1 - 1.0 ADB10706.1 R ACZ98603.1 1.0 R 1.0 ADL52313.1 S BAV13065.1 - AEE96311.1 1.0 S 0.6 3.2.1.- AEV59735.1 - 0.9 AEV59723.1 - ADU29456.1 S AAE86309.1 - 1.0 1.0 1.0 3.2.1.4 4V2X BAB04322.1 - 1.0 1.0 ALL28250.1 -

3.2.1.4 ADD62401.1 - 1.0 5E09 1.0 CAD61243.1 - 3.2.1.4 CAD61244.1 -

AOZ94365.1 - 1.0 AIQ72470.1 0.8 1.0 - 1.0 AIQ22098.1 - AIQ33839.1 - 1.0 AIY09407.1 - 1.0 AHM68714.1 - AJE51689.1 - 1.0 1.0 CCI71836.1 - ADO59348.1 1.0 S 1.0 AOK90579.1 0.8 - 0.8 ADM72668.1 S APB68897.1 - 1.0 1.0 ALP36808.1 - 1.0 AHA61672.1 - 1.0 AET58208.1 S 0.7 AHZ10945.1 -

APQ61948.1 - 0.6 1.0 AHC22551.1 1.0 - 1.0 ALA44669.1 - APB73548.1 -

ALF95239.1 - 1.0 BAJ22272.1 - 1.0 1.0 ACX63618.1 W ANA81244.1 - 0.6 3.2.1.4 AAA22408.1 -

1.0 3.2.1.4 AFC68970.1 - APO45603.1 0.6 - ANF96686.1 - 1.0 AIQ54513.1 - 0.6 AIQ49083.1 - 0.6 1.0 AIQ32060.1 - AIQ60609.1 S AFH62788.1 1.0 S 0.6 0.7 AFC30482.1 S AEI42752.1 -

SDS11517.1 -

CBG75588.1 S 0.6 1.0 1.0 ACU37240.1 - SCG45534.1 - 1.0 CCK24845.1 -

1.0 1.0 1.0 ACU73736.1 S ACU73806.1 S 1.0 1.0 AJF63299.1 - SHH37621.1 - 1.0 ALV31540.1 - 1.0 ADI13041.1 S 1.0 1.0 1.0 ANS70502.1 - ADI13081.1 S

AKG46381.1 - 1.0 1.0 BAL86217.1 S SDT69013.1 0.6 - ANC31385.1 - 1.0 1.0 ACZ29221.1 - 1.0 AEE44521.1 S

ACU36714.1 - 1.0 1.0 ANZ41179.1 S ANZ41121.1 S 1.0 1.0 SCG64564.1 - SCF35569.1 - 1.0 1.0 AEB43836.1 W SCG15747.1 1.0 - SBT43216.1 S 1.0 SBT46588.1 - 1.0 1.0 ADL44291.1 S ADU06511.1 S Fig. S6C

3.2.1.4 CBL16523.1 R

AEQ16450.1 1.0 R 0.7 3.2.1.4 CAH69214.1 R 1.0 ADB10705.1 R BAQ28011.1 0.6 - 1.0 CAN91059.1 S 1.0 AGP42124.1 - CCO05659.1 R 1.0 1.0 CBK96026.1 - CBL35063.1 R

CBK97722.1 - 0.7 1.0 1.0 CBL35246.1 R CBK96866.1 1.0 - 0.9 AHF25528.1 R 0.7 CCO05195.1 R

AEY66181.1 - CBL17363.1 R 0.7 0.7 1.0 CBK96759.1 - 0.7 CBL35076.1 R 1.0 ADL51185.1 S 1.0 BAV13031.1 - 1.0 ADL52801.1 S BAV13032.1 -

ABW39344.1 - 1.0 ABW39349.1 - 1.0 1.0 ABW39334.1 - AMO13176.1 - 1.0 1.0 3.2.1.4 ADX05734.1 R 3.2.1.4 4YHE AIJ19564.1 R ADY35478.1 R 1.0 1.0 1.0 ALJ60576.1 - 0.6 3.2.1.4 ADD61911.1 - AIF26005.1 0.9 R AHF25221.1 R

1.0 1.0 3.2.1.- CAJ19140.1 R AHF24581.1 R AIF26252.1 0.7 1.0 0.9 R 1.0 AIF26090.1 R 1.0 3.2.1.4 ADX05705.1 R AIF25922.1 R 1.0 3.2.1.4 ADU86901.1 0.6 R 1.0 1.0 AGH14017.1 R AHW46443.1 R

3.2.1.4 ADX05688.1 R

1.0 1.0 3.2.1.4 ADX05729.1 R

1.0 3.2.1.4 CAJ19151.1 R 3.2.1.4 ADX05742.1 R 1.0 1.0 ADU86905.1 R

ADU86906.1 R

1.0 AFJ44729.1 - 1.0 AFJ44730.1 - 3.2.1.- CAJ19139.1 R 1.0 1.0 1.0 1.0 3.2.1.- ACA61145.1 R 3.2.1.- AIT97141.1 - 0.6

1.0 3.2.1.- CAJ19135.1 R

1.0 3.2.1.4 ADU86902.1 R 1.0 ADB80100.1 R 3.2.1.4 ABX76050.1 -

CCO05502.1 R

ADX05698.1 R 1.0 1.0 ADL50682.1 S BAV13070.1 - 0.7 AFH62790.1 S 0.6 1.0 AFC30484.1 S

AEI42754.1 -

3.2.1.4 CAA73113.1 - 1.0 0.6 1.0 3.2.1.151 2JEP AAR65335.1 - APO45372.1 - 0.5 3.2.1.151 AAR65336.1 - 0.6 ANY70776.1 - 0.7 0.6 3.2.1.151 BAE44526.1 - 0.6 1.0 AIQ53446.1 - AIQ47948.1 - AKH87057.1 - 1.0 ANS62921.1 -

0.6 AMW15165.1 - 0.6 ANN17321.1 - ANZ39721.1 S 1.0 0.6 ADJ49096.1 - 0.6 1.0 AEK46057.1 S

AGT87932.1 - 0.6 1.0 AGL18216.1 - AEV87015.1 - 1.0 SCG74880.1 - 1.0 AGZ40766.1 - 1.0 1.0 BAL90206.1 S SDS81858.1 - 3.2.1.- CAJ19146.1 R 1.0 AIF26272.1 R 1.0 ADK97059.1 R 1.0 AKU68585.1 -

AGB29444.1 R 0.9 AGH14127.1 R 1.0 1.0 ADE83642.1 R AGH14046.1 R 1.0 3.2.1.151 3ZMR ALJ47680.1 - 1.0 ZP_02065667.1 R SCV07078.1 - 0.8 AGA77434.1 W 1.0 1.0 AIE83677.1 S ACE86198.1 1.0 S ACE83841.1 S 1.0 AQT59026.1 - 1.0 0.6 0.5 AJQ93308.1 - 3.2.1.4 ABD81896.1 W

ALN90813.1 -

1.0 BAF57333.1 - 0.9 ADQ18612.1 -

ALJ60577.1 1.0 - 1.0 SCM57160.1 - 0.5 SCD22090.1 -

AEV98714.1 S 1.0 3.2.1.- 4W8A ADB44000.1 - 0.7 1.0 AIE86402.1 S 0.8 3.2.1.4 ACX74396.1 R 1.0 3.2.1.4 AAB38548.1 R 1.0 1.0 ADX05721.1 R 0.5 3.2.1.4 ACX76661.1 R ABQ03808.1 S 1.0 0.7 ACR12247.1 R 1.0 ABD79589.1 W 1.0 ACE84905.1 S AQT62398.1 - 1.0 ADU14847.1 W 1.0 BAU54040.1 1.0 - 1.0 SDT44428.1 - SDT04517.1 - 1.0 BAU54048.1 - 0.9 AEV98717.1 S 1.0 SDT04183.1 1.0 1.0 - BAU54047.1 -

BAU54049.1 - 1.0 AOM80706.1 1.0 W 1.0 SDS19204.1 - SDT44259.1 - Fig. S6D

3.2.1.4 ABX76046.1 -

CAL91972.1 R 1.0 1.0 BAC57896.2 1.0 R 1.0 BAC57893.1 R 1.0 CAL91969.1 R 1.0 CDG46722.1 - 1.0 CDG46723.1 - CAL91974.1 1.0 R 1.0 CAL91973.1 R 3.2.1.4 BAA76394.1 -

1.0 AFJ44731.1 - 1.0 WP_009104139.1 R

CBL34359.1 R 0.9 1.0 1.0 WP_005357523.1 - 1.0 ADD61973.1 - 1.0 CCO03822.1 R AOZ96726.1 R

1.0 CBK74879.1 R 1.0 ABJ52796.1 R 1.0 1.0 1.0 3.2.1.4 3.2.1.73 CAA35574.1 - 4NF7 ADL34447.1 R

1.0 3.2.1.4 ABX42426.1 S AGM71677.1 0.9 4X0V - ACZ98609.1 R 1.0

1.0 3.2.1.4 1EDG AAA23221.1 - 1.0 AEY67147.1 -

3.2.1.4 AEV59725.1 -

WP_009983134.1 - 0.7 1.0 EWM52390.1 R

1.0 1.0 3.2.1.- ADD61809.1 - CBL14187.1 R

1.0 1.0 3.2.1.4 AEV59736.1 - AJO67866.1 - 0.8 1.0 AIF26244.1 R AIF25951.1 R 1.0 AHF26313.1 R 1.0 AHF24515.1 R AHF24918.1 R 1.0 3.2.1.4 AHF24998.1 R 1.0 AHF24954.1 1.0 R 1.0 AHF26214.1 R AHF24177.1 R

3.2.1.- ACZ98591.1 R

1.0 ADZ83488.1 W 1.0 0.9 CUH91691.1 - 1.0 1.0 ADL53038.1 S 3.2.1.4 AAD39739.1 - 1.0 ADZ22510.1 S 1.0 AAK81397.1 S 1.0 AEI34646.1 S CDM70321.1 - 1.0 1.0 3.2.1.4 AAC37035.1 R CDM69884.1 - 1.0 CAZ94281.1 R 1.0 APA63131.1 - 1.0 ANH59008.1 W 1.0 0.7 WP_021778202.1 W EAQ39146.1 W

ADZ19875.1 S 1.0 1.0 AAK78801.1 S 1.0 AEI34705.1 S ADZ19876.1 S 1.0 AAK78802.1 S

AEI34706.1 S 1.0 1.0 AGF55279.1 S AQR94165.1 -

1.0 ADL52928.1 S 1.0 3.2.1.4 3.2.1.73 3.2.1.8 AAA23233.1 - 1.0 3NDY

1.0 3.2.1.4 3.2.1.73 3.2.1.8 AAA23231.1 S 2211254A R 1.0 1.0 AEV67086.1 W WP_010248927.1 - 1.0 3.1.1.72 3.2.1.4 4IM4 AAA23224.1 - 1.0 ALX08430.1 R 0.7 ADU74487.1 - ANV76179.1 -

3.1.1.72 3.1.1.- ABN52032.1 R

0.8 ADD82896.1 - 1.0 1.0 AMO13175.1 - AKI88012.1 -

1.0 ABW39339.1 - 1.0 AMO13177.1 1.0 - 1.0 AMO13172.1 - 1.0 AMO13173.1 -

ABW39348.1 - 1.0 ABW39343.1 - 0.9 ABW39338.1 - 1.0 ABW39347.1 - 1.0 1.0 0.7 ABW39333.1 - ABW39335.1 -

3.2.1.4 ADK66823.1 - 0.7 CBK96050.1 - 1.0 0.9 CBL35038.1 R WP_005355457.1 -

1.0 1.0 3.2.1.4 AAB69347.1 - 3.2.1.4 AAC05164.1 - 1.0 3.2.1.4 AAE59927.1 - 1.0 1.0 AEO51792.1 R 1.0 3.2.1.4 AAD04193.1 -

3.2.1.4 AAB69348.1 -

3.2.1.4

3.2.1.4 CAB92326.1 1.0 3.2.1.4 - 1.0 3.2.1.4

3.2.1.4 AAD43818.1 - 1.0 3.2.1.4 3AYR AAF00074.2 - 1.0 1.0 ADQ12983.1 - 1.0 3.2.1.4 AAC63094.1 R 0.8 3.2.1.4 3.2.1.4 3.2.1.73 3.2.1.73 3.2.1.4 3.2.1.4 3.2.1.73 3.2.1.73 AAC06321.1 - 3.2.1.4 3.2.1.4 3.2.1.73 3.2.1.73 ADL33471.1 R

3.2.1.151 ACZ54907.1 - 1.0 4W84 0.8 1.0 ADZ84561.1 W ACZ98607.1 R AOZ95201.1 R

1.0 1.0 ADU23111.1 R 3.2.1.4 AAA26467.1 - 1.0 AEV46039.1 R 1.0 1.0 ADU86903.1 R ADL33047.1 R

WP_002848543.1 - 1.0 1.0 3.2.1.4 BAA92146.1 - ADU21113.1 R

1.0 ADU21423.1 R 1.0 1.0 AQP41385.1 - 1.0 CBZ47490.1 R 0.7 BAK27255.1 R CBI12822.1 R

ADU86907.1 R 1.0 WP_009984467.1 - 1.0 WP_009986298.1 - 1.0 AAB19708.1 R 1.0

0.8 3.2.1.4 CAB05881.1 R 0.7 WP_019680199.1 -

AFL98798.1 W

CBL16630.1 R 0.9 0.8 1.0 CBL16772.1 R CBL17440.1 R

CCO06082.1 R 1.0 3.2.1.4 CAA38693.1 - 1.0 1.0 3.2.1.4 AAA26469.1 - ADU20585.1 R 0.6 1.0 WP_009984919.1 - EWM54675.1 R 1.0 CDE32105.1 - 1.0 1.0 AMO13178.1 - 1.0 CCZ83818.1 - 1.0 ERJ92483.1 R CDE11886.1 - 0.7 0.7 EIM57503.1 R 0.5 AOH43511.1 R 1.0 ACU73866.1 S ADR64667.1 R

3.2.1.4 BAE46390.1 3.2.1.4 - 1.0 0.9 AEX97595.1 R 1.0 1.0 3.2.1.4 BAA92430.1 - 3.2.1.4 ADU21608.1 R 0.9 1.0 AEX97596.1 R AHF24720.1 R

ANS70888.1 R 1.0 1.0 3.2.1.151 3.2.1.73 3VDH AAC97596.1 - 1.0 AGH13958.1 R ACA61144.1 R 1.0 1.0 3.2.1.- AIT97140.1 - 1.0 3.2.1.- ACA61160.1 R 1.0 AHF25871.1 R 1.0 ADU86908.1 R 0.7

1.0 3.2.1.4 ADX05703.1 R 3.2.1.4 ACA61149.1 R

ADR64668.1 R 1.0 0.8 AFN57700.1 R

0.8 1.0 3.2.1.73 3.2.1.78 ADA62505.1 - 3.2.1.4 3.2.1.73 3.2.1.78 3.2.1.8 ABB46200.1 - 0.7 1.0 ADU86909.1 R 0.9 3.2.1.4 3.2.1.8 AAC36862.1 - 1.0 1.0 AGH14098.1 R AIF25923.1 R

1.0 AEK98797.1 R 3.2.1.- ACA61140.1 R 0.6 3.2.1.4 ADR64663.1 R 0.7 0.7 ADR64666.1 R 1.0 3.2.1.- ACA61137.1 R

3.2.1.- ACA61135.1 R

0.9 AFO64637.1 R

0.9 AFQ39736.1 R 0.5 3.2.1.4 ABX76045.1 - 0.5 3.2.1.- ACA61132.1 R 0.5 AFO64636.1 R 1.0 1.0 AHB33631.1 - ADK55024.1 R References

Bianchetti CM, Brumm P, Smith RW, Dyer K, Hura GL, Rutkoski TJ, Phillips GN, Jr. 2013. Structure, dynamics, and specificity of endoglucanase D from Clostridium cellulovorans. J Mol Biol 425:4267- 4285. Fierobe HP, Gaudin C, Belaich A, Loutfi M, Faure E, Bagnara C, Baty D, Belaich JP. 1991. Characterization of endoglucanase A from Clostridium cellulolyticum. J Bacteriol 173:7956-7962. Foong FCF, Doi RH. 1992. Characterization and Comparison of Clostridium-Cellulovorans Endoglucanases-Xylanases Engb and Engd Hyperexpressed in Escherichia-Coli. Journal of Bacteriology 174:1403-1409. Iakiviak M, Mackie RI, Cann IK. 2011. Functional analyses of multiple lichenin-degrading enzymes from the rumen bacterium Ruminococcus albus 8. Appl Environ Microbiol 77:7541-7550. Liu J, Tsai C, Liu J, Cheng K, Cheng C. 2001. The catalytic domain of a Piromyces rhizinflata expressed in Escherichia coli was stabilized by the linker peptide of the enzyme. Enzyme Microb Technol 28:582-589. McGregor N, Morar M, Fenger TH, Stogios P, Lenfant N, Yin V, Xu X, Evdokimova E, Cui H, Henrissat B, et al. 2016. Structure-Function Analysis of a Mixed-linkage beta-Glucanase/Xyloglucanase from the Key Ruminal Bacteroidetes Prevotella bryantii B(1)4. J Biol Chem 291:1175-1197. Meng DD, Liu X, Dong S, Wang YF, Ma XQ, Zhou H, Wang X, Yao LS, Feng Y, Li FL. 2017. Structural insights into the substrate specificity of a glycoside family 5 lichenase from Caldicellulosiruptor sp. F32. Biochem J 474:3373-3389. Xue GP, Gobius KS, Orpin CG. 1992. A Novel Polysaccharide Hydrolase Cdna (Celd) from Neocallimastix-Patriciarum Encoding 3 Multifunctional Catalytic Domains with High Endoglucanase, Cellobiohydrolase and Xylanase Activities. Journal of General Microbiology 138:2397-2403.