Supporting Information for Kaltenbach at al., Evolution of chalcone isomerase from a non-catalytic ancestor

Table of Contents

Table S1. Protein sequences used for the structure-based alignment (Figure S1, Supplementary Data Set 1). p. 1

Figure S1. Structure-based alignment of representative ancestral and extant sequences used in this work. p. 3

Figure S2. Phylogenetic used for the ancestral reconstruction. p. 4

Figure S3. An advanced guide to ancestral sequence inference with no headaches (or fewer, at least): representative example of the decision-making process to determine the amino acid sequence in ambiguously inferred positions. p. 8

Table S2. Posterior probabilities of the initial ancestral reconstruction including indels and final amino acid sequences of the ancestral proteins after manual revision of ambiguously inferred positions in gap regions (Figure S3) and at the termini (Figure S1). p. 9

Table S3. Apparent midpoint denaturation temperatures (Tm) of selected variants measured using SYPRO Orange as fluorescent probe and/or by NanoDSF (following Tryptophane fluorescence). p. 16

Figure S4. Correlation between Tm values determined using SYPRO Orange and NanoDSF. p. 16

Table S4. Enzymatic activities of CHI from A. thaliana (AtCHI), inferred ancestors and variants obtained by directed evolution. p. 17

Figure S5. Spontaneous isomerization of chalconaringenin to racemic naringenin. p. 18

Figure S6. Analysis of the evolutionary conservation and position of residues to generate ancCHI*. p. 19

Table S5. Mutagenic primers used for phylogenetic library generation. p. 20

Table S6. Overview of library generation and screening throughput. p. 21

Table S7. Sequenced variants of the phylogenetic libraries. p. 21

Table S8. Mutations accummulated in the phylogenetic trajectory. p. 29

Table S9. Enzymatic activities of variants obtained by additional library screening or site-directed mutagenesis. p. 29

Table S10. Sequenced variants of the low-mutation rate phylogenetic libraries. p. 30

Table S11. Sequenced variants of the random mutagenesis libraries. p. 33

Table S12. Occurrence of phylogenetic founder mutations and epR4 mutations in the ancestral reconstruction and in extant sequences. p. 35 Table S13. Mutations accummulated in the alternative random mutagenesis trajectory. p. 35

Table S14. Data collection and refinement statistics for x-ray protein structures determined in this work. p. 36

Figure S7. X-ray crystal structures reported in this work. p. 37

Table S15. RMS values for protein-protein structural alignments based on CA alignments in PyMOL. p. 39

Figure S8. Average distance of each mutation from ancR1, ancR2, ancR3, ancR5, and ancR7 calculated as the distance from the side chain Cβ to Nε of R34. p. 40

Figure S9. Root mean square deviations (RMSD, Å) of all substrate heavy atoms from simulations of ancCC, ancR1, ancR3, ancR7, ancCHI and AtCHI Michaelis complexes. p. 41

Figure S10. Overview of simulations of the ancCHI and AtCHI Michaelis complexes, showing the distribution of different substrate binding modes in each simulation. p. 42

Figure S11. Illustration of different substrate binding modes observed in our molecular dynamics simulations using the ancR3 variant as a representative example. p. 43

Figure S12. Root-mean square deviations (RMSD) of all product heavy atoms from simulations of ancCC and ancR1 in complex with the product observed in the non- productive binding mode. p. 44

Figure S13. NMR 15N-1H HSQC spectra of ancCC (blue), ancR1 (red), and ancR7 (purple). p. 45

Figure S14. Root-mean square fluctuations (RMSF) of Arg34 during our substrate- free simulations of the ancCC, ancR1, ancR3 and ancR7 variants. p. 46

Figure S15. Root-mean square deviations (RMSD, Å) of all backbone heavy atoms from simulations of the different enzymes. p. 47

Table S16. Standard GAFF atom types and calculated partial charges used to describe the substrate chalconaringenin (left) and the product naringenin (right) p. 48

References p. 49

2 Table S1. Protein sequences used for the structure-based alignment (Figure S1, Supplementary Data Set 1).

Protein Family Major Clade Clade Family Sequence Identifier[a] CHI Angiosperms Basal Proteaceae Grevillea robusta GRRW-56937 CHI Angiosperms Basal Eudicots Papaveraceae Eschscholzia californica RKGT-57420 CHI Angiosperms Basalmost angiosperms Austrobaileyaceae Austrobaileya scandens FZJL-165039 CHI Angiosperms Basalmost angiosperms Ceratophyllaceae Ceratophyllum demersum NPND-121635 CHI Angiosperms Basalmost angiosperms Nymphaeaceae Nymphaea sp. PZRT-6817 CHI Angiosperms Basalmost angiosperms Amborellaceae Amborella trichopoda URDJ-33860 CHI Angiosperms Chloranthales Chloranthaceae Sarcandra glabra OSHQ-47092 CHI Angiosperms Core Eudicots Santalaceae Daenikera sp. BSEY-12135 CHI Angiosperms Core Eudicots Basellaceae Basella alba CTYH-26097 CHI Angiosperms Core Eudicots Crassulaceae Kalanchoe crenato-diagremontiana DRIL-14508 CHI Angiosperms Core Eudicots incarnata EGOS-1705 CHI Angiosperms Core Eudicots Nyctaginaceae HMFE-18529[b] CHI Angiosperms Core Eudicots Molluginaceae Mollugo pentaphylla KJAA-23136[c] CHI Angiosperms Core Eudicots Physenaceae Physena madagascariensis RUUB-84668 CHI Angiosperms Core Eudicots Phytolaccaceae Hilleria latifolia SFKQ-89198 CHI Angiosperms Core Eudicots Molluginaceae Mollugo nudicaulis UNSW-124362 CHI Angiosperms Core Eudicots Ranunculaceae Aquilegia formosa CHI_Afor CHI Angiosperms Core Eudicots/ Gentriana triflora CHI_Gtri CHI Angiosperms Core Eudicots/Asterids Solanaceae Nicotiana tobacum CHI_Ntab CHI Angiosperms Core Eudicots/Asterids Tanacetum parthenium DUQG-51806 CHI Angiosperms Core Eudicots/Asterids adnatum FXGI-5812 CHI Angiosperms Core Eudicots/Asterids Goodeniaceae Scaevola mossambicensis HUQC-70359 CHI Angiosperms Core Eudicots/Asterids retusus KTAR-140005 CHI Angiosperms Core Eudicots/Asterids Kaliphora madagascariensis KTWL-19787 CHI Angiosperms Core Eudicots/Asterids Asteraceae Matricaria matricarioides OAGK-54163 CHI Angiosperms Core Eudicots/Asterids Asteraceae Matricaria matricarioides OAGK-55018 CHI Angiosperms Core Eudicots/Asterids Asteraceae Flaveria trinervia ZCUA-8524[d] CHI Angiosperms Core Eudicots/Asterids Asteraceae graminifolia ZGDS-88797 CHI Angiosperms Core Eudicots/Asterids Convolvulaceae Ipomoea batatas CHI_Ibat CHI Angiosperms Core Eudicots/ Brassicaceae Arabidopsis thaliana CHI_Atha CHI Angiosperms Core Eudicots/Rosids Elaeagnaceae Elaeagnus umbellata CHI_Eumb CHI Angiosperms Core Eudicots/Rosids Medicago sativa CHI_Msat CHI Angiosperms Core Eudicots/Rosids Bataceae Batis maritima DZTK-39299 CHI Angiosperms Core Eudicots/Rosids japonica EILE-39733 CHI Angiosperms Core Eudicots/Rosids Rhamnaceae Rhamnus japonica EILE-6281 CHI Angiosperms Core Eudicots/Rosids Fabaceae membranaceus HJMP-10325 CHI Angiosperms Core Eudicots/Rosids Linaceae Linum hirsutum HNCF-62563 CHI Angiosperms Core Eudicots/Rosids Myrtaceae Syzygium micranthum NEBM-18344 CHI Angiosperms Core Eudicots/Rosids Coriariaceae Coriaria nepalensis NNGU-23905 CHI Angiosperms Core Eudicots/Rosids Quillajaceae Quillaja saponaria OQHZ-20024 CHI Angiosperms Core Eudicots/Rosids Passifloraceae Passiflora caerulea SIZE-3368 CHI Angiosperms Core Eudicots/Rosids Onagraceae Oenothera serrulata SJAN-10048 CHI Angiosperms Core Eudicots/Rosids Euphorbiaceae Euphorbia mesembryanthemifolia VPDX-44653 CHI Angiosperms Core Eudicots/Rosids Rhamnaceae Rhamnus caroliniana WVEF-12235 CHI Angiosperms Core Eudicots/Rosids Lythraceae Punica granatum YNUE-90967[e] CHI Angiosperms Core Eudicots/Rosids Anacardiaceae Rhus radicans YUOM-36840 CHI Angiosperms Core Eudicots/Rosids Krameriaceae lanceolata ZHMB-85808 CHI Angiosperms Magnoliids Piperaceae Piper auritum MUNP-769 CHI Angiosperms Magnoliids Myristicaceae Myristica fragrans OBPL-45593 CHI Angiosperms Magnoliids Aristolochiaceae Aristolochia elegans PAWA-45302 CHI Angiosperms Magnoliids Winteraceae Drimys winteri WKSU-130380 CHI Angiosperms Monocots Hemerocallidaceae Hemerocallis spp. BLAJ-12405 CHI Angiosperms Monocots Amaryllidaceae Allium cepa CHI_Acep CHI Angiosperms Monocots Pandanaceae Freycinetia multiflora DGXS-9913 CHI Angiosperms Monocots Amaryllidaceae Rhodophiala pratensis JDTY-16234 CHI Angiosperms Monocots Asphodelaceae Aloe vera JVBR-9654 CHI Angiosperms Monocots Velloziaceae Talbotia elegans SILJ-81872 CHI Angiosperms Monocots Orchidaceae Vanilla planifolia THDM-62163 CHI Angiosperms Monocots Xanthorrhoeaceae Johnsonia pubescens WTDE-8108 CHI Angiosperms Monocots/Commelinids Cannaceae Canna x generalis CHI_Cgen CHI Angiosperms Monocots/Commelinids Arecaceae Elaeis oleifera CHI_Eole CHI Angiosperms Monocots/Commelinids Poaceae Oryza sativa CHI_Osat CHI Angiosperms Monocots/Commelinids Cyperaceae Mapania palustris XPAF-10253 CHI Angiosperms Monocots/Commelinids Arecaceae Typhonium blumei YMES-18290 CHI Ferns Eusporangiate Monilophytes Ophioglossaceae Sceptridium dissectum EEAQ-85524 CHI Ferns Eusporangiate Monilophytes Equisetaceae Equisetum hyemale JVSZ-131011 CHI Ferns Eusporangiate Monilophytes Marattiaceae Angiopteris evecta NHCM-64613 CHI Ferns Leptosporangiate Monilophytes Anemiaceae Anemia tomentosa CQPW-971 CHI Ferns Leptosporangiate Monilophytes Pteridaceae Argyrochosma nivea XDDT-6681 CHI Gymnosperms Conifers Cupressaceae Tetraclinis sp. CGDN-7669 CHI Gymnosperms Conifers Pinaceae Pinus jeffreyi CHI_Pjef CHI Gymnosperms Conifers Pinaceae Tsuga heterophylla GAMH-51855 CHI Gymnosperms Conifers Pinaceae Keteleeria evelyniana JUWL-1715 CHI Gymnosperms Conifers Podocarpaceae Halocarpus bidwillii OWFC-55127 CHI Gymnosperms Conifers Sciadopityaceae Sciadopitys verticillata YFZK-2361 CHI Gymnosperms Cycadales Cycadaceae Cycas micholitzii CHI_Cmic

1 [Table S1. continued]

Protein Family Major Clade Clade Family Species Sequence Identifier[a] CHI Liverworts Marchantiaceae Marchantia polymorpha CHI_Mpol CHI Liverworts Sphaerocarpaceae Sphaerocarpos texanus HERT-2469 CHI Liverworts Conocephalaceae Conocephalum conicum ILBQ-1133 CHI Liverworts Ricciaceae Ricciocarpos natans WJLO-34965 CHI Lycophytes Selaginellaceae Selaginella moellendorffi CHI_Smoe CHI Lycophytes Lycopodiaceae Huperzia lucidula GKAG-17869 CHI Lycophytes Selaginellaceae Selaginella wallacei JKAA-10268 CHI Lycophytes Selaginellaceae Selaginella wallacei JKAA-176172 CHI Lycophytes Selaginellaceae Selaginella apoda LGDQ-10652 CHI Lycophytes Lycopodiaceae Pseudolycopodiella caroliniana UPMJ-16326 CHI Lycophytes Lycopodiaceae Dendrolycopodium obscurum XNXF-68496 CHI Lycophytes Selaginellaceae Selaginella acanthonota ZYCD-4491 CHIL Angiosperms Basal Eudicots Papaveraceae Capnoides sempervirens AUGV-6826 CHIL Angiosperms Basalmost angiosperms Ceratophyllaceae Ceratophyllum demersum NPND-7861 CHIL Angiosperms Core Eudicots media TJES-13499 CHIL Angiosperms Core Eudicots/Asterids Apocynaceae Asclepias curassavica DSUV-897 CHIL Angiosperms Core Eudicots/Asterids Polemoniaceae Phlox sp. FNEN-2227 CHIL Angiosperms Core Eudicots/Asterids Rubiaceae Psychotria ipecacuanha JOPH-14582 CHIL Angiosperms Core Eudicots/Asterids Gentianaceae affine KPUM-99218 CHIL Angiosperms Core Eudicots/Asterids quadrifidus PCGJ-969 CHIL Angiosperms Core Eudicots/Rosids Brassicaceae Arabidopsis thaliana CHIL_Atha CHIL Angiosperms Core Eudicots/Rosids Bataceae Batis maritima DZTK-39065 CHIL Angiosperms Core Eudicots/Rosids Linaceae Linum hirsutum HNCF-62845 CHIL Angiosperms Core Eudicots/Rosids Coriariaceae Coriaria nepalensis NNGU-92541 CHIL Angiosperms Monocots Alliaceae Allium sativum GJPF-61802 CHIL Angiosperms Monocots Iridaceae Sisyrinchium angustifolium LTZF-80836 CHIL Angiosperms Monocots Orchidaceae Vanilla planifolia THDM-3485 CHIL Angiosperms Monocots/Commelinids Zingiberaceae Zingiber officinale BDJQ-88594 CHIL Angiosperms Monocots/Commelinids Poaceae Saccharinum officinarum CHIL_Soff CHIL Angiosperms Monocots/Commelinids Marantaceae Maranta leuconeura JNUB-28237 CHIL Ferns Eusporangiate Monilophytes Ophioglossaceae Sceptridium dissectum EEAQ-86819 CHIL Ferns Eusporangiate Monilophytes Marattiaceae Angiopteris evecta NHCM-7052 CHIL Ferns Eusporangiate Monilophytes Psilotaceae Psilotum nudum QVMR-15336 CHIL Ferns Leptosporangiate Monilophytes Anemiaceae Anemia tomentosa CQPW-13199 CHIL Ferns Leptosporangiate Monilophytes Cyatheaceae Cyathea (Alsophila) spinulosa GANB-69555 CHIL Ferns Leptosporangiate Monilophytes Hypodematiaceae Didymochlaena truncatula RFRB-35224 CHIL Ferns Leptosporangiate Monilophytes Pteridaceae Vittaria lineata SKYV-4913 CHIL Gymnosperms Conifers Araucariaceae Wollemia nobilis RSCE-7418 CHIL Gymnosperms Cycadales Cycadaceae Cycas micholitzii XZUY-5266 CHIL Gymnosperms Gnetales Welwitschiaceae Welwitschia mirabilis TOXE-12387 CHIL Liverworts Radulaceae Radula lindenbergiana BNCU-85310 CHIL Liverworts Marchantiaceae Marchantia polymorpha CHIL_Mpol CHIL Liverworts Frullaniaceae Frullania spp. CHJJ-137033 CHIL Liverworts Treubiaceae Treubia lacunosa FITN-89007 CHIL Liverworts Sphaerocarpaceae Sphaerocarpos texanus HERT-6655 CHIL Liverworts Conocephalaceae Conocephalum conicum ILBQ-42388 CHIL Liverworts Scapaniaceae Scapania nemorosa IRBN-159060 CHIL Lycophytes Selaginellaceae Selaginella moellendorffi CHIL_Smoe CHIL Lycophytes Lycopodiaceae Huperzia lucidula GKAG-95828 CHIL Lycophytes Lycopodiaceae Diphasiastrum digitatum WAFT-65805 CHIL Lycophytes Lycopodiaceae Dendrolycopodium obscurum XNXF-68324 CHIL Lycophytes Selaginellaceae Selaginella acanthonota ZYCD-3775 CHIL Mosses Pottiaceae Syntrichia ruralis CHIL_Srur CHIL Mosses Funariaceae Physcomitrella patens CHILb_Ppat CHIL Mosses Polytrichaceae Polytrichum commune SZYG-4674 FAP Angiosperms Core Eudicots Ranunculaceae Aquilegia formosa FAPb_Afor FAP Angiosperms Core Eudicots/Asterids Convolvulaceae Ipomoea nil FAPb_Inil FAP Angiosperms Core Eudicots/Rosids Brassicaceae Arabidopsis thaliana FAPb_Atha_3 FAP Angiosperms Core Eudicots/Rosids Fabaceae Glycine max FAPb_Gmax FAP Angiosperms Monocots/Commelinids Poaceae Oryza sativa FAPb_Osat

[a] Sequence identifier for the structure-based alignment. The four-letter-code is according to the gene nomenclature of the 1KP database1. Sequences obtained elsewhere are labeled according the protein family and the abbreviated species name. [b] The current four-letter-code for Allionia incarnata transcriptomes in the 1KP database is DVXD and EGOS. [c] The current four-letter-code for Mollugo pentaphylla transcriptomes in the 1KP database is HURS. [d] The current four-letter-code for Flaveria trinervia transcriptomes in the 1KP database is HRVY and RLCS. [e] The current four-letter-code for Punica granatum transcriptomes in the 1KP database is JROW, QEBC, YMUO and VUGF.

2 10 20 30 40 50 60 MsCHI/1-222 ------AtCHI/1-246 ------ancCHI/1-220 ------ancCHI*/1-220 ------AtCHIL/1-209 ------

ancCHIL/1-220 ------

AtFAPb/1-287 MDGIL AAVPSAVCVSLRISCRNLDN AESIYHFPGK SLN RVSV LQTGN YVSRKGN SLLKN RHCGEI

* ancCC/1-220 ------Signal peptide * 70 80 90 100 110 120 130 MsCHI/1-222 ------MAASITAITVE --NLEYPAVVT SPVTGK SY FLGGAGERGLT IEGNFIKFTAIGVY AtCHI/1-246 M SSSN ACASPSPFPAVTKLHVD- -SVT FVPSVKSPASSNPLFLGGAGVRGLDIQGK FV IFTV IGVY ancCHI/1-220 ------MAVTKLVVE --GVQ F PPT ITPPGSSKELT LVGAGVRGIQ IEGV EIKVTAIGIY ancCHI*/1-220 ------MAVTKVTVD --GIEF PPT ITPPGSSKSLT LLGAGVRGIQ IEGV EIKVTAIGIY AtCHIL/1-209 ------MGTEMVMV H- -EVPFPPQ IIT ---SKPLSLLGQG ITDIEIHFLQVKFTA IGVY ancCHIL/1-220 ------MAVTKVTVD --GIEF PPT ITPPSSSKSLT LLGHGITGME IETIQIKFTAIGVY

AtFAPb/1-287 SR VIVKSAASSVGNAEDYAEETAT SVKFKRSVTLPGCSSP LSLLGT GFREKKFA IIGVKVYAAGYY

ancCC/1-220 ------MAVTKVTVD --GIEF PPT ITPPGSSKSLT LLGAGVRGME IETIQIKVTAIGVY * N-terminal adaptor * 140 150 160 170 180 190 MsCHI/1-222 VYLEDIAVASL AAKWK GK SSEELLETLDFYRDI ISGP FEKLIRGSKIRELSGP EY SRKVMEN CVAH AtCHI/1-246 VYLEGNAVPSLSVKWK GKTTEELTESIPFFR EIVTGA FEKFIKVTMK LP LTGQQY SEKVTEN CVAI ancCHI/1-220 IYAEPEIIN SHLAKWKGKSA EELVEDDEFFQDL IQAPVEKLARVTMLKPLTGAQY SGKVGEN TKDA ancCHI*/1-220 IYAEPEVIASHLQKWK GK SASELVE DDGFFKDLVQAPVEK LARVTMLKPLTGAQYSGKVGENTKDA AtCHIL/1-209 VYLDPSDVKTHLDNWK GKTGKELAG DDDFFDA LASAEMEKVIR VVV IKEIKGAQYGVQ LEN TVRDR ancCHIL/1-220 VYAEPSEIASHLQKWK GK SASELVE DDGFFKDLVQAPVEK LVKITIIKGIKGSQY GGA LESSIRDR AtFAPb/1-287 YYVN ESILSGLS-AWT GR SADE IQRDSSLFVSIFQAQAEKSLQ IVLVRDVDGKTFWDALDEAISPR ancCC/1-220 VYAEPEVIASHLQKWK GK SASELVE DDGFFKDLVQAPVEK LVKITIIKGIKGSQY GGA L EESIRDR

200 210 220 230 240 250 260 MsCHI/1-222 LK SVGT YGDA EAEAMQ KFAEAFKPVNFPPGASVFYRQSPDG- ILGLSFSPDT SIPEKEA AL IEN KA AtCHI/1-246 WKQLGLYTDC EAKAVEKFLEIFKEET F PPGSSILFALSPTG- SLTVAFSKDDSIPETGI AV IEN KL ancCHI/1-220 LK ELGKYSEAEEEALQEFR EFFKTKSFPPGST IFFHLSQSG- TLEISFSTDGSIPEKAE AV IEN AA ancCHI*/1-220 LKALGKYSEAEEEALEEFREFFKTKSF PPGST IFFHLSSP S- TLQISFSTDGSLPEEAE AT IEN AN AtCHIL/1-209 L AEEDKYEEEEETELEK VVGFFQSKYFKANSV ITYHFSAKDG ICEIGFET EGK- - EEEK LKVEN AN ancCHIL/1-220 L AADDKY EEEEEEALEK LVEFFQTKN LPKGSV IFYHWP SP S- TLQISV STDGKLP EEEK FTVEN AN AtFAPb/1-287 I KS---PSSED TTALSTFRGIFQNRPLNKGSV ILLTWIN TS-NMLVSVSSG-GLPTNVDAT IESG

ancCC/1-220 LAALDKYSEAEEEALEEFREFFQTKSLPKGSV IFFHWP SP S- TLQISV STDGSLPEEAE ATVEN AN *270 280 290 300 310 MsCHI/1-222 KAVSSAVLETMI-GEHAV SPDLKRCLAAR LPALLNEGAFKIGN------AtCHI/1-246 KLLAEAVLESII-GKNGV SPGT RLSVAERLSQ LMMK N KDEK EV SDHSVEEK LAKEN ancCHI/1-220 AAF AAALLGTML-GKNGV SP STKASIAEGISSLLMKNKDEK EV------ancCHI*/1-220 ANF AAALLGTML-GKNGV SP STKASVAEGISALLMK N KDEK EV------AtCHIL/1-209 ANVVGMMQ RWYLSGSRGV SP ST IVSIADSISAVLT ------ancCHIL/1-220 ANV AAALLDLYL-GENSISP ST LASVAEGIAALLMK N KDEK EV------AtFAPb/1-287 N VT SALFDV FF-GDSPVSPTLKSSVANQLAMTLV ------ancCC/1-220 ANV AAALLDV FL-GENSV SP STKASVAEGISALLMK N KDEK EV------C-terminal adaptor

Figure S1. Structure-based alignment of representative ancestral and extant sequences used in this work. The five positions previously reported as important for CHI catalysis2,3 are labelled with a red asterisk. N- and C-terminal sequences were predicted with large ambiguity and hence, adaptor sequence (black boxes) were derived from AtCHI (dashed boxes). The treatment of ambiguously inferred loop regions is described in Fig. S3 and Table S2. Note that FAPs contain an N- terminal chloroplast-transit signal peptide. At: Arabidopsis thaliana, Ms: Medicago sativa.

3

Liverworts

Lycophytes A CHI_Seedless

Ferns

Gymnosperms

B CHI_Seed plants Angiosperms

Mosses

Liverworts CHIL_Seedless plants Lycophytes* C

Ferns

Gymnosperms

Angiosperms D CHIL_Seed plants

Fatty-acid-binding Proteins (FAPs) E

0.20

CHI Ricciocarpos natans CHI Conocephalum conicum 0.47 CHI_Liverworts CHI Sphaerocarpos texanus CHI Marchantia polymorpha CHIb Selaginella wallacei CHIa Selaginella wallacei A CHI Selaginella acanthonota CHI_Lycophytes CHI Selaginella apoda CHI Selaginella moellendorffi CHI_Seedless plants CHI Pseudolycopodiella caroliniana CHI Dendrolycopodium obscurum CHI_Lycophytes CHI Huperzia lucidula CHI Equisetum hyemale CHI Sceptridium dissectum CHI Angiopteris evecta CHI_Ferns CHI Argyrochosma nivea CHI Anemia tomentosa

0.20

CHI_Seed plants

4 B CHI_Liverworts CHI_Lycophytes CHI_Seedless plants CHI_Lycophytes CHI_Ferns CHI Halocarpus bidwillii

0.20 CHI Pinus jeffreyi 0.45 CHI Keteleeria evelyniana CHI Tsuga heterophylla CHI_Gymnosperms CHI Cycas micholitzii CHI Sciadopitys verticillata CHI Tetraclinis sp. CHI Amborella trichopoda 0.72 CHI Austrobaileya scandens CHI Drimys winteri 0.57 CHI Aristolochia elegans 0.30 CHI Typhonium blumei 0.70 CHI Sarcandra glabra 0.84 CHI Myristica fragrans CHI Ipomoea batatas CHI Chionanthus retusus 0.38 0.81 CHI Nicotiana tobacum CHI Krameria lanceolata

0.35 CHI Physena madagascariensis CHI Hilleria latifolia

0.51 CHI Mollugo nudicaulis 0.42 0.45 CHIb Allionia incarnata CHIa Allionia incarnata 0.71 CHI Mollugo pentaphylla CHI Basella alba CHI Punica granatum CHI Piper auritum 0.37 0.34 CHI Kalanchoe crenato-diagremontiana 0.84 0.38 CHI Gentriana triflora CHI Stylidium adnatum 0.49 CHI Scaevola mossambicensis 0.78 CHI Flaveria trinervia CHI_ CHI Lactuca graminifolia Angiosperms

0.73 CHIb Matricaria matricarioides 0.55 CHIa Matricaria matricarioides CHI Tanacetum parthenium CHI_Seed plants CHI Syzygium micranthum 0.42 CHI Linum hirsutum 0.51 CHI Oenothera serrulata

0.40 CHI Astragalus membranaceus CHI Medicago sativa CHI Quillaja saponaria 0.21 CHI Elaeagnus umbellata CHI Passiflora caerulea 0.50 CHI Coriaria nepalensis 0.33 CHI Euphorbia mesembryanthemifolia 0.70 CHI Ceratophyllum demersum 0.44 0.40 CHI Kaliphora madagascariensis 0.25 CHI Rhamnus caroliniana 0.56 CHIb Rhamnus japonica 0.15 CHIa Rhamnus japonica CHI Batis maritima 0.09 CHI Rhus radicans 0.37 CHI Arabidopsis thaliana CHI Eschscholzia californica CHI Grevillea robusta CHI Aquilegia formosa CHI Daenikera sp. CHI Mapania palustris 0.83 CHI Aloe vera CHI Oryza sativa 0.38 CHI Elaeis oleifera 0.56 CHI Vanilla planifolia CHI Talbotia elegans 0.81 CHI Freycinetia multiflora 0.56 CHI Allium cepa CHI Nymphaea sp. CHI Canna x generalis 0.84 5 0.78 CHI Johnsonia pubescens

0.82 CHI Rhodophiala pratensis CHI Hemerocallis spp. CHIL Polytrichum commune CHIL Physcomitrella patens CHIL_Mosses CHIL Syntrichia ruralis CHIL Radula lindenbergiana 0.38 CHIL Scapania nemorosa CHIL Frullania spp. CHIL_Liverworts CHIL Marchantia polymorpha 0.77 CHIL Conocephalum conicum C CHIL Sphaerocarpos texanus CHIL Selaginella moellendorffi

CHIL Selaginella acanthonota CHI_Seedless CHIL Treubia lacunosa (Liverwort)* plants CHIL_Lycophytes* CHIL Huperzia lucidula CHIL Diphasiastrum digitatum 0.83 CHIL Dendrolycopodium obscurum CHIL Psilotum nudum 0.10 CHIL Sceptridium dissectum CHIL Angiopteris evecta CHIL Vittaria lineata CHIL_Ferns CHIL Didymochlaena truncatula CHIL Cyathea Alsophila spinulosa CHIL Anemia tomentosa

CHIL_Seed plants

CHIL_Mosses CHIL_Liverworts CHI_Seedless plants D CHIL_Lycophytes CHIL_Ferns CHIL Cycas micholitzii CHIL Welwitschia mirabilis CHI_Gymnosperms 0.39 CHIL Wollemia nobilis CHIL Sisyrinchium angustifolium 0.42 0.10 CHIL Allium sativum CHIL Vanilla planifolia CHI_ 0.63 Angiosperms CHIL Saccharinum officinarum CHIL Maranta leuconeura CHIL Zingiber officinale CHIL Ceratophyllum demersum CHIL Coriaria nepalensis 0.21 CHIL Phlox sp. CHIL_Seed plants 0.44 0.12 CHIL Spergularia media 0.08 CHIL Linum hirsutum 0.45 CHIL Anisacanthus quadrifidus 0.28 0.52 0.78 CHIL 0.33 CHIL Psychotria ipecacuanha CHIL Asclepias curassavica CHIL Batis maritima 0.46 CHIL Arabidopsis thaliana CHIL Capnoides sempervirens

6 CHIL_Mosses CHIL_Liverworts CHI_Seedless plants D CHIL_Lycophytes CHIL_Ferns CHIL Cycas micholitzii CHIL Welwitschia mirabilis CHI_Gymnosperms 0.39 CHIL Wollemia nobilis CHIL Sisyrinchium angustifolium 0.42 CHIL Allium sativum CHIL Vanilla planifolia CHI_ 0.63 Angiosperms CHIL Saccharinum officinarum CHIL Maranta leuconeura CHIL Zingiber officinale 0.10 CHIL Ceratophyllum demersum CHIL Coriaria nepalensis 0.21 CHIL Phlox sp. CHIL_Seed plants 0.44 0.12 CHIL Spergularia media 0.08 CHIL Linum hirsutum 0.45 CHIL Anisacanthus quadrifidus 0.28 0.52 0.78 CHIL Exacum affine 0.33 CHIL Psychotria ipecacuanha CHIL Asclepias curassavica CHIL Batis maritima 0.46 CHIL Arabidopsis thaliana CHIL Capnoides sempervirens

FAP Aquilegia formosa 0.64 FAP Arabidopsis thaliana

0.73 FAP Oryza sativa

FAP Ipomoea nil E FAP Glycine max FAP Ppat

0.10

Figure S2. Phylogenetic tree used for the ancestral reconstruction. The tree was calculated with MrBayes4 and shows a consensus from a 1 million generation run. Posterior probabilities <0.85 are shown. Genes are labeled according to the protein family (CHI, CHIL and FAP) and the plant species. Major plant clades are also labeled. The structural alignment used for tree generation can be found in Supplementary Data Set 1 (trimmed only at the termini). See also Figure S1. More information on each gene, such as 1KP accession numbers, can be found in Table S1. As expected, all genes group according to the major plant clades with high posterior probabilities with the exception of CHIL from Treubia lacunosa, a liverwort, which is located within the lycophyte clade (indicated by the asterisk).

7 Structure-based alignment

Consensus: 3 amino acids AtCHI

Active site

Trimmed alignment including ancestral sequences

ancCHIL (N8) ancCC (N7) ancCC/FAPb ancestor (N1) reconstruction including indels: DDDSG reduction to 3 amino acids: DD(S/G) chose G due to location in helix kink: DDG

ancCHI (N50) reconstruction including indels: DDDEE reduction to 3 amino acids: DDE

Figure S3. An advanced guide to ancestral sequence inference with no headaches (or fewer, at least): representative example of the decision-making process to determine the amino acid sequence in ambiguously inferred positions. Due to the high level of divergence within and between the three protein families, sequence-based alignments give ambiguous and inconsistent results, i.e. different alignment programs place gaps differently. We therefore performed a structure-based alignment5, which preferentially places gaps in loop regions and performed much better. The next challenge was that by default, ancestral inference places an amino acid at each position in the alignment, even if it consists mostly of gaps. In other words, the underlying model assumes that the ancestor was of maximum length and that every gap is a deletion, while in reality the opposite is often

8 true. Therefore, we inspected all ambiguously inferred positions manually and corrected them to the best of our knowledge. Our approach was that even if our decision-making was flawed, this should have little or no effect on ancestral protein structure and function because indels typically occur in loop regions of high divergence. In total, eleven loop regions were manually revised as illustrated in this figure. First, the structure-based alignment (trimmed only at the termini) was used to determine the consensus number of amino acids - three in the example at hand. Second, the most probable ancestral sequences were added to the trimmed alignment (that was used for generation of the phylogenetic tree and ancestral inference) and reduced to the consensus number of amino acids. In cases where a particular amino acid could not be decided on (in the example, both S and G in the third position of seem plausible), additional information such as phylogeny, structural information, and chemical intuition were used to make the decision. In the above example, the structural context in a helix kink led us to choose G due to its frequent occurrence in turns. Additionally, N- and C-terminal adaptor sequences were added to all ancestors as shown in Figure S1.

Table S2. Posterior probabilities of the initial ancestral reconstruction including indels and final amino acid sequences of the ancestral proteins after manual revision of ambiguously inferred positions in gap regions (Figure S3) and at the termini (Figure S1). Black/bold –N- and C-terminal adaptor sequences from A. thaliana CHI. Blue/bold – The set of 26 mutations identified as (nearly) neutral for CHI activity that was removed from ancCHI to yield ancCHI*. Green/bold – The set of 39 amino acids differing between ancCC and ancCHI*, on which the phylogenetic libraries were based. Underlined – Residues previously identified as important for catalysis3. Pink/bold – Mutations fixated after random mutagenesis in epR4.

9 [Table S2. continued] I I I L L T F T T T L V T V V E P P P P S S K S A V V E E T A K D R G G G G G M epR4 I I I I L L T F T T T L V T V V E P P P P S S K S A V E E T A K D R G G G G G M ancR7 I I I I L L T F T T T L V T V V E P P P P S S K S A V E A K D R G G G G G Q G M ancCHI* I I I L L L F T T T V T V V E V P P P P S S K E V A V E A K R G Q G G G G Q G M ancCHI Final ancestral sequence Final I I I I L L L T F T T T T V T V V E P P P P S S S K S T E E A K D H G G G G M M ancCHIL I I I L L T F T T T L V T V V E P P P P S S K S A V E E T A K D R G G G G G M M ancCC 1 2 3 4 5 6 7 8 9 11 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 (final) Position 0.11 0.11 0.10 0.09 0.01 0.05 0.00 0.00 0.01 0.16 0.01 0.03 0.04 0.05 0.10 0.01 0.08 0.03 0.01 0.00 0.02 0.02 0.04 0.00 0.04 0.01 0.13 0.01 0.23 0.00 0.01 0.00 0.01 0.00 0.00 0.08 0.04 0.06 0.13 0.02 p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(T) p(S) p(E) p(S) p(E) p(Y) p(A) p(S) p(S) p(A) p(A) p(S) p(S) p(S) p(S) p(A) p(S) p(S) p(A) p(V) p(S) p(S) p(D) p(R) p(Q) p(Q) p(G) p(Q) p(M) p(M) p(M) 0.11 0.11 0.11 0.11 0.10 0.07 0.06 0.17 0.07 0.00 0.40 0.30 0.01 0.09 0.04 0.27 0.20 0.06 0.41 0.03 0.04 0.00 0.03 0.03 0.07 0.02 0.01 0.27 0.01 0.26 0.00 0.03 0.00 0.13 0.03 0.01 0.33 0.40 0.20 0.02 ancCHI p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(T) p(T) p(V) p(A) p(A) p(K) p(S) p(A) p(A) p(S) p(S) p(V) p(V) p(V) p(A) p(A) p(A) p(A) p(S) p(V) p(A) p(S) p(A) p(K) p(V) p(E) p(A) p(D) p(D) p(R) p(D) p(N) 0.75 0.63 0.91 0.65 0.26 0.92 1.00 0.59 0.46 0.98 0.87 0.76 0.66 0.37 0.91 0.80 0.43 0.89 0.94 0.99 0.94 0.94 0.80 0.96 0.75 0.98 0.53 0.98 0.33 1.00 0.92 1.00 0.86 0.97 0.98 0.49 0.49 0.82 0.24 0.95 p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(F) p(T) p(T) p(T) p(T) p(V) p(V) p(P) p(A) p(E) p(V) p(P) p(P) p(P) p(P) p(P) p(S) p(S) p(K) p(E) p(V) p(A) p(V) p(E) p(R) p(G) p(Q) p(G) p(G) p(G) p(G) p(Q) p(G) 0.11 0.02 0.21 0.00 0.04 0.13 0.00 0.00 0.05 0.00 0.04 0.05 0.09 0.02 0.03 0.02 0.00 0.06 0.00 0.00 0.01 0.10 0.00 0.05 0.00 0.05 0.00 0.04 0.00 0.01 0.00 0.01 0.00 0.01 0.01 p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(T) p(T) p(T) p(F) p(V) p(S) p(E) p(A) p(S) p(S) p(A) p(K) p(A) p(A) p(A) p(A) p(V) p(S) p(E) p(N) p(R) p(R) p(D) p(M) 0.05 0.23 0.03 0.05 0.22 0.15 0.00 0.09 0.06 0.00 0.08 0.06 0.41 0.15 0.10 0.08 0.05 0.00 0.00 0.43 0.06 0.00 0.02 0.24 0.00 0.39 0.00 0.24 0.03 0.22 0.01 0.38 0.01 0.03 0.01 0.01 0.02 ancCHIL p(I) p(I) p(I) p(I) p(T) p(T) p(T) p(T) p(A) p(A) p(E) p(A) p(V) p(S) p(Y) p(S) p(A) p(S) p(S) p(V) p(V) p(S) p(A) p(S) p(V) p(V) p(S) p(V) p(S) p(N) p(D) p(G) p(Q) p(Q) p(M) p(M) p(M) 0.92 0.37 0.96 0.68 0.31 0.85 1.00 0.91 0.76 1.00 0.87 0.59 0.47 0.24 0.86 0.88 0.93 0.99 1.00 1.00 0.47 0.93 0.99 0.94 0.49 1.00 0.48 0.99 0.70 1.00 0.96 1.00 0.72 0.99 0.60 0.99 0.96 0.98 0.96 0.95 p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(T) p(F) p(T) p(T) p(T) p(T) p(T) p(T) p(V) p(V) p(P) p(E) p(P) p(P) p(A) p(P) p(P) p(S) p(S) p(S) p(K) p(P) p(E) p(E) p(D) p(D) p(H) p(G) p(G) p(G) p(G) p(M) 0.11 0.11 0.08 0.02 0.04 0.02 0.00 0.02 0.21 0.00 0.06 0.05 0.13 0.10 0.02 0.05 0.15 0.01 0.01 0.00 0.04 0.02 0.03 0.03 0.16 0.01 0.07 0.01 0.09 0.00 0.10 0.00 0.03 0.06 0.07 0.08 0.02 0.05 0.07 0.12 p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(T) p(T) p(T) p(A) p(S) p(S) p(A) p(A) p(S) p(P) p(A) p(A) p(V) p(A) p(V) p(S) p(A) p(A) p(A) p(V) p(S) p(S) p(E) p(V) p(K) p(A) p(R) p(H) p(D) p(Q) p(M) p(M) 0.11 0.11 0.15 0.26 0.06 0.04 0.19 0.44 0.01 0.27 0.34 0.01 0.08 0.06 0.32 0.17 0.14 0.15 0.03 0.03 0.00 0.10 0.05 0.03 0.03 0.27 0.01 0.28 0.02 0.21 0.00 0.10 0.00 0.37 0.08 0.28 0.40 0.06 0.20 0.14 ancCC p(I) p(I) p(I) p(I) p(I) p(I) p(L) p(T) p(T) p(T) p(F) p(V) p(A) p(E) p(S) p(V) p(K) p(Y) p(S) p(A) p(S) p(S) p(V) p(S) p(A) p(A) p(S) p(P) p(V) p(A) p(S) p(A) p(K) p(V) p(A) p(S) p(D) p(N) p(D) p(Q) Posterior probabilities p(amino acid) in ancestral reconstruction including indels ancestral including in reconstruction acid) p(amino Posteriorprobabilities 0.76 0.39 0.89 0.76 0.33 0.49 0.98 0.71 0.35 0.99 0.85 0.64 0.38 0.28 0.84 0.79 0.68 0.95 0.95 1.00 0.83 0.91 0.91 0.90 0.29 0.98 0.57 0.96 0.67 1.00 0.64 1.00 0.58 0.82 0.62 0.67 0.54 0.87 0.52 0.52 p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(T) p(F) p(T) p(T) p(T) p(T) p(T) p(V) p(V) p(P) p(A) p(E) p(P) p(P) p(A) p(P) p(P) p(S) p(S) p(K) p(S) p(A) p(V) p(E) p(E) p(D) p(R) p(G) p(G) p(G) p(G) p(G) p(M) 1 2 3 4 5 6 7 8 9 11 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Position (incl. indels) (incl.

10 [Table S2. continued] I I I I L L F K T A V Y A E P E V S K K K S A S E V E A H D D Q G Q G G M W epR4 I I I I L L F K V T A V Y A E P E V S K K K S A S E V E A H D D Q G Q G G W ancR7 I I I I L L F V E K V T A Y A E P E V S K K K S A S E V E A H D D G Q G G W ancCHI* I I I I I L L F V E K V T A Y A E P E S K K K S A E E V E E A N H D D G G W ancCHI Final ancestral sequence Final I I I I L L F F K T A V Y A E P S E A S K K K S A S E V E H D D Q G Q G G W ancCHIL I I I I L L F K V T A V Y A E P E V S K K K S A S E V E A H D D Q G Q G G W ancCC 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 (final) Position 0.11 0.09 0.04 0.00 0.03 0.00 0.10 0.05 0.00 0.04 0.00 0.05 0.00 0.03 0.01 0.20 0.24 0.07 0.08 0.02 0.06 0.06 0.05 0.00 0.01 0.00 0.02 0.03 0.01 0.02 0.00 0.01 0.12 0.05 0.02 0.06 0.00 0.21 0.17 0.00 p(I) p(I) p(L) p(L) p(L) p(L) p(F) p(K) p(V) p(A) p(V) p(V) p(S) p(A) p(V) p(V) p(S) p(Y) p(V) p(S) p(A) p(A) p(E) p(H) p(N) p(N) p(N) p(D) p(D) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(G) p(Q) p(Q) p(M) 0.11 0.23 0.24 0.12 0.02 0.02 0.21 0.41 0.00 0.17 0.01 0.21 0.12 0.05 0.06 0.22 0.34 0.10 0.22 0.22 0.27 0.39 0.06 0.01 0.04 0.00 0.06 0.06 0.06 0.04 0.02 0.01 0.14 0.17 0.05 0.12 0.35 0.21 0.20 0.01 ancCHI p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(F) p(T) p(S) p(A) p(V) p(V) p(E) p(K) p(S) p(A) p(S) p(Y) p(A) p(S) p(K) p(E) p(E) p(S) p(Y) p(R) p(C) p(D) p(D) p(R) p(R) p(R) p(D) p(D) p(N) p(Q) p(Q) 0.53 0.47 0.76 0.97 0.84 0.97 0.57 0.52 1.00 0.76 0.99 0.68 0.87 0.89 0.92 0.56 0.40 0.67 0.52 0.69 0.53 0.43 0.87 0.99 0.94 1.00 0.90 0.87 0.92 0.89 0.98 0.98 0.52 0.66 0.93 0.67 0.65 0.38 0.48 0.99 p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(T) p(F) p(V) p(E) p(K) p(V) p(A) p(Y) p(A) p(E) p(P) p(E) p(A) p(K) p(K) p(K) p(S) p(A) p(E) p(E) p(V) p(E) p(E) p(E) p(N) p(H) p(H) p(D) p(D) p(D) p(G) p(G) p(W) 0.07 0.03 0.00 0.00 0.00 0.00 0.01 0.02 0.07 0.00 0.00 0.08 0.10 0.01 0.01 0.04 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.02 0.04 0.08 0.00 0.00 0.01 0.00 0.00 p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(T) p(T) p(T) p(T) p(Y) p(A) p(E) p(A) p(A) p(K) p(S) p(H) p(D) p(R) p(N) p(N) p(N) p(N) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(M) 0.11 0.31 0.38 0.25 0.02 0.00 0.00 0.02 0.02 0.15 0.00 0.05 0.02 0.31 0.19 0.03 0.13 0.06 0.00 0.01 0.00 0.09 0.00 0.00 0.02 0.00 0.03 0.00 0.00 0.04 0.25 0.01 0.07 0.01 0.01 0.00 ancCHIL p(I) p(I) p(I) p(L) p(F) p(T) p(V) p(E) p(V) p(S) p(S) p(V) p(V) p(E) p(V) p(V) p(S) p(E) p(Y) p(K) p(A) p(S) p(S) p(A) p(E) p(E) p(E) p(A) p(Y) p(R) p(D) p(R) p(R) p(D) p(D) p(M) 0.62 0.57 0.75 0.98 0.99 1.00 0.96 0.95 1.00 0.76 1.00 0.94 0.98 1.00 0.57 0.45 0.96 0.85 0.82 1.00 0.99 0.99 0.82 1.00 1.00 1.00 0.99 0.98 0.99 0.96 1.00 1.00 0.93 0.67 0.69 0.99 0.92 0.97 0.99 0.99 p(I) p(I) p(I) p(I) p(L) p(L) p(F) p(T) p(F) p(K) p(A) p(V) p(Y) p(A) p(E) p(P) p(S) p(E) p(A) p(S) p(K) p(K) p(K) p(P) p(A) p(S) p(E) p(V) p(E) p(S) p(H) p(D) p(D) p(D) p(Q) p(G) p(Q) p(G) p(G) p(W) 0.11 0.11 0.12 0.08 0.03 0.00 0.12 0.01 0.03 0.10 0.00 0.07 0.00 0.02 0.02 0.00 0.19 0.17 0.07 0.04 0.02 0.05 0.10 0.03 0.00 0.00 0.01 0.04 0.01 0.00 0.01 0.04 0.09 0.06 0.02 0.01 0.03 0.08 0.01 p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(F) p(T) p(T) p(T) p(K) p(A) p(S) p(A) p(Y) p(V) p(S) p(S) p(A) p(E) p(A) p(Y) p(C) p(H) p(D) p(N) p(N) p(N) p(N) p(D) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(M) 0.11 0.33 0.39 0.28 0.03 0.15 0.03 0.03 0.15 0.00 0.32 0.01 0.07 0.01 0.34 0.34 0.23 0.21 0.05 0.20 0.23 0.15 0.09 0.00 0.03 0.00 0.05 0.20 0.02 0.29 0.04 0.02 0.10 0.19 0.06 0.05 0.26 0.10 0.15 0.02 ancCC p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(F) p(V) p(E) p(V) p(S) p(S) p(V) p(A) p(V) p(E) p(S) p(V) p(S) p(A) p(A) p(Y) p(A) p(P) p(S) p(A) p(K) p(E) p(E) p(E) p(A) p(E) p(R) p(D) p(N) p(R) p(R) p(R) p(D) Posterior probabilities p(amino acid) in ancestral reconstruction including indels ancestral including in reconstruction acid) p(amino Posteriorprobabilities 0.52 0.41 0.68 0.96 0.64 0.92 0.90 0.74 1.00 0.59 0.99 0.84 0.90 0.99 0.40 0.42 0.57 0.62 0.73 0.71 0.59 0.55 0.81 1.00 0.96 1.00 0.94 0.71 0.96 0.49 0.96 0.97 0.80 0.61 0.84 0.92 0.73 0.75 0.57 0.96 p(I) p(I) p(I) p(I) p(L) p(L) p(T) p(F) p(K) p(V) p(A) p(V) p(Y) p(A) p(E) p(P) p(E) p(V) p(A) p(S) p(K) p(K) p(K) p(S) p(A) p(S) p(E) p(V) p(E) p(S) p(H) p(D) p(D) p(D) p(Q) p(G) p(Q) p(G) p(G) p(W) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 Position (incl. indels) (incl.

11 [Table S2. continued] I I I I L L L F T F L V A P V E K V K S Y E E S A A K D K K R D R Q G G Q G G epR4 I I I L L L F T L V A P V E K V K P T S Y V E E A K D K N R D R K Q G Q G G M ancR7 L L L F T L L V A P V E K V K P T Y S V E T A A A K D R K N K D K Q G Q G G M ancCHI* I L L L F T L L A P V E K V K P T Y S V E T A A A D R K N K D K Q Q G Q G G M ancCHI Final ancestral sequence Final I I I I I L L L L F T K V A P V E K V K K K S Y A E S S A D R D R Q G G Q G G ancCHIL I I I I I L L L F T L V A P V E K V K S Y E E S A A K D K K R D R Q G G Q G G ancCC 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 111 110 112 113 114 115 116 117 100 101 102 103 104 105 106 107 108 109 (final) Position 0.11 0.11 0.01 0.14 0.03 0.13 0.12 0.00 0.05 0.00 0.00 0.02 0.01 0.01 0.01 0.02 0.03 0.02 0.02 0.00 0.02 0.02 0.00 0.13 0.07 0.00 0.15 0.01 0.10 0.14 0.01 0.04 0.06 0.02 0.06 0.18 0.17 0.02 0.05 p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(F) p(T) p(F) p(Y) p(A) p(S) p(S) p(A) p(V) p(S) p(V) p(K) p(S) p(S) p(E) p(E) p(S) p(E) p(E) p(H) p(D) p(H) p(N) p(H) p(R) p(Q) p(Q) p(Q) p(Q) p(G) p(Q) p(Q) 0.11 0.03 0.18 0.26 0.40 0.41 0.27 0.01 0.12 0.09 0.02 0.05 0.06 0.45 0.16 0.03 0.10 0.08 0.00 0.08 0.05 0.00 0.32 0.35 0.01 0.32 0.12 0.03 0.22 0.15 0.25 0.07 0.08 0.03 0.13 0.19 0.22 0.02 0.38 ancCHI p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(F) p(T) p(F) p(S) p(E) p(V) p(E) p(S) p(V) p(K) p(S) p(A) p(S) p(A) p(E) p(A) p(S) p(A) p(V) p(S) p(S) p(E) p(D) p(R) p(R) p(D) p(R) p(D) p(R) p(Q) p(M) 0.96 0.24 0.66 0.42 0.43 0.42 0.97 1.00 0.72 0.90 0.98 0.89 0.90 0.54 0.83 0.93 0.85 0.88 0.85 0.99 0.89 0.89 1.00 0.40 0.55 0.98 0.50 0.63 0.92 0.53 0.19 0.72 0.84 0.58 0.91 0.76 0.39 0.26 0.93 0.39 p(I) p(L) p(L) p(L) p(L) p(L) p(F) p(T) p(T) p(T) p(A) p(P) p(V) p(E) p(K) p(A) p(V) p(K) p(P) p(A) p(Y) p(S) p(K) p(V) p(E) p(K) p(A) p(K) p(D) p(R) p(N) p(C) p(D) p(Q) p(Q) p(G) p(Q) p(G) p(G) p(M) 0.11 0.00 0.03 0.02 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.01 0.01 0.00 0.01 0.01 0.00 0.05 0.03 0.00 0.03 0.00 0.01 0.00 0.00 0.07 0.17 0.00 0.00 0.00 0.00 0.02 p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(T) p(T) p(T) p(S) p(K) p(S) p(A) p(A) p(E) p(S) p(A) p(A) p(S) p(H) p(D) p(N) p(D) p(N) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(M) p(M) p(M) 0.03 0.05 0.45 0.01 0.33 0.07 0.02 0.02 0.00 0.00 0.05 0.08 0.44 0.42 0.31 0.02 0.06 0.01 0.01 0.10 0.01 0.25 0.04 0.00 0.00 0.16 0.37 0.00 0.33 0.01 0.01 0.10 0.33 0.01 0.00 0.00 0.00 0.10 ancCHIL p(I) p(I) p(I) p(I) p(I) p(F) p(T) p(F) p(T) p(F) p(T) p(Y) p(E) p(E) p(E) p(V) p(V) p(V) p(V) p(E) p(V) p(P) p(A) p(S) p(S) p(A) p(A) p(V) p(K) p(E) p(K) p(V) p(D) p(R) p(R) p(R) p(R) p(Q) 0.96 0.90 0.50 0.99 0.66 0.90 0.96 1.00 0.98 1.00 1.00 0.94 0.91 0.56 0.58 0.42 0.96 0.93 0.99 0.97 0.89 0.99 1.00 0.69 0.89 1.00 1.00 0.80 0.46 0.99 0.65 0.98 0.99 0.37 0.48 0.99 0.99 1.00 1.00 0.85 p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(F) p(T) p(K) p(V) p(A) p(P) p(V) p(E) p(K) p(V) p(K) p(K) p(K) p(S) p(Y) p(A) p(E) p(S) p(S) p(A) p(D) p(C) p(R) p(D) p(R) p(Q) p(G) p(G) p(Q) p(G) p(G) 0.11 0.00 0.09 0.02 0.02 0.07 0.02 0.01 0.02 0.00 0.00 0.03 0.09 0.04 0.01 0.08 0.26 0.17 0.01 0.13 0.14 0.10 0.00 0.25 0.04 0.00 0.06 0.09 0.04 0.19 0.09 0.07 0.07 0.17 0.03 0.10 0.01 0.01 0.12 p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(L) p(L) p(F) p(T) p(T) p(T) p(T) p(T) p(E) p(S) p(K) p(A) p(P) p(S) p(A) p(V) p(S) p(A) p(S) p(E) p(N) p(H) p(D) p(D) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(M) p(M) 0.11 0.02 0.31 0.45 0.13 0.33 0.09 0.02 0.10 0.03 0.02 0.05 0.45 0.47 0.23 0.30 0.18 0.14 0.23 0.21 0.23 0.00 0.33 0.13 0.05 0.15 0.08 0.25 0.06 0.22 0.19 0.09 0.09 0.30 0.32 0.10 0.05 0.02 0.25 ancCC p(I) p(I) p(I) p(I) p(I) p(L) p(F) p(T) p(F) p(Y) p(E) p(E) p(S) p(A) p(V) p(V) p(V) p(V) p(A) p(A) p(E) p(S) p(S) p(S) p(A) p(V) p(K) p(S) p(K) p(K) p(D) p(R) p(R) p(R) p(D) p(D) p(N) p(Q) p(Q) Posterior probabilities p(amino acid) in ancestral reconstruction including indels ancestral including in reconstruction acid) p(amino Posteriorprobabilities 0.97 0.44 0.50 0.81 0.57 0.85 0.96 1.00 0.86 0.97 0.97 0.87 0.68 0.49 0.52 0.56 0.39 0.62 0.85 0.44 0.64 0.41 1.00 0.35 0.72 0.95 0.67 0.80 0.52 0.87 0.48 0.65 0.76 0.43 0.44 0.59 0.62 0.92 0.95 0.33 p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(F) p(T) p(K) p(V) p(A) p(P) p(V) p(E) p(K) p(V) p(K) p(K) p(K) p(S) p(Y) p(A) p(E) p(E) p(S) p(A) p(D) p(C) p(R) p(D) p(R) p(Q) p(G) p(G) p(Q) p(G) p(G) 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 111 110 112 113 114 115 116 117 118 119 100 101 102 103 104 105 106 107 108 109 120 Position (incl. indels) (incl.

12 [Table S2. continued] I L L L F F T L F F K Y S E A E E E S E E E K S P S V P A D R K H Q G W epR4 I L L F F F T F F F K Y S E A E E E A E E E K S P S V S A D R K K H G W ancR7 I L L F F F T F F F L K Y S E A E E E A E E E K S P P S T S A R K H G G ancCHI* I L L F F F T F F F L E K Y S E A E E E A E E K S P P S T S R K H G Q G ancCHI Final ancestral sequence Final I L L L F F T F A K Y E E E E E E A E K V E K P K S V Y P D D N H Q G W ancCHIL I L L F F F T L F F K Y S E A E E E A E E E K S P S V P A D R K H Q G W ancCC 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 (final) Position 0.11 0.04 0.06 0.07 0.09 0.04 0.05 0.17 0.00 0.02 0.00 0.03 0.04 0.21 0.00 0.06 0.13 0.03 0.01 0.06 0.00 0.07 0.00 0.03 0.10 0.08 0.08 0.07 0.00 0.00 0.00 0.04 0.19 0.02 0.01 0.08 0.00 0.09 0.05 0.07 p(I) p(L) p(L) p(L) p(L) p(L) p(F) p(T) p(T) p(F) p(T) p(S) p(A) p(V) p(K) p(K) p(K) p(E) p(A) p(S) p(S) p(A) p(Y) p(V) p(A) p(Y) p(D) p(N) p(D) p(R) p(R) p(R) p(C) p(H) p(G) p(Q) p(Q) p(Q) p(Q) p(Q) 0.11 0.11 0.11 0.07 0.06 0.33 0.14 0.10 0.38 0.18 0.24 0.02 0.04 0.08 0.22 0.01 0.18 0.36 0.05 0.09 0.05 0.08 0.01 0.04 0.22 0.29 0.24 0.22 0.00 0.00 0.00 0.04 0.19 0.10 0.25 0.10 0.09 0.17 0.39 0.14 ancCHI p(I) p(I) p(L) p(L) p(L) p(L) p(F) p(T) p(A) p(A) p(E) p(Y) p(A) p(E) p(K) p(S) p(E) p(Y) p(V) p(Y) p(Y) p(K) p(A) p(A) p(S) p(V) p(Y) p(D) p(D) p(D) p(D) p(D) p(D) p(N) p(G) p(Q) p(Q) p(Q) p(Q) p(W) 0.82 0.71 0.41 0.63 0.80 0.57 0.42 0.75 0.82 0.98 0.87 0.76 0.50 0.98 0.74 0.47 0.89 0.90 0.66 0.94 0.67 0.99 0.92 0.55 0.64 0.60 0.32 0.65 0.99 0.99 1.00 0.90 0.49 0.88 0.72 0.57 0.91 0.57 0.43 0.41 p(I) p(L) p(L) p(L) p(F) p(F) p(F) p(T) p(F) p(T) p(F) p(F) p(F) p(E) p(K) p(S) p(E) p(A) p(E) p(E) p(E) p(E) p(A) p(E) p(E) p(K) p(K) p(S) p(P) p(P) p(S) p(P) p(D) p(R) p(H) p(G) p(Q) p(G) p(W) p(W) 0.09 0.08 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.06 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.05 0.00 0.06 0.00 0.00 0.01 0.00 0.08 0.13 0.00 0.01 0.09 0.00 0.00 0.00 p(I) p(I) p(L) p(L) p(L) p(T) p(T) p(T) p(S) p(V) p(A) p(P) p(E) p(V) p(S) p(N) p(N) p(D) p(R) p(H) p(R) p(R) p(D) p(H) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(Q) p(M) p(M) p(M) p(W) 0.11 0.24 0.10 0.01 0.00 0.00 0.03 0.00 0.01 0.07 0.00 0.00 0.12 0.03 0.00 0.00 0.01 0.12 0.02 0.02 0.05 0.05 0.00 0.01 0.10 0.12 0.32 0.03 0.00 0.02 0.00 0.09 0.36 0.02 0.18 0.12 0.04 0.00 0.00 ancCHIL p(I) p(I) p(I) p(I) p(L) p(F) p(F) p(F) p(T) p(F) p(E) p(E) p(A) p(A) p(S) p(E) p(Y) p(Y) p(K) p(S) p(S) p(Y) p(K) p(P) p(A) p(V) p(A) p(N) p(R) p(D) p(D) p(D) p(D) p(D) p(D) p(D) p(R) p(N) p(N) 0.43 0.50 0.98 0.99 0.99 0.97 0.99 0.99 0.93 1.00 1.00 0.50 0.95 1.00 0.99 0.98 0.86 0.97 0.97 0.95 0.94 1.00 0.99 0.81 0.79 0.88 0.43 0.96 0.99 0.96 1.00 0.82 0.44 0.98 0.80 0.43 0.96 1.00 1.00 1.00 p(I) p(L) p(L) p(L) p(L) p(L) p(F) p(F) p(T) p(F) p(A) p(K) p(Y) p(E) p(E) p(E) p(E) p(E) p(E) p(E) p(A) p(E) p(K) p(V) p(E) p(K) p(P) p(K) p(S) p(V) p(Y) p(P) p(D) p(D) p(N) p(H) p(Q) p(G) p(W) p(W) 0.11 0.10 0.07 0.06 0.05 0.03 0.05 0.24 0.01 0.02 0.00 0.01 0.06 0.07 0.00 0.01 0.04 0.04 0.03 0.06 0.01 0.06 0.00 0.04 0.07 0.04 0.06 0.02 0.00 0.04 0.00 0.07 0.10 0.02 0.02 0.09 0.02 0.05 0.00 0.07 p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(F) p(F) p(T) p(F) p(E) p(E) p(E) p(K) p(K) p(Y) p(A) p(S) p(S) p(A) p(V) p(S) p(Y) p(V) p(N) p(N) p(D) p(R) p(R) p(G) p(Q) p(Q) p(Q) p(G) p(Q) p(Q) p(Q) p(Q) p(M) 0.11 0.11 0.17 0.09 0.12 0.23 0.04 0.27 0.26 0.12 0.39 0.07 0.05 0.09 0.01 0.04 0.18 0.21 0.12 0.15 0.09 0.07 0.01 0.41 0.14 0.23 0.42 0.29 0.09 0.00 0.04 0.00 0.10 0.35 0.07 0.21 0.32 0.17 0.01 0.07 ancCC p(I) p(I) p(I) p(L) p(L) p(F) p(T) p(T) p(T) p(F) p(E) p(A) p(E) p(S) p(K) p(V) p(V) p(Y) p(K) p(Y) p(K) p(P) p(A) p(V) p(Y) p(A) p(R) p(D) p(D) p(D) p(D) p(D) p(N) p(R) p(N) p(G) p(G) p(Q) p(Q) p(W) Posterior probabilities p(amino acid) in ancestral reconstruction including indels ancestral including in reconstruction acid) p(amino Posteriorprobabilities 0.46 0.56 0.71 0.56 0.85 0.65 0.30 0.84 0.53 0.92 0.93 0.58 0.72 0.98 0.94 0.68 0.70 0.84 0.55 0.88 0.74 0.99 0.53 0.61 0.67 0.50 0.38 0.87 0.99 0.88 1.00 0.80 0.44 0.90 0.73 0.47 0.65 0.65 0.99 0.63 p(I) p(L) p(L) p(L) p(L) p(F) p(F) p(F) p(T) p(F) p(F) p(A) p(K) p(Y) p(S) p(E) p(A) p(E) p(E) p(E) p(E) p(A) p(E) p(E) p(E) p(K) p(S) p(P) p(K) p(S) p(V) p(P) p(D) p(D) p(R) p(H) p(Q) p(G) p(W) p(W) 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 Position (incl. indels) (incl.

13 [Table S2. continued] I L L L T T L F S P S S V S S P E E A E A T V E A V A A A V D N N D Q G epR4 I I L L L T T L F S P S S V S S P E E A E A T E A V A A A V D N N D Q G ancR7 I I L L L T F T L F S P S S S S P E E A E A T E A A A A T D N N Q G G M ancCHI* I I I L L L T F T F S E S S S P E A E A V E A A A A T A D K N Q G G G M ancCHI Final ancestral sequence Final I L L L L T T F T L S P S S V S K P E E E K V E A V A A A Y D N N D Q G ancCHIL I L L L T T L F S P S S V S S P E E A E A T V E A V A A A V D N N D Q G ancCC 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 (final) Position 0.17 0.03 0.21 0.07 0.01 0.09 0.01 0.12 0.02 0.03 0.20 0.02 0.13 0.01 0.03 0.04 0.13 0.06 0.10 0.00 0.03 0.02 0.03 0.14 0.04 0.06 0.02 0.05 0.02 0.07 0.02 0.01 0.03 0.06 0.02 0.07 0.04 0.13 0.02 0.08 p(I) p(I) p(L) p(L) p(L) p(L) p(T) p(T) p(T) p(A) p(S) p(A) p(A) p(S) p(A) p(K) p(A) p(P) p(A) p(V) p(A) p(S) p(K) p(Y) p(S) p(S) p(V) p(V) p(V) p(N) p(D) p(D) p(D) p(R) p(D) p(Q) p(Q) p(Q) p(G) p(M) 0.11 0.19 0.21 0.26 0.08 0.02 0.21 0.03 0.27 0.17 0.30 0.24 0.04 0.13 0.02 0.16 0.05 0.26 0.10 0.00 0.09 0.28 0.08 0.21 0.25 0.10 0.42 0.06 0.06 0.10 0.09 0.02 0.08 0.37 0.02 0.16 0.06 0.33 0.03 0.09 ancCHI p(I) p(I) p(I) p(L) p(L) p(T) p(T) p(T) p(A) p(K) p(P) p(S) p(V) p(V) p(A) p(V) p(P) p(E) p(S) p(S) p(V) p(S) p(K) p(E) p(S) p(S) p(V) p(S) p(S) p(E) p(S) p(S) p(S) p(N) p(R) p(C) p(D) p(Q) p(G) p(G) 0.24 0.73 0.29 0.81 0.97 0.51 0.95 0.31 0.81 0.64 0.31 0.92 0.34 0.98 0.75 0.81 0.33 0.57 0.77 1.00 0.82 0.64 0.82 0.48 0.62 0.80 0.55 0.83 0.91 0.66 0.83 0.95 0.88 0.41 0.93 0.75 0.87 0.35 0.91 0.75 p(I) p(I) p(I) p(L) p(L) p(L) p(L) p(T) p(F) p(T) p(F) p(T) p(S) p(S) p(E) p(S) p(S) p(S) p(K) p(P) p(E) p(K) p(A) p(E) p(A) p(V) p(E) p(A) p(A) p(A) p(A) p(A) p(D) p(N) p(Q) p(G) p(G) p(G) p(G) p(M) 0.07 0.06 0.01 0.00 0.01 0.02 0.02 0.08 0.00 0.02 0.06 0.01 0.01 0.03 0.04 0.02 0.00 0.05 0.04 0.01 0.02 0.01 0.31 0.08 0.01 0.05 0.00 0.00 0.01 0.02 0.00 0.00 0.00 0.07 0.03 0.00 0.00 0.02 0.05 0.00 p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(T) p(T) p(P) p(A) p(A) p(E) p(A) p(A) p(A) p(S) p(P) p(V) p(K) p(P) p(A) p(E) p(A) p(S) p(S) p(V) p(S) p(V) p(V) p(N) p(N) p(D) p(N) p(Q) p(Q) p(Q) p(Q) p(M) p(M) 0.11 0.12 0.12 0.22 0.16 0.01 0.28 0.03 0.20 0.41 0.03 0.19 0.01 0.06 0.06 0.07 0.04 0.00 0.06 0.24 0.01 0.10 0.03 0.31 0.32 0.41 0.35 0.01 0.06 0.26 0.00 0.03 0.00 0.37 0.08 0.04 0.01 0.05 0.29 0.49 ancCHIL p(I) p(I) p(I) p(I) p(I) p(I) p(T) p(T) p(T) p(F) p(S) p(S) p(A) p(V) p(K) p(V) p(A) p(S) p(E) p(S) p(E) p(A) p(S) p(S) p(A) p(V) p(S) p(Y) p(A) p(E) p(S) p(S) p(E) p(E) p(D) p(D) p(D) p(C) p(M) p(W) 0.54 0.68 0.74 0.83 0.97 0.69 0.92 0.69 0.58 0.95 0.69 0.97 0.93 0.88 0.83 0.92 1.00 0.83 0.70 0.97 0.84 0.94 0.31 0.45 0.57 0.79 0.65 0.99 0.93 0.68 0.99 0.97 0.99 0.40 0.83 0.95 0.98 0.91 0.54 0.51 p(I) p(L) p(L) p(L) p(L) p(L) p(T) p(T) p(F) p(T) p(A) p(S) p(S) p(P) p(S) p(S) p(V) p(S) p(K) p(P) p(E) p(E) p(E) p(K) p(V) p(E) p(A) p(V) p(A) p(A) p(A) p(Y) p(D) p(D) p(N) p(N) p(D) p(Q) p(G) p(G) 0.09 0.06 0.09 0.01 0.04 0.03 0.03 0.10 0.02 0.03 0.08 0.02 0.10 0.02 0.04 0.18 0.12 0.05 0.09 0.00 0.03 0.04 0.18 0.12 0.08 0.08 0.03 0.01 0.04 0.06 0.16 0.14 0.02 0.13 0.02 0.02 0.03 0.04 0.18 0.03 p(I) p(I) p(L) p(L) p(L) p(L) p(T) p(F) p(T) p(S) p(A) p(K) p(A) p(A) p(E) p(A) p(A) p(A) p(S) p(A) p(V) p(A) p(K) p(E) p(K) p(V) p(K) p(S) p(S) p(S) p(V) p(N) p(R) p(D) p(N) p(Q) p(Q) p(G) p(M) p(M) 0.11 0.15 0.28 0.17 0.10 0.20 0.03 0.19 0.31 0.17 0.19 0.03 0.16 0.05 0.10 0.28 0.15 0.06 0.40 0.01 0.06 0.25 0.19 0.22 0.17 0.10 0.42 0.03 0.07 0.12 0.24 0.14 0.03 0.32 0.03 0.06 0.06 0.05 0.19 0.22 ancCC p(I) p(I) p(I) p(I) p(I) p(I) p(L) p(L) p(T) p(T) p(F) p(S) p(V) p(K) p(V) p(A) p(S) p(E) p(S) p(S) p(P) p(S) p(A) p(K) p(V) p(S) p(A) p(S) p(E) p(A) p(S) p(E) p(E) p(Y) p(N) p(C) p(D) p(C) p(Q) p(G) Posterior probabilities p(amino acid) in ancestral reconstruction including indels ancestral including in reconstruction acid) p(amino Posteriorprobabilities 0.37 0.66 0.35 0.82 0.82 0.68 0.93 0.54 0.67 0.78 0.64 0.93 0.63 0.91 0.79 0.43 0.56 0.78 0.47 0.98 0.78 0.58 0.47 0.45 0.57 0.73 0.54 0.96 0.87 0.68 0.51 0.59 0.92 0.39 0.90 0.91 0.88 0.87 0.34 0.71 p(I) p(L) p(L) p(L) p(L) p(T) p(T) p(T) p(F) p(A) p(S) p(S) p(P) p(S) p(S) p(V) p(S) p(K) p(P) p(E) p(E) p(A) p(E) p(A) p(V) p(E) p(A) p(V) p(A) p(A) p(A) p(V) p(D) p(D) p(N) p(N) p(D) p(Q) p(G) p(G) 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 Position (incl. indels) (incl.

14 [Table S2. continued] I L L L T E S V S P S K A S V A E S E E V A N K N K D K G G M epR4 I L L L T S V S P S K A S V A E S E E V A K N K N K D K G G M ancR7 I L L L T V S P S K A S V A E S E E V A K N K N K D K G G G M ancCHI* I I L L L T V S P S K A S A E S S E E V K N K N K D K G G G M ancCHI Final ancestral sequence Final I I L L L L T E S S P S A S V A E A A E E V N K N K D K G G M ancCHIL I L L L T E S V S P S K A S V A E S E E V A N K N K D K G G M ancCC 211 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 212 213 214 215 216 217 218 219 220 (final) Position 0.11 0.03 0.16 0.01 0.07 0.02 0.04 0.01 0.02 0.03 0.07 0.01 0.01 0.04 0.20 0.02 0.01 0.10 0.02 0.17 0.01 0.01 p(I) p(L) p(L) p(T) p(T) p(T) p(T) p(T) p(V) p(A) p(A) p(A) p(S) p(S) p(K) p(D) p(N) p(Q) p(Q) p(M) p(M) p(M) 0.17 0.19 0.01 0.21 0.04 0.39 0.12 0.03 0.37 0.09 0.24 0.03 0.05 0.30 0.07 0.01 0.28 0.15 0.17 0.27 0.02 0.01 ancCHI p(I) p(I) p(I) p(I) p(L) p(T) p(V) p(S) p(E) p(K) p(S) p(A) p(A) p(S) p(A) p(V) p(A) p(A) p(R) p(D) p(R) p(G) 0.77 0.43 0.98 0.53 0.90 0.56 0.87 0.95 1.00 0.53 0.81 0.74 0.96 0.86 0.43 0.90 0.98 0.46 0.67 0.79 0.33 0.96 0.97 p(I) p(I) p(L) p(L) p(L) p(L) p(T) p(K) p(V) p(S) p(P) p(S) p(K) p(A) p(S) p(A) p(E) p(S) p(S) p(N) p(G) p(G) p(G) 0.08 0.15 0.00 0.00 0.07 0.00 0.00 0.01 0.00 0.02 0.01 0.00 0.12 0.00 0.00 0.02 0.01 0.00 0.00 0.04 0.02 p(I) p(I) p(I) p(I) p(L) p(L) p(L) p(T) p(T) p(T) p(T) p(K) p(A) p(A) p(A) p(A) p(A) p(Q) p(G) p(Q) p(M) 0.17 0.25 0.01 0.00 0.09 0.43 0.08 0.03 0.00 0.10 0.49 0.00 0.18 0.00 0.00 0.02 0.28 0.03 0.01 0.05 0.05 ancCHIL p(I) p(T) p(T) p(T) p(P) p(V) p(V) p(S) p(S) p(A) p(S) p(S) p(V) p(S) p(S) p(V) p(D) p(R) p(D) p(Q) p(M) 0.55 0.58 1.00 0.98 0.99 0.83 0.56 0.91 1.00 0.97 0.99 0.86 0.50 0.99 0.70 1.00 1.00 0.96 0.71 0.97 0.98 0.89 0.92 p(I) p(I) p(L) p(L) p(L) p(L) p(L) p(T) p(E) p(S) p(S) p(P) p(S) p(A) p(S) p(V) p(A) p(E) p(A) p(A) p(N) p(G) p(G) 0.11 0.05 0.16 0.00 0.06 0.01 0.12 0.01 0.02 0.04 0.05 0.03 0.01 0.03 0.01 0.00 0.01 0.08 0.02 0.16 0.02 0.02 p(I) p(I) p(L) p(L) p(L) p(T) p(T) p(T) p(T) p(T) p(V) p(S) p(A) p(S) p(S) p(A) p(S) p(V) p(D) p(G) p(Q) p(Q) 0.10 0.22 0.00 0.10 0.04 0.18 0.28 0.05 0.07 0.07 0.19 0.14 0.06 0.24 0.02 0.02 0.02 0.16 0.45 0.17 0.03 0.03 ancCC p(I) p(I) p(I) p(I) p(T) p(V) p(A) p(K) p(S) p(A) p(A) p(A) p(S) p(A) p(S) p(V) p(A) p(K) p(R) p(D) p(G) p(M) Posterior probabilities p(amino acid) in ancestral reconstruction including indels ancestral including in reconstruction acid) p(amino Posteriorprobabilities 0.79 0.51 1.00 0.69 0.91 0.63 0.70 0.91 1.00 0.86 0.85 0.69 0.83 0.89 0.64 0.96 0.98 0.95 0.73 0.50 0.51 0.93 0.93 p(I) p(L) p(L) p(L) p(L) p(T) p(E) p(S) p(V) p(S) p(P) p(S) p(K) p(A) p(S) p(V) p(A) p(E) p(S) p(A) p(N) p(G) p(G) 211 201 202 203 204 205 206 207 208 209 210 212 213 214 215 216 217 218 219 220 221 222 223 Position (incl. indels) (incl.

15 Table S3. Apparent midpoint melting temperatures (Tm) of selected variants measured using SYPRO Orange as fluorescent probe and/or by NanoDSF (following Tryptophane fluorescence).

T [°C] Variant m SYPRO Orange[a] NanoDSF[b] AtCHI 39.5±0.2 50.2 ancCC 81.9±0.01 /[c] ancCHIL 79.5±0.1 / ancCHI n.d.[d] 87.6 ancCHI* n.d. 81.7 ancR1 (L108V) 76.9±0.2 84.3 ancR2 69.4±0.2 77.8 ancR3 72.2±0.1 81.3 ancR4 74.6±0.1 81.3 ancR5 n.d. 81.8 ancR6 83.8±0.1 88.5 ancR7 83.6±0.1 89 epR3 83.0±0.1 / ancCC+M36I 80.6±0.01 88.3 ancCC+I99L 82.0±0.1 89.2

[a] Measurements were performed in triplicate and the average Tm is given. [b] Because measurements were performed only once and the error of the fit is negligible, no error is given. [c] Not analyzed. [d] Tm value was not determined because no clear transition was observed.

90

85

80

75

70

65

60 Tm [°C] NanoDSF

55 y = 15.633 + 0.88934x R 2= 0.9931

50 30 40 50 60 70 80 90 Tm [°C] SYPRO Orange

Figure S4. Correlation between Tm values determined using SYPRO Orange and NanoDSF. Values were taken from Table S3 for all variants where both methods were successfully applied.

16 Table S4. Enzymatic activities of CHI from A. thaliana (AtCHI), inferred ancestors and variants obtained by directed evolution.

[c] vinitial (100 µM substrate) Michaelis-Menten Kinetics [a] [b] [d] Variant Lysate Purified kcat KM kcat/KM kcat/kuncat -1 [µM/min/OD600][µmol/min/mg] [s ] [µM] [M/s] AtCHI[e] (9.7±0.5)×104 (1.2±0.1)×106 87±11 11±3 7.7×106 8.4×104 ancCC n.d.[f] n.d. ancCHIL n.d. n.d. ancCHI[e] (6.7±3.0)×104 (6.4±0.8)×105 16±7 79±41 2.1×105 1.6×104 ancCHI* [e] (3.4±0.6)×104 (5.1±0.1)×105 7.4±1.2 72±16 1.0×105 7100 ancR1 (L108V)[e] 6.2±0.5 100±40 ancR2[e] 350±70 (1.4±0.1)×104 0.13±0.02 29±7 4.4×103 120 (0.11±0.02) (20±6) (5.7×10 3) 110 ancR3[e] 670±90 (3.1±0.1)×104 0.24±0.02 25±5 9.7×103 230 (0.27±0.04) (32±8) (8.6×10 3) 260 ancR4[e] (2.3±0.2)×103 (5.9±0.1)×104 0.53±0.04 13±2 4.0×104 510 ancR5[e] (7.2±1.7)×103 (8.1±0.1)×104 0.55±0.07 10±3 5.7×104 530 ancR6[e] (1.0±0.1)×104 (1.6±0.1)×105 0.37±0.03 9±2 4.0×104 350 ancR7[e] (1.2±0.2)×104 (2.0±0.1)×105 1.4±0.1 29±6 4.7×104 1300 epR1 (F133L)[e] 2.7±0.1 n.d. epR2[e] 15±1 670±60 epR3[e] 20±2 (4.1±0.8)×103 0.036±0.002 95±14 3.8×102 34 (0.034±0.004) (82±22) (4.1×10 2) 32 epR4[e] 80±6 (1.2±0.6)×104 0.10±0.01 64±8 1.6×103 95 3 (0.093±0.007) (54±13) (1.7×10 ) 90 [a] Cells were grown in at least duplicate and lysates sufficiently diluted (~1-10,000-fold) to determine initial rates of chalconaringenin isomerization (in µM/min) at a substrate concentration of 100 µM, normalized to cell density (OD600), corrected for the dilution factor, and averaged. This experiment was repeated three times and the combined average determined. [b] Purified variants were sufficiently diluted (25 nM - 50 µM) to determine the specific enzymatic activity (µmol product generated per min per mg protein) at a substrate concentration of 100 µM. Measurements were performed in triplicate, corrected for the dilution factor, and averaged. [c] Initial rates were measured over a range of substrate concentrations (4.5 – 180 µM) and steady-state kinetic parameters determined. The Michaelis-Menten equation only holds at an excess of substrate over enzyme (or enzyme over substrate). Therefore, for the lowest- activity variants, where high enzyme concentrations (~50 µM) are necessary to detect rate enhancements above background, kinetic parameters could not be determined. For several variants of intermediate activity, the lowest substrate concentrations were in less than 5-fold excess of the enzyme. Data was re-fitted without these points and the values obtained are shown in parentheses. Note that both fits gave highly similar results, but the accuracy of the kinetic parameters should be taken with caution in these cases. -3 -1 [d] The rate constant of the uncatalyzed reaction kuncat was determined as 2.04 × 10 s and the rate enhancement achieved by the enzyme, kcat/kuncat, determined. [e] Stereospecificity was determined by chiral HPLC. All assayed variants produced (S)- naringenin. [f] No activity was detected.

17 90

85

80

75

70

65

60 Tm [°C] NanoDSF

55 y = 15.633 + 0.88934x R 2= 0.9931

50 30 40 50 60 70 80 90 Tm [°C] SYPRO Orange

2 10-7

1.5 10-7

1 10-7 v0 [M/s]

5 10-8

y = 0.0010482x

0 0 5e-5 0.0001 0.00015 0.0002 [Chalconaringenin] [M]

Figure S5. Spontaneous isomerization of chalconaringenin to racemic naringenin. Initial rates v0 were measured in duplicate in 50 mM HEPES pH 7.5 with 5% ethanol as co-solvent and the rate constant of the uncatalyzed reaction, kuncat, was calculated as 1.05 × 10-3s-1.

18 A. thaliana CHI A

naringenin A. thaliana CHIL conserved

variable

insufficient data

B

26 (nearly) neutral residues variable and/or peripheral/surface ancCC

39 ancestral library residues conserved and/or near the active site

Figure S6. Analysis of the evolutionary conservation and position of residues to generate ancCHI*. A. Evolutionary conservation within the CHI and CHIL families was calculated using ConSurf6 and plotted onto the respective crystal structures from A. thaliana (CHIL: PDB ID 4DOK, CHI: 4DOI). Note that naringenin was modeled into the CHI structure by superposition with M. sativa CHI in complex with the product (1eyq). B. Taking into account both evolutionary conservation and where residues are located in the crystal structures, the 65 residues differing between ancCC and ancCHI were divided into two sets: 26 mutations/exchanges were assessed as (nearly) neutral and removed from ancCHI to yield ancCHI*. The remaining 39 mutations were assessed as crutial for activity and included in the phylogenetic library. For the identity and exact position of these residues, see Table S2 (same color code).

19 Table S5. Mutagenic primers used for phylogenetic library generation.

Amino acid Primer No. Position[a] Primer[c] ancCC ancCHI* Library[b] 1 36 M I I gcaggcgttcgcggcATCgaaatcgagaccatcc 2 37 E Q Q gcgttcgcggcatgCAAatcgagaccatccag 3 40 T G G, A ggcatggaaatcgagGSCatccagatcaaggtg 4 41 I V V catggaaatcgagaccGTCcagatcaaggtgacc 5 42 Q E E gaaatcgagaccatcGAGatcaaggtgaccgca 6 50 V I I gtgaccgcaattggtATCtacgcagagccggaa 7 91 V A A cggtggaaaagctgGCTaagattaccatcatt 8 92 K R R gtggaaaagctggttCGTattaccatcattaa 9 93 I V V gaaaagctggttaagGTTaccatcattaaaggc 10 95 I M M ctggttaagattaccATGattaaaggcattaaa 11 96 I L L gttaagattaccatcCTTaaaggcattaaaggc 12 98 G P P attaccatcattaaaCCGattaaaggcagcca 13 99 I L L accatcattaaaggcCTTaaaggcagccagtat 14 100 K T T atcattaaaggcattACAggcagccagtatggt 15 102 S A A aaaggcattaaaggcGCTcagtatggtggcgc 16 105 G S S aaaggcagccagtatTCTggcgccctggaggaaa 17 107 A K K, T gcagccagtatggtggcAMActggaggaaagcatcc 18 108 L V V gccagtatggtggcgccGTGgaggaaagcatccgtg 19 109 E G G tatggtggcgccctgGGGgaaagcatccgtgat 20 111 S N N ggcgccctggaggaaAACatccgtgatcgcctg 21 112 I T T gccctggaggaaagcACCcgtgatcgcctggcc 22 113 R K K ctggaggaaagcatcAAAgatcgcctggccgc 23 115 R A A, S, T gaaagcatccgtgatBCTctggccgccctggat 24 117 A K K atccgtgatcgcctgAAGgccctggataagtat 25 120 D G G cgcctggccgccctgGGTaagtatagcgaagcc 26 138 Q K K ttccgtgaattcttcAAGaccaagagcctgccg 27 142 L F F ttccagaccaagagcTTTccgaaaggcagcgtg 28 144 K P P accaagagcctgccgCCAggcagcgtgatcttc 29 147 V T T, A, I ctgccgaaaggcagcRYTatcttctttcattgg 30 152 W L L gtgatcttctttcatTTGgccgagcccgagcacc 31 153 P S S atcttctttcattggTCGagcccgagcaccctg 32 162 V F F accctgcaaatcagcTTCagcaccgatggtagc 33 176 V I I gaagccgaagcaaccATCgaaaatgccaatgtt 34 181 V F F gtggaaaatgccaatTTTgccgccgcactgctg 35 187 D G G gccgccgcactgctgGGCgtgtttctgggcgaa 36 188 V T T, A, I, M, V cgccgcactgctggacRYStttctgggcgaaaat 37 189 F M M, F, I, L cgcactgctggacgtgWTSctgggcgaaaatagc 38 192 E K K gacgtgtttctgggcAAGaatagcgtctcacca 39 194 S G G tttctgggcgaaaatGGCgtgagcccgagtacc [a] Numbering according to the ancestral genes. [b] In cases where the transition from ancCC to ancCHI* would require the simultaneous substitution of two base pairs, the primer was partially randomised to include at least one plausible bridging amino acid. [c] Representative round 1 primers are shown with mutagenic codons in capital letters. Each primer introduces a mutation at one position. Note that after each round, the primer set was updated to avoid “erasing” mutations that fixated. For example, the round 1 mutation L108V was introduced by primer no. 18, but this codon is also present in primers introducing neighboring mutations (16-17 and 19-22; highlighted in green). Therefore, for R2, these primers were replaced by new primers containing L108V, and primer 18 was removed from the library completely.

20 Table S6. Overview of library generation and screening throughput.

Library Round Mutagenesis1method Template No.1of1clones1screened phylogenetic 1 ISOR ancCC 740 phylogenetic 2 ISOR ancR1 560 phylogenetic 3 ISOR ancR2 560 phylogenetic 4 ISOR 3_1;<;5 370 phylogenetic 5 DNA;shuffling ancCC,;4_1;<;9 560 phylogenetic 6 ISOR ancR5 560 phylogenetic 7 DNA;shuffling 6_1;<;9 560 low

Table S7. Sequenced variants of the phylogenetic libraries. (a) Round 1, (b) Round 2, (c) Round 3, (d) Round 4, (e) Round 5, (f) Round 6, (g) Round 7.

[a] For activity measurements, cells were grown and assayed in triplicate. Clarified cell lysates were sufficiently diluted to determine initial rates of naringenin formation, normalized to cell density (OD600), normalized to the respective parent variant from the previous round (regrown and assayed in parallel), and averaged. For example, Round 2 clones are normalized to ancR1, Round 3 clones to ancR2, etc. Round 1 variants are given relative to ancCC, which is inactive and corresponds to the background rate. Note that data from different tables is not directly comparable as lysates were diluted to different extents, and substrate concentrations ranged from 100 - 180 µM. [b] The 39 possible mutations from ancCC to ancCHI* are shown. In the following cases, the desired amino acid mutation was not accessible via a single base pair substitution and primers were partially randomized to include at least one additional, “bridging” amino acid: T40A/G, A107K/T, R115A/S/T, V147T/A/I, V188T/A/I/M/V and F189M/F/I/L. [c] This column lists mutations introduced inadvertently during PCR amplification.

21 Table S17a. Sequenced variants of the phylogenetic round 1 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 1_1 1.8 + + + 1_2 1.4 + + 1_3 1.5 + 1_4 1.8 + + 1_5 2.1 + 1_6 1.6 + + 1_7 1.5 + 1_8 1.5 + + + 1_9 3.7 + + 1_10 2.2 + 1_11 1.6 + + + 1_12 2.0 + + + + 1_13 2.5 + 1_14 2.2 + 1_15 1.5 1_16 3.5 + + + 1_17 1.7 + 1_18 2.2 + + 1_19 2.2 + + 1_20."ancR1" 2.1 + 1_21 1.8 1_22 2.0 + 1_23 1.9 + 1_24 1.3 + 1_25 2.1 +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 1_1 + + + G66D 1_2 + + 1_3 + 1_4 I11L,7E12D 1_5 + S156N 1_6 1_7 + 1_8 + + + 1_9 + 1_10 + + 1_11 + + + 1_12 + + + + G66D 1_13 + + + 1_14 + + 1_15 + + 1_16 + 1_17 1_18 1_19 + 1_207"ancR1" 1_21 L130P 1_22 V195I 1_23 + 1_24 + + 1_25 +

22 Table S7b. Sequenced variants of the phylogenetic round 2 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 2_1 4.3 + + + 2_2("ancR2" >10[d] + + (( + 2_3 1.3 + + + 2_4 1.6 + 2_5 1.2 + + 2_6 5.8 + + + + + 2_7 3.1 + + + 2_8 4.5 + + + 2_9 1.8 + + + + 2_10 3.7 + + + + 2_11 3 + + + 2_12 4.7 + + 2_13 3.3 + + + 2_14 3.1 + + + + 2_15 4.8 + + 2_16 1.4 + 2_17 1.5 + 2_18 1.8 +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 2_1 + + + 2_2/"ancR2" + + + 2_3 + + + 2_4 2_5 + + + 2_6 + + + A118T 2_7 + + + 2_8 + P143S 2_9 2_10 + + + 2_11 + + + 2_12 + + + 2_13 + + + 2_14 + + + P197Q 2_15 2_16 + A129V 2_17 2_18 + [d] The increase in activity was too high to accurately determine initial rates at the given dilution of the lysate.

23 Table S7c. Sequenced variants of the phylogenetic round 3 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 3_1 1.2 + + + + 3_2 1.1 + + + + 3_3&"ancR3" 1.8 + + + + 3_4 1 + + + + 3_5 1.1 + + + +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 3_1 + + 3_2 + + + 3_30"ancR3" + + + + 3_4 + + + 3_5 + + +

24 Table S7d. Sequenced variants of the phylogenetic round 4 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 4_1 1.6 + + + + + + + 4_2 1.6 + + + + + + 4_3)"ancR4" 2.8 + + + + + + 4_4 1.8 + + + + + + 4_5 1.6 + + + + + 4_6 1.4 + + + + 4_7 1.5 + + + + + + 4_8 1.5 + + + + + + + 4_9 1.8 + + + + +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 4_1 + + + + + 4_2 + + + + + + 4_31"ancR4" + + + + + 4_4 + + + + + 4_5 + + + + 4_6 + + + + + 4_7 + + + + + + + 4_8 + S[d] + + 4_9 + + + [d] A115 was mutated to the bridging amino acid Serine.

25 Table S7e. Sequenced variants of the phylogenetic round 5 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 5_1 0.9 + + + + + + 5_2 1 + + + + + 5_3 0.9 + + + + + + 5_4+"ancR5" 1.3 + + + + + + 5_5 1.1 + + ++ + + + + 5_6 0.9 + + + + + + 5_7 1.2 + + + + + + 5_8 1 + + + + +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 5_1 + + + + + 5_2 + + + S22G 5_3 + + + + + 5_44"ancR5" + + + + + + 5_5 + + + + 5_6 + + + + 5_7 + + + + + + 5_8 + + + + + + +

26 Table S7f. Sequenced variants of the phylogenetic round 6 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 6_1 1.2 + + + + + + + 6_2 1.2 + + + + + + + + 6_3 1.3 + + + + + + + 6_4)"ancR6" 1.4 + + + + + + + 6_5 1.2 + + + + + + 6_6 1.2 + + + + + + + 6_7 1.2 + + + + + + 6_8 1.2 + + + + + 6_9 1.2 + + + + + +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 6_1 + + + + + + + + + 6_2 + + + + + + + 6_3 + + + + + + 6_42"ancR6" + + + + + + 6_5 + + + + + + + + 6_6 + + + + + 6_7 + + + + + + 6_8 + + + + + + + + S22G 6_9 + + + + +

27 Table S7g. Sequenced variants of the phylogenetic round 7 library.

Mutations[b] M E T I Q V V K I I I G I K S G A L E Variant Activity[a] 36 37 40 41 42 50 91 92 93 95 96 98 99 100 102 105 107 108 109 I Q G V E I A R V M L P L T A S K V G 7_1 1.1 + + + + + + + 7_2 1.1 + + + + + + + 7_3 1.1 + + + + + + + + 7_4)"ancR7" 1.2 + + + + + + 7_5 1.1 + + + + + + + + 7_6 1.2 + + + + + + 7_7 1.1 + + + + + + + 7_8 1.1 + + + + + + +

Mutations[b] Variant S I R R A D Q L K V W P V V V D V F E S [continued] 111 112 113 115 117 120 138 142 144 147 152 153 162 176 181 187 188 189 192 194 PCR[c] N T K A K G K F P T L S F I F G T M K G 7_1 + + + + + + + + 7_2 + + + + + + + 7_3 + + + + + + + + 7_42"ancR7" + + + + + + + 7_5 + + + + + + + 7_6 + + + + + + + S22G 7_7 + + + + + + 7_8 + + + + +

28 Table S8. Mutations accumulated in the phylogenetic trajectory.

Mutations[a] Round Variant L108V M36I S111N Q138K I95M I99L K100T L142F P153S E192K G98P A117K V176I 1 ancR1 + 2 ancR2[b] + + + + 3 ancR3[b] + + + + + + 4 ancR4[b] + + + + + + + + 5 ancR5[b] + + + + + + + + + + 6 ancR6[b] + + + + + + + + + + + + 7 ancR7 + + + + + + + + + + + + +

[a] Only mutations present in the seven final variants are shown. A list of all 39 possible mutations is given in Table S5 and an overview of all variants selected in each round is given in Table S7. [b] These variants contain additional mutations that did not fixate over the evolution (ancR2: V162F, V91A, A107K. ancR3: V162F, D187G. ancR4: T40G, R113K, S194G. ancR5: T40G, R1113K. ancR6: T40G). Note that T40G requires the simultaneous substitution of two base pairs. Therefore, the library included A40 as a possible bridging amino acid (see Table S5), but all selected variants contained G.

Table S9. Enzymatic activities of variants obtained by additional library screening or site-directed mutagenesis.

[c] vinitial (100 µM substrate) Michaelis-Menten Kinetics [a] [b] [d] Variant Lysate Purified kcat KM kcat/KM kcat/kuncat -1 [µM/min/OD600][µmol/min/mg] [s ] [µM] [M/s] ancCC+M36I[e] 1.4±0.6 44±30 ancCC+I99L[e] 4.0±0.6 240±40 ancCC+36/99 7.6±0.9 460±40 ancCC+36/108 13±1 250±80 ancCC+99/108 42±5 (2.7±0.2)×103 ancCC+36/99/108 240±70 (5.7±0.4)×103 ancCC+K100T n.d.[f] n.d. ancCC+S111N n.d. n.d. ancCC+36/100 2.0±0.5 50±40 ancCC+36/111 n.d. n.d. ancCC+99/100 4.4±0.4 270±40 ancCC+99/111 2.9±0.4 310±170 ancCC+100/108 10±1 180±30 ancCC+108/111 12±2 300±130 ancCHI*+I36M (7.1±0.8)×103 (2.1±0.01)×105 1.9±0.1 120±20 1.7×104 1900 ancCHI*+L99I (3.1±0.4)×103 (5.6±0.1)×104 1.0±0.4 55±32 1.9×104 1000 (0.91±0.30) (45±22) (2.0×10 4) 880 ancCHI*+V108L (5.3±0.1)×103 (1.1±0.1)×105 0.73±0.05 30±7 2.4×104 700 epR1-M36I 6.7±0.5 190±70 epR1+I99L 16±2 490±360 epR1+L108V 4.2±0.6 n.d. ancR3+F133L 730±20 (2.7±0.1)×104 0.30±0.04 43±10 6.9×103 290 (0.28±0.04) (37±9) (5.7×10 3) 270 ancR7+F133L (9.0±1.1)×103 (1.8±0.1)×105 0.83±0.03 14±2 6.1×104 800 ancCHI*+F133L (4.7±0.6)×104 (5.2±0.02)×105 8.1±3.2 53±29 1.5×105 7800 AtCHI+F146L (2.1±0.8)×104 (6.0±1.0)×105 19±7 38±22 5.1×105 1.8×104 ancR1-ΔR 59±3 (1.3±0.3)×104 ancR7-ΔR

29 [a] Cells were grown in at least duplicate and lysates sufficiently diluted (~1-10,000-fold) to determine initial rates of chalconaringenin isomerization (in µM/min) at a substrate concentration of 100 µM, normalized to cell density (OD600), corrected for the dilution factor, and averaged. This experiment was repeated three times and the combined average determined. [b] Purified variants were sufficiently diluted (25 nM - 50 µM) to determine the specific enzymatic activity (µmol product generated per min per mg protein) at a substrate concentration of 100 µM. Measurements were performed in triplicate, corrected for the dilution factor, and averaged. [c] Initial rates were measured over a range of substrate concentrations (4.5 – 180 µM) and steady-state kinetic parameters determined. The Michaelis-Menten equation only holds at an excess of substrate over enzyme (or enzyme over substrate). Therefore, for the lowest- activity variants, where high enzyme concentrations (~50 µM) are necessary to detect rate enhancements above background, kinetic parameters could not be determined. For several variants of intermediate activity, the lowest substrate concentrations were in less than 5-fold excess of the enzyme. Data was re-fitted without these points and the values obtained are shown in parentheses. Note that both fits gave highly similar results, but the accuracy of the kinetic parameters should be taken with caution in these cases. -3 -1 [d] The rate constant of the uncatalyzed reaction kuncat was determined as 2.04 × 10 s and the rate enhancement achieved by the enzyme, kcat/kuncat, determined. [e] Stereospecificity was determined by chiral HPLC. All assayed variants produced (S)- naringenin. [f] No activity was detected.

Table S10. Sequenced variants of the low-mutation rate phylogenetic libraries. (a) Round 1, (b) Round 2 based on ancCC+M36I, (c) Round 2 based on ancCC+I99L, (d) Round 2 based on ancCC+L108V.

[a] Activity measurements were performed as described above. [b] Enriched mutations are shown in grey boxes. [c] This column lists mutations introduced inadvertently during PCR amplification. [d] This column indicates which clones were independently identified more than once.

Table S10a. Sequenced variants of the low-mutation rate round 1 phylogenetic library.

Variant Activity[a] Mutations[b] PCR[c] Times4found[d] 1 1.7 L108V 3 2 1.3 M36I 3 1.6 I99L 2 4 1.4 L108V W152L P153S 5 2.3 K100T L108V 6 1.4 L158P 7 1.7 A117K L108V

30

Table S10b. Sequenced variants of the low-mutation rate round 2 phylogenetic library based on ancCC+M36I.

Variant Activity[a] Mutations[b] PCR[c] Times4found[d] all M36I 1 1.4 I99L P153S S111N 2 2.2 L108V P153S 3 1.1 G66D 4 1.6 I99L A118T 5 2.5 I95M L108V 6 1.2 I99L 7 2.2 I99L V187T 8 2.2 L108V 3 9 3.4 L108V S111N A172V 10 1.9 199L R113K

Table S10c. Sequenced variants of the low-mutation rate round 2 phylogenetic library based on ancCC+I99L.

Variant Activity[a] Mutations[b] PCR[c] Times4found[d] all I99L 1 1.2 M36I 4 2 2.3 L108V 6 3 1.6 T40G V91A L108V 4 2.2 I41V L108V 5 2.0 L108V G187D 6 1.6 F133I 7 1.7 L108V V56M 8 1.8 M36I V147T 9 2.8 L108V L142F V181F 10 1.4 M36I L142F 11 2.2 E37Q L108V A52T 12 2 T40G L108V

31 Table S10d. Sequenced variants of the low-mutation rate round 2 phylogenetic library based on ancCC+L108V.

Variant Activity[a] Mutations[b] PCR[c] Times4found[d] all L108V 1 2.0 M36I S102A S111N 2 1.3 S111N 7 3 1.3 M36I 6 4 1.2 K100T 3 5 1.2 E37Q P153S 6 1.2 M36I E109G 7 1.2 I95M F181V 8 1.4 G98P S23N 9 2.6 I99L D187G 10 1.2 I95M K100T 11 2.4 I99L 2 12 1.2 K65N 13 2.0 M36I S111N 14 1.4 M36I P153S 15 1.3 S111N G66D 16 1.5 K100T S111N 17 1.2 S194G A67T 18 1.2 V181F A179T 19 1.5 M36I L142F P153S E12D 20 2.6 M36I L142F I28M,

32 Table S11. Sequenced variants of the random mutagenesis libraries. (a) epRound 1, (b) epRound 2, (c) epRound 3, (d) epRound 4.

[a] Activity measurements were performed as described above. Note that the activity of both epRound 1 and 2 variants are given relative to ancCC, which is inactive and corresponds to the background rate. [b] Enriched mutations are shown in grey boxes.

Table S11a. Sequenced variants of the epRound 1 random mutagenesis library.

Variant Activity[a] Mutations[b] 1_1"epR1" 1.7 F133L 1_2 1.7 F78L D187G F133L E192V 1_3 1.7 F78Y K144E F133L G145C L185M 1_4 1.5 M36V A58P 1_5 1.2 K24T 1_6 1.2 D114V F136L 1_7 1.6 F133S S208C 1_8 1.5 F133S 1_9 1.6 F133L 1_10 1.5 L185V 1_11 1.7 F133I 1_12 1.4 I160F V220E 1_13 1.8 L130P K213Q 1_14 1.6 G10C F133L Q159H E173D S202G 1_15 1.7 F133I D216Y V220E 1_16 1.6 E88K I95T 1_17 1.6 F133L E173V S208L A209S 1_18 1.7 E124V W152L V188G 1_19 1.2 A2P

33 Table S11b. Sequenced variants of the epRound 2 random mutagenesis library.

Variant Activity[a] Mutations[b] 2_1 3.7 M36V F133S L185V K213Q 2_21"epR2" 3.0 M36V F133L 2_3 3.6 E88K I95T F127G F133I 2_4 3.1 M36V F133L 2_5 3.2 M36V F133S F136L S208C 2_6 3.6 M36V F133I S208C 2_7 3.7 M36V P54S F133I K213Q 2_8 2.8 M36A F133L 2_9 3.5 M36V A2P K65L F133S S208C 2_10 3.2 M36V R115H F133L D187G E192V

Table S11c. Sequenced variants of the epRound 3 random mutagenesis library.

Variant Activity[a] Mutations[b] all M36V,&F133L 3_1&"epR3" 1.2 V45M 3_2 1.3 S123R K65M I99F P155S 3_3 1.0 E12V V45M I99F A129S 3_4 1.2 V33I Y122F E124K 4126D F150I 3_5 1.2 V45M G106D G166V 3_6 1.2 H60Y F150I 3_7 1.2 I95T S163N

Table S11d. Sequenced variants of the epRound 4 random mutagenesis library.

Variant Activity[a] Mutations[b] all M36V,&V45M,&I99F,&F133L 4_1 2.2 E12V A129S P155S G166D 4_2 2.2 A129S 4_3 2.2 A129S 4_4 2.2 A129S 4_5 2.1 A129S S163N 4_6 2.2 A129S 4_7 2.3 A129S 4_8 2.4 K65M S123R P155S G166D 4_9 2.0 E12V A129S

34 Table S12. Occurrence of phylogenetic founder mutations and epR4 mutations in the ancestral reconstruction and in extant sequences.

Mutation Posterior probabilities in ancestral reconstruction[a] Occurrence in extant sequences[b] anc ep ancCC ancCHIL ancCHI[c] CHI CHIL M36 I V p(M) 0.67 p(M) 0.99 p(I) 0.49 63% L 34% I p(I) 0.11 p(I) 0.01 p(V) 0.33 13% I 33% M p(V) 0.08 p(M) 0.08 13% V 5% V 3% M V45 / M p(V) 0.64 p(F) 0.99 p(V) 0.84 69 % F 81% F p(I) 0.15 p(I) 0.11 19% V 0 M p(F) 0.12 p(L) 0.03 1% M 0 V I99 L F p(I) 0.64 p(I) 0.89 p(L) 0.89 89%L 74% I p(V) 0.21 p(V) 0.10 p(I) 0.08 3% I 7% L p(L) 0.14 p(L) 0.01 p(V) 0.02 0 F 0 F L108 V / p(L) 0.87 p(L) 0.99 p(V) 0.53 85% V 81% L p(I) 0.06 p(L) 0.22 1% L 7% V p(V) 0.04 p(I) 0.10 A129 / S p(A) 0.98 p(A) 1.00 p(A) 0.98 87 % A 67% A p(S) 0.01 p(S) 0.01 1% S 12% S F133 / L, I, S p(F) 0.84 p(L) 0.97 p(F) 0.90 85% F 44%L p(L) 0.12 p(F) 0.02 p(Y) 0.09 3% L 19% I p(Y) 0.03 p(M) p(L) 0.01 0 I, 2% F 0 S 0 S [a] The highest three posterior probability values are given. Values p(amino acid)<0.01 are not shown. [b] The occurrence of the relevant amino acids in extant sequences was calculated from the sequence alignment (Supplementary Data Set 1), which contained 88 CHI and 43 CHIL sequences. Concensus amino acids are shown in bold. [c] ancCHI amino acids were identical to those found in ancCHI*.

Table S13. Mutations accummulated in the alternative random mutagenesis trajectory.

Mutations[a] Round Variant F133L M36V V45M I99F A129S ep1 epR1 + ep2 epR2 + + ep3 epR3 + + + ep4 epR4 + + + + +

[a] Only mutations present in the four final variants are shown. An overview of all variants selected in each round is given in Table S11.

35 Table S14. Data collection and refinement statistics for x-ray protein structures determined in this work. Parenthesized values describe the highest resolution shell.

Data Collection ancCC ancR1 ancR2 ancR3

Space Group P212121 P212121 C2221 P1211

Cell dimensions 56.08, 69.45, 128 56.3, 69.12, 82.74, 219.43, 69.22, 117.69, 126.83 104.52 71.85

a,b,c (Å) 56.08, 69.45, 128 56.3, 69.12, 82.74, 219.43, 69.22, 117.69, 126.83 104.52 71.85

α,β,γ (º) 90, 90, 90 90, 90, 90 90 90 90 90.00, 118.63, 90.00

Resolution (Å) 33.51-1.4 (1.45- 33.34-1.5 (1.55- 75.68-2.4 (2.47- 53.99-2.0 (2.07- 1.40) 1.50) 2.40) 2.00)

Rmerge 0.082 (0.833) 0.098 (0.535) 0.095 (0.865) 0.078 (0.618)

I/σ(I) 11.5 (2.2) 12.8 (3.8) 13.8 (2.6) 11.7 (2.3)

Completeness (%) 99.8 (100.0) 99.6 (99.0) 100.0 (100.0) 100.0 (100.0)

Redundancy 6.7 (6.4) 9.8 (8.0) 10.1 (9.1) 7.2 (6.2)

Refinement

No. unique 98865 79600 37657 68082 reflections

Rwork / Rfree 0.196/0.239 0.1585/0.1817 0.2094/0.2567 0.1924/0.2165

No. atoms 4003 3976 6397 6592

Protein 3511 3387 6372 6244

Ligand/Ion 38 34 4 12

Water 454 555 21 336

B-factors (Å2) 26.6 24.4 74.3 52.4

Protein 24.8 22 74.4 52.5

Ligand/Ion 45.5 28.2 88.1 35.8

Water 38.8 38.7 55.6 51.2

R.M.S. deviation

bond lengths 0.009 0.006 0.005 0.007 (Å)

bond angles (º) 1.26 0.99 0.84 0.99

36 [Table S14. continued]

Data Collection ancR5 ancR7 ancCHI* epR4

Space Group P41212 P1 P212121 P1211

Cell dimensions

a,b,c (Å) 65.51,65.51,117.2 38.18,49.57,63.10 50.44,65.44,94.14 65.94,112.67,71.7 4 6

α,β,γ (º) 90,90,90 98.68,92.02,107.9 90,90,90 90,117.37,90

Resolution (Å) 57.19-1.51 (1.57- 62.14-1.60(1.69- 47.07-1.9 (1.97- 58.57-1.58 (1.67- 1.51) 1.60) 1.90) 1.58)

Rmerge 0.90 (0.463) 0.059 (0.521) 0.0317 (0.3909) 0.101(0.748)

I/σ(I) 18.9 (5.1) 10.2 (2.0) 11.04 (2.15) 8.33 (1.8)

Completeness (%) 100.0 (99.9) 95.8 (94.6) 100.00 (100.00) 99.8 (100.00)

Redundancy 16.3(12.8) 3.8 (3.9) 6.9 (7.1) 5.3 (5.2)

Refinement

No. unique 40583 54777 21761 126702 reflections

Rwork / Rfree 0.1787/0.1905 0.1866/0.2233 0.1755/0.2062 0.1696/0.1933

No. atoms 2019 3652 1719 7291

Protein 1723 3242 1585 6244

Ligand/Ion 16 3 2 12

Water 280 407 132 1035

B-factors (Å2) 27.2 40.5 44.1 29.6

Protein 24.7 39.4 43.7 27.8

Ligand/Ion 82 38.7 32.4 16.2

Water 39.4 48.9 49.3 41.1

R.M.S. deviation

bond lengths 0.007 0.003 0.011 0.005 (Å)

bond angles (º) 1.08 0.64 1.28 0.93

37 ancCC ancR1 ancR2

ancR3 ancR5 ancR7

ancCHI* epR4

X-ray crystal structures with locations of substitutions mapped onto the structure. X-ray crystal structures are shown: ancCC, ancR1, ancR2, ancR3, ancR5, ancR7, ancCHI* and FigureepR4. Naringenin S7. X-ray bound crystal to the structures active site ofreported the ancCC in and this ancR1 work structures. AncCC, is ancR1,rendered ancR2, as ancR3,space-filling ancR5, (black). ancR7, Positions ancCHI* of substitutions and epR4. that Naringenin have amino acidbound identities to the which active differ site from of the ancCC are colored in red. ancCC and ancR1 structures is rendered as space-filling (black). Positions of substitutions! that have amino acid identities which differ from ancCC are colored in red.

38 Table S15. RMS values for protein-protein structural alignments based on CA alignments in PyMOL. Average values with standard deviations are reported for proteins with more than 1 molecule per asymmetric unit. The number of molecules per asymmetric unit is shown in parenthesis in the left most column under the name of the protein. Previously published CHI protein crystal structures, AtCHI (PDB ID: 4DOI) and MsCHI (PDB ID: 1EYQ), are used here for comparison. The results show that the structures that on average the evolved proteins used in this study have similar RMS values across all CHI structures.

ancCC ancR1 ancR2 ancR3 ancR5 ancR7 ancCHI* epR4 AtCHI MsCHI (4DOI) (1EYQ) ancCC 0.429 0.437 ± 0.520 ± 0.452 ± 0.722 ± 0.427 ± 0.900 ± 0.598 ± 0.952 ± 1.229 ± (2) 0.272 0.083 0.054 0.120 0.060 0.221 0.108 0.139 0.046 ancR1 0.403 0.520 ± 0.451 ± 0.729 ± 0.450 ± 0.899 ± 0.577 ± 1.071 ± 1.167 ± (2) 0.080 0.065 0.089 0.058 0.220 0.104 0.105 0.061 ancR2 0.346 ± 0.467 ± 0.655 ± 0.550 ± 0.901 ± 0.551 ± 1.098 ± 1.217 ± (4) 0.094 0.077 0.098 0.081 0.060 0.062 0.069 0.052 ancR3 0.229 ± 0.745 ± 0.448 ± 0.680 ± 0.448 ± 0.836 ± 1.272 ± (4) 0.047 0.012 0.095 0.017 0.080 0.068 0.062 ancR5 N/A 0.577 ± 1.006 0.672 ± 1.272 ± 1.044 ± (1) 0.081 0.059 0.021 0.052 ancR7 0.328 0.806 ± 0.593 ± 0.962 ± 1.167 ± (2) 0.175 0.117 0.122 0.052 ancCHI* N/A 0.879 ± 0.823 ± 1.182 ± (1) 0.350 0.023 0.028 epR4 0.271 ± 1.050 ± 1.345 ± (4) 0.084 0.067 0.060 AtCHI 0.097 1.187 ± (2) 0.034 MsCHI 0.182 (2) Overall 0.604± 0.681± 0.682± 0.602± 0.824± 0.631± 0.897± 0.6983 0.936 ± 1.201± Average 0.426 0.399 0.243 0.199 0.213 0.302 0.506 ± 0.434 0.247 0.154

39 round 1 7.5Å round 2 11.2Å round 3 9.6Å round 5 17.1Å round 7 17.2Å

Figure S8. Average distance of each mutation from ancR1, ancR2, ancR3, ancR5, and ancR7 calculated as the distance from the side chain Cβ to Nε of R34. Mutations that fixated later occur, on average, further away from the active site. Note: the value for ancR1 is not an average since only a single mutation was introduced.

40

Figure S9. Root mean square deviations (RMSD, Å) of all substrate heavy atoms from simulations of ancCC, ancR1, ancR3, ancR7, ancCHI and AtCHI Michaelis complexes. The relevant enzyme variant is indicated on each panel, and panels to the left are from simulations with the substrate starting from the productive mode, whereas panels to the right are from simulations with the substrate starting from the non-productive mode. The starting orientations in all substrate-bound simulations were chosen to be compatible with formation of the S-enantiomer of the product. We show here that simulations initiated with the substrate bound in the productive mode show high stability and no significant changes in substrate position (left panel). The corresponding simulations from the non-productive mode show in each case high instability (right panel), with the substrate almost immediately changing its position and orientation (note the high RMSD values). In the simulations of the ancR3, ancR7 and AtCHI variants, we additionally observe reorientation of the substrate from the non-productive to the productive binding mode during the simulations (see Fig.s 5d, S10 and S11 for further details). Finally, in many individual trajectories, the substrate dissociates fully from the active site. The data is presented individually for five independent 100 ns trajectories per system, with data collected every 10 ps. For clarity RMSD data for individual replicas are plotted using cspline smoothing function implemented in Gnuplot.

41

ancCHI AtCHI

Final 50 ns binding mode: CHI-like, productive ancCC-like, non-productive other binding modes dissociated

Initial binding mode: CHI-like, productive Initial binding mode: ancCC-like, non-productive

Figure S10. Overview of simulations of the ancCHI and AtCHI Michaelis complexes, showing the distribution of different substrate binding modes in each simulation. The distribution of different binding modes was obtained through RMSD-based clustering using the average linkage algorithm. The clustering was performed on the combined final 50 ns of each of five 100ns trajectories for each variant, starting in either the productive (upper panel) or non-productive (lower panel) binding modes (i.e. a total of 250 ns simulation time per system). The clustering was performed on the heavy atoms of the substrate, after aligning all structures to the starting structure using backbone heavy atoms. From this clustering, we are able to differentiate between substrate conformations, which either fully dissociate from the active site, stay in a non-productive-like conformation, adopt a productive-like conformation, or bind in a totally different conformation while remaining in the active site. From this figure, it can be seen that while simulations starting from the productive binding mode stay stable throughout the trajectories, those starting from the non-productive binding mode are highly unstable, with a substantial fraction of substrate dissociations, or the substrate finding completely new (and catalytically irrelevant) binding modes in the active site (see Figure S11 for an overview of the different binding modes).

Variants ancCC, ancR1, ancR3, and ancR7 are shown in Fig. 5d. Note that while AtCHI behaves similarly to ancR7, but shows no further increase in productive binding, ancCHI does not fit the overall trend. The reason for this is unclear, however, these two variants are less comparable to the others due to larger differences in the protein sequence (e.g., neutral mutations).

42 (A) (B)

(C) (D)

Figure S11. Illustration of different substrate binding modes observed in our molecular dynamics simulations, using the ancR3 variant as a representative example. All simulations in this figure were initiated from the substrate in a non- productive mode. Shown here is (A) an overlay of three different substrate binding positions, and (B-D) individual representation of the substrate in a productive, non- productive and intermediary position (denoted as “other binding modes” in Fig. 5d and S10), respectively. Note that all panels are presented from exactly the same angle, to highlight also examples movements of the catalytic arginine in conjunction with substrate reorientation.

43

Figure S12. Root-mean square deviations (RMSD) of all product heavy atoms from simulations of ancCC and ancR1 in complex with the product observed in the non-productive binding mode in PDB IDs 5WKR and 5WKS. These simulations were performed in order to validate that our simulation protocol can correctly reproduce the stability of the product complex for which crystallographic coordinates are available, and as can be seen from this figure, the product complex is stable over the course of 100ns of simulation time in all 5 replicas, thus increasing our confidence that the instabilities observed for the reactant complex in Figures S9 and S10 are not simulation artefacts due to a bad simulation protocol. The data is presented individually for five independent 100 ns trajectories per system, with data collected every 10 ps. For clarity RMSD data for individual replicas are plotted using cspline smoothing function implemented in Gnuplot.

44 ) 85.0 ancR7 ancCC ancR1

85.5 N (ppm

15 86.0 - 1

ω 86.5

7.50 7.40 7.30 1 ω2 - H (ppm)

10 9 8 7

90 90

100 100 ) N (ppm

15 110 110 - 1 ω

120 120

130 130

10 9 8 7 1 ω2 - H (ppm)

Figure S13. NMR 15N-1H HSQC spectra of ancCC (blue), ancR1 (red), and ancR7 (purple). The arginine side chain signals 15Nε-1Hε are well-resolved at peaks at the 15N chemical shift, 86 ppm (zoom-in shown at the top). Unlabeled amide backbone peaks downfield of 8.5 ppm represent structured residues in the protein and were selected to calculate the average heteronuclear NOE backbone values.

45

Figure S14. Root-mean square fluctuations (RMSF) of Arg34 during our substrate-free simulations of the ancCC, ancR1, ancR3 and ancR7 variants. The RMSF values were calculated independently for the backbone and side chain heavy atoms, after aligning the last 50ns of all five replicas of the simulation of ancCC, ancR1 and ancR7 to the starting structure along the backbone heavy atoms. In the case of ancR3, only four replicas were considered in the RMSF calculations, as in one of the replicas a substantial movement of the entire hairpin was observed, which artificially increased the RMSD of Arg34. The RMSF values correspond to the average mobility of each atom during our simulations, and thus provide a measure of the flexibility of a given residue. The data indicates that (1) in all cases, the Arg34 side chain is more flexible than the backbone, in good agreement with the NMR data (see main text), and (2) that there are changes in the relative flexibility of both backbone and side chain across the different enzyme variants. Specifically, these RMSF values show an increase in the flexibility of the Arg34 side chain in ancR1 compared to ancCC, and a reduction in flexibility again moving towards ancR3 and ancR7, compared to the other two variants.

46

Figure S15. Root-mean square deviations (RMSD, Å) of all backbone heavy atoms from simulations of the different enzymes. (A) in complex with the substrate bound in a productive mode, (B) the substrate bound in a non-productive mode, (C) no substrate, or (D) in complex with the product. The relevant enzyme variants are indicated on each panel. The data is presented as averages over five independent 100 ns trajectories per system, with data collected every 10 ps.

47 Table S16. Standard GAFF atom types and calculated partial charges used to describe the substrate chalconaringenin (left) and the product naringenin (right).

Atom Name Atom Type Partial Charge Atom Name Atom Type Partial Charge C1 ce -0.1609 C1 ct 0.3413 C2 cf -0.3108 C2 ct -0.4237 C3 c 0.8619 C3 c 0.7866 C4 ca -0.7954 C4 ca -0.5600 C5 ca 0.9591 C5 ca 0.5963 O1 o -0.7805 O1 os -0.4394 C6 ca -0.9905 C6 ca -0.7322 C7 ca 0.7903 C7 ca 0.6901 C8 ca -0.8999 C8 ca -0.7062 C9 ca 0.6603 C9 ca 0.5665 O2 o -0.6247 O2 o -0.5699 O3 oh -0.6840 O3 oh -0.6326 O4 oh -0.6325 O4 oh -0.5584 H1 ho 0.4272 H1 ho 0.4550 H2 ho 0.4199 H2 ho 0.4325 H3 ha 0.2234 H3 ha 0.2354 H4 ha 0.2426 H4 ha 0.2558 C10 ca 0.1715 C10 ca 0.0051 C11 ca -0.1271 H5 h1 0.0219 C12 ca -0.4596 H6 hc 0.1239 H5 ha 0.1789 H7 hc 0.1239 C13 ca 0.5028 C11 ca -0.1598 H6 ha 0.1780 C12 ca -0.3168 C14 ca -0.3262 H8 ha 0.1837 O5 oh -0.6732 C13 ha 0.4207 C15 ca -0.1846 H9 ha 0.1634 H7 ha 0.1880 C14 ca -0.3168 H8 ha 0.1563 O5 oh -0.5666 H9 ho 0.4422 C15 ca -0.1598 H10 ha 0.1178 H10 ha 0.1837 H11 ha 0.1298 H11 ha 0.1634 H12 ho 0.3930

48 References

1. Matasci, N. et al., Data access for the 1,000 Plants (1KP) project. Gigascience 3, 17 (2014). 2. Jez, J.M., Bowman, M.E., Dixon, R.A., & Noel, J.P., Structure and mechanism of the evolutionarily unique plant enzyme chalcone isomerase. Nat Struct Biol 7, 786-791 (2000). 3. Ngaki, M.N. et al., Evolution of the chalcone-isomerase fold from fatty-acid binding to stereospecific catalysis. Nature 485, 530-533 (2012). 4. Ronquist, F. & Huelsenbeck, J.P., MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572-1574 (2003). 5. Armougom, F. et al., Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res 34, W604-608 (2006). 6. Ashkenazy, H. et al., ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38, W529- 533 (2010).

49