<<

bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Catalytic promiscuity potentiated the divergence of new 2 functions in cyanogenic defense metabolism 3 4 Brenden Barco*(1), Lara Zipperer (2), and Nicole K. Clay (1) 5 6 *Correspondence should be addressed to B. Barco ([email protected]) 7 8 Current addresses: 9 (1) Department of Molecular, Cellular & Developmental Biology, Yale University, Kline 10 Biology Tower 734, 219 Prospect St., New Haven, CT 06511 11 (2) Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 12 06511 13 14

1 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15 Key words: enzyme promiscuity, regioselectivity, indole-3-cyanohydrin, camalexin, 16 cytochrome P450 monooxygenase, (regulatory) neofunctionalization, indole- 17 carbonylnitrile (ICN), recognition site (SRS) 18 19 Abstract 20 Cytochrome P450 monooxygenases (P450s) constitute the largest metabolic enzyme 21 family in plants, responsible for synthesizing hundreds of thousands of specialized 22 metabolites with essential roles in chemical defenses against herbivores and 23 pathogens (Banks et al., 2011; Nelson and Werck-Reichhart, 2011; Wurtzel and 24 Kutchan, 2016). Substrate promiscuity has been documented to play a central role in 25 the evolution of plant specialized metabolic (Weng et al., 2012; Leong and 26 Last, 2017), however most plant P450s are highly substrate-specific (Verpoorte, 2013). 27 Here, we show the rapid inversion of primary and weak secondary (promiscuous) 28 catalytic activities between two distinct yet evolutionarily linked multifunctional P450s, 29 CYP71A12 and CYP71A13, based on intramolecular epistasis of two amino acid 30 residues under positive selection in CYP71A12. Furthermore, we uncover previously 31 undocumented catalytic activity during the inversion as well as naturally occurring 32 amino acid substitution patterns that could have been present in evolutionary 33 intermediates between the two enzymes. Comparative expression profiling and 34 homology modeling reveal that natural selection acted on the promoter of CYP71A13 35 and the substrate-recognition elements of CYP71A12 to improve the efficiencies of 36 their promiscuous reactions. The rise in catalytic promiscuity potentiated the 37 divergence of new P450 enzyme functions in cyanogenic defense metabolism. 38 of promiscuous reactions is one of the core technologies 39 underpinning the field of synthetic biology. Our results provide a more complete 40 understanding of how natural selection uses promiscuous reactions to generate new 41 enzymes in nature and chemical diversity in pathogen defense, as well as demonstrate 42 a novel strategy for identifying their molecular origins in highly divergent, related 43 enzymes. 44

2 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

45 Main 46 Class II P450s from a monophyletic group of CYP71, CYP736 and CYP83 families have 47 been repeatedly recruited for the biosynthesis of cyanohydrins (α-hydroxynitriles) and 48 related structures from amino acid-derived oximes (Fig. 1a) (Bak et al., 1998; 49 Jørgensen et al., 2011; Nafisi et al., 2007; Nelson and Werck-Reichhart, 2011; Takos et 50 al., 2011; Irmisch et al., 2014; Rajniak et al., 2015; Yamaguchi et al., 2014, 2016). They 51 catalyze three consecutive reactions: an E-to-Z isomerization of the oxime substrate, 52 oxime dehydration, and hydroxylation at either the α-, β-, γ- or benzylic carbon (Bak et 53 al., 1998; Nafisi et al., 2007; Klein et al., 2013; Clausen et al., 2015; Knoch et al., 2016). 54 In A. thaliana, CYP83B1 lost the ability to isomerize tryptophan (Trp)-derived indole-3- 55 acetaldoxime (IAOx), using instead the E-oxime as its substrate in a pathway that 56 produces the defense metabolite 4-methoxy-indol-3yl-methylglucosinolate (4M-I3M) 57 (Fig. 1a) (Bak et al., 2001; Bednarek et al., 2009; Clay et al., 2009; Bednarek et al., 58 2011; Clausen et al., 2015). By contrast, CYP71A12 and CYP71A13 - which share 59 89.3% sequence identity - catalyze the conversion of IAOx to form α-hydroxy-indole-3- 60 acetonitrile (α-hydroxy-IAN), a novel cyanohydrin that is the precursor to the 61 cyanogenic defense metabolite 4-hydroxy-indole-3-carbonylnitrile (4OH-ICN) in 62 Arabidopsis (Fig. 1a) (Nafisi et al., 2007; Klein et al., 2013; Rajniak et al., 2015). 63 CYP71A13 alone additionally dehydrates α-hydroxy-IAN to form dehydro-IAN, 64 precursor to the non-cyanogenic defense metabolite camalexin in Arabidopsis (Klein et 65 al., 2013).

3 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(E/Z)-oxime acetonitrile cyanohydrin A barley CYP71C113 CYP79A8 CYP71L1 CYP71C113 Glc CYP79A12 HO O NH2 NOH CYP71U5 CYP71L1 N N N CO2H epiheterodendrin L-Leu CYP71C113 CYP71L1 β/γ-hydroxynitrile glycosides sorghum CYP71E1 Glc HO O NH2 CYP79A1 NOH

N N N CO2H

HO HO HO HO HO L-Tyr dhurrin

Arabidopsis CYP71A12 CYP79B2 CYP71A13 CYP71A13 CYP79B3 HO NOH N N N

N N N N H H H H L-Trp IAOx IAN α-OH-IAN dehydro-IAN camalexin

CYP83B1 indole glucosinolates

ICN 4OH-ICN B Capsella

CYP71A28CYP71A13CYP71A12 Arabidopsis thaliana

8 months old Arabidopsis lyrata Boechera Olimarabidopsis cabulica Capsella rubella

Super-tribe Crucihimalaya lasiocarpa n.d. Erysimum Boechera holboelii Polyctenium fremontii tr. Lineage I Erysimum chieri n.d. n.d. Cardamine hirsuta n.d. n.d. Brassica Lineage II Brassica rapa n.d. n.d.

0 50 100 150 0 5 15 25 35 nmol / mg DW nmol / mg DW 66 camalexin (4OH-)ICN + products Fig. 1. (a) (left) Images courtesy of https://plants.usda.gov. (right) Schematics of the three-branched pathway for cyanohydrin metabolism in H. vulgare, S. bicolor, and A. thaliana. Amino acids are N,N-hydroxylated and decarboxylated by CYP79 enzymes to produce aldoximes. CYP71 enzymes then perform a variety of subsequent reactions, including aldoxime isomerization, aldoxime dehydration, acetonitrile oxygenation, and cyanohydrin dehydration. Diverse enzyme families then tailor these compounds into cyanogenic glycosides (e.g. epiheterodendrin and dhurrin), cyanohydrin- derived antimicrobials (e.g. camalexin and 4OH-ICN), or side products not derived from a cyanohydrin (e.g. β/γ hydroxynitrile glycosides, indole glucosinolates). (b) (far left, top to bottom) Images of Capsella rubella, Boechera holboelii, Erysimum chieri, and Brassica rapa. (middle) Occurence of CYP71As denoted by yellow (CYP71A28), red (CYP71A13), and blue (CYP71A12) circles. Dashed boxes indicate hypothesized presence due to the absence of genomic sequence. (right) Camalexin and ICN metabolite levels in 10-day-old seedlings inoculated with Pto avrRpm1 for 30 h. Data represent the mean ± SE of four replicates of 13 to 17 seedlings each from a compilation of three independent experiments, each containing A. thaliana as a positive control. Experiments were repeated twice, producing similar results. Thick branches represent the Super-tribe. ICA-ME, breakdown product of ICN. Abbreviations: DW, dry weight; tr. trace; n.d., not detected, tr., trace.

4 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

67 Fig. 1. Phylogenetic distribution of CYP71A12 and CYP71A13 orthologs and their 68 physiological functions. (a) Biosynthetic pathways of cyanohydrin-derived plant 69 defense metabolites. Images courtesy of https://plants.usda.gov. (b) Presence of 70 CYP71A12, CYP71A13, and CYP71A28 orthologs in the Brassicaceae, inferred from 71 shared synteny (solid circles) or relatedness to plants with sequenced orthologs 72 (dashed circles) See Methods for lists of contigs and chromosomes used. Phylogenetic 73 species tree (left). Levels of camalexin (middle) and (4OH-)ICN metabolites (right) in 10- 74 day-old seedlings inoculated with Psta for 30 h. Data represent mean ± SE of four 75 replicates of 13 to 17 seedlings each. Thick branches represent the Super-tribe. To 76 ensure robust sampling of certain lineages, unsequenced plant species were profiled, 77 and presence of orthologs in these species were predicted based on the available 78 sequences of close relatives. To ensure robust activation of defense metabolism in 79 sampled species, 4M-I3M and related metabolites were profiled, and 4M-I3M’s 80 signaling function was assayed (Extended Data Fig. 1a–b; Supplementary Note 1).

5 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

81 To determine whether the CYP71A13 dehydration reaction is indicative of a functional 82 gain or loss, we investigated whether physiological functions of CYP71A12/A13 are 83 conserved outside of Arabidopsis. To that end, we performed comparative 84 phylogenetic and syntenic analyses to identify putative homologs/homeologs from 85 available sequences of plants in the mustard family (Brassicaceae) (Supplementary 86 Table 1). We then profiled (6-methoxy)camalexin and ICN metabolites in seedlings that 87 were elicited with the bacterial pathogen Pseudomonas syringae pv. tomato DC3000 88 avrRpm1 (Psta), using liquid chromatography coupled with diode array detection and 89 mass spectrometry (LC-DAD-MS). The Brassicaceae is split into a larger core clade of 90 lineage I and II species and a smaller sister clade, containing the genus Aethionema 91 (Couvreur et al., 2009; Huang et al., 2015). CYP71A12 and CYP71A13 orthologs in a 92 monophyletic group of lineage I species (hereafter referred to as the ‘Super-tribe’) likely 93 contribute to the biosynthesis of (4OH-)ICN and (6-methoxy)camalexin in these species 94 (Fig. 1b; Extended Data Fig. 1, Supplementary Note 1), whereas CYP71A13 orthologs 95 in lineage II species do not synthesize (4OH-)ICN and (6-methoxy)camalexin (Fig. 1b; 96 Extended Data Fig. 2d), indicating important functional differences between lineage I 97 and II CYP71A13 enzymes. 98 99 CYP71A12 and CYP71A13 are paralogous enzymes that diverged from CYP71A28 100 through transposition and tandem gene duplications (Werck-Reichhart et al., 2002; Bak 101 et al., 2011). Our maximum likelihood tree shows excellent support (>90%) for three 102 distinct subclades: lineage I CYP71A12, lineage I CYP71A13, and lineage II CYP71A13 103 orthologs (Fig. 2a; Extended Data Fig. 2a). The absence of lineage II CYP71A12 104 sequences (Fig. 2a; Extended Data Fig. 2a–b) suggests either loss of CYP71A12 or 105 non-duplication of CYP71A13 (Fig. 2a; Extended Data Fig. 2a–b). CYP71A18 is the sole 106 nonsyntenic member of the CYP71A12/CYP71A13 clade (Fig. 2a; Extended Data Fig. 107 2a–b) and likely duplicated from CYP71A12 after A. thaliana speciation; it has no 108 characterized function to date. The elongated branch length underlying the 109 CYP71A12/CYP71A13 clade suggests accelerated evolution (Fig. 2a), which could be 110 due to positive selection or neutral drift after escape from the evolutionary constraints

6 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

111 imposed on the common ancestor. To examine the first scenario, we performed 112 likelihood ratio tests (LRTs) based on branch-site models to detect positive selection 113 on individual codons along prespecified lineages of our tree (Fig. 2a; Extended Data 114 Fig. 2a) (Zhang et al., 2005). As expected, a substitution rate (dN/dS) below 1 (0.15- 115 0.16) was observed for most residues throughout the tree (Extended Data Fig. 1c; 116 Supplementary Table 1), indicating purifying selection. However, a dN/dS far above 1 117 (80.03) was observed for thirteen residues in CYP71A12 sequences against all other 118 sequences (Fig. 2a; Extended Data Figs. 2b–c, 3a; Supplementary Table 1), indicating 119 that these residues are conserved or under neutral selection in CYP71A13 and 120 CYP71A28 enzymes, but are under positive selection in CYP71A12 enzymes. LRTs 121 also accepted the branch-site model over a null hypothesis that assumes neutral 122 evolution in the foreground lineage (P < 0.05; Supplementary Table 1), a more stringent 123 test for positive selection. LRTs did not accept the branch-site model for positive 124 selection on CYP71A18 (Supplementary Table 1), indicating CYP71A18 has escaped 125 from the selective constraints imposed on CYP71A12. 126

7 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B 100 CYP71A14/15sLineage I CYP71A14/15 (N=7) amino acid position CYP71A18 232 481 77 CYP71A12 49 Lineage I 81 AlCYP71A12 (Asn) T (Leu) 92 CYP71A12 C C non-Arabidopsis Lineage I CYP71A12 AA C CYP71A12s (N=8) CGT AT 90 90 CYP71A13 99 AhCYP71A13 (Arg) Lineage I CGT (Ile) 99 AlCYP71A13 CYP71A13 35 (Asn) A 100 C AAC GT CYP71A13snon-Arabidopsis Lineage (N=5) I CYP71A13 81 BrCYP71A13 Lineage II 100 SiCYP71A13 CYP71A13 T (Arg) C (Ile) CYP71A28sBrassicaceae CYP71A28 (N=10) 76 CGC ATT CYP71A21

0.1 C C (Arg) A T (Ile) G C A T GTC C D Arg232Asn Ile481Leu

amino acid position 232 481 SRS3 SRS-3 SRS-6 SRS2 αG β4-1 SRS6 CYP71A12 ALAWIDRINGF DLTEAFGLD CYP71A13 ILAWIDGIRGF DLTEAIGID SRS1 SRS4 SRS5 BrCYP71A13 SLAWIDRIRGF DIAEAVGID AaCYP71A28C SLAWLDRIRGL DLSEATSIE

127 Fig. 2 (a) Phylogenetic maximum likelihood tree of full-length CYP71A family protein sequences. Bootstrap values (N=100 replicate trees) are shown in red at the nodes. Scale bar represents 0.05 nucleotide substitutions per site. Individual branches under positive selection per the “branch-site model” within Codeml (P < 0.05 based on likelihood ratio tests) are shaded in green (see Supplementary Table 1). Enzymes used for heterologous expression in Fig 3 are in grey boxes. IDs used for alignment are found in Supplementary File 1. (b) Logos of two positively selected codons by for the Lineage I CYP71A12, Lineage I CYP71A13, Lineage II CYP71A13, and CYP71A28 clades (N = 17, 9, 3, and 17 for first codon and 12, 9, 3, and 14 for second codon, respectively). IDs used are found in Supplementary File 2. (c) Protein alignment of the SRS-3 and SRS-6 sequences of CYP71A12, CYP71A13, BrCYP71A13, and AaCYP71A28C. The red residues indicate positions under positive selection for CYP71A12. (d) Overlay view of the Phyre 2.0-based homology model for AaCYP71A28C, CYP71A12, and CYP71A13. The SRS regions are shown in dark-blue (AaCYP71A28C), blue (CYP71A12), and light-blue (CYP71A13), and the heme-binding loop is shown in dark-purple (AaCYP71A28C), purple (CYP71A12), and light-purple (CYP71A13). CYP71A12 Sites under positive selection per the “branch-site model” within Codeml are labeled and in dark-red(AaCYP71A28C), red (CYP71A12), and light-red (CYP71A13).

8 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

128 Fig. 2. Branch-site likelihood modeltest for positive selection identifies two 129 putative substrate-binding residues within the CYP71A12 . 130 (a) Phylogenetic maximum likelihood tree of full-length CYP71A12, CYP71A13 and 131 CYP71A28 protein sequences. Bootstrap values (N=100 replicate trees) are shown in 132 red at the nodes. Scale bar represents 0.05 nucleotide substitutions per site. Individual 133 branches identified to be under positive selection, using the Branch-site model (P < 134 0.05, likelihood ratio tests) are shaded in green. Proteins used in this study are boxed 135 in grey. (b) Alignment of sequence logos of amino acid positions 232 and 481 of full- 136 length and and partial sequences. (c) Alignment of the SRS-3 and SRS-6 sequences. 137 Residues expected to fall within SRS sequences are boxed in gray. Residues under 138 positive selection are in red. The conserved secondary structures are indicated at the 139 top. (d) Structural overlay of homology models of CYP71A12 (medium-colored), and 140 CYP71A13 (light-colored), and AaCYP71A28C (dark-colored) shown as ribbon 141 drawings (alignment in Extended Data Fig. 3b). Colored regions indicate the locations 142 of the six SRSs (blue) around the heme-containing active site (purple) and of the 143 residues in positions 232 and 481 (red, in ball and stick). See Methods for lists of 144 sequences.

9 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

145 We then investigated the natural distribution pattern of Asn232 and Leu481, which are 146 CYP71A12 residues identified to be under positive selection with posterior probabilities

147 ≥ 095 (Supplementary Table 1). CYP71A28 and lineage II CYP71A13 enzymes 148 predominantly contain Arg232 and Ile481 residues (Fig. 2b). Lineage I CYP71A13 149 enzymes retain Ile481 but nearly half exhibit Arg232Asn substitutions (Fig. 2b), a 150 pattern that could have been present in evolutionary intermediates between the two 151 P450s. While most CYP71A12 enzymes exhibit Arg232Asn and Ile481Leu 152 substitutions, CrCYP71A12L1 retains Arg232, a pattern that could represent the first 153 mutational step in regressive evolution. An Arg232Asn substitution requires a two-base 154 change to interconvert between Arg and Asn; we were unable to find one-base 155 changes (leading to Ser/His) at the corresponding position in the available sequences 156 (n = 29; see Methods), further supporting rapid CYP71A12 evolution. 157 158 The six predicted substrate recognition sites (SRS1–6) in CYP71A12s, each contains a 159 single residue under positive selection (Fig. 2c; Extended Data Fig. 3a–b) (Gotoh 1992; 160 da Fonseca et al., 2007; Seifert and Pleiss, 2009; Zawaira et al., 2011). Of these 161 residues, only Asn232 and Leu481 can form a hydrogen bond and hydrophobic 162 contacts with the anionic site and aromatic ring of the bound substrate α-hydroxy-IAN. 163 Homology models based on six P450 crystal structures in the closed conformation 164 predict that the Arg232Asn substitution may form a new hydrogen bond with the 165 substrate, orienting it away from the active site (Fig. 3d). Furthermore, Ile481Leu may 166 alter the conformation of the C-terminal loop, which houses SRS-6, to cooperatively 167 support the exclusion of the substrate from the active site and the loss of dehydration 168 reaction in CYP71A12 (Fig. 3d).Interestingly, a two-residue sequence gap N-terminal to 169 SRS-6, present in CYP71A12s and CYP71A13s and not in CYP71A28s (Extended Data 170 Fig. 3a), repositions the Ile481 sidechain so that it may form new hydrophobic 171 interactions with the substrate (Fig. 3d) and potentiate the acquisition of dehydration 172 reaction in CYP71A13. 173

10 Camalexin

bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Camalexin ICN 0.30

30 d a d a 0.20 20

e 10 0.10 nmolmgDW / nmolmgDW / tr. a c bc ab bc n.d. n.d. n.d. n.d. n.d. 0 0.00 EV EV

CYP71A13CYP71A12CYP71A18 CYP71A13CYP71A12CYP71A18 CrCYP71A28BrCYP71A13 CrCYP71A28BrCYP71A13 AaCYP71A28CBrCYP71A28L AaCYP71A28CBrCYP71A28L

B Camalexin 20 ICN + ICA-ME 25 d ICN c 15 0.30 20 c a ab a ab 15 c b 10

0.20 b 10 5 nmolmgDW / nmolmgDW / 5

0.10 c nmolmgDW / ab d ab n.d. n.d. n.d. n.d. 0 a b a ab 0

0.00 WT WT EV

cyp71A13cyp71A12 cyp71A13cyp71A12 cyp71A18-1cyp71A18-2 cyp71A18-1cyp71A18-2 CYP71A13cyp71A12/13CYP71A12CYP71A18 cyp71A12/13 CrCYP71A28BrCYP71A13 AaCYP71A28CBrCYP71A28L C cyp71A12/13/18-1 cyp71A12/13/18-1 5 x 106 1.5 x 105 ICN 168.50-169.50m/z (a.u.) intensity Camalexin (a.u.) 6 CYP71A12 ioncount 7 CYP71A125 N232R 6 CYP71A12L481I 5 CYP71A12N232R L481I 4 CYP71A133 1 3 AaCYP71A28C2 empty vector 2 1 12.4 11.7 retention time (min) retention time (min) 174

Fig. 3 (a-b) Transient expression in Nicotiana benthamiana leaves of the camalexin (top) and ICN (bottom) biosynthetic pathways using either CYP71A enzymes boxed in Figure 2a or site-directed reversion mutations of CYP71A12 residues in Figure 2a. CYP79B2, GGP1, and CYP71B15 were co- expressed with each CYP71A to reconstitute the camalexin pathway and CYP79B2 and FOX1 were co-expressed with each CYP71A used to reconstitute the ICN pathway. The full dataset for (b) is available as Supplementary Figure 3d. (c) LC-DAD analysis of camalexin (top) and ICN (bottom) in 10-day-old seedlings elicited with Pto avrRpm1 for 24 h. ICA-ME is an ICN breakdown product. Data represent the mean mean ± SE of three replicates of 13 to 17 seedlings each. Different11 letters denote statistically significant differences (P < 0.05, two-tailed t test). Experiments were repeated at least twice, producing similar results. Abbreviations: DW, dry weight; n.d., not detected.(c) Transient expression in Nicotiana benthamiana leaves of the camalexin (top) and ICN (bottom) biosynthetic pathways using wild-type or site-directed mutants of CYP71A12, CYP71A13, and AaCYP71A28C. bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

175 Fig. 3. Mutagenesis and epistasis potentiated CYP71A12 functionalization. (a–b) 176 Levels of camalexin (left) and (4OH-)ICN (right) in N. benthamiana leaves (a) and 10- 177 day-old seedlings elicited with Psta for 24 h (b). Empty vector (EV) or indicated CYP71 178 genes in (a) were co-expressed with CYP79B2, GGP1, and CYP71B15 to reconstitute 179 the camalexin pathway (left), or with CYP79B2 and FOX1 to reconstitute the ICN 180 pathway (right). Data represent mean ± SE of three replicate leaves (a) and three 181 replicates of 13-17 seedlings (b). Different letters denote statistically significant 182 differences (P < 0.05, two-tailed t test). (c) Fluorescence emission chromatogram of 183 camalexin (left) and extracted ion chromatogram of ICN (right) in N. benthamiana 184 leaves.

12 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

185 To investigate whether CYP71A12 and CYP71A13 are evolutionarily constrained in 186 their catalytic activities, we first compared the reaction product profiles of 187 representative Brassicaceae CYP71A12, CYP71A13, and CYP71A28 enzymes by 188 reconstituting the ICN and camalexin pathways in Nicotiana benthamiana and 189 substituting enzymes at the CYP71A-catalyzed steps. ICN and camalexin production 190 was used to infer the production of labile direct products α-hydroxy-IAN and dehydro- 191 IAN (Klein et al., 2013). The substitution of A. thaliana or B. rapa CYP71A13 in the 192 camalexin and ICN pathway yielded high amounts of camalexin and low or 193 undetectable amounts of ICN, respectively, whereas the substitution of CYP71A12 or 194 CYP71A18 yielded the near-opposite (Fig. 3a), indicating that they, like most enzymes, 195 catalyze promiscuous reactions (Khersonsky and Tawfik, 2010; Copley, 2017). The 196 substitution of CYP71A28 enzymes or the empty vector (EV) yielded trace and 197 undetectable amounts of camalexin and ICN, respectively (Fig. 3a), indicating non- 198 acquisition of specific physiological functions. The promiscuous camalexin reaction 199 observed for EV in N. benthamiana is likely due to the presence of distantly-related 200 native CYP71As (Ohkawa et al., 1998). The rise and fall in efficiencies for promiscuous 201 secondary reactions in CYP71A13 and CYP71A12 relative to BrCYP71A13 and 202 CYP71A18 (Fig. 3a) are likely the result of changes in selective pressures on these 203 enzymes (Fig. 2b). Consistent with the observed promiscuous reactions in vivo, 204 camalexin and (4OH-)ICN amounts are higher in Psta-infected loss-of-function 205 cyp71A12 and cyp71A13 single mutants, respectively, than in the cyp71A12 cyp71A13 206 double mutant (Fig. 3b; Extended Data Figs. 5-6, Supplementary Note 2) (Müller et al., 207 2015). 208 209 We next generated single and double Asn232Arg and Leu481Ile CYP71A12 revertants 210 and compared their reaction profiles to CYP71A12 and CYP71A13. The CYP71A12 211 Asn232Arg reversion did not alter catalytic specificities or efficiencies (Fig. 3c; 212 Extended Data Fig. 4b). The Leu481Ile reversion yielded intermediate amounts of both 213 ICN and camalexin (Fig. 3c; Extended Data Fig. 4b).This previously undocumented 214 catalytic activity may have been present in evolutionary intermediates; confirming this,

13 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

215 we observed a Leu481Ile substitution in at least one CYP71A12 ortholog (Fig. 2b). The 216 CYP71A12 double revertant enzyme completely inverted the catalytic specificities to 217 that of CYP71A13 and notably restored the dehydration reaction to dehydro-IAN (Fig. 218 3c; Extended Data Fig. 4b). These results indicate that the catalytic specificities and 219 efficiencies between CYP71A12 and CYP71A13 are mutationally accessible, requiring 220 as little as three base changes. Furthermore, the regressive evolution of CYP71A12 221 involves an epistatic interaction on successive mutations; whereby the Asn232Arg and 222 Leu481Ile substitutions respectively potentiate and actualize the ability to 223 biosynthesize dehydro-IAN. 224 225 In the course of this study, we observed that the primary reactions of BrCYP71A13 and 226 CYP71A18, while apparent via heterologous expression (Fig. 3a), could not be 227 detected in their Psta-infected native hosts (Figs. 1b and 3b). These results suggest 228 that BrCYP71A13 and CYP71A18, which book-end the evolutionary trajectories of 229 CYP71A13 and CYP71A12, are not natively expressed in a pathogen-inducible 230 manner. To test this hypothesis, we obtained datasets for all transcriptomics 231 experiments on biotic stress in Arabidopsis and B. rapa (Toufighi et al., 2005; Klein et 232 al., 2015; Chen et al., 2016), and examined expression patterns of all full-length 233 CYP71A members. CYP71A18’s gene expression is mostly unchanged between mock 234 and pathogen treatments and at least 22 to 467-fold less than those of 235 CYP71A12/CYP71A13/CYP83B1 for a variety of pathogen treatments (Fig. 4a). 236 Similarly, BrCYP71A13’s gene expression is 44 to 9185-fold less than those of related 237 CYP71CRs in B. rapa defense metabolism (Fig. 4b) (Klein et al., 2015; Chen et al., 238 2016). Altogether, our results suggest that changes in gene expression and substrate- 239 recognition sites were both necessary to increase the efficiency and thus the 240 physiological relevance of catalytic promiscuity in CYP71A13 and CYP71A12, 241 respectively (Fig. 4c). 242

14 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B 0.1 1 1000 MAMPs bacteria 0 100 flg22 Psta B. cinereaG. orontii relative signal P. infestans FPKM flg22Psm P. brassicae CYP71A28 BrCYP71A28 CYP71A18 BrCYP71A28L CYP71A12 Bra036897 CYP71A13 Bra020467 CYP71A14/15 BrCYP71A13 CYP71A16 Bra019538 CYP71A19/20 CYP71CR1 CYP83B1 CYP71CR2

C IAOx

NOH

N H

gene expression 232 481 481 dehydro-IAN α-OH-IAN

HO HO

HO

catalytic N N N N N N activities HO N N N N H H N

N N N N H N H N H H N H 232 481 H H etc. etc.

selective pressures

ancestor BrCYP71A13 CYP71A13 intermediate CYP71A12 CYP71A18

243

Fig. 4 (a) (left) CYP71A protein phylogeny adapted from Bak et al. (2011). (right) Raw microarray data of CYP71A gene expression from Toronto BAR (N=200 conditions). (b) (top) Model for regulatory and catalytic neofunctionalization of CYP71A13 and CYP71A12 from a CYP71A28 progenitor. Defense-regulated genes are denoted in black and important amino acid states at positions 232 and 481 are denoted by NI (ancestral promiscuous state), RI (present collective CYP71A13 state), and NL (present CYP71A12 state). (bottom) Model for the functions of amino acid residues 232 and 481 within CYP71A12 or CYP71A13 enzymes (denoted in box) in the context of camalexin and ICN biosynthesis.

15 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

244 Fig. 4. Changes in gene expression potentiated neofunctionalization of CYP71A13 245 and release of CYP71A18 from selective constraint. (a) Heat map of gene 246 expression for P450 genes in Arabidopsis under various pathogen treatments. (b) Heat 247 map of gene expression for P450 genes in B. rapa under two pathogen treatments. (c) 248 Proposed evolutionary trajectory of CYP71A enzymes involved in cyanogenic defense 249 metabolism in Arabidopsis.

16 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

250 251 Proteins evolve by following trajectories through ‘rugged’ sequence space, consisting 252 of mutational steps between functional proteins and between their catalytic 253 specificities (Smith, 1970; Kondrashov & Kondrashov, 2015; Starr and Thornton, 2016). 254 The rise in catalytic promiscuity is likely responsible for the repeated evolution of 255 cyanohydrins and their cyanogenic and non-cyanogenic derivatives as defense 256 chemicals in over 2,650 higher plant species and certain arthropods, fungi and bacteria 257 (Nahrstedt 1985; Davis and Nahrstedt 1987; Blumer and Haas, 2000; Zagrobelny et al., 258 2007, 2008; Klein et al., 2013; Caspar and Spiteller, 2015; Rajniak et al., 2015). We 259 observed that promoter mutations and intramolecular epistasis rendered accessible 260 the evolutionary trajectories from CYP71A12 to CYP71A13, BrCYP71A13 to 261 CYP71A13, and CYP71A12 to CYP71A18, and the resulting increases and decreases 262 in catalytic promiscuity. Collectively, our data provide additional insight into the 263 evolvability of promiscuous activities for the discovery and directed evolution of novel 264 enzymes. 265 266 METHODS 267 Information on Abbreviations, Plant Materials and Growth Conditions, Pathogen 268 infection, Extraction and LC-DAD-MS Analysis of (6-methoxy)camalexin and (4OH- 269 )ICN, Extraction and LC-DAD-FLD-MS Analysis of Glucosinolates, Identification of 270 CYP71A13 and CYP71A12 orthologs, 4M-I3M Activity Assay and Identification of 271 callose pathway orthologs, and Heterologous pathway reconstitution in N. 272 benthamiana can be found in Supplementary information. 273 274 REFERENCES 275 1. Bak, S. et al. Cloning of three A-type cytochrome P450, CYP71E1, CYP98, and 276 CYP99 from Sorghum bicolor (L.) Moench by a PCR approach and identification by 277 expression in Escherichia coli of CYP71E1 as a multifunctional cytochrome P450 in 278 the biosynthesis of the cyanogenic glucoside dhurrin. Plant Mol. Biol. 36, 393-405 279 (1998).

17 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

280 2. Bak, S. et al. CYP83B1, a cytochrome P450 at the metabolic branch point in auxin 281 and indole glucosinolate biosynthesis in Arabidopsis. Plant Cell 13, 101-111 (2001). 282 3. Bak, S. et al. Cytochromes P450. Arabidopsis Book 9, e0144 (2011). 283 4. Banks, J.A. et al. The Selaginella genome identifies genetic changes associated with 284 the evolution of vascular plants. Science 332, 960-963 (2011). 285 5. Bednarek, P. et al. A glucosinolate metabolism pathway in living plant cells mediates 286 broad-spectrum antifungal defense. Science 323, 101-106 (2009). 287 6. Bednarek, P. et al. Conservation and clade-specific diversification of pathogen- 288 inducible tryptophan and indole glucosinolate metabolism in Arabidopsis thaliana 289 relatives. New Phytologist 192, 713-726 (2011). 290 7. Blumer, C., & Haas, D. Mechanism, regulation, and ecological role of bacterial 291 cyanide biosynthesis. Archives of Microbiology, 173(3), 170-177 (2000). 292 8. Caspar, J., & Spiteller, P. A free cyanohydrin as arms and armour of Marasmius 293 oreades. ChemBioChem, 16(4), 570-573 (2000). 294 9. Chen, J. et al. Transcriptome analysis of Brassica rapa near-isogenic lines carrying 295 clubroot-resistant and–susceptible alleles in response to Plasmodiophora brassicae 296 during early infection. Frontiers in Plant Science 6, 1183 (2016). 297 10. Clausen M. et al. The bifurcation of the cyanogenic glucoside and glucosinolate 298 biosynthetic pathways. Plant J. 84, 558-573 (2015). 299 11. Clay, N.K. et al. Glucosinolate metabolites required for an Arabidopsis innate 300 immune response. Science 323, 95-101 (2009). 301 12. Copley, S.D. Shining a light on enzyme promiscuity. Curr. Opin. Struct. Biol. 47, 302 167-175 (2017). 303 13. Couvreur, T.L. et al. 2009. Molecular phylogenetics, temporal diversification, and 304 principles of evolution in the mustard family (Brassicaceae). Mol. Biol. Evol. 27, 55- 305 71 (2009). 306 14. da Fonseca, R.R., Antunes, A., Melo, A. & Ramos, M.J. Structural divergence 307 and adaptive evolution in mammalian cytochromes P450 2C. Gene 387, 58-66 308 (2007).

18 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

309 15. Davis, R. H., & Nahrstedt, A. Biosynthesis of cyanogenic glucosides in 310 butterflies and moths: Effective incorporation of 2-methylpropanenitrile and 2- 311 methylbutanenitrile into linamarin and lotaustralin by Zygaena and Heliconius 312 species (Lepidoptera). Insect biochemistry, 17(5), 689-693 (1987). 313 16. Gotoh, O. Substrate recognition sites in cytochrome P450 family 2 (CYP2) 314 proteins inferred from comparative analyses of amino acid and coding nucleotide 315 sequences. J. Biol. Chem. 267, 83-90 (1992). 316 17. Huang, C.H. et al. Resolution of Brassicaceae phylogeny using nuclear genes 317 uncovers nested radiations and supports convergent morphological evolution. Mol. 318 Biol. Evol. 33, 394-412 (2015). 319 18. Irmisch, S. et al. Herbivore-induced poplar cytochrome P450 enzyme CYP71 320 family convert aldoximes to nitriles which repel a generalist caterpillar. Plant J. 30, 321 1095-1107 (2014). 322 19. Jørgensen, K. et al. Biosynthesis of the cyanogenic glucosides linamarin and 323 lotaustralin in cassava: isolation, biochemical characterization, and expression 324 pattern of CYP71E7, the oxime-metabolizing cytochrome P450 enzyme. Plant 325 Physiol. 155, 282-292 (2011). 326 20. Khersonsky, O. & Tawfik, D.S. Enzyme promiscuity: a mechanistic and 327 evolutionary perspective. Annu. Rev. Biochem. 79, 471-505 (2010). 328 21. Klein, A.P., Anarat-Cappillino, G. & Sattely, E.S. Minimum set of cytochromes 329 P450 for reconstituting the biosynthesis of camalexin, a major Arabidopsis antibiotic. 330 Angew. Chemie 52, 13625-13628 (2013). 331 22. Klein, A.P. & Sattely, E.S. Two cytochrome P450 catalyze S-heteroxyclizations in 332 cabbage phytoalexin biosynthesis. Nat. Chem. Biol. 11, 837-839 (2015). 333 23. Knoch, E. et al. Biosynthesis of the leucine derived α‐, β‐and γ‐hydroxynitrile 334 glucosides in barley (Hordeum vulgare L.). Plant J. 88, 247-256 (2016). 335 24. Kondrashov, D.A. & Kondrashov, F.A. Topological features of rugged fitness 336 landscapes in sequence space. Trends Genet. 31, 24-33 (2015).

19 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

337 25. Leong, B.J. & Last, R.L. Promiscuity, impersonation and accommodation: 338 evolution of plant specialized metabolism. Curr. Opin. Struct. Biol. 47, 105-112 339 (2017). 340 26. Müller, T. M. et al. (TRANSCRIPTION ACTIVATOR-LIKE EFFECTOR 341 -mediated generation and metabolic analysis of camalexin-deficient 342 cyp71a12 cyp71a13 double knockout lines. Plant Physiol. 168, 849-858 (2015). 343 27. Nafisi, M. et al. Arabidopsis cytochrome P450 monooxygenase 71A13 catalyzes 344 the conversion of indole-3-acetaldoxime in camalexin synthesis. Plant Cell 19, 2039- 345 2052 (2007). 346 28. Nahrstedt, A. Cyanogenic compounds as protecting agents for organisms. Plant 347 Systematics and Evolution, 150(1-2), 35-47 (1985). 348 29. Nelson, D. & Werck-Reichhart, D. A P450-centric view of plant evolution. Plant 349 J. 66, 194-211 (2011). 350 30. Nishimura, M.T. et al. Loss of a callose results in salicylic acid- 351 dependent resistance. Science 301, 969-972 (2003). 352 31. Rajniak, J., Barco, B., Clay, N.K. & Sattely, E.S. A new cyanogenic metabolite in 353 Arabidopsis required for inducible pathogen defence. Nature 525, 376-379 (2015). 354 32. Seifert, A. & Pleiss, J. Identification of selectivity-determining residues in 355 cytochrome P450 monooxygenases: a systematic analysis of the substrate 356 recognition site 5. Proteins 74, 1028-1035 (2009). 357 33. Sirim, D., Widmann, M., Wagner, F. & Pleiss, J. Prediction and analysis of the 358 modular structure of cytochrome P450 monooxygenases. BMC Struct. Biol. 10, 34 359 (2010). 360 34. Smith, J.M. Natural selection and the concept of a protein space. Nature 225, 361 563-564 (1970). 362 35. Starr, T.N. & J.W. Thornton. Epistasis in protein evolution. Protein Science 25, 363 1204-1218 (2016). 364 36. Takos, A.M. et al. Genomic clustering of cyanogenic glucoside biosynthetic 365 genes aids their identification in Lotus japonicus and suggests the repeated 366 evolution of this chemical defence pathway. Plant J. 68, 273-286 (2011).

20 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

367 37. Toufighi, K. et al. The Bio-Analytic Resource: e-northerns, expression angling, 368 and promoter analyses. Plant J. 43, 153-163 (2005). 369 38. Verpoorte, R. Secondary Metabolism, In: Metabolic engineering of plant 370 secondary metabolism (eds Verpoorte and A Wilhelm Alfermann) Kluwer Academic 371 Publishers, p. 1-29 (2013). 372 39. Weng, J.K., Philippe, R.N. & Noel, J.P. The rise of chemodiversity in plants. 373 Science 336, 1667-1670 (2012). 374 40. Werck-Reichhart, D., Bak, S. & Paquette, S. Cytochromes P450. Arabidopsis 375 Book, e0028 (2002). 376 41. Wurtzel, E. T. & Kutchan, T. M. Plant metabolism, the diverse chemistry set of 377 the future. Science 353, 1232-1236 (2016). 378 42. Yamaguchi, T., Yamamoto, K. & Asano, Y. Identification and characterization of 379 CYP79D16 and CYP71AN24 catalyzing the first and second steps in l- 380 phenylalanine-derived cyanogenic glycoside biosynthesis in the Japanese apricot, 381 Prunus mume Sieb. et Zucc. Plant Mol. Biol.86, 215-223 (2014). 382 43. Yamaguchi, T., Noge, K. & Asano, Y. Cytochrome P450 CYP71AT96 catalyses 383 the final step of herbivore-induced phenylacetonitrile biosynthesis in the giant 384 knotweed, Fallopia sachalinensis. Plant Mol. Biol. 91, 229-239 (2016). 385 44. Zagrobelny M, Olsen CE, Bak S, Møller BL. Intimate roles for cyanogenic 386 glucosides in the life cycle of Zygaena filipendulae (Lepidoptera Zygaenidae). Insect 387 Biochem Mol Biol 37, 1189-1197 (2007). 388 45. Zagrobelny, M., Bak, S., & Møller, B. L. Cyanogenesis in plants and arthropods. 389 Phytochemistry, 69(7), 1457-1468 (2008). 390 46. Zawaira, A. et al. An expanded, unified substrate recognition site map for 391 mammalian cytochrome P450s: analysis of molecular interactions between 15 392 mammalian CYP450 isoforms and 868 substrates. Curr.rug Metab. 12, 684-700 393 (2011). 394 47. Zhang, J., Nielsen, R. & Yang, Z. Evaluation of an improved branch-site 395 likelihood method for detecting positive selection at the molecular level. Mol. 396 Biol.vol. 22, 2472-2479 (2005).

21 bioRxiv preprint doi: https://doi.org/10.1101/398503; this version posted August 29, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

397 398 SUPPLEMENTARY INFORMATION 399 Additional supplementary information and extended data are available for this paper. 400 401 ACKNOWLEDGEMENTS 402 We thank JL Celenza for cyp79B2 cyp79B3, E Glawischnig for (cyp71A12 cyp71A13)- 403 1, and T Mitchell-Olds for Boechera stricta SAD12/LTM and Boechera holboelii 404 910/913. We thank ES Sattely for (4OH-)ICN and camalexin standards. This work was 405 supported by T32-GM007499 (to BB) and Elsevier/Phytochemistry Young Investigator 406 Award (to NKC). 407 408 AUTHOR CONTRIBUTIONS 409 BB and NKC performed 4M-I3M activity assays. BB and LZ performed indole 410 glucosinolate profiling. BB performed all other experiments. BB and NKC interpreted 411 the results and wrote the paper. 412 413 AUTHOR INFORMATION 414 The authors declare no competing financial interests. Correspondence should be 415 addressed to B. Barco ([email protected]) 416 417

22