bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Multi-omic analysis of a hyper-diverse metabolic pathway reveals 2 evolutionary routes to biological innovation 3 Gaurav D. Moghe1, Bryan J. Leong1,2, Steven Hurney1,3, A. Daniel Jones1,3, Robert L. Last1,2* 4 1Department of Biochemistry and Molecular Biology, 2Department of Plant Biology, 5 3Department of Chemistry, Michigan State University, East Lansing, USA 6 * Corresponding author: [email protected] 7

8 Abstract 9 The diversity of life on Earth is a result of continual innovations in molecular networks 10 influencing morphology and physiology. Plant specialized metabolism produces hundreds of 11 thousands of compounds, offering striking examples of these innovations. To understand how 12 this novelty is generated, we investigated the evolution of the family-specific, 13 trichome-localized acylsugar biosynthetic pathway using a combination of mass spectrometry, 14 RNA-seq, enzyme assays, RNAi and phylogenetics in non-model species. Our results reveal that 15 hundreds of acylsugars are produced across the Solanaceae family and even within a single plant, 16 revealing this phenotype to be hyper-diverse. The relatively short biosynthetic pathway 17 experienced repeated cycles of innovation over the last 100 million years that include gene 18 duplication and divergence, gene loss, evolution of substrate preference and promiscuity. This 19 study provides mechanistic insights into the emergence of plant chemical novelty, and offers a 20 template for investigating the ~300,000 non-model plant species that remain underexplored.

21 Impact statement 22 Integrative analyses of a specialized metabolic pathway across multiple non-model species 23 revealed mechanisms of emergence of chemical novelty in plant metabolism.

24 Introduction 25 Since the first proto-life forms arose on our planet some four billion years ago, the forces 26 of mutation, selection and drift have generated a world of rich biological complexity. This 27 complexity, evident at all levels of biological organization, has intrigued humans for millennia 28 (Tipton, 2008; Mayr, 1985). Plant metabolism, estimated to produce hundreds of thousands of 29 products with diverse structures across the plant kingdom (Fiehn, 2002; Afendi et al., 2012), 1

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

30 provides striking examples of this complexity. Plant metabolism is traditionally divided into 31 primary and secondary/specialized, the former referring to production of compounds essential for 32 plant development and the latter encompassing metabolites documented as important for plant 33 survival in nature and metabolites of as yet unknown functional significance (Pichersky and 34 Lewinsohn, 2011; Moghe and Last, 2015). While primary metabolism generally consists of 35 highly conserved pathways and enzymes, specialized metabolic pathways are in a state of 36 continuous innovation (Milo and Last, 2012). This dynamism produced numerous lineage- 37 specific metabolite classes such as steroidal glycoalkaloids in Solanaceae (Wink, 2003), 38 benzoxazinoid alkaloids in Poaceae (Dutartre et al., 2012), betalains in Caryophyllales 39 (Brockington et al., 2015) and glucosinolates in Brassicales (Halkier and Gershenzon, 2006). The 40 structural diversity produced by lineage-specific pathways makes them exemplary systems for 41 understanding the evolution of novelty in the living world. 42 Previous studies investigating the emergence of lineage-specific metabolite classes 43 uncovered the central role of gene duplication and diversification in this process: e.g. 44 duplications of isopropylmalate synthase (glucosinolates and acylsugars), cycloartenol synthase 45 (avenacin), tryptophan synthase (benzoxazinoids) and anthranilate synthase (acridone alkaloids) 46 (Moghe and Last, 2015). Duplications of members of enzyme families (e.g. cytochromes P450, 47 glycosyltransferases, methyltransferases, BAHD acyltransferases) also play major roles in 48 generating chemical novelty, with biosynthesis of >40,000 structurally diverse terpenoids — 49 produced partly due to genomic clustering of terpene synthases and cytochrome P450 enzymes 50 (Boutanaev et al., 2015) — as an extreme example. These duplicate genes, if retained, may 51 experience sub- or neo-functionalization via transcriptional divergence and evolution of protein- 52 protein interactions as well as via changes in substrate preference, reaction mechanism and 53 allosteric regulation to produce chemical novelty (Moghe and Last, 2015). In this study, we 54 sought to understand the molecular processes by which a novel class of metabolites — 55 acylsugars — emerged in the Solanaceae family. 56 Acylsugars are lineage-specific plant specialized metabolites detected in multiple genera 57 of the Solanaceae family including Solanum (King et al., 1990; Schilmiller et al., 2010; Ghosh et 58 al., 2014), Petunia (Kroumova and Wagner, 2003), Datura (Forkner and Hare, 2000) and 59 Nicotiana (Kroumova and Wagner, 2003; Kroumova et al., 2016). These compounds, produced 60 in the tip cell of trichomes on leaf and stem surfaces (Schilmiller et al., 2012, 2015; Ning et al., 2

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

61 2015; Fan et al., 2016a), typically consist of a sucrose or glucose core esterified to groups 62 derived from fatty acid or branched chain amino acid metabolism (Figure 1A). Multiple studies 63 demonstrated that acylsugars mediate plant-insect and plant-fungus interactions (Puterka et al., 64 2003; Simmons et al., 2004; Weinhold and Baldwin, 2011; Leckie et al., 2016; Luu et al., 2017), 65 with their structural diversity documented as functionally important in some cases (Puterka et al., 66 2003; Leckie et al., 2016). 67 Acylsugars, compared to other specialized metabolic classes such as alkaloids, 68 phenylpropanoids and glucosinolates, are biosynthetically rather simple. These compounds are 69 produced by a set of enzymes called acylsugar acyltransferases (ASATs), which catalyze 70 sequential addition of acyl chains to the sucrose molecule (Figure 1A) (Schilmiller et al., 2012, 71 2015; Fan et al., 2016a). ASATs are members of Clade III of the large and functionally diverse 72 BAHD enzyme family (St Pierre and De Luca, 2000; D’Auria, 2006) (Figure 1-Figure 73 Supplement 1). Despite their evolutionary relatedness, S. lycopersicum ASATs (SlASATs) are 74 only ~40% identical to each other at the protein level. ASATs exhibit different activities across 75 wild tomatoes due to ortholog divergence, gene duplication and neo-functionalization, leading to 76 divergence in acceptor and donor substrate repertoire of ASATs between wild tomato species 77 (Schilmiller et al., 2015; Ning et al., 2015; Fan et al., 2016a). In addition, loss of an ASAT 78 enzyme also contributed to diversification of acylsugar phenotypes between individuals of the 79 same species (Kim et al., 2012). This diversity, primarily studied in closely related wild tomato 80 species and underexplored in the broader Solanaceae family, creates opportunities for further 81 exploration of the mechanisms generating biological complexity and variation. 82 Previous studies of acylsugar biosynthesis primarily focused on closely related Solanum 83 species. In this study, we sought to understand the timeline for emergence of the ASAT activities 84 and explore the evolution of the acylsugar biosynthetic pathway since the origin of the 85 Solanaceae. Typically, significant hits obtained using BLAST searches are analyzed in a 86 phylogenetic context to understand enzyme origins (Frey et al., 1997; Ober and Hartmann, 2000; 87 Qi et al., 2004; Benderoth et al., 2006). However, ASAT orthologs can experience functional 88 diversification due to single amino acid changes and/or duplication (Schilmiller et al., 2015; Fan 89 et al., 2016a), precluding functional assignment based on sequence similarity. Thus, we inferred 90 the origins and evolution of the pathway with a bottom-up approach; starting by assessing the 91 diversity of acylsugar phenotypes across the family using mass spectrometry. Our findings 3

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

92 illustrate the varied mechanisms by which this specialized metabolic pathway evolved and have 93 implications for the study of chemical novelty in the plant kingdom.

94 Results and Discussion

95 Diversity of acylsugar profiles across the Solanaceae 96 While the Solanaceae family comprises 98 genera and >2700 species (Olmstead and 97 Bohs, 2007), there are extensive descriptions of acylsugar diversity reported for only a handful of 98 species (Severson et al., 1985; King et al., 1990; Shinozaki et al., 1991; Ghosh et al., 2014). In 99 this study, we sampled vegetative tissue surface metabolites from single of 35 Solanaceae 100 and four Convolvulaceae species. These species were sampled at the New York Botanical 101 Gardens and Michigan State University (Figure 1B; Figure 1-File Supplement 1A), and 102 acylsugar profiles were obtained using liquid chromatography-mass spectrometry (LC/MS) with 103 collision-induced dissociation (CID; see Methods) (Schilmiller et al., 2010; Ghosh et al., 2014; 104 Fan et al., 2016b). Molecular and substructure (fragment) masses obtained by LC/MS-CID were 105 used to annotate acylsugars in Solanum nigrum, Solanum quitoense, alkekengi, Physalis 106 viscosa, Iochroma cyaneum, Atropa belladonna, Nicotiana alata, Hyoscyamus niger and 107 Salpiglossis sinuata (Salpiglossis) (Figure 1B,C; Figure 1-Figure Supplement 2; Figure 1- 108 Source Data 1). Plant extracts without detectable acylsugars generally lacked glandular 109 trichomes (Fisher Exact Test p=2.3e-6) (Figure 1-File Supplement 1B). In addition, the 110 acylsugar phenotype is quite dynamic and can be affected by factors such as developmental 111 stage, environmental conditions and the specific accession sampled (Kim et al., 2012; Ning et al., 112 2015; Schilmiller et al., 2015). These factors may also influence the detection of acylsugars in 113 some species. 114 The suite of detected acylsugars exhibited substantial diversity, both in molecular and 115 fragment ion masses. Most species accumulated acylsugars with mass spectra consistent with 116 disaccharide cores — most likely sucrose — esterified with short- to medium-chain aliphatic 117 acyl groups, similar to previously characterized acylsugars in cultivated and wild tomatoes 118 (Schilmiller et al., 2010; Ghosh et al., 2014). However, S. nigrum acylsugar data suggested 119 exclusive accumulation of acylhexoses (Figure 1C), with fragmentation patterns similar to 120 previously analyzed S. pennellii acylglucoses (Schilmiller et al., 2012). Mass spectra of S. 121 quitoense acylsugars also revealed acylsugars with features distinct from any known acylsugars 4

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

122 (Figure 1-Figure Supplement 2), and structures of these will be described in detail in a separate 123 report. In total, more than 100 acylsucroses and at least 20 acylsugars of other forms were 124 annotated with number and lengths of acyl groups based on pseudomolecular and fragment ion 125 masses (Figure 1-Figure Supplement 2). Several hundred additional low-abundance isomers 126 and novel acylsugars were also detected (Figure 1-Figure Supplement 3). For example, in 127 Salpiglossis alone, >400 chromatographic peaks had m/z ratios and mass defects consistent with 128 acylsugars (For review only - Additional File 1). This diversity is notable when compared to 129 ~33 detected acylsucroses in cultivated tomato (Ghosh et al., 2014). 130 Mass spectra also revealed substantial diversity in the number and lengths of acyl chains 131 (Figure 1C,D,E, Figure 1-Figure Supplement 2). Based on negative ion mode data, the number 132 of acyl chains on the sugar cores ranged from two to six (Figure 1-Figure Supplement 2), with 133 chain lengths from 2-12 carbons. Across the Solanaceae, we found species that incorporate at 134 least one common aliphatic acyl chain in all their major acylsugars: e.g. chains of length C5 in A. 135 belladonna, C4 in P. viscosa, and C8 in P. alkekengi (Figure 1D,E; Figure 1-Figure 136 Supplement 2). Longer C10 and C12 chain-containing acylsugars were found in multiple 137 species (P. viscosa, A. belladonna, H. niger, S. nigrum and S. quitoense) (Figure 1C-E, Figure 138 1-Figure Supplement 2). Mass spectra consistent with acylsugars containing novel acyl chains 139 were also detected. While we could not differentiate between acyl chain isomers (e.g. iso-C5 140 [iC5] vs. anteiso-C5 [aiC5]) based on CID fragmentation patterns, our data reveal large acyl 141 chain diversity in acylsugars across the Solanaceae. Such diversity can result from differences in 142 the intracellular concentrations of acyl CoA pools or divergent substrate specificities of 143 individual ASATs. A previous study from our lab identified allelic variation in the enzyme 144 isopropylmalatesynthase3, which contributes to differences in abundances of iC5 or iC4 chains 145 in acylsugars of S. lycopersicum and some S. pennellii accessions (Ning et al., 2015). ASAT gene 146 duplication (Schilmiller et al., 2015), gene loss (Kim et al., 2012) and single residue changes 147 (Fan et al., 2016a) also influence chain diversity in acylsugars, illustrating the various ways by 148 which acylsugar phenotypes may be generated in the Solanaceae. 149 Published data from S. pennellii and S. habrochaites revealed differences in furanose ring 150 acylation on acylsucroses (Schilmiller et al., 2015). Ring-specific acylation patterns can be 151 evaluated using positive mode mass spectrometry and CID, which generates fragment ion masses 152 from cleavage of the glycosidic linkage. We found furanose ring acylation in almost all tested 5

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

153 species in Solanaceae (Figure 1-Figure Supplement 4A-E); however, the lengths of acyl chains 154 on the ring varied. All tri-acylsucroses analyzed using positive mode CID data bore all acyl 155 chains on one ring, likely the pyranose ring — as evidenced by neutral loss of 197 Da (hexose + 156 plus NH3) from the [M+NH4] ion — unlike S. lycopersicum, which bears one acyl chain on the 157 furanose ring. However, the substituents varied among tetra- and penta-acylsugars of different 158 species, with some (e.g.: N. alata), showing up to four chains on the same ring (Figure 1-Figure 159 Supplement 4A), and others (e.g.: H. niger, Salpiglossis) revealing spectra consistent with 160 acylation on both pyranose and furanose rings (Figure 1-Figure Supplement 4B,C). 161 These results demonstrate the substantial acylsugar diversity across the Solanaceae 162 family. To identify the enzymes that contribute to this diversity, we performed RNA-seq in four 163 phylogenetically-spaced species with interesting acylsugar profiles, namely S. nigrum, S. 164 quitoense, H. niger and Salpiglossis.

165 Transcriptomic profiling of trichomes from multiple Solanaceae species 166 Our previous studies in cultivated tomato (Schilmiller et al., 2012, 2015; Ning et al., 167 2015; Fan et al., 2016a) demonstrated that identifying genes with expression enriched in 168 stem/petiole trichomes compared to shaved stem/petiole without trichomes is a productive way 169 to find acylsugar biosynthetic enzymes. We sampled polyA RNA from these tissues from four 170 species and performed de novo read assembly (Table 1). These assemblies were used to find 171 transcripts preferentially expressed in the trichomes (referred to as “trichome-high transcripts”) 172 and to develop hypotheses regarding their functions based on homology. 173 Overall, 1888-3547 trichome-high transcripts (22-37% of all differentially expressed 174 transcripts) were identified across all four species (False Discovery Rate adjusted p<0.05, fold 175 change≥2) (Table 1). These transcripts were subjected to a detailed analysis including coding 176 sequence prediction, binning into 25,838 orthologous groups, assignment of putative functions 177 based on tomato gene annotation and Gene Ontology enrichment analysis (see Methods, Table 178 1). Analysis of the enriched categories (Fisher exact test corrected p<0.05) revealed that only 20 179 of 70 well-supported categories (≥10 transcripts) were enriched in at least three species (Table 1- 180 File Supplement 1), suggesting existence of diverse transcriptional programs in the trichomes at 181 the time of their sampling. Almost all enriched categories were related to metabolism, protein 182 modification or transport, with metabolism-related categories being dominant (Table 1-File

6

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

183 Supplement 1). These results support the notion of trichomes as “chemical factories” 184 (Schilmiller et al., 2008) and point to the metabolic diversity that might exist in trichomes across 185 the Solanaceae. 186 A major goal of this study was to define the organization of the acylsugar biosynthetic 187 pathway at the origin of the Solanaceae, prompting us to focus on Salpiglossis, whose 188 phylogenetic position is of special interest in inferring the ancestral state of the biosynthetic 189 pathway. We first validated the plant under study as Salpiglossis using a phylogeny based on 190 ndhF and trnLF sequences (Figure 2-Figure Supplement 1A,B). A previously published 191 maximum likelihood tree of 1075 Solanaceae species suggested Salpiglossis as an extant species 192 of the earliest diverging lineage in Solanaceae (Särkinen et al., 2013). However, some tree 193 reconstruction approaches show Duckeodendron and Schwenckia as emerging from the earliest 194 diverging lineages, and Salpiglossis and Petunioideae closely related to each other (Olmstead et 195 al., 2008; Särkinen et al., 2013). Thus, our further interpretations are restricted to the last 196 common ancestor of Salpiglossis-Petunia-Tomato (hereafter referred to as the Last Common 197 Ancestor [LCA]) that existed ~22-28 mya. To infer the ancestral state of the acylsugar 198 biosynthetic pathway in the LCA, we characterized the pathway in Salpiglossis using in vitro and 199 in planta approaches.

200 In vitro investigation of Salpiglossis acylsugar biosynthesis 201 The acylsugar structural diversity and phylogenetic position of Salpiglossis led us to 202 characterize the biosynthetic pathway of this species. NMR analysis of Salpiglossis acylsugars 203 revealed acylation at the R2, R3, R4 positions on the pyranose ring and R1′, R3′, R6′ positions 204 on the furanose ring (Figure 2A; Figure 2A-Table Supplement 1) (For review only – 205 Additional File 1). The acylation positions are reminiscent of Petunia axillaris (Pa) acylsucroses 206 where PaASAT1, PaASAT2, PaASAT3 and PaASAT4 acylate with aliphatic precursors at R2, 207 R4, R3 and R6 on the six-carbon pyranose ring, respectively (For review only – Additional File 208 2). Thus, we tested the hypothesis that PaASAT1,2,3 orthologs in Salpiglossis function as 209 SsASAT1,2,3 respectively. 210 Thirteen Salpiglossis trichome-high BAHD family members were found (Figure 2- 211 Figure Supplement 1C), with nine expressed in Escherichia coli (Figure 2-Table Supplement 212 2; Figure 2-File Supplements 1,2). Activities of the purified enzymes were tested using sucrose

7

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

213 or partially acylated sucroses (Fan et al., 2016a, 2016b) as acceptor substrates. Donor C2, aiC5 214 and aiC6 acyl CoA substrates were tested based on the common occurrence of these ester groups 215 in a set of 16 Salpiglossis acylsucroses purified for NMR (For review only – Additional File 1). 216 Representative NMR structures that illustrate the SsASAT positional selectivity described in the 217 results below are shown in Figure 2A. Four of the tested candidates catalyzed ASAT reactions 218 (Figure 2B-E). In the following description, we name the enzymes based on their order of 219 acylation in the Salpiglossis acylsugar biosynthetic pathway. A description is provided in Figure 220 2-Figure Supplement 2 to assist in understanding the chromatograms. 221 Salpiglossis sinuata ASAT1 (SsASAT1) generated mono-acylsugars from sucrose using 222 multiple acyl CoAs (Figure 2B, Figure 2-Figure Supplements 2,3), similar to the donor 223 substrate diversity of SlASAT1 (Fan et al., 2016a) (Figure 2-Figure Supplement 2). We infer 224 that SsASAT1 primarily acylates the R2 position on the pyranose ring. This is based on (a) the 225 S1:6(6) negative mode CID fragmentation patterns (Figure 2-Figure Supplement 3A) and (b) 226 comparisons of chromatographic migration of the mono-acylsucroses produced by SsASAT1, 227 PaASAT1, which acylates sucrose at R2 and matches the major SsASAT1 product, and 228 SlASAT1, which acylates sucrose at R4 (Figure 2-Figure Supplement 3A,B) (For review only 229 - Additional File 2). This activity is similar to P. axillaris PaASAT1 and unlike S. lycopersicum 230 SlASAT1. 231 SsASAT2 was identified by testing the ability of each of the other eight cloned enzymes 232 to acylate the mono-acylated product of SsASAT1 (S1:5 or S1:6) as acyl acceptor and using 233 aiC5 CoA as acyl donor. SsASAT2 catalyzed the formation of di-acylated sugars, which co- 234 eluted with the PaASAT2 product but not with the SlASAT2 product (Figure 2B; Figure 2- 235 Figure Supplement 3C,D). Positive mode fragmentation suggested that the acylation occurs on 236 the same ring as SsASAT1 acylation (Figure 2-Figure Supplement 3C). This enzyme failed to 237 acylate sucrose (Figure 2-Figure Supplement 4), supporting its assignment as SsASAT2. 238 SsASAT3 (Figure 2B, red chromatogram) added aiC6 to the pyranose ring of the di- 239 acylated sugar acceptor. This aiC6 acylation reaction is in concordance with the observed in 240 planta acylation pattern at the R3 position — aiC6 is present at this position in the majority of S. 241 sinuata acylsugars (For review only – Additional File 1). Surprisingly, the enzyme did not use 242 aiC5 CoA, despite the identification of S3:15(5,5,5) and likely its acylated derivates 243 [S4:20(5,5,5,5), S4:17(2,5,5,5), S5:22(2,5,5,5,5) and S5:19(2,2,5,5,5)] from S. sinuata extracts 8

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

244 (For review only – Additional File 1). SsASAT3-dependent R3 position acylation was further 245 confirmed by testing the tri-acylated product with PaASAT4, which acylates at the R6 position 246 of the pyranose ring (Figure 2-Figure Supplement 5) (For review only – Additional File 2). 247 The successful R6 acylation by PaASAT4 is consistent with the hypothesis that SsASAT3 248 acylates the R3 position. Taken together, these results suggest that the first three enzymes 249 generate acylsugars with aiC5/aiC6 at the R2 position (SsASAT1), aiC6 at the R3 position 250 (SsASAT3) and aiC5/aiC6 at the R4 position (SsASAT2). 251 We could not identify the SsASAT4 enzyme(s) that performs aiC5 and C2 acylations on 252 the R1′ and R3′ positions of tri-acylsucroses, respectively. However, we identified another 253 enzyme, which we designate SsASAT5, that showed three activities acetylating tri-, tetra- and 254 penta-acylsucroses (see Methods). SsASAT5 can perform furanose ring acetylation on tri- 255 (Figure 2-Figure Supplement 6A,B) and tetra-acylsucroses (Figure 2C,D), and both furanose 256 as well as pyranose ring acetylation on penta-acylsucroses (Figure 2-Figure Supplement 6C-F). 257 All of the products produced by SsASAT5 in vitro co-migrate with acylsugars found in plant 258 extracts, suggesting the SsASAT5 acceptor promiscuity also occurs in planta. Our observation 259 that SsASAT5 can perform pyranose ring acetylation — albeit weakly (Figure 2-Figure 260 Supplements 6D,F) — is at odds with NMR-characterized structures of a set of 16 purified 261 Salpiglossis acylsugars, which show all acetyl groups on the furanose ring. However, a previous 262 study described one pyranose R6-acylated penta-acyl sugar S5:22(2,2,6,6,6) in Salpiglossis 263 (Castillo et al., 1989), suggesting the presence of accession-specific variation in enzyme 264 function. Despite showing SsASAT4-, SsASAT5- and SsASAT6-like activities, we designate 265 this enzyme SsASAT5 because its products have both acylation patterns and co-migration 266 characteristics consistent with the most abundant penta-acylsugars from the plant (Figure 2C,D). 267 Overall, in vitro analysis revealed four enzymes that could catalyze ASAT reactions and 268 produce compounds also detected in plant extracts (Figure 2E). We further verified that these 269 enzymes are involved in acylsugar biosynthesis by testing the effects of perturbing their 270 transcript levels using Virus Induced Gene Silencing (VIGS).

271 In planta validation of acylsugar biosynthetic enzymes 272 To test the role of the in vitro identified ASATs in planta, we adapted a previously 273 described tobacco rattle virus-based VIGS procedure (Dong et al., 2007; Velásquez et al., 2009)

9

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

274 for Salpiglossis. We designed ~300 bp long gene-specific silencing constructs for transient 275 silencing of SsASAT1, SsASAT2, SsASAT3 and SsASAT5 (Figure 2-File Supplements 1,2), 276 choosing regions predicted to have a low chance of reducing expression of non-target genes (see 277 Methods). The Salpiglossis ortholog of the tomato phytoene desaturase (PDS) carotenoid 278 biosynthetic enzyme was used as positive control (Figure 3A), with transcript level decreases 279 confirmed for each candidate using qRT-PCR in one of the VIGS replicates (Figure 3B). As no 280 standard growth or VIGS protocol was available for Salpiglossis, we tested a variety of 281 conditions for agroinfiltration and plant growth, and generated at least two biological replicate 282 experiments for each construct, run at different times (Figure 3-Table Supplement 1). ASAT 283 knockdown phenotypes were consistent, regardless of variation in environmental conditions. 284 SsASAT1 VIGS revealed statistically significant reductions in acylsugar levels in at least 285 one construct across two experimental replicates (Kolmogorov-Smirnov [KS] test, p-value 286 <0.05) (Figure 3C,D; Figure 3-Figure Supplement 1A), consistent with its predicted role in 287 catalyzing the first step in acylsugar biosynthesis. SsASAT2 expression reduction also produced 288 plants with an overall decrease in acylsugars (Figure 3E,F; Figure 3-Figure Supplement 1B). 289 However, these plants also showed some additional unexpected acylsugar phenotypes, namely 290 increases in lower molecular weight acylsugars (m/z ratio: 651-700, 701-750; KS test p<0.05), 291 decreases in levels of higher molecular weight products (m/z ratio: 751-800, 801-850) (Figure 292 3E,F) as well as changes in levels of some individual acylsugars as described in Figure 3-Figure 293 Supplement 2. These results provide in planta support for the involvement of SsASAT2 in 294 acylsugar biosynthesis, and suggest the possibility of discovering additional complexity in the 295 future. 296 Silencing the SsASAT3 transcript also led to an unexpected result - significantly higher 297 accumulation of the normally very low abundance tri-acylsugars [S3:13(2,5,6); S3:14(2,6,6); 298 S3:15(5,5,5); S3:16(5,5,6); S3:18(6,6,6)] and their acetylated tetra-acylsugar derivatives (KS test 299 p<0.05), compared to empty vector control infiltrated plants (Figure 4A,B; Figure 4-Figure 300 Supplement 1). The tetra-acylsugars contained C2 or C5 acylation on the furanose ring (Figure 301 4-Figure Supplement 2). These observations are consistent with the hypothesis that di-acylated 302 sugars accumulate upon SsASAT3 knockdown and then serve as substrate for one or more other 303 enzyme (Figure 4-Figure Supplement 3). Our working hypothesis is that this inferred activity is 304 the as-yet-unidentified SsASAT4 activity; this is based on comparisons of in vitro enzyme assay 10

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

305 products and in vivo purified acylsugars from Salpiglossis plants (Figure 2A) (For review only 306 – Additional File 1). We propose that the hypothesized SsASAT4 may promiscuously acylate 307 di-acylated sugars in addition to performing C2/C5 additions on tri-acylsugars. 308 SsASAT5, based on in vitro analysis, was proposed to catalyze acetylation of tetra- to 309 penta-acylsugars. As expected, its knockdown led to accumulation of tri- and tetra-acylsugars 310 (Figure 4C,D; Figure 4-Figure Supplement 3B). The accumulating tetra-acylsugars were 311 C2/C5 furanose ring-acylated derivatives of the tri-acylsugars, suggesting presence of a 312 functional SsASAT4 enzyme and further validating the annotation of the knocked down enzyme 313 as SsASAT5 (Figure 4-Figure Supplement 3B). Thus, in summary, we identified four ASAT 314 enzymes and validated their impact on acylsugar biosynthesis in Salpiglossis trichomes using 315 VIGS. 316 Taken together, the Salpiglossis metabolites produced in vivo, combined with in vitro and 317 RNAi results, lead to the model of the Salpiglossis acylsugar biosynthetic network shown in 318 Figure 5. SsASAT1 – the first enzyme in the network – adds aiC5 or aiC6 to the sucrose R2 319 position. SsASAT2 then converts this mono-acylated sucrose to a di-acylated product, via 320 addition of aiC5 or aiC6 at the R4 position. Four possible products are thus generated by the first 321 two enzymes alone. Next, the SsASAT3 activity adds aiC6 at the R3 position, followed by one 322 or more uncharacterized enzyme(s) that adds either C2 or aiC5 at the furanose ring R1′ or R3′ 323 positions, respectively. SsASAT5 next performs acetylation at the R6′ position to produce penta- 324 acylsugars, which can then be further converted by an uncharacterized SsASAT6 to hexa- 325 acylsugars via C2 addition at the R3′ position. 326 Our results are consistent with the existence of at least two additional activities – 327 SsASAT4 and SsASAT6. These enzymes may be included in the five BAHD family candidates 328 highly expressed in both the trichome and stem (average number of reads > 500), and thus not 329 selected for our study because they did not meet the differential expression criterion. Also, the 330 fact that there are >400 detectable acylsugar-like peaks in the Salpiglossis trichome extracts 331 suggests the existence of additional ASAT activities, either promiscuous activities of 332 characterized ASATs or of other uncharacterized enzymes. Nonetheless, identification of the 333 four primary ASAT activities can help us to investigate the origins and evolution of the 334 acylsugar biosynthetic pathway over time.

11

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

335 The evolutionary origins of acylsugar biosynthesis 336 We used our analysis of SsASAT1, SsASAT2, SsASAT3 and SsASAT5 activities with 337 information about ASATs in Petunia and tomato species (Schilmiller et al., 2012, 2015; Fan et 338 al., 2016a) (For review only – Additional File 2) to infer the origins of the acylsugar 339 biosynthetic pathway. Based on BLAST searches across multiple plant genomes, ASAT-like 340 sequences are very narrowly distributed in the plant phylogeny (Figure 6-Figure Supplement 341 1). This led us to restrict our BLAST searches, which used SlASATs and SsASATs as query 342 sequences, to species in the orders , Lamiales, Boraginales and Gentianales, all in the 343 Lamiidae clade (Refulio-Rodriguez and Olmstead, 2014). Phylogenetic reconstruction was 344 performed with the protein sequences of the most informative hits obtained in these searches to 345 obtain a “gene tree”. Reconciliation of this gene tree with the phylogenetic relationships between 346 the sampled species (Figure 6B) allowed inference of the acylsugar biosynthetic pathway before 347 the emergence of the Solanaceae (Figure 6C; Figure 6-Figure Supplements 2A-C, 3A-C). 348 Three major subclades in the gene tree – highlighted in blue, red and pink – are relevant 349 to understanding the origins of the ASATs (Figure 6A). A majority of characterized ASATs 350 (blue squares in the blue subclade, Figure 6A) are clustered with Capsicum PUN1 — an 351 enzyme involved in biosynthesis of the alkaloid capsaicin — in a monophyletic group with high 352 bootstrap support (Group #1, red and blue subclades Figure 6A). Two of the most closely 353 related non-Solanales enzymes in the tree — Catharanthus roseus minovincinine-19-O- 354 acetyltransferase (MAT) and deacetylvindoline-4-O-acetyltransferase (DAT) — are also 355 involved in alkaloid biosynthesis (Magnotta et al., 2007). This suggests that the blue ASAT 356 subclade emerged from an alkaloid biosynthetic enzyme ancestor. 357 A second insight from the gene tree involves Salpiglossis SsASAT5 and tomato 358 SlASAT4, which reside outside of the blue subclade. Both enzymes catalyze C2 addition on 359 acylated sugar substrates in downstream reactions of their respective networks. Multiple 360 enzymes in this region of the phylogenetic tree (Figure 2-Figure Supplement 1C; light blue 361 clade) are involved in O-acetylation of diverse substrates e.g. indole alkaloid 16-epivellosimine 362 (Bayer et al., 2004), the phenylpropanoid benzyl alcohol (D’Auria et al., 2002) and the terpene 363 geraniol (Shalit et al., 2003). This observation is consistent with the hypothesis that O- 364 acetylation activity was present in ancestral enzymes within this region of the phylogenetic tree.

12

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

365 Gene tree reconciliation with known relationships between plant families and orders 366 (Figure 6B) was used to infer acylsugar pathway evolution in the context of plant evolution. We 367 used historical dates as described by Särkinen and co-workers (Särkinen et al., 2013) in our 368 interpretations, as opposed to a recent study that described a much earlier origination time for the 369 Solanaceae (Wilf et al., 2017). Based on known relationships, Convolvulaceae is the closest 370 sister family to Solanaceae; however, we found no putative ASAT orthologs in any searched 371 Convolvulaceae species. The closest Convolvulaceae homologs were found in the Ipomoea 372 trifida genome (Hirakawa et al., 2015) in the red subclade. This suggests that the blue and red 373 subclades arose via a duplication event before the Solanaceae-Convolvulaceae split, estimated to 374 be ~50-65 mya (Särkinen et al., 2013). The lack of ASAT orthologs in Convolvulaceae could 375 indicate that Convolvulaceae species with an orthologous acylsugar biosynthetic pathway were 376 not sampled in our analysis. Alternatively, the absence of orthologs of the blue subclade enzymes 377 in the Convolvulaceae, coupled with the lack of acylsugars in the sampled Convolvulaceae 378 species (Figure 1B), leads us to hypothesize that ASAT orthologs were likely lost early in 379 Convolvulaceae evolution. 380 The known phylogenetic relationships between species can help assign a maximum age 381 for the duplication event leading to the blue and red subclades. Database searches identified 382 homologs from species in the Boraginales (Ehretia, Heliotropium) and Gentianales (Coffea, 383 Catharanthus) orders that clustered outside Group #1, in a monophyletic group with high 384 bootstrap support (Group #2, pink branches, Figure 6A). Boraginales is sister to the Lamiales 385 order (Figure 6B), and although Boraginales putative orthologs were identified through BLAST 386 searches, no orthologs from Lamiales species were detected. The closest Lamiales (Sesamum, 387 Lavandula, Mimulus) homologs were more closely related to SsASAT5 than to the ASATs in the 388 blue subclade (purple squares, Figure 6A). The most parsimonious explanation for these 389 observations is that the duplication event that gave rise to the blue and red subclades occurred 390 before the Solanaceae-Convolvulaceae split 50-65 mya but after the Solanales- 391 Boraginales/Lamiales orders diverged 75-100 mya (Hedges et al., 2006) (Figure 6B,C). 392 Overall, these observations support a view of the origins of the ASAT1,2,3 blue subclade 393 from an alkaloid biosynthetic ancestor via a single duplication event >50mya, prior to the 394 establishment of the Solanaceae. This duplicate would have undergone further rounds of 395 duplication to generate the multiple ASATs found in the blue subclade. Thus, a logical next 13

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

396 question is ‘what was the structure of the acylsugar biosynthetic network early in the Solanaceae 397 family evolution?’ To address this, we focused on ASAT evolution across the Solanaceae family.

398 Evolution of acylsugar biosynthesis in the Solanaceae 399 ASAT enzymes in the blue subclade represent the first three steps in the acylsugar 400 biosynthetic pathway. The likely status of these three steps in the Salpiglossis-Petunia-Tomato 401 last common ancestor (LCA) was investigated using the gene tree displayed in Figure 7A. 402 Mapping the pathway enzymes on the gene tree suggests that the LCA likely had at least three 403 enzymatic activities, which we refer to as ancestral ASAT1 (aASAT1), ASAT2 (aASAT2) and 404 ASAT3 (aASAT3); these are shown in Figure 7A as red, dark blue and yellow squares, 405 respectively. We further traced the evolution of the aASAT1,2,3 orthologs in the Solanaceae 406 using existing functional data and BLAST-based searches. These results reveal that the aASAT1 407 ortholog (red squares, Figure 7A) was present until the Capsicum-Solanum split ~17 mya 408 (Särkinen et al., 2013) and was lost in the lineage leading to Solanum. This loss is evident both in 409 similarity searches and in comparisons of syntenic regions between genomes of Petunia, 410 Capsicum and tomato (Figure 7B). On the other hand, the aASAT2 (dark blue squares, Figure 411 7A) and aASAT3 orthologs (yellow squares, Figure 7A) have been present in the Solanaceae 412 species genomes at least since the last common ancestor of Salpiglossis-Petunia-Tomato ~25 413 mya, and perhaps even in the last common ancestor of the Solanaceae family. 414 One inference from this analysis is that aASAT2 orthologs switched their activity from 415 ASAT2-like acylation of mono-acylsucroses in Salpiglossis/Petunia to ASAT1-like acylation of 416 unsubstituted sucrose in cultivated tomato (Figure 7A). Interestingly, despite the switch, both 417 aASAT2 and aASAT3 orthologs in tomato continue to acylate the same pyranose-ring R4 and 418 R3 positions, respectively (Figure 7C; Figure 8). In addition, cultivated tomato has two ‘new’ 419 enzymes — SlASAT3 and SlASAT4 — which were not described in Petunia or Salpiglossis 420 acylsugar biosynthesis. 421 The functional transitions of aASAT2 and aASAT3 could have occurred via (i) functional 422 divergence between orthologs or (ii) duplication, neo-functionalization and loss of the ancestral 423 enzyme. Counter to the second hypothesis, we found no evidence of aASAT2 ortholog 424 duplication in the genomic datasets; however, we cannot exclude the possibility of recent 425 polyploidy in extant species producing duplicate genes with divergent functions. We explored

14

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

426 the more parsimonious hypothesis that functional divergence between orthologs led to the 427 aASAT2 functional switch. 428 Functional divergence between aASAT2 orthologs may have occurred by one of two 429 mechanisms. One, it is possible that aASAT2 orthologs had weak/strong sucrose acylation 430 activity prior to aASAT1 loss. Alternatively, sucrose acylation activity arose completely anew 431 after the loss of aASAT1. We sought evidence for acceptor substrate promiscuity in extant 432 species by characterizing the activity of additional orthologous aASAT2 enzymes from H. niger 433 and S. nigrum (Figure 7C) using aiC6 and nC12 as acyl CoA donors. HnASAT2 — like 434 SsASAT2 — only performed the ASAT2 reaction (acylation of mono-acylsucrose), without any 435 evidence for sucrose acylation under standard testing conditions (Figure 7C, Figure 7-Figure 436 Supplement 1A-D). However, both SnASAT1 and SlASAT1 could produce S1:6(6) and 437 S1:12(12) from sucrose. 438 Overall, these results suggest that up until Atropina-Solaneae common ancestor, the 439 aASAT2 ortholog still primarily conducted the ASAT2 activity (Figure 7C). However, at some 440 point, the aASAT2 ortholog moved in reaction space towards being ASAT1. This activity shift 441 was likely complete by the time the S.nigrum-S.lycopersicum lineage diverged (Figure 7C), 442 given that aASAT2 orthologs in both species utilize sucrose. Future detailed characterization of 443 aASAT2 orthologs from phylogenetically intermediate, pre- and post-aASAT1 loss species can 444 help pinpoint the route taken by the aASAT2 ortholog in its functional diversification as well as 445 determine whether positive selection played a role.

446 Conclusion 447 We performed experimental and computational analysis of BAHD enzymes from several 448 lineages spanning ~100 million years, and characterized the emergence and evolution of the 449 acylsugar metabolic network. We identified four biosynthetic enzymes in Salpiglossis sinuata, 450 the extant species of an early emerging Solanaceae lineage, characterized in vitro activities, 451 validated in planta roles of these ASATs and studied their emergence over 100 million years of 452 plant evolution. These results demonstrate the value of leveraging genomics, phylogenetics, 453 analytical chemistry and enzymology with a mix of model and non-model organisms to 454 understand the evolution of biological complexity.

15

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

455 Based on our findings, the acylsugar phenotype across the Solanaceae is “hyper-diverse”, 456 meaning there is large diversity within and between closely related species. Previous studies 457 indicate that all eight hydroxyls on the sucrose core are subject to acylation (Ghosh et al., 2014; 458 Schilmiller et al., 2015; Fan et al., 2016a). A simple calculation (see Supplemental Methods) 459 considering only 12 aliphatic CoA donors suggests >6000 theoretical possible structures for 460 tetra-acylsugars alone. This estimate does not even include estimates of tri-, penta-, hexa- 461 acylsugars nor does it consider non-aliphatic CoAs such as malonyl CoA, esters of other sugars 462 such as glucose or positional isomers. Although the theoretical possibilities are restricted by 463 availability of CoAs in trichome tip cells and existing ASAT activities, we still observe hundreds 464 of acylsugars across the Solanaceae, with >400 detectable acylsugar-like chromatographic peaks 465 in single plants of Salpiglossis and N. alata, including very low abundance hepta-acylsugars and 466 acylsugars containing phenylacetyl and tigloyl chains (For review only – Additional File 1). 467 This diversity may be less than among some terpenoid classes e.g. >850 diterpenes have been 468 described in different members of the Euphorbia genus alone (Vasas and Hohmann, 2014). 469 However, the diversity is significantly greater than other specialized metabolite classes restricted 470 to a single clade such as glucosinolates in Brassicales (~150 across the order) (Olsen et al., 471 2016), betalains in Caryophyllales (~75 non-glycosylated betalains across the order) (Khan and 472 Giridhar, 2015) and acridone alkaloids in Rutaceae (~100 in the family) (Roberts et al., 2010). 473 Acylsugar diversity may be higher than, or at least comparable to, resin glycosides in 474 Convolvulaceae (>250 structures across the family) (Pereda-Miranda et al., 2010) and 475 pyrrolizidine alkaloids in Boraginaceae (>225 structures across the family) (El-Shazly and Wink, 476 2014). Much of the observed acylsugar phenotypic complexity arises due to the diversity in acyl 477 chains — their types, lengths, isomeric forms and positions on the sucrose molecule. 478 Does such immense diversity within the same metabolite class serve a purpose? 479 Glucosinolate diversity was shown to influence fitness in the wild (Prasad et al., 2012) but 480 pyrrolizidine alkaloid diversity may not be so relevant (Macel et al., 2002). While acylsugar 481 diversity is ecologically relevant to some extent (Weinhold and Baldwin, 2011; Leckie et al., 482 2016; Luu et al., 2017), it is unlikely that each novel acylsugar has a specific function. The 483 diversity — perhaps occurring as an unintended consequence of ASAT promiscuity — may, in 484 some cases, also provide benefit in the form of physical defense (e.g. stickiness).

16

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

485 We studied the question of how acylsugar biosynthesis originated. The results provide 486 evidence for repeated cycles of innovation involving potentiation, actualization and refinement 487 that contributed to the origin and diversification of the acylsugar biosynthetic pathway (Blount et 488 al., 2012) (Figure 8). We propose that the ASAT enzymes catalyzing sucrose acylation first 489 emerged (or actualized) between 30-80 mya. Events prior to the emergence of this activity — 490 including existence of alkaloid biosynthetic BAHDs and the duplication event 60-80 mya that 491 gave rise to the blue and the red subclades — potentiated the emergence of the ASAT activities. 492 Presumably, acylsugars provided a fitness advantage to the ancestral plants, leading to the 493 refinement of ASAT activities over the next few million years. For acylsugar production to 494 emerge, these steps in enzyme evolution would have been accompanied by parallel innovations 495 in ASAT transcriptional regulation leading to gland cell expression as well as production of 496 precursors (i.e., acyl CoA donors and sucrose) in Type I/IV trichomes. 497 Since the origin of the Solanaceae, this pathway experienced another round of major 498 innovation (Figure 8) leading to the aASAT2 functional switch. We propose that this cycle of 499 innovation was made possible by the potentiation provided by the similarity in activities of 500 aASAT1,2 and 3, conducive to the emergence of promiscuity. Thus, the sucrose acylation 501 activity may have already emerged in aASAT2 before the aASAT1 ortholog loss. It is unclear 502 whether the loss of aASAT1 was a result of it being rendered redundant by the emergence of the 503 aASAT2 promiscuous activity or an unrelated loss event. Regardless, this loss event was 504 associated with the reorganization of acylsugar biosynthesis, and perhaps was followed by a 505 refinement in the actualized aASAT2 sucrose acylation activity. Additional sampling of 506 intermediate species can help test this model and determine the dynamics that played out 507 between aASAT1,2,3 orthologs between 20-15 my. We cannot rule out the possibility of 508 aASAT2 switching to ASAT1 after the aASAT1 ortholog loss. However, this model is less 509 parsimonious, because it needs the emergence of the ASAT1 activity to occur “just in time” 510 before the entire downstream pathway decayed or diversified through drift. In addition to this 511 second cycle of innovation, previous results from closely related Solanum species also revealed 512 more recent cycles involving gene duplication, regulatory divergence of ASATs (Schilmiller et 513 al., 2015) and neo-functionalization of duplicates (Ning et al., 2015) in influencing the 514 phenotypes produced by this pathway. These findings highlight the striking plasticity of the

17

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

515 acylsugar biosynthetic network and illustrate the many ways by which chemical novelty emerges 516 in specialized metabolism. 517 In this study, we used an integrative strategy combining high-throughput technologies 518 and traditional biochemistry. S. lycopersicum is the flagship species of the Solanaceae family, 519 and we were able to apply the insights derived with this crop as a foundation to discover novel 520 enzymes in related species. Our investigations in Salpiglossis were also aided by the availability 521 of ASATs and NMR structures of their products in Petunia, which is more closely related to 522 Salpiglossis than tomato. “Anchor species” such as Petunia, which are phylogenetically distant 523 from flagship/model species, can enable study of a different region of the phylogenetic tree of 524 the clade of interest. Development of limited genomic and functional resources in such anchor 525 species, coupled with integrative, comparative approaches can offer more efficient routes for the 526 exploration of biochemical complexity in the ~300,000 plant species estimated to exist on our 527 planet (Mora et al., 2011).

528 Methods

529 Plant acylsugar extractions and mass spectrometric analyses 530 Acylsugar extractions were carried out from plants at the New York Botanical Gardens and from 531 other sources (Figure 1-File Supplement 1). The plants sampled were at different 532 developmental stages and were growing in different environments. The extractions were carried 533 out using acetonitrile:isopropanol:water in a 3:3:2 proportion similar to previous descriptions 534 (Schilmiller et al., 2010; Ghosh et al., 2014; Fan et al., 2016a) with the exception of gently 535 shaking the tubes by hand for 1-2 minutes. All extracts were analyzed on LC/MS using 7-min, 536 22-min or 110-min LC gradients on Supelco Ascentis Express C18 (Sigma Aldrich, USA) or 537 Waters BEH amide (Waters Corporation, USA) columns (Supplemental Table 1), as described 538 previously (Schilmiller et al., 2010; Ghosh et al., 2014; Fan et al., 2016a). While the 110-min 539 method was used to minimize chromatographic overlap in support of metabolite annotation in 540 samples with diverse mixtures of acylsugars, targeted extracted ion chromatogram peak area 541 quantification was performed using the 22-min method data. The QuanLynx function in 542 MassLynx v4.1 (Waters Corporation, USA) was used to integrate extracted ion chromatograms 543 for manually selected acylsugar and internal standard peaks. Variable retention time and 544 chromatogram mass windows were used, depending on the experiment and profile complexity. 18

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

545 Peak areas were normalized to the internal standard peak area and expressed as a proportion of 546 mg of dry weight.

547 RNA-seq data analysis 548 RNA was extracted and sequenced as described in the Supplemental Methods. The mRNA-seq 549 reads were adapter-clipped and trimmed using Trimmomatic v0.32 using the parameters 550 (LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:50). The quality-trimmed reads 551 from all datasets of a species were assembled de novo into transcripts using Trinity 552 v.20140413p1 after read normalization (max_cov=50,KMER_SIZE=25,max-pct- 553 stdev=200,SS_lib_type=RF) (Grabherr et al., 2011). We tested three different kmer values 554 (k=25, 27, 31) and selected the best kmer value for each species based on contig N50 values, 555 BLASTX hits to the S. lycopersicum annotated protein sequences and CEGMA results (Parra et 556 al., 2007). A minimum kmer coverage of 2 was used to reduce the probability of erroneous or 557 low abundance kmers being assembled into transcripts. After selecting the best assembly for 558 each species, we obtained a list of transcripts differentially expressed between trichomes and 559 stem/petiole for each species using RSEM/EBSeq (Li and Dewey, 2011; Leng et al., 2013) at an 560 FDR threshold of p<0.05. The differential expression of five transcripts in S. quitoense and four 561 transcripts in Salpiglossis was confirmed using semi-quantitative RT-PCR, along with the PDS 562 as negative control (Table 1-Figure Supplement 1).

563 Purification and structure elucidation of acylsugars using NMR 564 For metabolite purification, aerial tissues of 28 Salpiglossis plants (aged 10 weeks) were 565 extracted in 1.9 L of acetonitrile:isopropanol (AcN:IPA, v/v) for ~10 mins, and ~1 L of the 566 extract was concentrated to dryness on a rotary evaporator and redissolved in 5 mL of AcN:IPA. 567 Repeated injections from this extract were made onto a Thermo Scientific Acclaim 120 C18 568 HPLC column (4.6 x 150 mm, 5 µm particle size) with automated fraction collection. HPLC 569 fractions were concentrated to dryness under vacuum centrifugation, reconstituted in AcN:IPA 570 and combined according to metabolite purity as assessed by LC/MS. Samples were dried under

571 N2 gas and reconstituted in 250 or 300 µL of deuterated NMR solvent CDCl3 (99.8 atom % D) 572 and transferred to solvent-matched Shigemi tubes for analysis. 1H, 13C, J-resolved 1H, gCOSY, 573 gHSQC, gHMBC and ROESY NMR experiments were performed at the Max T. Rogers NMR 574 Facility at Michigan State University using a Bruker Avance 900 spectrometer equipped with a 19

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

575 TCI triple resonance probe. All spectra were referenced to non-deuterated CDCl3 solvent signals

576 (δH = 7.26 and δC = 77.20 ppm).

577 Candidate gene amplification and enzyme assays 578 Enzyme assays were performed as previously described (Fan et al., 2016b) with the following 579 modifications: Enzyme assays were performed in 30 µL reactions (3µL enzyme + 3µL 1mM

580 acyl CoA + 1µL 10mM (acylated) sucrose + 23µL 50mM NH4Ac buffer pH 6.0) or 20µL 581 reactions with proportionately scaled components. Reactions that used an NMR-characterized 582 substrate were performed with 0.2µL substrate and 23.8µL buffer. Reaction products were 583 characterized using Waters Xevo G2-XS QToF LC/MS (Waters Corporation, USA) using 584 previously described protocols (Fan et al., 2016b).

585 Characterizing the SsASAT5 activity 586 Identification of SsASAT5 activity required a significant amount of the starting substrate, 587 however, standard sequential reactions used to isolate SsASAT1,2,3 activities do not produce 588 enough starting substrate for SsASAT5. Hence, we used S3:15(5,5,5) purified from a back- 589 crossed inbred line (BIL6180, (Ofner et al., 2016)) whose NMR resolved structure shows R2, 590 R3, R4 positions acylated by C5 chains (unpublished data) — same as Salpiglossis tri- 591 acylsugars. SsASAT5 could use this substrate to acylate the furanose ring (Figure 2-Figure 592 Supplement 6A,B), however, the tetra-acylsugar thus produced is not further acetylated to 593 penta-acylsugar, suggesting the C2 acylation by SsASAT5 is not the same as the C2 acylation on 594 tetra-acylsugars in vivo. SsASAT5 also acetylated the acceptors S4:21(5,5,5,6) and 595 S4:19(2,5,6,6) purified from Salpiglossis trichome extracts on the furanose ring to penta- 596 acylsugars that co-migrated with the most abundant penta-acylsugars purified from the plant 597 (Figure 2C,D). Finally, S5:23(2,5,5,5,6) and S5:21(2,2,5,6,6) purified from plant extracts were 598 enzymatically acetylated to hexa-acylsugars, which co-migrated with two low abundance hexa- 599 acylsugars from the plant (Figure 2-Figure Supplement 6C,D). Positive mode fragmentation 600 patterns suggested that SsASAT5 possessed the capacity to acetylate both pyranose (weak) and 601 furanose rings of the penta-acylsugars (Figure 2-Figure Supplement 6E,F).

20

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

602 Virus induced gene silencing 603 Primers to amplify fragments of transcripts for VIGS were chosen to minimize probability of 604 cross silencing other Salpiglossis transcripts due to sequence similarity or due to homopolymeric 605 regions (Figure 2-File Supplement 1,2). This was achieved using a custom Python script 606 (Figure 3-File Supplement 1), which divided the entire transcript of interest in silico into 607 multiple overlapping 20nt fragments, performed a BLAST against the Salpiglossis transcriptome 608 and tomato genome and flagged fragments with >95% identity and/or >50% homopolymeric 609 stretches. Contiguous transcript regions with >12 unflagged, high-quality fragments were 610 manually inspected and considered for VIGS. 1-2 300bp regions were cloned into the pTRV2- 611 LIC vector (Dong et al., 2007) and transformed into Agrobacterium tumefaciens GV3101. 612 Agroinfiltration of Salpiglossis plants was performed as described previously (Velásquez et al., 613 2009) using the prick inoculation method. Salpiglossis phytoene desaturase was used as the 614 positive control for silencing. Empty pTRV2-LIC or pTRV2 vectors were used as negative 615 controls. Each VIGS trial was done slightly differently while modifying the growth, 616 transformation and maintenance conditions of this non-model species. The experimental details, 617 including the optimal growth and VIGS conditions that give the fastest results, are noted in 618 Figure 3-Table Supplement 1.

619 Phylogenetic inference 620 All steps in the phylogenetic reconstruction were carried out using MEGA6 (Tamura et al., 621 2013). Amino acid sequences were aligned using Muscle with default parameters. Maximum 622 likelihood and/or neighbor joining (NJ) were used to generate phylogenetic trees. For maximum 623 likelihood, the best evolutionary model (JTT[Jones-Taylor-Thornton]+G+I with 5 rate 624 categories) was selected based on the Akaike Information Criterion after screening several 625 models available in the MEGA6 software, while for NJ, the default JTT model was used. 626 Support values were obtained using 1000 bootstrap replicates, however trees obtained using 100 627 bootstrap replicates also showed similar overall topologies. Trees were generated either using 628 “complete deletion” or “partial deletion with maximum 30% gaps/missing data” options for tree 629 reconstruction. Sequences <350aa (eg: Solyc10g079570) were excluded from this analysis.

21

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

630 Data release 631 All RNA-seq datasets are deposited in the NCBI Short Read Archive under the BioProject 632 PRJNA263038. Coding sequences of ASATs are provided in Figure 2-File Supplement 1 and 633 have been deposited in GenBank (KY978746-KY978750). RNA-seq transcripts and orthologous 634 group membership information has been uploaded to Dryad (provisional DOI: 635 10.5061/dryad.t7r64).

636 Acknowledgements 637 We thank Dr. Rachel Meyers, Dr. Amy Litt and the New York Botanical Garden staff for 638 permission and assistance in sample collection. We also thank the support staff at the Research 639 Technology Support Facility at MSU for transcriptome sequencing and mass spectrometry 640 assistance, and the Max T. Rogers NMR Facility for assistance with NMR data generation. We 641 acknowledge Daniel Lybrand for cloning the Solanum nigrum ASAT1 gene, and Dr. Anthony 642 Schilmiller and Dr. Pengxiang Fan for assisting with sample collection at NYBG. We also 643 acknowledge the valuable feedback received from members of the Last lab, Dr. Cornelius Barry 644 and Dr. Eran Pichersky in the research design of this manuscript. This work was funded by 645 National Science Foundation grants IOS–1025636 and IOS-PGRP-1546617 to R.L.L. and A.D.J. 646 and National Institute of General Medical Sciences of the National Institutes of Health graduate 647 training grant no. T32–GM110523 to B.L. A.D.J. acknowledges support from the USDA 648 National Institute of Food and Agriculture, Hatch project MICL-02143.

649 Competing interests 650 The authors declare no competing interests 651

652 References 653 Afendi, F.M., Okada, T., Yamazaki, M., Hirai-Morita, A., Nakamura, Y., Nakamura, K., Ikeda, 654 S., Takahashi, H., Altaf-Ul-Amin, M., Darusman, L.K., Saito, K., and Kanaya, S. (2012). 655 KNApSAcK family databases: integrated metabolite–plant species databases for 656 multifaceted plant research. Plant Cell Physiol. 53: e1–e1.

657 Bayer, A., Ma, X., and Stöckigt, J. (2004). Acetyltransfer in natural product biosynthesis-- 658 functional cloning and molecular analysis of vinorine synthase. Bioorg. Med. Chem. 12: 659 2787–2795.

22

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

660 Benderoth, M., Textor, S., Windsor, A.J., Mitchell-Olds, T., Gershenzon, J., and Kroymann, J. 661 (2006). Positive selection driving diversification in plant secondary metabolism. Proc. 662 Natl. Acad. Sci. U. S. A. 103: 9118–9123.

663 Blount, Z.D., Barrick, J.E., Davidson, C.J., and Lenski, R.E. (2012). Genomic analysis of a key 664 innovation in an experimental E. coli population. Nature 489: 513–518.

665 Boutanaev, A.M., Moses, T., Zi, J., Nelson, D.R., Mugford, S.T., Peters, R.J., and Osbourn, A. 666 (2015). Investigation of terpene diversification across multiple sequenced plant genomes. 667 Proc. Natl. Acad. Sci. U. S. A. 112: E81-88.

668 Brockington, S.F., Yang, Y., Gandia-Herrero, F., Covshoff, S., Hibberd, J.M., Sage, R.F., Wong, 669 G.K.S., Moore, M.J., and Smith, S.A. (2015). Lineage-specific gene radiations underlie 670 the evolution of novel betalain pigmentation in Caryophyllales. New Phytol. 207: 1170– 671 1180.

672 Castillo, M., Connolly, J., Ifeadike, P., Labbe, C., Rycroft, D., and Woods, N. (1989). Partially 673 acylated glucose and sucrose derivatives from Salpiglossis sinuata (Solanaceae). J. 674 Chem. Res. 12: 398–399.

675 D’Auria, J.C. (2006). Acyltransferases in plants: a good time to be BAHD. Curr. Opin. Plant 676 Biol. 9: 331–340.

677 D’Auria, J.C., Chen, F., and Pichersky, E. (2002). Characterization of an acyltransferase capable 678 of synthesizing benzylbenzoate and other volatile esters in flowers and damaged Leaves 679 of Clarkia breweri. Plant Physiol. 130: 466–476.

680 Dong, Y., Burch-Smith, T.M., Liu, Y., Mamillapalli, P., and Dinesh-Kumar, S.P. (2007). A 681 Ligation-Independent Cloning Tobacco Rattle Virus vector for high-throughput Virus- 682 Induced Gene Silencing identifies roles for NbMADS4-1 and -2 in floral development. 683 Plant Physiol. 145: 1161–1170.

684 Dutartre, L., Hilliou, F., and Feyereisen, R. (2012). Phylogenomics of the benzoxazinoid 685 biosynthetic pathway of Poaceae: gene duplications and origin of the Bx cluster. BMC 686 Evol. Biol. 12: 64.

687 El-Shazly, A. and Wink, M. (2014). Diversity of pyrrolizidine alkaloids in the Boraginaceae: 688 Structures, distribution, and biological properties. Diversity 6: 188–282.

689 Fan, P., Miller, A.M., Schilmiller, A.L., Liu, X., Ofner, I., Jones, A.D., Zamir, D., and Last, R.L. 690 (2016a). In vitro reconstruction and analysis of evolutionary variation of the tomato 691 acylsucrose metabolic network. Proc. Natl. Acad. Sci. U. S. A. 113: E239-248.

692 Fan, P., Moghe, G.D., and Last, R.L. (2016b). Comparative biochemistry and in vitro pathway 693 reconstruction as powerful partners in studies of metabolic diversity. In Methods in 694 Enzymology, S.E. O’Connor, ed, Synthetic Biology and Metabolic Engineering in Plants 695 and Microbes Part B: Metabolism in Plants. (Academic Press), pp. 1–17.

23

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

696 Fiehn, O. (2002). Metabolomics--the link between genotypes and phenotypes. Plant Mol. Biol. 697 48: 155–171.

698 Forkner, R.E. and Hare, J.D. (2000). Genetic and environmental variation in acyl glucose ester 699 production and glandular and nonglandular trichome densities in Datura wrightii. J. 700 Chem. Ecol. 26: 2801–2823.

701 Frey, M., Chomet, P., Glawischnig, E., Stettner, C., Grün, S., Winklmair, A., Eisenreich, W., 702 Bacher, A., Meeley, R.B., Briggs, S.P., Simcox, K., and Gierl, A. (1997). Analysis of a 703 chemical plant defense mechanism in grasses. Science 277: 696–699.

704 Ghosh, B., Westbrook, T.C., and Jones, A.D. (2014). Comparative structural profiling of 705 trichome specialized metabolites in tomato (Solanum lycopersicum) and S. habrochaites: 706 acylsugar profiles revealed by UHPLC/MS and NMR. Metabolomics 10: 496–507.

707 Grabherr, M.G. et al. (2011). Full-length transcriptome assembly from RNA-Seq data without a 708 reference genome. Nat. Biotechnol. 29: 644–652.

709 Halkier, B.A. and Gershenzon, J. (2006). Biology and biochemistry of glucosinolates. Annu. 710 Rev. Plant Biol. 57: 303–333.

711 Hedges, S.B., Dudley, J., and Kumar, S. (2006). TimeTree: a public knowledge-base of 712 divergence times among organisms. Bioinforma. Oxf. Engl. 22: 2971–2972.

713 Hirakawa, H. et al. (2015). Survey of genome sequences in a wild sweet potato, Ipomoea trifida 714 (H. B. K.) G. Don. DNA Res. 22: 171–179.

715 Khan, M.I. and Giridhar, P. (2015). Plant betalains: Chemistry and biochemistry. Phytochemistry 716 117: 267–295.

717 Kim, J., Kang, K., Gonzales-Vigil, E., Shi, F., Jones, A.D., Barry, C.S., and Last, R.L. (2012). 718 Striking natural diversity in glandular trichome acylsugar composition is shaped by 719 variation at the Acyltransferase2 locus in the wild tomato Solanum habrochaites. Plant 720 Physiol. 160: 1854–1870.

721 King, R.R., Calhoun, L.A., Singh, R.P., and Boucher, A. (1990). Sucrose esters associated with 722 glandular trichomes of wild Lycopersicon species. Phytochemistry 29: 2115–2118.

723 Kroumova, A.B. and Wagner, G.J. (2003). Different elongation pathways in the biosynthesis of 724 acyl groups of trichome exudate sugar esters from various solanaceous plants. Planta 216: 725 1013–1021.

726 Kroumova, A.B.M., Zaitlin, D., and Wagner, G.J. (2016). Natural variability in acyl moieties of 727 sugar esters produced by certain tobacco and other Solanaceae species. Phytochemistry 728 130: 218–227.

24

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

729 Leckie, B.M., D’Ambrosio, D.A., Chappell, T.M., Halitschke, R., De Jong, D.M., Kessler, A., 730 Kennedy, G.G., and Mutschler, M.A. (2016). Differential and synergistic functionality of 731 acylsugars in suppressing oviposition by insect herbivores. PloS One 11: e0153345.

732 Leng, N., Dawson, J.A., Thomson, J.A., Ruotti, V., Rissman, A.I., Smits, B.M.G., Haag, J.D., 733 Gould, M.N., Stewart, R.M., and Kendziorski, C. (2013). EBSeq: an empirical Bayes 734 hierarchical model for inference in RNA-seq experiments. Bioinformatics 29: 1035– 735 1043.

736 Li, B. and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq data 737 with or without a reference genome. BMC Bioinformatics 12: 323.

738 Luu, V.T., Weinhold, A., Ullah, C., Dressel, S., Schoettner, M., Gase, K., Gaquerel, E., Xu, S., 739 and Baldwin, I.T. (2017). O-acyl sugars protect a wild tobacco from both native fungal 740 pathogens and a specialist herbivore. Plant Physiol.

741 Macel, M., Klinkhamer, P.G.L., Vrieling, K., and van der Meijden, E. (2002). Diversity of 742 pyrrolizidine alkaloids in Senecio species does not affect the specialist herbivore Tyria 743 jacobaeae. Oecologia 133: 541–550.

744 Magnotta, M., Murata, J., Chen, J., and De Luca, V. (2007). Expression of deacetylvindoline-4- 745 O-acetyltransferase in Catharanthus roseus hairy roots. Phytochemistry 68: 1922–1931.

746 Mayr, E. (1985). The growth of biological thought (Harvard University Press).

747 Milo, R. and Last, R.L. (2012). Achieving diversity in the face of constraints: lessons from 748 metabolism. Science 336: 1663–1667.

749 Moghe, G.D. and Last, R.L. (2015). Something old, something new: Conserved enzymes and the 750 evolution of novelty in plant specialized metabolism. Plant Physiol. 169: 1512–1523.

751 Mora, C., Tittensor, D.P., Adl, S., Simpson, A.G.B., and Worm, B. (2011). How many species 752 are there on earth and in the ocean? PLOS Biol. 9: e1001127.

753 Ning, J., Moghe, G., Leong, B., Kim, J., Ofner, I., Wang, Z., Adams, C., Arthur, D.J., Zamir, D., 754 and Last, R.L. (2015). A feedback insensitive isopropylmalate synthase affects acylsugar 755 composition in cultivated and wild tomato. Plant Physiol.: pp.00474.2015.

756 Ober, D. and Hartmann, T. (2000). Phylogenetic origin of a secondary pathway: the case of 757 pyrrolizidine alkaloids. Plant Mol. Biol. 44: 445–450.

758 Ofner, I., Lashbrooke, J., Pleban, T., Aharoni, A., and Zamir, D. (2016). Solanum pennellii 759 backcross inbred lines (BILs) link small genomic bins with tomato traits. Plant J. Cell 760 Mol. Biol. 87: 151–160.

761 Olmstead, R.G. and Bohs, L. (2007). A summary of molecular systematic research in 762 Solanaceae: 1982-2006. Acta Hortic.: 255–268.

25

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

763 Olmstead, R.G., Bohs, L., Migid, H.A., Santiago-Valentin, E., Garcia, V.F., and Collier, S.M. 764 (2008). A molecular phylogeny of the Solanaceae. Taxon 57: 1159–1181.

765 Olsen, C.E., Huang, X.-C., Hansen, C.I.C., Cipollini, D., Ørgaard, M., Matthes, A., Geu-Flores, 766 F., Koch, M.A., and Agerbirk, N. (2016). Glucosinolate diversity within a phylogenetic 767 framework of the tribe Cardamineae (Brassicaceae) unraveled with HPLC-MS/MS 768 and NMR-based analytical distinction of 70 desulfoglucosinolates. Phytochemistry 132: 769 33–56.

770 Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core 771 genes in eukaryotic genomes. Bioinformatics 23: 1061–1067.

772 Pereda-Miranda, R., Rosas-Ramírez, D., and Castañeda-Gómez, J. (2010). Resin glycosides from 773 the morning glory family. Prog. Chem. Org. Nat. Prod. 92: 77–153.

774 Pichersky, E. and Lewinsohn, E. (2011). Convergent evolution in plant specialized metabolism. 775 Annu. Rev. Plant Biol. 62: 549–566.

776 Prasad, K.V.S.K. et al. (2012). A gain-of-function polymorphism controlling complex traits and 777 fitness in nature. Science 337: 1081–1084.

778 Puterka, G.J., Farone, W., Palmer, T., and Barrington, A. (2003). Structure-function relationships 779 affecting the insecticidal and miticidal activity of sugar esters. J. Econ. Entomol. 96: 780 636–644.

781 Qi, X., Bakht, S., Leggett, M., Maxwell, C., Melton, R., and Osbourn, A. (2004). A gene cluster 782 for secondary metabolism in oat: implications for the evolution of metabolic diversity in 783 plants. Proc. Natl. Acad. Sci. U. S. A. 101: 8233–8238.

784 Refulio-Rodriguez, N.F. and Olmstead, R.G. (2014). Phylogeny of Lamiidae. Am. J. Bot. 101: 785 287–299.

786 Roberts, M.F., Strack, D., and Wink, M. (2010). Biosynthesis of alkaloids and betalains. In 787 Annual Plant Reviews Volume 40: Biochemistry of Plant Secondary Metabolism, M. 788 Winkessor, ed (Wiley-Blackwell), pp. 20–91.

789 Särkinen, T., Bohs, L., Olmstead, R.G., and Knapp, S. (2013). A phylogenetic framework for 790 evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree. BMC Evol. 791 Biol. 13: 214.

792 Schilmiller, A., Shi, F., Kim, J., Charbonneau, A.L., Holmes, D., Daniel Jones, A., and Last, 793 R.L. (2010). Mass spectrometry screening reveals widespread diversity in trichome 794 specialized metabolites of tomato chromosomal substitution lines. Plant J. 62: 391–403.

795 Schilmiller, A.L., Charbonneau, A.L., and Last, R.L. (2012). Identification of a BAHD 796 acetyltransferase that produces protective acyl sugars in tomato trichomes. Proc. Natl. 797 Acad. Sci. U. S. A. 109: 16377–16382.

26

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

798 Schilmiller, A.L., Last, R.L., and Pichersky, E. (2008). Harnessing plant trichome biochemistry 799 for the production of useful compounds. Plant J. Cell Mol. Biol. 54: 702–711.

800 Schilmiller, A.L., Moghe, G.D., Fan, P., Ghosh, B., Ning, J., Jones, A.D., and Last, R.L. (2015). 801 Functionally divergent alleles and duplicated loci encoding an acyltransferase contribute 802 to acylsugar metabolite diversity in Solanum trichomes. Plant Cell 27: 1002–1017.

803 Severson, R.F., Arrendale, R.F., Chortyk, O.T., Green, C.R., Thome, F.A., Stewart, J.L., and 804 Johnson, A.W. (1985). Isolation and characterization of the sucrose esters of the cuticular 805 waxes of green tobacco leaf. J. Agric. Food Chem. 33: 870–875.

806 Shalit, M., Guterman, I., Volpin, H., Bar, E., Tamari, T., Menda, N., Adam, Z., Zamir, D., 807 Vainstein, A., Weiss, D., Pichersky, E., and Lewinsohn, E. (2003). Volatile ester 808 formation in roses. Identification of an acetyl-coenzyme A. geraniol/citronellol 809 acetyltransferase in developing rose petals. Plant Physiol. 131: 1868–1876.

810 Shinozaki, Y., Matsuzaki, T., Suhara, S., Tobita, T., Shigematsu, H., and Koiwai, A. (1991). 811 New types of glycolipids from the surface lipids of Nicotiana umbratica. Agric. Biol. 812 Chem. 55: 751–756.

813 Simmons, A.T., Gurr, G.M., McGrath, D., Martin, P.M., and Nicol, H.I. (2004). Entrapment of 814 Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae) on glandular trichomes of 815 Lycopersicon species. Aust. J. Entomol. 43: 196–200.

816 St Pierre, B. and De Luca, V. (2000). Evolution of acyltransferase genes: Origin and 817 diversification of the BAHD superfamily of acyltransferases involved in secondary 818 metabolism. In Recent Advances in Phytochemistry, J.T. Romeo, R. Ibrahim, L. Varin, 819 and V. De Luca, eds, Evolution of Metabolic Pathways. (Elsevier), pp. 285–315.

820 Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: Molecular 821 Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30: 2725–2729.

822 Tipton, C.M. (2008). Susruta of India, an unrecognized contributor to the history of exercise 823 physiology. J. Appl. Physiol. 104: 1553–1556.

824 Vasas, A. and Hohmann, J. (2014). Euphorbia diterpenes: isolation, structure, biological activity, 825 and synthesis (2008-2012). Chem. Rev. 114: 8579–8612.

826 Velásquez, A.C., Chakravarthy, S., and Martin, G.B. (2009). Virus-induced gene silencing 827 (VIGS) in Nicotiana benthamiana and tomato. J. Vis. Exp. JoVE.

828 Weinhold, A. and Baldwin, I.T. (2011). Trichome-derived O-acyl sugars are a first meal for 829 caterpillars that tags them for predation. Proc. Natl. Acad. Sci. U. S. A. 108: 7855–7859.

830 Wilf, P., Carvalho, M.R., Gandolfo, M.A., and Cúneo, N.R. (2017). Eocene lantern fruits from 831 Gondwanan Patagonia and the early origins of Solanaceae. Science 355: 71–75.

27

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

832 Wink, M. (2003). Evolution of secondary metabolites from an ecological and molecular 833 phylogenetic perspective. Phytochemistry 64: 3–19.

834

28

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 1

Pyranose Furanose A. ring ring C. CH3 H3C OH 6’ O Solanum SlASAT1 6 OH HO O nigrum Stem Young leaf Fruit O 1’ 5’ O 5 4’ 4 1 2’ OH H2:14(4,10) 3 2 O SlASAT2 3’ SlASAT3 H2:15(5,10) O O O O O H3:16(4,4,8) O H3:16(2,4,10) CH3 CH3 H3C H3:17(2,5,10) CH3 CH H3:17(4,5,8) SlASAT4 3 H3:18(4,5,9) Length of H3:18(4,4,10) S4:17 (2,5,5,5) Core sugar individual H3:19(4,5,10) moeity acyl chains f

(Sucrose) a e

# of carbon l # of acyl atoms on all g chains D. Atropa n acyl chains u belladonna o Y Flower bud

S3:14(4,5,5) B. S3:17(3,5,9) Solanum lycopersicum S3:18(5,5,8) Solanum betaceum S3:18(3,5,10) Solanum quitoense NYBG Solanum quitoense MSU S3:19(5,5,9) Solanum eutuberosum S3:19(3,5,11) Solanum brachyantherum S3:20(4,5,11) Solanum aethiopicum Solanum jasminoides S3:20(5,5,10) Solanum conocarpum S3:20(3,5,12) Solanum nigrum Solanum pseudocapsicum S3:21(4,5,12) Physalis viscosa S3:21(5,5,11) Physalis peruviana S3:22(5,5,12) Physalis alkekengi f

Iochroma cyaneum NYBG a e l

Iochroma cyaneum MSU g

Iochroma fuchsioides n Saracha punctata Physalis u E. o Withania somnifera viscosa Petiole Y Flower bud Capsicum chinense Capsicum annuum S3:18(4,4,10) Lycianthes quichensis S3:19(4,5,10) Lycianthes rantonnetii S4:20(4,4,4,8) Brugmansia suaveolens Datura inoxia S4:21(4,4,5,8) Mandragora autumnalis S4:21(4,4,4,9) Juanulloa mexicana Solandra sp. S4:21(4,4,5,8) Atropa belladonna S4:22(4,4,5,9) Hyoscyamus niger S4:22(4,4,4,10) Nicotiana sp. Nicotiana alata yellow S4:23(4,10,10) Nicotiana alata white S4:23(4,4,5,10) Nicotiana sylvestris Calibrachoa sp. S4:23(4,5,5,9) Petunia axillaris S4:24(4,5,5,10) Schizanthus pinnatus S4:25(4,5,6,10) Goetzia sp. Cestrum diurnum S4:25(4,4,5,12) Salpiglossis sinuata S4:26(4,5,5,12) Ipomoea batatas S4:26(4,4,8,10) Ipomoea purpurea Convolvulaceae Argyreia nervosa S4:27(4,4,9,10) Convolvulus cneorum S4:28(4,4,10,10)

Figure 1: Acylsugars in Solanaceae (A) An example acylsugar from tomato. The nomenclature of acylsugars and the ASAT enzymes responsible for acylation of specific positions are described. Carbon numbering is shown in red. Sl refers to S. lycopersicum. The phylogenetic position of the ASAT enzymes is shown in Figure Supplement 1. (B) Species sampled for acylsugar extractions. Phylogeny is based on the maximum likelihood tree of 1075 species (Särkinen et al, 2013). Species with black squares show presence of acylsugars in mass spectrometry. Species highlighted in blue were cultivated for RNA-seq. Species in red are Convolvulaceae species. Species in pink were not sampled in this study but have been extensively studied in the context of acylsugar biosynthesis (see main text). More information about these species sampled at the NYBG is provided in File Supplement 1 and Source Data 1. (C,D and E): Individual acylsugars from three representative species. Color scale ranges from no acylsugar (white) to maximum relative intensity in that species (orange). Peak areas of isomeric acylsugars were combined. S. nigrum produced acylsugars consistent with a hexose (H) core. Acylsugars identified from other species are described in Figure Supplement 2 and 3, and the raw peak intensity values obtained from different species are provided in Source Data 2. Figure Supplement 4 shows fragmentation patterns of select acylsugars under positive ionization mode. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 1-Figure Supplement 1

Solyc00g134620 Solyc00g135260 Solyc07g017320 Solyc07g014580 Solyc00g040290 Solyc00g040390

ASAT3 Solyc02g079490 Solyc11g067290 Solyc11g067340 Solyc11g067330 Solyc03g078130 Solyc07g043700 Solyc07g043710 Solyc08g075180ASAT2 Solyc12g087980 Solyc08g075210 100 95

100 52 Solyc10g079570 Solyc09g092270 100 ASAT1 96 Solyc07g043670 53 45 100 Solyc08g078030 73 36 Solyc11g069680 Solyc08g007210 100 19 24 99 Solyc04g009680 99 100 100 Solyc05g039950 Solyc05g015800 46 60 43 Solyc08g014490 Solyc08g005770 51 Solyc02g081800 100 100 Solyc08g005760 53 Solyc02g081760 75 79 Solyc02g081740 Solyc11g020640 100 22 98 Solyc02g081750 100 71 Solyc07g049670 100 19 Solyc07g006670 36 100 Solyc07g006680 Solyc07g049660 100 35 78 Solyc04g082350 Solyc01g068140 28 26 Solyc05g014330 43 93 37 89 85 Solyc06g051320 21 100 Solyc04g079720 27 47 91 Solyc01g105550 39 54 Solyc08g013830 98 97 ASAT4 100 Solyc01g105590 100 Solyc06g051130 Solyc12g088170 80 Solyc04g080720 100 61 Solyc07g008380 100 66 61 Solyc07g008390 43 29 Solyc12g010980 Solyc03g097500 100 100 100 Solyc12g096770 100 Solyc11g008630 100 73 Solyc12g096790 100

100 69 Solyc12g096800 96 62 Solyc03g117600 100 93 Solyc07g052060 100 100 Solyc12g005430 Solyc07g005760 Solyc02g081770 100 Solyc12g005440 Solyc07g015960 Solyc01g008300

Solyc11g071470 0

1

Solyc11g071480 7

2

Solyc06g074710 6

0

Solyc11g066640 g

Solyc10g008680 2

0

Solyc03g025320 c

Solyc05g052680 y

l

Solyc05g052670 Solyc08g005890

o

Solyc09g014280 Solyc12g096250

Solyc04g078660 Solyc10g055730

Solyc01g107080 Solyc02g093180 S

Figure 1- Figure Supplement 1: Tomato ASATs are members of the BAHD enzyme family (A) SlASATs shown in a phylogenetic tree with all other BAHDs in cultivated tomato genome. BAHD is an acronym for four acyltransferase enzymes: benzyl alcohol O-acetyltransferase (BEAT), anthocyanin O-hydroxycinnamoyltransferase (AHCT), N-hydroxycinnamoyl/benzoyltransferase (HCBT), deacetylvindoline 4-O-acetyltransferase (DAT) - the first discovered members of this family. Please refer to Figure 1 for the biochemical activities of SlASATs. Only sequences >200aa in length were used in this analysis. Tree was constructed using Neighbor Joining with the JTT model and 100 bootstrap replicates. Figure 1-FigureSupplement2 certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder C. bioRxiv preprint A. B. S S S S S S S S S S S S S S S S S S S S alata yellow 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 S S S S S S S S S S S S S S S S S S S S S S S S S belladona Nicotiana : : : : : : : : : : : : : : : : : : : : S S S S S S S S S S S S alkekengi 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 1 4 1 4 2 4 2 4 2 4 2 4 1 4 1 4 1 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 Physalis Atropa : : : : : : : : : : : : : : : : : 8 : 9 : 0 : 1 : 3 : 4 : 8 : 9 : 9 0 1 2 3 4 3 3 4 4 5 8 : : : : : : : : : : : : 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( 4 7 8 2 8 2 3 9 4 6 6 9 7 0 7 0 8 0 8 9 1 0 1 1 2 2 3 2 3 5 4 5 4 4 5 5 5 7 6 8 6 2 7 2 8 2 2 2 2 2 2 2 2 2 2 2 2 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( , ( , ( , ( , ( , ( , ( , ( , ( , , , , , , , , , , , , 4 3 5 3 5 3 4 5 3 4 5 5 5 6 8 8 8 8 8 5 4 5 5 4 5 6 4 2 6 5 6 2 8 5 2 4 5 2 4 2 6 5 2 6 5 6 6 4 4 4 4 5 4 4 6 5 6 , , , , , , , , , , , , , , , , , , , , , , , , , , , , 5 , , , 5 , , , 5 , , 5 , , 5 , , , 5 , , 5 , 5 , , 5 , 5 , 5 , 5 , , , , , , , , 8 8 8 8 8 8 8 5 5 5 6 8 8 8 8 6 6 8 8 5 8 5 4 5 6 5 6 6 6 6 8 6 8 8 8 6 5 6 4 6 5 6 6 6 6 , , , , , , , , , , , , 5 9 8 1 9 1 1 1 1 1 1 1 ) ) ) ) ) ) , , , , , , , , ) , ) ) ) , ) , , , , , , , , , , , , , , , , , , , , , , , , doi: 8 7 8 8 8 8 8 8 6 6 6 8 8 6 8 8 8 6 8 8 8 8 8 9 6 6 6 8 6 8 8 6 8 8 ) ) ) 0 ) 1 1 0 2 2 1 2 ) ) ) ) ) ) ) ) , , ) , ) ) , ) , ) ) , ) ) ) ) , , , , , , , , ) , , ) ) ) ) ) ) ) 8 8 8 8 8 8 6 8 8 8 8 8 8 8 8 8 ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) Young leaf https://doi.org/10.1101/136937 Leaf Calyx Flower bud

Flower bud E. S S S S S S S S S S S S S S S S 3 3 3 3 3 3 3 3 3 3 3 4

Stem 4 4 4 4 Iochroma cyaneum D. : : : : : : : : : : : : : : : : acylsugar-like peaks plant in the extract. UA: Unidentified acylsugar, H:Hexose (orange square). The described acylsugars are only asubset of allthedetectable ranges from noacylsugars (white square) to thespecies-specificmaximum peak area pared to thetotal normalized acylsugar peak area inthegiven tissue.The color scale same acylsugar. Colors represent %normalized peakarea ofagiven acylsugar com- standard peakarea weight, anddry andwere summedacross different isomers ofthe different species. Figure 1-FigureSupplement 2:Normalized andintegrated acylsugarpeakareasin 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 4 5 6 7 8 9 0 1 1 2 4 5 5 6 7 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( S S S S S S S S S S S S S S S S S S 4 4 5 5 4 5 4 5 4 5 5 4 5 4 4 5 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 , , , , , , , , , , , , , , , , : : : 4 5 : 5 : 5 : 5 : 5 5 : 5 : 5 : 5 5 : 5 : 5 : : 4 : 5 : 5 : : Physalis 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 viscosa , , , , , , , , , , , , , , , , F. 8 9 0 1 1 1 2 2 3 3 3 4 5 5 6 6 7 8 5 5 5 6 8 8 1 1 1 1 1 5 5 5 5 5 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ) ) ) ) ) ) 0 0 2 1 2 , , , , , 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 ) ) ) ) ) , , , , , , , , , , , , , , , , , , 4 5 4 4 4 4 4 4 1 0 4 0 5 2 5 2 5 2 4 5 4 4 4 quitoense , , , , , , , , 0 , , , , , , , , , ) ) ) ) ) UA- UA- UA- UA- UA- UA- UA- UA- UA- UA- UA- UA- Solanum 1 1 4 5 4 4 5 4 5 5 5 6 5 5 8 9 1 , 0 0 , , , , , , 1 , , , , , , , , 0 8 8 9 1 9 1 1 9 1 1 1 1 1 1 MSU ) ) 0 , ) ) ) 0 ) 0 0 ) 0 0 2 2 0 0 1 ) 1 1 1 9 8 7 6 5 4 3 2 1 ) ) ) ) ) ) ) ) ) 0

Petiole ; 2 1 0 ) this versionpostedMay11,2017. a Emerging leaf PetioleCC-BY 4.0Internationallicense Peduncle (A-J): Peak area ofeachacylsugar was normalized usingtheinternal Leaf Young leaf

unopened flower Flower bud Petiole

Stem

G. H. S S S S S S S S S S S S S S S S S S S S S S S S Hyoscyamus 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 : : : : : : : : : : : : : : : : : : : : : H2 H2 : H3: H3 : H3 H3 : H3 H3 H3 Solanum 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 nigrum 7 8 9 9 0 0 1 1 2 3 5 6 7 8 9 0 0 0 1 1 2 3 3 4 niger : : : : : : : : 1 1 1 1 1 1 1 1 1 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( 4 5 4 5 4 5 4 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 5 6 7 6 7 8 8 9 , , , , , , , , , , , , , , , , , , , , , , ( ( , ( ( , ( ( ( ( ( 5 5 5 5 5 5 5 5 5 5 4 4 5 4 4 4 4 5 5 4 5 5 4 5 4 5 2 2 4 4 4 4 4 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 1 1 4 5 4 5 5 4 5 8 8 1 9 1 1 1 1 1 1 4 5 5 4 5 5 4 5 5 5 5 5 5 5 0 0 , , , , , , , ) ) 0 ) 1 0 2 1 2 3 , , , , , , , , , , , , , , 1 1 8 8 9 1 1 . 5 5 5 8 8 9 1 8 9 1 1 1 1 1 ) ) ) ) 0 0 ) ) ) ) ) ) 0 ) 0 ) ) ) ) ) ) ) 0 ) ) 0 0 1 2 2 The copyrightholderforthispreprint(whichwasnot ) ) ) ) ) ) ) ) ) ) Young leaf Stem Calyx Young leaf Stem Fruit I. J. S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S 3 3 S 6 3 4 6 4 S 4 6 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 Salpiglossis alata white : : : : : : : : : : 3 : 3 : 3 : : 3 : 3 : 3 : 3 : : 4 : 4 : 4 : 4 : 4 : 4 : 4 : 4 4 : 5 : 5 : : 5 : 5 5 5 1 1 2 1 1 2 1 1 2 1 1 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 Nicotiana : : : : : : : : : : : : : : : : : : : : : : 6 6 5 8 6 6 7 7 7 7 8 9 0 0 1 1 1 2 2 3 4 9 0 0 1 sinuata 2 3 3 4 5 6 1 1 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( 5 2 2 6 2 2 2 2 2 2 2 2 3 2 4 3 5 5 4 5 6 2 2 2 2 2 2 2 2 2 2 8 9 0 0 1 3 4 8 8 9 9 0 1 2 3 4 3 3 4 4 5 8 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ( ( 6 6 ( 5 2 ( 6 3 ( 2 5 ( 4 2 3 ( 4 ( 5 ( 5 6 ( 5 ( 6 ( 5 ( 5 6 ( 6 ( 6 ( 2 ( 2 ( 2 2 ( 3 ( 5 ( 4 ( 5 5 5 5 6 4 5 7 8 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 6 6 5 5 6 5 5 5 5 5 6 6 6 6 6 6 6 5 6 6 6 6 4 5 4 5 5 5 5 5 6 , , , , , , , , , , , , , , , , , , , , , , , ) , , ) , , , , , , , , , , , , , , , , , , , , , 5 , 6 , 6 , 8 , 8 , 8 8 8 3 5 4 5 5 4 5 6 2 2 2 2 2 2 6 6 5 6 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 6 6 6 5 6 6 6 , , , , , , , , , , , , , , , , , , ) , , ) , ) , ) , ) , ) ) , ) ) , ) ) ) ) ) ) ) , , , , , , , , , 6 5 6 6 8 8 8 8 8 8 6 8 6 8 6 5 5 6 5 6 5 6 6 6 8 6 8 6 8 6 5 4 5 6 8 ) , , , ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) , , , , , , , , , , , , , , , 6 6 6 8 8 7 8 8 8 8 8 8 6 6 8 8 8 8 ) ) ) ) ) ) ) ) ) ) ) ) , , , , , , 8 8 8 8 8 8 ) ) ) ) ) ) Petiole Leaf Young leaf Flower bud Old leaf Stem Fruit w/ calyx bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Figure 1-Figure Supplement 3 aCC-BY 4.0 International license.

Solanum 1: TOF MS ESI- 100 BPI nigrum 2.92e6

%

0 40 100

1: TOF MS ESI- 100 Physalis BPI alkekengi 7.19e6

%

0 40 100

100 1: TOF MS ESI- Physalis BPI viscosa 8.48e6

%

0 30 100 100 Iochroma 1: TOF MS ESI- BPI cyaneum 6.97e6

%

0 30 100 1: TOF MS ESI- 100 Atropa BPI belladonna 6.87e6 Relative abundance

%

0 30 100

100 Hyoscyamus 1: TOF MS ESI- BPI niger 8.48e6

%

0 40 100

100 Nicotiana 1: TOF MS ESI- BPI alata yellow 4.70e6

%

0 40 100

100 1: TOF MS ESI- Salpiglossis BPI sinuata 1.35e7 %

0 40 Retention time 100

Figure 1-Figure Supplement 3: The complexity of acylsugar phenotype across multiple acylsugar producing species. Undiluted trichome extracts collected from plants at the NYBG were run on a 110-min gradient on a C18 column as described in the Methods (Supplemental Table 1). Peaks consistent with acylsug- ar masses and fragmentation patterns eluted between 30/40-min and 100-min for all species. Most of the peaks shown in chromatograms above have m/z and fragmentation patterns consistent with being acylsugars with aliphatic acyl chains. The most abundant peaks were identified, and have been described in Figure Supplement 2. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Figure 1-Figure Supplement 4 aCC-BY 4.0 International license.

A. Nicotiana alata white B. Hyoscyamus niger 2: TOF MS ES+ 2: TOF MS ES+

513.29)+ 205.07 3.52e5 4.33e5 100 8 100 , 6 ,

169.05 )

5 710.38 , 8 , 2

109.03 6 -Furanose , Tetra-acylsugar Tetra-acylsugar 5

, 752.42 % % 2,5,5,12)

2 139.04 514.30 -NH3 ( 1 2 Furanosyl (2)+ 85.06 : 313.16 97.03 4:24 ( 253.10 369.18 4 515.30 206.07 513.33 S

170.05 S

Pyranosyl ( 411.26 285.13 370.19 0 m/z 0 m/z -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 2: TOF MS ES+ 2: TOF MS ES+ 205.07 313.16 100 9.41e5 100 7.84e5 ) 8 8 , , 513.33 5

109.03 , 5,5,12)+ 710.41 )+ Penta-acylsugar (

8 -Furanose Tri-acylsugar )+ 2, 2 % % 8 , 2 5,5,12) , Furanosyl (

4 ( 85.06

5 -NH3

145.05 , 780.42 2 411.26 183.17 514.33 2

5 : 211.09 Pyranosyl ( 127.04

206.07 3:22 ( 97.03 S 57.07 295.15 397.21 541.33 515.33

229.10 S 0 m/z 0 Pyranosyl m/z -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

Salpiglossis sinuata Physalis viscosa C. D. 2: TOF MS ES+ 205.06 100 6.35e3 2: TOF MS ES+ 247.11 100 1.61e5 Tetra-acylsugar ,5,6,6) % 2 )+ 682.36 181.08 2 4,4,5,8) Furanosyl ( Tetra-acylsugar % 139.04 327.18 19 (

710.36 21 (

443.27 S4 : Furanosyl (5)+ 341.18 S4 : 0 m/z 85.06 -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 429.23 2: TOF MS ES+ 0 m/z 247.08 -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 100 2.05e4 169.05 Penta-acylsugar

724.37 ,2,5,6,6) % 2,2)+ 2 Furanosyl ( 327.18 21 (

443.26 S5 : 0 m/z -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 2: TOF MS ES+ 331.13 100 5.76e3

Hexa-acylsugar ,2,5,5,6,6) ,2,5)+ % 2 327.18 2 109.03 Furanosyl (

253.12 26 ( 808.44 211.08 443.27 S6 : 0 m/z -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

E. Iochroma cyaneum

2: TOF MS ES+ 780.48 100 317.16 1.59e4 4,5,5,12)

% Tetra-acylsugar 26 (

299.16 4,5)+ 429.29 Furanosyl ( 181.09 S4 : 0 m/z -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

2: TOF MS ES+ 682.40 100 1.43e4 -Furanose 5,5,10)+ 5,5,10) Pyranosyl ( 3 Tri-acylsugar % 485.31 -NH 313.17 20 ( 486.31 1346.78 211.10 168.07 383.25 1348.77 0 S3 : m/z -0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

Figure 1-Figure Supplement 4: Positive mode CID mass spectra of select representative acylsug- ars from five species. (A-E) Each spectrum was generated at elevated collision energy to generate fragment ions. Selected fragment ions obtained at the same retention time as the assigned pseudomo- lecular ion of the acylsugar are annotated. The pseudomolecular ion masses are masses of the adducts of the depicted acylsugars ([M+NH4]+). The mass difference between the pseudomolecular ion (intact acylsugar) and the fragment ion provides information about the number of acyl carbon atoms on each of the pyranose and furanose rings. We interpreted the losses and the fragment ions as involving the pyranose vs. furanose ring based on knowledge about their fragmentation patterns derived from previous studies. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Table 1: RNA-seq data statistics

Item S. nigrum S. quitoense H. niger S. sinuata Original read pairs 81,314,841 85,374,110 86,161,659 80,302,734

Filtered read pairs 73346531 76734781 76819022 71129160 (% original) (90.2%) (89.9%) (89.2%) (88.6%) Normalized read 17301238 15350023 20779972 17905057 pairs (23.6%) (20.0%) (27.1%) (25.2%) (% normalized) Total transcript 160,583 124,958 189,711 149,136 isoforms Longest isoforms 78,020 72,426 96,379 77,970 (% total) (49%) (58%) (51%) (52.3%) With >10 reads 32,105 32,044 38,252 32,798

With predicted 23,224 22,289 26,262 23,570 peptide>50aa Differentially 10,386 (32%) 12,194 (38%) 9007 (24%) 7091 (22%) expressed1,2 (% >10 reads) Trichome high2 2292 (22.1%) 3547 (29.1%) 3321 (36.8%) 1888 (26.6%) (% differentially expressed) 1 Differentially expressed genes at p<0.05 (corrected for multiple testing); 2 p<0.05, fold change>2 2 Some differentially expressed transcripts were confirmed by RT-PCR, as shown in Figure Supplement 1.

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Table 1-Figure Supplement 1

A. 1kb plus

28X

35X

PDS control

1kb plus B.

28X

35X

PDS control

Table 1-Figure Supplement 1: Confirmation of differential expression results from RNA-seq using semi-quantitative RT-PCR (A, B) Verification of trichome high expression of certain transcripts using semi-quantitative RT-PCR in S.quitoense (A) and Salpiglossis (B). The comparisons between Trichome (T) and Shaved stem (S) tissues involved using the same amount of starting cDNA, stopping the reactions after 28 or 35 cycles of PCR and running them on an agarose gel. Phytoene desaturase (PDS) which was not estimated to be differentially expressed based on RSEM/EBSeq analysis was used as control. Figure 2

A.

O O

OH OH O OH OH O O O OH OH O O O O HO O O O O HO O O O O O O O O O O OH OH OH OH O O O O O O O O O OH O O O O O O O O O O O O O O O O O

S4:19 (2,5,6,6) S4:21 (5,5,5,6) S5:21 (2,2,5,6,6) S6:25 (2,2,5,5,5,6)

OH O OH HO O O O OH O 2.44 OH HO O B. 1.65 O 1.38 TOF MS ES- 100 OH OH HO OH O O OH O HO HO O OH S2:11(5,6) O O O OH OH O HO O OH O O O O O S1:6(6) S3:17(5,6,6) Relative abundance % Time 0 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 3.00

ASAT5 ASAT5 S5:21(2,2,5,6,6) D. C. 100 1: TOF MS ES- (product) [F2:4(2,2)-H]+ + 4.00;779.37 100 SsASAT5 enzyme reaction 3.41;751.34 4 SsASAT5 enzyme reaction 779.37 0.1000Da 9.92e5 in vitro 100 2: TOF MS ES+ in vitro 2.34e5 S5:23(2,5,5,5,6) OH O OH + (product) HO O 247.08 -NH O O % 4 OH bioRxiv preprintO doi: https://doi.org/10.1101/136937; this version -F2:4(2,2)posted May 11, 2017. The copyright holder for this preprint (which was not O O S4:19(2,5,6,6) certified by peerO O review)O is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under O (substrate) 443.2649 724.37 m/z [P3:17(5,6,6)-H]+ 0 S5:21(2,2,5,6,6).NH 100 aCC-BY200 300 40 04.0500 International600 700 license.

2.96 Relative abundance % Relative abundance % 709.33 0 0

3.41;751.34 100 3.99;779.37 100 S5:21(2,2,5,6,6) S5:23(2,5,5,5,6) 1: TOF MS ES- O plant extract S5:23(2,5,5,5,6) 779.37 0.1000Da plant extract OH 1: TOF MS ES- 1.83e6 O O 709.33+751.34 0.1000Da HO O O O 3.97e6 OH O O O O O O O

2.96 Relative abundance % Relative abundance % 709.33 Time Time 0 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50 ) 6 , 5 , ) ) 5 6 6 , , ) , 5 5 6 5 , , ) , , 2 5 6 6 5 , , , , , e 2 ) 5 5 5 2 s ( ( ( ( ( 6 0 ( 1 1 7 8 o r 2 6 2 1 1 1 : : : : : : E. c 5 u 1 4 2 3 4 S S S S S S S Transcript ID Inferred activity CoA donors utilized c40486_g1 SsASAT1 aiC5, aiC6 c49208_g1 SsASAT2 aiC5, aiC6 c36223_g2 SsASAT3 aiC6 c48744_g2 SsASAT5 C2 c49704_g1 c57127_g2 c57127_g3 c58191_g3 c52698_g1

C2+aiC5 C2+aiC5 +aiC6 CoA CoA

Figure 2: In vitro validation of Salpiglossis ASAT candidates. (A) NMR derived structures of three Salpiglossis acylsugars. NMR resonances used to interpret the first three structures are described in Table Supplement 1. We verified the plant under study as Salpiglossis using genetic markers. These results are shown in Figure Supplement 1A,B. (B) Results of enzyme assays for SsASAT1 (black), SsASAT2 (blue) and SsASAT3 (red). Numbers above the peaks represent the retention times of the individual compounds (see Methods), whose predicted structures are shown alongside. Additional validation of these in vitro results is described in Figure Supplements 2-5. (C,D) The SsASAT5 reactions, whose products have the same retention time as in planta compounds. Inset in panel C shows positive mode fragmentation and predicted acyl chains on pyranose [P] and furanose [F] rings. SsASAT5 also performs additional acylation activities as shown in Figure Supplement 6. (E) Testing various ASAT candidates with different acceptor (top) and donor (bottom) substates. Red indicates no activity seen by LC/MS, dark blue indicates a likely true activity, which results in a product usable by the next enzyme and/or a product that co-migrates with the most abundant expected compound. Light blue color indicates that the enzyme can acylate a given substrate, but the product cannot be used by the next enzyme or does not co-migrate with the most abundant expected compound. The relationships of the enzymes with each other are shown in Figure Supplement 1C. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 2 - Table Supplement 1: NMR chemical shifts for four acylsugars purified from Salpiglossis plants

S4:19(2,5,6,6) S4:21 (5,5,5,6) S5:21(2,2,5,6,6) S6:25(2,2,5,5,5,6) Carbon # (group) 1H (, ppm) 13C (, ppm) 1H (, ppm) 13C (, ppm) 1H (, ppm) 13C (, ppm) 1H (, ppm) 13C (, ppm) 1 (CH) 5.64 (d, J = 3.7 Hz) 89.8 5.87 (d, J = 4.0 Hz) 88.8 5.62 (d, J = 3.6 Hz) 89.5 5.70 (d, J = 3.6 Hz) 89.4 4.87 (dd, J = 10.4, 3.7 4.78 (dd, J = 10.3, 4.1 4.90 (dd, J = 10.4, 3.7 4.87 (dd, J = 10.4, 3.7 2 (CH) 70.5 71.1 70.5 70.4 Hz) Hz) Hz) Hz) -1(CO) 172.8 177.6 172.3d 175.9 2.32 (dd, J = 15.7, 5.7 2.30 (dd, J = 15.5, 5.8 2.40 (sextet, J = 7.0 -2(CH ) Hz), 2.06 (dd, J = 41.0 40.8 Hz), 2.05 (dd, J = 41.0 40.7 2 Hz) 15.7, 8.3 Hz) 2.41 (sextet, J = 7.0 Hz) 15.5, 8.3 Hz) -3(CH) 1.82 (m) 31.6 1.14 (d, J = 7.0 Hz) 1.82 (m) 31.6 1.12 (d, J = 7.0 Hz) 16.1

-3(CH3) 16.2

-4(CH3) 0.88 (d, J = 6.7 Hz) 19.4 1.64 (m), 1.45 (m) 26.7 0.88 (d, J = 6.7 Hz) 19.3 1.63 (m), 1.44 (m) 26.8

-5(CH2) 1.34 (m), 1.20 (m) 29.3 0.87 (t, J = 7.4 Hz) 11.6 1.33 (m), 1.20 (m) 29.4 0.86 (t, J = 7.4 Hz) 11.7

a 5.58 (dd, J = 10.6, 9.3 d 5.57 (dd, J = 10.7, 9.2 -6(CH ) 0.88 (t, J = 7.4 Hz) 11.4 0.87 (t, J = 7.4 Hz) 11.4 68.9 3 Hz) Hz) 5.53 (dd, J = 10.7, 9.1 5.55 (dd, J = 10.7, 9.2 3(CH) 68.9 68.8 69.0 172.2 Hz) Hz) 2.25 (dd, J = 15.8, 5.4 2.24 (dd, J = 15.8, 5.4 -1(CO) 172.3b Hz), 2.02 (dd, J = 15.8, 172.3d Hz), 2.01 (dd, J = 15.8, 41.1g 8.5 Hz) 172.2 8.6 Hz) 2.23 (dd, J = 15.5, 5.6 2.23 (dd, J = 15.6, 5.5 -2(CH2) Hz), 2.02 (dd, J = 41.2 Hz), 2.02 (dd, J = 41.3 1.78 (m) 31.5 15.5, 8.4 Hz) 1.79 (m) 41.1c 15.5, 8.4 Hz) -3(CH) 1.79 (m) 31.6 0.89 (d, J = 6.7 Hz) 31.5 1.79 (m) 31.7 0.88 (d, J = 6.7 Hz) 19.5

-4(CH3) 0.88 (d, J = 6.7 Hz) 19.4 1.31 (m), 1.19 (m) 19.5 0.87 (d, J = 6.7 Hz) 19.4 1.30 (m), 1.18 (m) 29.4

-5(CH2) 1.31 (m), 1.18 (m) 29.4 0.86 (t, J = 7.4 Hz) 29.4 1.30 (m), 1.17 (m) 29.4 0.85 (t, J = 7.4 Hz) 11.4 4.93 (dd, J = 10.7, 9.3 4.97 (dd, J = 10.3, 9.2 -6(CH ) 0.85 (t, J = 7.4 Hz) 11.4a 0.85 (t, J = 7.4 Hz) 11.4e 69.0 3 Hz) 11.4 Hz) 4.92 (dd, J = 10.7, 9.2 4.97 (dd, J = 10.5, 9.2 4(CH) 68.6 68.9 176.5 Hz) 68.6 Hz) 2.36 (sextet, J = 7.0 -1(CO) 176.1 176.4 41.1g 2.36 (sextet, J = 7.0 Hz) 176.2 Hz) 2.36 (sextet, J = 7.0 2.35 (sextet, J = 7.0 -2(CH) 41.1 41.1 1.10 (d, J = 7.0 Hz) 16.5 Hz) 1.10 (d, J = 7.0 Hz) 41.1c Hz)

-3(CH3) 1.10 (d, J = 7.0 Hz) 16.5 1.66 (m), 1.44 (m) 16.4 1.10 (d, J = 7.0 Hz) 16.5 1.66 (m), 1.45 (m) 26.7

-4(CH2) 1.67 (m), 1.44 (m) 26.6 0.90 (t, J = 7.4 Hz) 26.6 1.66 (m), 1.44 (m) 26.7 0.90 (t, J = 7.4 Hz) 11.8 4.15 (ddd, J = 10.6, 6.2, 4.07 (ddd, J = 10.2, -5(CH ) 0.90 (t, J = 7.4 Hz) 11.8 0.90 (t, J = 7.4 Hz) 11.8 71.2 3 2.5 Hz) 11.8 5.1, 2.2 Hz) 3.63 (dd, J = 12.4, 2.4 3.66 (dd, J = 12.8, 2.3 4.14 (ddd, J = 10.2, 4.05 (ddd, J = 10.3, 5(CH) 71.9 Hz), 3.60 (dd, J = 12.4, 61.5 71.1 Hz), 3.59 (dd, J = 12.8, 61.7 5.5, 2.8 Hz) 5.2, 2.1 Hz) 6.5 Hz) 5.1 Hz) bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

3.66 (dd, J = 12.8, 2.3 6 (CH2) 3.61 (m) 61.9 88.8 Hz), 3.58 (dd, J = 61.6 5.70 (d, J = 3.6 Hz) 89.4 5.87 (d, J = 4.0 Hz) 12.8, 5.3 Hz)

3.66 (d, J = 12.3 Hz), 4.13 (d, J = 11.6 Hz), 3.63 (d, J = 12.3 Hz), f 4.14 (d, J = 11.8 Hz), 1′(CH ) 64.6 64.1 64.3 64.1 2 3.54 (d, J = 12.3 Hz) 4.06 (d, J = 11.6 Hz) 3.55 (d, J = 12.3 Hz) 4.04 (d, J = 11.8 Hz) -1(CO) 175.9 175.8

c 2.44 (sextet, J = 7.0 g -2(CH) 41.1 41.1 2.41 (sextet, J = 7.0 Hz) Hz)

-3(CH3) 1.15 (d, J = 7.0 Hz) 16.7 1.17 (d, J = 7.0 Hz) 16.7

-4(CH2) 1.68 (m), 1.49 (m) 26.9 1.70 (m), 1.51 (m) 26.9

-5(CH3) 0.91 (t, J = 7.4 Hz) 11.7 0.93 (t, J = 7.4 Hz) 11.7 2′(C) 104.2 103.7 104.7 103.0 3′(CH) 5.17 (d, J = 7.8 Hz) 80.1 4.17 (d, J = 8.9 Hz) 78.4 5.20 (d, J = 7.6 Hz) 79.3 5.23 (d, J = 8.1 Hz) 79.0 -1(CO) 172.3b 4.31 (t, J = 8.9 Hz) 172.0 171.5

-2(CH3) 2.26 (s) 21.0 3.71 (dt, J = 9.2, 2.3 Hz) 2.22 (s) 21.0 2.20 (s) 20.9 3.88 (dd, J = 13.4, 2.4 4′(CH) 4.60 (t, J = 7.8 Hz) 71.5 Hz), 3.71 (dd, J = 13.4, 72.6 4.35 (t, J = 7.6 Hz) 73.9 4.35 (t, J = 8.1 Hz) 73.5 2.3 Hz) 4.13 (d, J = 11.6 Hz), 4.08 (ddd, J = 9.6, 4.04 (ddd, J = 8.2, 5.7, 5′(CH) 3.89 (m) 82.4 81.0 80.4 80.2 4.06 (d, J = 11.6 Hz) 6.1, 3.6 Hz) 3.6 Hz) 4.41 (dd, J = 12.1, 6.2 4.40 (dd, J = 12.2, 5.6 f 6′(CH2) 3.89 (m), 3.71 (m) 59.9 59.7 Hz), 4.28 (dd, J = 64.3 Hz), 4.30 (dd, J = 12.2, 63.8 12.1, 3.6 Hz) 3.5 Hz) -1(CO) 171.7 171.7

-2(CH3) 2.12 (s) 21.0 2.13 (s) 21.0

aTwo 13C signals not resolved in 2D spectra (11.37, 11.39 ppm) bTwo 13C signals not resolved in 2D spectra (172.27, 172.29 ppm) cThree 13C signals not resolved in 2D spectra (41.08, 41.10, 41.13 ppm) dTwo 13C signals not resolved in 2D spectra (172.34, 172.35 ppm) eTwo 13C signals overlapping fTwo 13C signals overlapping gThree 13C signals not resolved in 2D spectra (41.10, 41.12, 41.14 ppm)

Figure 2-FigureSupplement1 A.

tation. Nodes withoutanyvalueshave<70 bootstrapsupport. The bluecladerepresentsClade III. all siteswith<70% coveragewerediscarded. Onlybootstrapvalues >70aredisplayedinthe figureforeaseofrepresen - squares) andproteinsequences ofcandidateSalpiglossisenzymes(redsquares). 1000bootstrapswereperformed,and default parametersinMEGA6, withBAHDenzymesdescribedinD’Auria(2006), SlASAT proteinsequences(blue noted. Tree was constructedusingNJapproachwith1000bootstrap replicates.(C)Neighborjoiningtreeobtainedusing Hyoscyamus DNA (thisstudy)andothersequencesdownloadedfrom NCBI. Accession numbersofNCBIsequences are enzymes. (A,B)Phylogenybasedon thendhF(A)andtrnLFspacer(B)sequenceamplified fromSalpiglossisand Figure 2-FigureSupplement 1:PhylogeneticpositionsofSalpiglossis,Hyoscyamus andSalpiglossiscandidate

Goetzia elegans elegans Goetzia AY206746.1 certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint

Hyoscyamus niger

ndhF

Duckeodendron cestroides cestroides Duckeodendron HQ216090.1 AY206743

ndhF ndhF ndhF C.

doi: 90 https://doi.org/10.1101/136937 HN 100 ndhF (this study)

Cucumis CmAAT(1-2)

SS c49383 g1 c49383 SS 78 Lupinus HMT/HLT Lupinus Musa BanAAT Arabidopsis CHAT SS

Taxus DBNTBT MpAAT1 Malus ndhF

Clarkia CbBEBT Clarkia

Taxus BAPT U08921.1

Vitis AMAT Vitis Cucumis CmAAT3 Cucumis 100 (this study)

Taxus DBAT

Solanum lycopersicum Solanum 38

ndF 54

Petunia BPBT Petunia

SS c49704 g1 c49704 SS Taxus TAT NtBEBT Nicotiana Taxus DBBT Salpiglossis sinuata

Hordeum ACT Arabidopsis CER2 Arabidopsis

Dianthus HCBT

U08926.1 Petunia axillaris Petunia ndhF

Cestrum diurnum U08928.1 ;

100

Zea Glossy2 Zea this versionpostedMay11,2017.

75 99

a

100

98 100

ndhF CC-BY 4.0Internationallicense

KU376255.1 Nicotiana HQT g1 c52698 SS

Avena AsHHT1 ndhF

80 99

Arabidopsis AtHCT 92 100

Nicotiana NtHCT 99 89

99

99 93 99 0.2

Chrysanthemum Dm3MAT1 Chrysanthemum 100

99 96

C h r y s a n t h e m u m

D m 3 M A T 2

100 100 SlASAT2

99

Pericallis Sc3MaT Pericallis SS c36223 g1

Dahlia Dv3MAT Dahlia

100

100 97 SlASAT3

98 98

Glandularia Vh3MAT1 Glandularia 76 SS c40486 g1

AY206760.1 86 100

Lamium Lp3MAT1 Lamium Goetzia elegans Goetzia B. SlASAT1 .

89

The copyrightholderforthispreprint(whichwasnot

Nicotiana NtMAT1 Nicotiana 100 Cestrum elegans

SS c49208 g1 trnLF 96

100 100 SS c58191 g3 Gentiana Gt5AT Gentiana

100 Capsicum Pun1 KP756701.1

Perilla Pf3AT Perilla Catharanthus MAT trnLF Salvia Ss5MaT2 Salvia 100

Catharanthus CAT

Perilla Pf5MaT Perilla

Papaver SalAT Papaver

Clarkia CbBEAT Clarkia 100

SlASAT4 SS c48744 g2 Salvia Ss5MaT1 Salvia SS c57127 g3 SS c57127 g2

Rosa RhAAT1 Cucumis CmAAT4

Fragaria VAAT Fragaria SAAT Fragaria 0.00 5 Solanum lycopersicum Salpiglossis sinuata

trnLF trnLF 57 (this study) AY098703.1 40 90

Salpiglossis sinuata Hyoscyamus senecionis Hyoscyamus

KU295782.1 KM200028.1

trnLF trnLF bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 2-Table Supplement 2: Trichome preferentially expressed BAHD enzymes

Candidate Homolog Trichome Trichome Stem Stem Status (Trinity 1 (# of 2 (# of 1 (# of 2 (# of assembler reads) reads) reads) reads) given names) c40486_g1 PaASAT1 3856 1000 46 10 SsASAT1 c49208_g1 Sl_ASAT1/PaASAT2 5310 1851 39 7 SsASAT2 c36223_g2 Sl_ASAT2/PaASAT3 5987 1792 51 13 SsASAT3 c48744_g1 Solyc01g008330 5045 940 29 7 SsASAT5 c49704_g1 Solyc08g078030 3817 1200 56 17 NAD c58191_g3 Solyc02g081740 3360 1086 30 14 NAD c57127_g2 Solyc05g014330 25699 5624 112 25 NAD c57127_g3 Solyc05g014330 6094 2073 44 15 NAD c52698_g1 Solyc09g092270 7940 4810 528 436 NAD c49383_g1 Solyc08g005760 15113 7133 108 31 Not tested c50879_g1 Solyc12g087980 2849 1162 458 235 No DFGWG/HxxxxD  Abandoned c41764_g1 Solyc06g051320 465 231 90 31 Not cloned (not highly (ASAT4 clade) differentially expressed) c42979_g2 NA 445 157 12 5 Not cloned (not highly differentially expressed)

NAD: No Activity Detected in vitro bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 2-File Supplement 1 Sequences identified in this study. The transcript identifier as per the assembler Trinity nomenclature is noted in the part after the bar (|). Green and yellow highlighted sequences are the two regions targeted for VIGS in Salpiglossis.

>SsASAT1|c40486_g1 ATGGTTGCCTCAGCACTAGTTTCTTTGTCGAAGAAAATTATCAGACCTTTCTCTCCAACCCCTTTTTCCGAAAGAATT TACAAGCTTTCCTTTATTGATCAATTCAATAGTACACAATATATCCCCTTAGTCTTCTTCTATCCCAAGAACAAGGGA AATATTGAACCAAGTGATATGTGTAAGGTTTTTGAGAATTCCCTCTCAAAAACCTTAGCTGCTTATTACCCTTTTGCA GGAACATTAAGGGACAATATATGTGTTGAGTGTAACGATATAGGTGCTGATTTCTTCAAGGCTCGTTTCGATTGTC CAATGTCTGAAATTCTCAAAAGTCATAATAGAGATGTCAAAGAAAGTGTATATCCCAAGGATATACCATGGAGTGA TGCATCAAACAAAAAGTTGGTCACGGTTCAATTTAACCAATTTGATTGTGGAGGAATAGCTCTAAGTGCATGTATA TCGCACAAGATTGGAGACTTTTGTACGGTGATTAACTTTTTTCACGATTGGGCTGCAATTGCTCGTGATTCTAATGG AAAAGTATGTCCTCAATTTATTGGATCATCCATCTTCCCACCTACCAATGAACCTGTGAATGAGCCTCCTCGTATAA AATCATGTGTAACGAAAAAGTTAGTTTTTTCAAACCATACATTAAAATCTTTCATTACTGAATCATCATCACTAGAA GTTAAAGAACCAACTCGGGTAGAAATACTCACATCACTTCTTTATAAATGTGGTATGAAAGCGGTATTGGAAAATA ATTCTTCTAGTTCTTCAATTTTCAACCCGTCTATTCTGTTCCAAACCGTGAATTTAAGGCCTTTTATACCTTTGCCAGA GAATACTGCTGGAAATTTTAGTTCTTCCCTTTTTGTACCGACATATATTGAAGAAGAAACAAAGTTATCTACATTGG TTAGTCAACTAAGAAAGGGAAAAGAAGAGGTCATTAATAACTTCAAGAAATGTGAAGGGGGTCAAGATTTGGTG TCAGCAACAAAAGGACCATTTCAAGAAATAAGAAAGTTGTACAAGGAGGTGGATTTTCAACTGTATAGGTGTAGT AGTTTGGCTAATTATCCAATATATGATGTAGACTTTGGATGGGGTAGGCCAAAACAAATAACTATGGCGGATTCTC CATTGAGAAATACTTTCTCTTTGTTTGACGACAACACTGGGAATTATATAGAAGCACTGGTAGCTTTGGATGATGA AATTACTATGTCTACGTTTGAGAGAGAGATGGAGCAACTTCTTGAATCTCAACTTCCAACTGAAGAAAATAAAACG GAAGCTAGCGCTCATTGTTGA

>SsASAT2|c49208_g1 ATGGCTGCTTCAAGGCTTGTTTTGCTTTCCAAAAAGATGATTAAGCCTTCTTCTCCAACTCCAATTTCACATAGAATT CACAAGCTCTCACTTATGGACCAAATGGGAACTCACTCCTACGTTCCGTTTTCGTTCTTTTACCCGAAACAAGACGC TGCAAGCTCGCTCGAACCAACAAAGGTATCTCGAATTCTTGAAAAATCCCTTTCGAAAGTTCTAACATCATATTATC CGTTAGCAGGGCGCGTAAGAGACAATTCATTTGTCGAATGTAATGACATGGGAGTCGATTTCTCTCAAGTCCAAAT AGATTGTCCAATGTCCAGAATTTTCGATGTCCCTCGTGCTGAGATGGACAATTTAATATTTCCTAAAGATCCTTGGA TCCCGTCTACGGACAGTTTAGTCGTAGCACAACTCAACCATTTCGAATGTGGTGGCGTTGTACTAAGCACATGCAT TTCCCACAAGGTTGCCGACGGATACAGTATCGCGAAATTCCTCAGGGACTGGGCTATTGTCGCGCGTGATTCAGG AGAGAAACCATCTCCCCTGTTTAATGGAGCGTCGATACTTAAACCAACCAACAATTCTTCAACTCAAGTGGTTGATC CCTCTTTATATAAAACAACATCGAAAAGGTACCATTTTTCTGCCTCCAAGTTAAAAGCTCTTAAGGCCATGATTTCA GCTGATTCTGGAAACCAGATTGTTCCTACAGCCGTGGAAGCCGTTACTGCGTTGCTATTCAAAAGCATCAATGCAC GAAATTTCAAGCCATCGATGTTGATGCAAACGGTAAATCTACGTGGAAAAAACAACGACGCGCTACTTCCAGCAG ACTTGGTCGGGAACGTTATTTTTCCTTTCGTCGTATCAGCAGCGAATGAGGACGAGCTGAATTTGCAAAGACTCGT TAGCGAGCTTAGAGAAGGAAAAGAGAAGATACACAATACGCTTGAATGTATTAAATCGGAAGAGTTACTCTGTTC AAAGGTATGTGAAATAGCTAGAGAGATAAACAAGCGAACCACGAGGGATGATTTTTCTATGTACAGATTTACTAG TGTAAGGAAATTCCCTTTGAACGATATCGATTTCGGATGGGGAAGGCCAAGAAGAGCGGATTTTGCTACTGCGCC GATCAATCTGATTTTCTTGATGGATGATCAAAGTGGGGATGGAGTTGAAGTGCTATTAAACTTGGGAGAAGAAGA GATGTCTACATTTCAAAGTAATGATGAGCTTCTTCAATTTGCTTCTCCGAGCTGA

>SsASAT3|c36223_g2 bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

ATGGGTGGTGTATCAAACCTTGTGTCATCAAGTTGCAAAAAAATAATCAAACCTTACTTTCCTACTCCAACTTCACT TCGTTGTTGCAAGCTCTCTTACCTCGATCAAATAGTCGGTCGCATTTATGCGCCTATGGCCTTTTTTTACCCTAAGCG TTCCGATATTATTAAACCAAGTGTTAATATATCCCAACATCTGCAAAATTCCCTTTCTAAAGTTTTGACACATTATTA TCCATATGCGGGAAAGTTGAATGACAATGTCTCTGTTGATTGTAATGACAGTGGAGTAGAGTTCTTCACTACCAAA ATTAATTGTCCAATGTCTGAAATCGTCAACAACCCTTATGCTGATAAACAAGATTTAGTCTTTCCCAAAGGAATTGC TTGTACACCTTCATACGAGGGTAGCCTTGCAGTTATTCAACTTAGTCATTTTGATTGCGGAGGATTTGCAATCAGTG CATGTGTGACACACAAAATTGGAGATGCATATACAATAGGCAATTTCATTAACGATTGGGCCACTATAACACGTGA CCCAAGTTCAAAAATATTACCCGGTCCTCAATTTAACGGAGCCTCTTTTTTCCCACCTTCAAACGACCTACCAAATGA ATCCAATTTTATTGTGCCAAAAAGTAAAGAGTGTTTATCCAGAAGTTTTGATTTCTCCCAATCTAAATTAGCTTCTTT AAAATCCAGGGTCATAAATAGTTTACCAGAGGTAAAAAATCTAACTGACACTGAGATTGTGTCAGCATTTATTTAC CAACGAGCCATGGCTACAAAAAATGCTGTTGTAGGCTCAATTCAACCATCGTTACTCATCCAAGCTGCAAACTTGA GGCCACCAGTGCCAAAAAGCACTATGGGAAACATTGGTTCCTTCTTTTCTGTACAAACAACAGAAGAGAATGAAAT TGATCTACCAAGTCTTGTTGGTAAACTACGAAAGGCGAAAGAAGAATATCGACAAAAATACAAACATGCTAAAAC AAATGAACTTTTACCTATAACAATAGGACTATATAGAGATGCAGATGAATTTTTGGTAAACAATTGTTATGACATTT ATAGGTTTAGCAGTATTAAAAAATTTCCCTTGTATGATGTAGACTTTGGATGGGGAAAGCCTGAGAGGGTTAGTCT ACCAAGTAATTGTCCAGTTAGAAACTTCTTTGCCTTGGTTGATAATAAAGCTAGGGATGGAATGGAAGTAATAATG TACATGAAGGAACAAGATATTTTGGCATCTCGAGCAAGGATGAAGAGCTCCTAG

>SsASAT5|c48744_g1 ATGGAACTAATCTCCCAAGAATTCATCAAACCTGTTTCTCCAACCCCCGATCACCTCAAATTCTATGAGTATTCCATT TTGGATCAGCTCATTGGTCCATCCCTTTACACCCCTGTACTTCTATACTATCCATCTCCACCAGATAATAATAATAAC AAAAAACCGGAGGCTATCAATACATCAAGATTAAAACACTTGAAGAAATCCTTGTCACAAATTCTTGCTCACTATTA CCCTTTGGCAGGAAGACTTCGAAACGACAACACTGGCGTCGATTGCAACGATGAAGGCGTTCCATTTCTCGAGGC ATTTGTGCACAATCATCGCTTGCAAGACATTCTCGATGGAAAAAGAGTAGTTACAGAATCTTTGGTTCCACTCACCA ATTACGAGTCAGTATTTCCATCAAACACGTTGCTTCTTGTTCAAGTTACCTTGTTTGAATGTGGAGGAATGGCTGTT GGTATTTCAGCTACACATAAAATATTAGATGCTCGTTCTTTGATCACCTTCTTAACTGATTGGGCAGCATTGACGCG CCAAGACGATCCCAATTTTACACTCACGCAACTTGTCCCCCTTTCCAAAATTATACCACCAGCTAATGGATTGCCACC CTCAATTCCGGTTGAAGGCTTTATCTCAACGGAGCCATGTGTTAGGAATATTTTTGTATTCAGACCTTGTAGCATTG CTGATCTGAAGATTAAGGCAGCCAGCGAGAACATACTGAGGCCTTCGCGCGTTGAAGTAGTGACAAGTACTATCT GGATGTGCCTAATGGAAAACTCCAAAAGGCCGTGTTTGATAACACACATGGTAAACTTGAGGAAGCGAGTTGACC CTCCAATGCATGATCACCATCTTGGAAACTTCATTGGGATGGTTGTAGCACATAATGATGAAAATCATATAGCTAA TATGGTGGCTTCATTACGAAAAGGGATTTCCGAGTTTGAAAAGAAGTGTCTGAAGGGGGAACGGGAGGCCTTGG GGGCTAGCATAGTAAACCACGCAACAGAATCAATAAATTACTTGGTTAGCAGAAAGGATGCAGATTTGTACAAAT TCAATAGCTGGTGCGGTTTTCCATTTTATGATGTGGATTTTGGGTGGGGAAAGCCAGTTTTGTCTAGCACAGCTGA AGGGAAATCCAAGAATTCTATAAAATTGATAGATACAAAGGATGGTGGAGTGGAAGCTCTTGTTTGTCTGAGTGA AGAAGATATGAAAGTGTTTGAGAAGAATACAGAGCTGCTAACTTATGCTACTCTGAAACCCAATTCGCATGTGTAG

>SsPDS|c55490_g2 ATGCCCCAAATTGGACTTGTTTCTGCTGTTACCTTGAGGGTTCAAGGTAATTCACTCTATCTTTGGAGTTCGAGGTC TGCTACTGAAAGTCACGTTGGTCGCGCACAAAGGAATTCGTTATGTTTTGGTGGTAGCGACTCCATGGGCCATAAA TTAAAAATTCATACTCCCCATGCCACGGCTAGAAGATTGGCAAATGGCTTCCATCCTTTAAAGGTAGTTTGCATTGA TTATCCAAGACCAGAGCTAGACAATACAGTTAACTATTTGGAGGCTGCATTGTTATCATCATCATTTCGTACATCTC CTCGCCCAACCAAACCATTGGAGGTTGTTATTGCTGGTGCAGGTTTGGGTGGTTTGTCTACAGCAAAATATTTGGC AGATGCTGGTCACAAGCCGATATTGCTCGAGGCAAGAGATGTCCTAGGTGGAAAGGTAGCTGCATGGAAAGATG ACGATGGAGATTGGTATGAGACTGGGTTGCACATATTCTGTAAGTTTAACTCCTCAATAATACTGTCATTGCAATTT CTTTTGAGATCTTTTTTGTCCATTAGACAGATAGTTATCCCTGTTTGTCTTTTGTCTTTGCAAATACCAATTATTGTCA GTCAATATGTATTATACATTGCTTCTCATTTTCATCTGTTAATTTCCTGTCGTGACTCATACAAGTTGGTACTTCATCT CTTTTAAGTTGGGGCTTACCCAAATATTCAGAACCTGTTTGGAGAATTAGGGATTAATGATCGATTACAGTGGAAG bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

GAACATTCAATGATATTTGCAATGCCTAATAAGCCAGGGGAATTCAGCCGCTTTGATTTTCCCGAAGCTTTGCCTGC ACCATTAAATGGAATTTTGGCCATCCTAAAGAACAATGAAATGCTAACATGGCCAGAGAAAGTCAAATTTGCAATT GGACTCTTGCCGGCAATGCTTGGAGGGCAATCTTATGTGGAAGCTCAAGACGGGTTAAGTGTTAAGGATTGGATG AGAAAGCAAGGTGTGCCTGATAGGGTGACAGATGAGGTGTTCATTGCCATGTCAAAGGCACTTAACTTCATAAAT CCTGACGAGCTATCGATGCAGTGCATCTTGATCGCTTTGAACAGATTCCTTCAGGAGAAACATGGTTCAAAAATGG CCTTCTTAGATGGTAATCCTCCTGAGAGACTTTGCAAGCCAATTGTTGAACATATCGAGTCAAAAGGTGGCCAAGT CAGACTAAACTCACGAATAAAAAAGATTGAGCTAAATGAGGATGGAAGTGTGAAGTGTTTTATACTGAACAATGG TAGTACAATTGAAGGAGATGCATTTGTGTTTGCAACTCCAGTGGATATCTTCAAGCTTCTATTGCCTGAAGAGTGG AAAGGGATCCCATATTTCCAAAAGTTGGAGAAGTTAGTCGGAGTTCCTGTGATTAATGTCCATATATGGTTTGACA GAAAATTGAAGAACACGTCTGATAATCTGCTCTTCAGCAGAAGCCCACTACTCAGTGTGTACGCTGACATGTCTGT CACATGTAAGGAATACTACAATCCCAATCAGTCGATGTTGGAATTGGTTTTTGCACCAGCAGAAGAATGGATATCT CGCAGTGACTCAGAAATTATTGATGCTACGATGAAGGAACTAGCAAAACTTTTTCCTGATGAAATTTCAGCAGATC AGAGCAAAGCAAAAATATTGAAGTATCATGTTGTCAAAACTCCAAGGTCTGTTTATAAAACAGTTCCAGGTTGTGA ACCCTGTCGGCCGTTGCAAAGATCCCCTATAGAGGGTTTTTACTTAGCCGGTGACTACACGAAACAGAAATACTTG GCTTCAATGGAAGGTGCTGTCTTATCAGGAAAGCTTTGTACGCAAGCTATTGTACAGGATTACGAGTTACTTGATG CCCGGGGCCAGAGGAAGCTGGCAGAAGCAAGCCTAGTTTAG – 3’UTR – CGGAGTGGAGCTACAATTAGTGTTTGTACACAGCATATATACAAGAGAACCAAATACACAGTGTTACATAATTGAA GGGGCAAGCTCCTACCACAAACGTCAAAAAAAGATGCTTCAAGCTTAACCTTCTTTAATCAGAAATTGAAATGTAA GCACATACCATGTTATTTAATCAGAAATTGAAATGTAAGC

>SS_c52698_g1 ATGGTGTCTTCAAAACCCGAAGCTGGTCTAATCTACAACATCAAATTATCCTCAGTTGGACCAGCAAGGGTAACAG GACAAGATGTGGTTTATGAGCCAAGCAACATGGATTTAGCCATGAAACTACATTATTTAAGAGGGATTTATTATTT TGAAAACCAAGCATTTCAAGATTTCAACATATATAAAATTAAAGAGCCAATGTTTTCTTGGTTGAATCATTTTTATAT GACATGTGGCAGGCTTAGAAGGGCAGAATCAGGGGGGCCTTATATAAAATGCAATGATTGTGGAGTTAGGTTTAT TGAAGCACAATGTGATAAAACTTTGGATGAATGGCTTGAAATGAAAAATTATAATAATTCTCTTGAGAAGTTACTT GTTTCTAATCAAGTTCTTGGTCCTGAATTGGCTTTCTCCCCTCTTGTCCTCATACAGCATACTAAATTCAAATGTGGT GGAATTTCATTGGGCTTAAGTTGGGCTCATGTACTTGGAGACATATTCTCAGCAGCTGAATTTATGAACCTACTGG GAGAAGTGGTTAGTGGTTACAAACCAACCCGGCCCATTAACTTGGCCCATTCATTGACCAAAGCAAATTCAACCCA AATCCTACAAAAGATTGTGGAGGATCCAATTTCCATAAGACGGGTCGACCCGGTTGAAGATCATTGGATTGTTAAA AATAGTTGCAAGATGGAGCTATTTTCATTCCATGTCACTGCCTCCAAATTGGGCCAGTTGCAATCAAGAGTGGGCC ATCAAGGCCCATTTGAATCGCTATGTGCAGTTATCTGGCAGTCCATTTCAAGAATTAGAGATGGGCCTGAGCCACA AGTTGTGACCATTTGTAAAAAAGGTGAGGGAAAAAAAGAGGGCCTTGTGGGAAACACTCAAATCATTGGTGCACT AAAGTTGGAATATTCAATTAGAGAAGCCAATCCTAGTGAACTAGCAAGGTTAGTTAAAAATGAGATCATCGACGA GCGATTAAAAATAGACGAAGCTGTGGAAAAAGAACATGGAGTATCGGATCTTGTCGTTTATGGAGCAAATTTGAC TTTTGTGAACTTAGAAAGTGTTGATTTCTATGGACTTGATTGGAAGGGACACAAGCCAATGAATGTAAGTTACATA ATCGACGGAGTTGGTGATGCTGGGACCGCTATGGTGTTTCGGGGGCCCAATGATATTACCAAGGAGGGTGATGA AGGAAGAACTGTGACAATGATTTTGCCCGAGGATGAGATAATGGCATTGAAAGTTGAGCTAAACAAAGAATGGTC TATTGCTTGTATTTAA

>SS_c49704_g1 ATGTCAAATTGCAATGGAAATGGAGTTTCACATTTTGGCTCCAATAACTTGCAAATTGAAGCAATCCAAACAGTGA TACCAATAAAGCCAACTAAGCCAAGGTTGTCCCGGCGAATCGCCGTGGCTGATCAAAATGGAAATTTTCTCCAGAG GCGTTTTCACGCGGTCCTTTGCTATAACAAGGCCTCAGAGGAGGATTCAGGGTGGATCGTTGCTGGTCGGATCAA AGAGTCACTTGGAAGGGCACTTGTTGAAAATCCATTACTTGCTGGTAGACTTAAAAAAGGAGAAAATGGGGATTG TGGAGAGTTTGAAATTGTATCGAATGATTCTGGTGTTAGATTGGTTGAGGCTATAATGCCAATGAAGTTGGCAGAT TTTCTTGATCTCAAGGATAAGAAAATTGCAGAAGCTGAGCTTGTCTTTTGGGAGGATGTTCATGAACCAAATCCTC AGTTTTCTCCTCTTTTTTATGTCCAGGTGACAAATTTCAAGTGTGGAAGATACTCAATTGGGATAAGTTGCAGCCTT bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

TTTCTAGCAGATCCTTATGCCATGACTAGTTTCTTGAATAAATGGTTCAAAATTCACAATAATATGGTATCAGAATC AGACACACCCAAAATTCCAACATTTTTCCTTCCAAGTTTTAGGAAAATCGGTTGTTCACCAACTTTATCAATGAGCTC CACTACATGCCACCAAGTTAACGAAACGCTAATCTTTAAAATACCCGCGAAGATCTTGAATTTAAAAGATGACATG CACTTGAATCTTGCAGCAAAATGTGTTGGGGAAGCAGAAGACAAACTTGGTAAGAAGTCGTCGTCAAAATTCTGT TTGTTTGTTAAAGATACCTCGGAGGATGTTAAAGTGGAAACTTATTCTCGAGAAGGGATTTTGCCAGAAATATTTG GTTCTATCAATAGTGGATTAGTTTCAACCAGCTTCGATGATTTGGAAGCTGATAACATAAGGTTTAATGAAGAAAA TAAGGCTGTCTTCTTTTCATGTTGGATTAACTCAGGAAATGATGATGATCTTGTGCTGATTACTCCATCTGTTGGTG AAGGTGACTCTGAAATGAAGGTTATAGTAACTGTTAGATATTAA

>SS_c49383_g1 ATGGGATCATTGGTTGCCCTCTTTTGTTAGTTCAGGTGACCCGTTTTAGATGTGGGGGATTTAGTGTTGGGTTCAG ACTTAATCACACAATGATGGATGTATATGGTATGAAATTGTTTTTAAACGCGTTAACCGAATTAATGGGAGGAGCT GTTACACCTTCTATATTACCTGTATGGAAAAGGGATCTCCTAAGTGCTAGATCATCACCACGCATTACATGCACACA CCATGAGTTTGATGAGTACTCCTCAAGGTATAATAATACAATTGCATGGTTAGACAAGAAGTTGGTCCAACAGTCT TTCTTTTTTGGAGACGAAGAGGTGGAAGCCATTCGAAATCAGTTACTAGACGATTGTCCAAATAGTACTAGTACAA AATTCGAGTTAGTAGCTGCATTTTTATGGAAGTATCGTACAATTGCTCTTGCTCCGCATCCTGAAGAGATTGTTCAT CTCACTTACCTTATCAATGCACGCGGAAAATCATCGCTAAACCAATTCCAACTACCACGCGGGTACTATGGGAATG CGTTCGTTTCTCCAGCAGCAACATCAGAAGCAGGTTTGCTATGTTCAAGTCCATTAGCATATGCACTTGAATTGGTT AAGAAAGTTAAAGATCACGCAAACGAAGAATACATTAGATCAGTGGCTGATTTGATGGTGATTAAAGGGCGACCT GAGTTGAAGCAATCTTGGAATTTTATTATCTCAGATAATAGATTTGTTGGATTTGATGAAGTTGATTTTGGATGGG GAAAGCCCATGCTCGGAGGGGTTCCAAAAGCTTTATCTTTTATCAGCCATTGTGTACCTCTTTCAGATAACAAGGG GGGAAAAGGTATTCTCATAGCGATAAGTTTGCCTCCACTGGCCATGGAAAAATTTCAAGCGATTGTCTACAAGGTG ACTTCCAAAAAATTGTCCAAGGCATCCCCATAA

>SS_c57127_g2 ATGGCTGCCAAAGTGGAGATGATATCCAAAGAAATGATCAAACCATCAACCCCAACACCTCCTTCCCTTAGAACCC ACAAATTATCACTTCTCGATCAAATTGCACCCCCGGTTTTTCTCCCTCTAATTTTCTTCTACCAATATGAAGTACTTGA CAATAACGACCGCACCAGAAAGTTACAGTCCTTGAAAAATTCTCTATCCGATGCTTTAACCCGATTCTACCCATTAG CTGGAACACTCAATATTAATAATAGTACCGTTGATTGCAATGATACTGGAGCAGAGTTTATTGAAGCTCAAGTTCA TGGTTACACCCTCTCACAAGTTATGGAAAATCCAAATATTGAGGAATTAGCACAATTCCTCCCAATTCATGAAGCTT GTGGCATAGGGAATCATGATGTCCTTTTGGCAATCAAAGTTAATTTGTTTGATTGTGAAGGGATCGCAATTGGTGT GTGCATGTCACACAAAGTTGGCGATGGCGTGTCCCTTGTGACATTCATCAATTCATGGGCAGCCATTGCCCGAGGT GACACTGAAATTGTGCAGCCTAATTTCAACTTGGCAAGTCTTTTCCCATCAGTAGACTCGTTTAATTCAATTCACAAT TCATCCATAGGAATAACTAACGAAAAACTTCTGATAAAAAGATTTGTTTTCGATAAGGAAAAGATTGATGCTCTCA AGAAATCGACTTCTATGGCCTCAGGATCAGGAGTGGAGGATCCCACTCGTGTGGAAGCCCTCTCGGCCTTCCTATG GAAACATTTCATAGAGGCTTCGAAGTTGAAAATAGACTCCAAAAAAACGTTTGCTACAATTCATATAGTCAATATG AGGCCAAGAATGAACCCAACCTTACCAGACCACTTTTTTGGGAACCTTTGGACAGTTGCATTAGCATTAAACACAA TTAACTCAGAAACAAATAAATTAATGAAAACTACTAGTGATGATGATTTGGTGTACCAGTTGAGAAGTGCAATAAG GAGAATCAATGGTGACTATATAAATATGGTACAAAATAAAGAAGAGTTTCTGAAACATATAGGTAAGTTAGTAGA GTTATTTTCAAAGGGAGAAGTTGAGTTTTCATGTTATTTTACGAGCTGGAGTAAGTTTCCAGTGTATGAAGTGGAT TTTGGATGGGGAAAACCAAGCAGGGTATGCACTACCACTTTGCCTTACAAGAATATGATCTTCTTCTTGCCTACAA AATGTGGAGAGGGAATAGAGGCATATGTTAACATGCCCAATGAAGATGTCTTTTCCCAAGCTTTGACATAA

>Ss_c57127_g3 ATGGCTGCCAAAGTGGAAATGATATCCAAAGAAATGATCAAACCATCATTGCCAACACCTCCTTCCCTTAGAACCC ACAAATTATCACTTCCTGATCAAATTGCACCCCCAGTTTTCATCCCTTTAATTTTCTTCTACCAATATGAAACAGTTGA CAATATCGACCGTGCCAGAAAGTTACACTCCTTGAAAAATTCTCTATCTGATGCTTTAACTCGATTCTACCCATTGG CTGGAACACTTAATATTAATAGCAGTACCATTAATTGCAATGATATTGGAGTAGAGTTTATGGAAGCTCAAGTTCA bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

TGATTACAGCCTCTCACAAGTTATGGAAAATCCAAAAATTGAGGAATTAACACAATTTCTCCCAATTGAAGCTTGTG GCATGGAGAACCATAATGTCCTTTTAGCAATCAAAGTTAATTTTTTTGATTGTGGAGGGATTGCAATTGGGGTGTG CATGTCTCATAAAGTTGGTGATGGCTTGTCCATAGTAACATTCATCAATACATGGGCAGCCATAGCCCGGGGTGAC ACCGAAAAAATTGTGCAGCCAAATTTCAACTTGGCAAGTCTTTTCCCACCAGTAGACTTGTCTAGTTCAAGTTACAA TTCATCTATAGGGATAACTAACAAAAAACTTATGATAAGAAGATCTGTTTTCGATAAGAAACAGATTGATGCTCTC AAGAAATCGGCTTCTATGGCCTCAGGATCACGAGTGGCGGATCCCACTCGTGTGGAAGCCGTCTCGACCTTCCTAT GGAAACATCTCATAAAGACTTCAAAGTCGAAAATAAACTCCAACAAAATGTTTGCTGCAATTCATATAGTTAATAT GAGGCCAAGAATGAGCCCAACCTTACCTGACCACTTTTTTGGGAACCTTTGGACAACAGCACTAGCATTAATCACA AATAATTCAGAAACTAGTGATGATGAGTTGGTGTACGAGCTGATAGGTGCAATAAGAAAAATCAACAGTGAATAT ATAAATATGTTACAAAATGGAGAAGAATTGTTGAAGCATGCGGGGAAGTTAGTTGAGTTATTATCAAAGGGAGAA GTTGAGTTTTGTTGTTTTACGAGCTGGTGTAGGTTTCCAGTGTATGAAGCGGACTTCGGATGGGGAAAACCAATAA GGGTATGCACTACCACTTTGCCTTACAAGAATTGTATCTTCCTCTTGCCTACAAAATGTGGAGAGGGAATAGAGGC ATATGTTAACATGCTAAACCATGATGTCCTTTACGAGCTCTGA

>HnASAT1|HN_c58659_g1_i1 ATGGCTGCCTCAGCTTTACTTTCTTTATCCAAAAAAATCATAAAACCATTTTCTCCAACACCATTTTCTGAAAGAATT TACAAGCTTTCTTTCATTGATCAATTCAATACTACACAATATAACCCCCTCGCCTTCTTCTATCCCAAAAACAAGGGA GTACCCTCAATTGATCCAAATGATATGTGTAAGGTTATAGAGAATTCACTTTCAAAAGCCTTAGCTGCTTATTACCC TTTTGCTGGAACATTGCGAGACAATATTTATATCGAGTGTAACGATATAGGTGCTGATTTTTACAAGGCTCGATTCG ATTGTCCCATGTCTGAAATTCTCAAAAGTCAGGATAGAAATGTCAAAGAAATAGTGTATCCCAAAGATGTGCCATG GAACATTGTTACACCTAGCAGAAAGTTGGTCGTGGTTCAGTTTAACCAATTTGATTGTGGAGGAATTGCTCTAAGT GCATGCGTTTCACACAAGATTGAAGATATGCATACATTTTATAAGTTTATGCATGATTGGGCTGCAATATCGCTCTG TTCTAACGTAAATATATGTCCTCAATTTATTGGATCGTCAGTTTTCCCACCTACAAATGAAGCTGTGAATGAACCAC CTCGCGAACAATGTGTAACAAAGAGATTACTTTTTTCAAACCATGCATTGAAATCTCTCCTCCCAGGATCATCAGAA GTGAAAAATCCAACTCGGGTTGAAATACTGACAGCACTTCTTTATAAATGTAGTATGAGGGCGAATTCTAGTGGAT TGTTCAAGCCATCAATGTTGTTTCAAACTATCAATTTAAGACCTATTATACCTTTGCCAGATAATACTCCTGGAAATT TCAGTTCTTCCCTTTTTGTACCAACATATACTGAAGAAGAAATGAAGTTATCAAGATTGGTTAGTGAGCTAAGGAA AGGAAAAGAAAAATATTTTGATGATTACAGAAAGTGTAAAGAAGGTCATCAAGATATGGTTTCTACAACAACGAG ACCGTATCAAGAAATAAGGGCTCTGTTCAAGGACAATGATTTTGATCTTTATAGGTGTAGTAGTTTGGTGAATTAT GGGTGGCATGGTTTAGACTTTGGATGGGGTATGCCTAATAGAGTAAGTATGGCAGACGTGAAACTGAGAAATATT TTCATGCTGTTCGATAACAACACAGAGGATCATGTAGAAGCACAGGTATCTTTTGACAAAGAAAGTAAAATGTCG GCGTTCTTACGAGAGATTGAGCAAGTTCTTG

>HnASAT2|HN_c61400_g1_i3 ATGGCTGCTCCAAGACTTGTTTTACTTTCCAAAAAGATCATTAAGCCTTCTTCTCCTACACCACTTTCACATAGAATT CAAAAGCTCTCTCTTATGGATCAAATGGGGACTCACTCCTACAGTCCATTTTCTTTCTTCTACCCCAAACAGGACACT GCAAGCTCGCCAGAACCAACAAAGGTATTCGAAATTCTTGAAAAATCCCTTTCCAAAGTCCTAACGGCTTATTATCC GTTTGCTGGACGCATAAGAGATAACTCCTTGATTGAATGTAATGACATGGGTGTCGAACTCTCTCACGTTCGAATT GATTGTCCAATGTCCACTATCTTCAATCACCCTCATACTGATATCGATAATTTGATCTTTCCAGACGATCCTTGGTTC CCATCCACGGAAAGTTTAGTCGTAGCTCAACTTAGTCATTTCAAATGTGGTGGCGTAGTGCTCGGTGCGTCCTTCTC CCACAAGGTCAGTGATGGGCTGAGTACGATTAAATTCCTACGGGACTATGCAATGGTTGCGCGTAATTCAGAAGC AAAACCTTCTCCCCTGTTTACTGGTGCGTCAATTTTTCAACCAATCAAATTTTCTTCATCGTCTCCCGTTATTGTTGAT CCTCCTCGAAAACTAAATGCATCGAAAAGGTACCATTTTTCAACTTCCAAGCTAAAAGCTCTAAAGGCCTTTGTTTC AGCTGATTCCAAAAGCCAAATTCTTCCAACAACTGTGGAAGCTGTAACTGCATTCCTTTGCAAATGCGTTAACACGC CAACTTTTAAGCCATCATTATTGGTGCAGGCAGTTAACCTACGTGGAACAAATAATGATGCACTCGTTCCAGCAGA CTTGTTCGGGAACGCCGTACTTCCTTTCGCTGTATCAGCAGCGAATAAGGAAGAGATAAATTTGCAAAGACTAGTT GGTGAGCTTAGAAAAGGAAAAGAGAAGATCCAAGATACGCTCAAATATGTTAAATCAGAAGAGTTGCTATGTTCA ACGGTGTCTGAAATAGCTAGAGAGATGAACGAGCAGACCTCAAGCAACGATATGCCTATGTACAGATTTACTAGT bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

TTAAGGAAATTCCCATTACATGACATAAATTTTGGATGGGGGAGGCCAAGAAGAGTGGATATGGCTACTTATCCG GTGAATATGTTTGTCTTGATGGATAACCTAAGTGGGGACGGAGTTGAAGTGCTCGTAAACTTGGAAGAAGGAGA GATGTCTGCATTTGAAAGTAACAACGAGCTTCATCAGTTTGCTTCTCCATTCTCGGGACTCTAG bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

First line describes the MS method and the ionization Figure 2-Figure Supplement 2 method used. ES- : Negative ionization mode ES+: Positive ionization mode Second line describes the m/z ratio (569.28) requested for the Extracted Ion Chromatogram with the m/z mass window (0.1 Da) 1: TOF MS ES- 100 569.28 0.1000Da nC12 1.49e6 % 0 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- Third line describes the The Y-axis is scaled total peak area of the 100 nC10 541.25 0.1000Da by the highest peak 6.36e5 requested ion in the intensity in the % Extracted Ion selected retention 0 Chromatogram. time window. 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- 100 nC6 485.21 0.1000Da 2.71e6 % 0 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- 100 485.21 0.1000Da aiC6 1.55e6 % 0 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- 100 iC6 485.21 0.1000Da

Relative abundance % 1.56e6 % 0 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- 100 aiC5 471.21 0.1000Da 7.03e4 % 0 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- 100 471.21 0.1000Da iC5 1.43e6 % 0 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 1: TOF MS ES- 100 457.2 0.1000Da iC4 862 2.50 % 458.04 0 Time 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20 2.30 2.40 2.50 Retention time on the column

Figure 2-Figure Supplement 2: SsASAT1 reactions with different acyl CoA substrates. Shown are extracted ion chromatograms of SsASAT1 reactions using sucrose as the substrate and various acyl CoAs as donors. No product was found with acetyl CoA as donor. LC/MS analyses were conducted using a C18 column in the negative ionization mode as described in the Methods. Descriptions of different parts of the chromatogram that aid in understanding the chromatogram are highlighted in red. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 2-Figure Supplement 3

341.11 A. 100 2: TOF MS ES- SsASAT1+ 3.94e5 1.34 Sucrose+

% aiC6 -aiC6 S1:6(6) 1.31 Sucrose S1:6(6).HCOO- 323.10 342.11 179.06 439.18 343.11 485.19 1.31 0 100 100 150 200 250 300 350 400 450 1: TOF MS ES- m/z C18 COLUMN 471.17 0.0400Da Sucrose+aiC5 14min method ASAT1 2.31e5 + % SlASAT1 PaASAT1 0 SsASAT1 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 3.00 Relative abundance % 4.01 B. 4.13

4.12 100 1: TOF MS ES- 471.17 0.0400Da 5.61e3 BEH-amide COLUMN Sucrose+aiC5 9min method ASAT1 4.02 + SlASAT1 PaASAT1 0 Relative abundance % SsASAT1 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00

542.23 100 S1:6(6)+ 1: TOF MS ES+ C. SsASAT2+ 345.16 2.98e5 1.70 aiC5 -Furanose % 3 242.26 -NH S2:11(5,6).NH4+ 1.59 186.21 1: TOF MS ES- 197.10 363.17 C18 COLUMN 1.59 Pyranosyl (5,6)+ 555.2 0.1000Da 100 0 S1:5+aiC5 7min method 50 100 150 200 250 300 350 400 450 500 550 3.16e4 m/z ASAT2 + SlASAT2 PaASAT2 0 SsASAT2 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 Relative abundance %

D. 2.74 1: TOF MS ES- 2.19 555.2 0.1000Da BEH amide COLUMN 2.19 100 S1:5+aiC5 9min method ASAT2 + SlASAT2 PaASAT2 0 SsASAT2 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 Relative abundance %

Figure 2-Figure Supplement 3: Comparative analyses of LC/MS retention times of enzyme reaction products. (A,B) Comparison between S1:5(5) produced by SsASAT1, PaASAT1 and SlASAT1 on the C18 (A) and the BEH amide (B) column. The dashed line connect the peaks aligned based on their retention times, and shows that the SsASAT1 product peaks align precisely with the PaASAT1 product peaks with two different chromatographic methods (A,B), suggesting structural identity. (C,D) Comparisons between SsASAT2, PaASAT2 and SlASAT2 enzymes similar to those shown above, suggest that SsASAT2 and PaASAT2 acylate the same positions on the sucrose molecule. Inset in (A) shows negative ion mode fragmen- tation of S1:6(6) product of the SsASAT1 reaction using aiC6 CoA as donor. Inset in (C) shows positive ion mode fragmenta- tion of S2:11(5,6) product of SsASAT2 using S1:6(6) produced by SsASAT1 as substrate. Inset in (C) shows that both aiC5 and aiC6 are added on the same (pyranose) ring. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 2-Figure Supplement 4

1: TOF MS ES- 100 457.25+471.25+485.25 0.2000Da 8.04e3

SsASAT2+sucrose+ (iC4+iC5+aiC5+iC6+aiC6)

6.06;325.1835

Non-specific peaks at wrong 6.09;311.2951

% retention times

5.81 2.29 477.3412 458.0395 6.97 1.48 5.74;116.9286 325.1857 424.5787 3.10 437.1518 5.21;116.9288 6.21;403.3047

0 Time 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50

Figure 2-Figure Supplement 4: SsASAT2 does not acylate sucrose. Extracted ion chromatogram of masses corresponding to S1:4(4), S1:5(5) and S1:6(6). No peaks corresponding to these products are seen. Additional reactions with individual CoAs -- aiC6 CoA and nC12 CoA -- are shown in Figure 7-Figure Supple- ment 1. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 2-Figure Supplement 5

A. 1: TOF MS ES-

100 100 527.3209 751.37 0.1000Da 4: TOF MSMS 724.46 ES+ S3:17(5,5,6,6) - 5.15e3 2.31e3 Unsubstituted furanose ring O 3.77 S3:17(5,6,6 - {SsASAT3})+

O O Petunia ASAT4+aiC5 O % O +

O O O O O O % O OH HO O O O OH O Relative intensity O OH 0 m/z O O 50 250 450 650 850 O

0 1.00 2.00 3.00 4.00 5.00 6.00 B. 1: T OF MS ES- 100 751.37 0.1000Da 963

S3:17(5,6,6 - {SsASAT3})+ no Enzyme+aiC5 [Negative control] % Relative intensity

0 1.00 2.00 3.00 4.00 5.00 6.00

C. 3.18 1: TOF MS ES- 100 723.35 0.1000 Da 4.82e5

S3:15-P(5,5,5 - {BIL6180})+

O Petunia ASAT4+aiC5 [Positive control] O O OH HO O O % O OH O

O OH O O O Relative intensity

0 Time 1.00 2.00 3.00 4.00 5.00 6.00

Figure 2-Figure Supplement 5: SsASAT3 acylates at the R3 position. An indirect assess- ment of SsASAT3 positional specificity using Petunia ASAT4, which acylates at the pyranose R6 position (A) The chromatogram shows that Petunia ASAT4, which is known to acylate at the R6 position, can acylate the tri-acylated sugar S3:17(5,6,6) produced by SsASAT3 using the SsASAT2 product S2:10(5,5) . The inset shows positive mode data suggesting all four chains are on the same ring. Given R6 is the only free hydroxyl group available on the pyranose ring in S3:17(5,6,6), and since SsASAT1 and SsASAT2 acylate R2 and R4 positions, the only position SsASAT3 can acylate without inhibiting PaASAT4 is R3. (B) Negative control with no enzyme (C) Positive control with S3:15-P(5,5,5) purified from BIL6180, with R2, R3 and R4 positions substituted by C5 chains based on NMR data (unpublished data). bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 2-Figure Supplement 6

A. B. 1: TOF MS ES- 2.49;681.2982 100 681.297 0.0300Da SsASAT5+ 9.45e5 S3:15(5,5,5)*+ C2

4.99 681.2963 % 100 2: TOF MS ES+ 7.66e3 S4:17 (2,5,5,5) Non-specific peak 313.1646 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50 % 1: TOF MS ES- 2.49;681.2979 100 681.297 0.0300Da 8.72e5 437.1919 F1:2(2) 2.21 Plant extract 205.0702 331.1297 P3:15(5,5,5) 681.2976

415.2271 S4:17(2,5,5,5) 455.2220 409.1768 617.2736 654.3260 0 m/z 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 620 640 %

0 Time 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50

O O OH O O O O O O OH O O H O O C. O D. O 1: TOF MS ES- 10.26 100 793.35 0.0300Da 793.35 3.46e4

1: TOF MS ES- 11.41 100 821.39 0.1000Da SsASAT5+ 821.39 385 S5:21(2,2,5,6,6)+ SsASAT5+ C2 S5:23(2,5,5,5,6)+ O % OH C2 % O O HO O O O

OH S6:25 (2,2,5,5,5,6) O O O O O

S6:23 (2,2,2,5,6,6) O 0 O 10.60 10.80 11.00 11.20 11.40 11.60 11.80 1: TOF MS ES- 11.31 100 821.39 0.1000Da 10.42 821.39 2.73e6 116.93 0 S6:25(2,2,5,5,5,6) 9.50 10.00 10.50 11.00 11.50 12.00 O

OH O -NMR O O O O O 1: TOF MS ES- O 10.21 % OH 100 793.35 0.0300Da O 793.35 8.64e5 O O O O O 11.02 O 821.38 Plant extract EIC 0 10.60 10.80 11.00 11.20 11.40 11.60 11.80 1: TOF MS ES- 11.31 100 821.39 0.1000Da 821.38 2.91e6

% Plant extract EIC 11.43 It is possible that S6:25(2,2,5,5,5,6) is 821.38 only a small fraction of this peak, since CID fragmentation patterns (not shown) reveal low abundance % derivative fragment ions.

11.79 821.38

0 Time 9.50 10.00 10.50 11.00 11.50 12.00 0 Time 10.60 10.80 11.00 11.20 11.40 11.60 11.80 F. E. 2: TOF MS ES+ 184.07 833 2: TOF MS ES+ 100 289.13 100 1.06e3 O O O O O O O O O O + C + O C OH 758.57 O O

O % % F2:7(2,5) F3:6(2,2,2)

289.09 748.35: S6:23 (2,2,2,5,6,6) 776.38: S6:25 (2,2,5,5,5,6) 0 m/z 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 0 m/z 100 150 200 250 300 350 400 450 500 550 600 650 700 750

Figure 2-Figure Supplement 6: SsASAT5 putative secondary activities. (A) SsASAT5 can transfer C2 to S3:15 (5,5,5) isolated from Solanum pennellii-derived Backcross Inbred Line BIL6180 (unpublished data; Ofner et al, 2016), which has the three acyl chains on the R2, R3 and R4 pyranose (abbr: P) ring, same as Salpiglossis. (B) The C2 addition likely occurs on the furanose (abbr: F) ring, as seen by the presence of the m/z 205.07 in the positive-ion mode elevated collision energy mass spectrum. (C,D) SsASAT5 can transfer C2 to penta-acylated sucroses, and the product co-migrates with an acylsugar of the same molecular mass produced in the plant. Co-migrating peaks are aligned by the dashed line. Central panel in (D) shows that the S6 sugar produced by SsASAT5 is not the same as the most abundant acylsugar produced by the plant purified for NMR, which has two C2s on the furanose ring. Shown are extracted ion LC/MS chro- matograms for the expected masses. (E,F) The most abundant fragment ions (m/z: 289.09, 289.13) are consistent with C2 addition occurring on the furanose and pyranose rings, respectively. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Figure 3 aCC-BY 4.0 International license.

A. B. 1 .0

s 0 .8 ion v rol plants 0 .6

ctor con t 0 .4 v e Mean fold expres s

empt y 0 .2 SsPDS-1 SsPDS-2

0 .0 T1 A S A s

Internal SsASAT2 SsASAT3 SsASAT5 S C. standard Acylsugars 1: TOF MS ES- 100 BPI D. Uninoculated 4.38e6 (n=5) 2.06 % 179.07

0 400 1.00 2.00 3.00 4.00 5.00 6.00 7.00 1: TOF MS ES- 100 BPI Empty vector 4.41e6 2.05

% (n=8) 179.07 300

0 1.00 2.00 3.00 4.00 5.00 6.00 7.00 *

n=5 2.06 1: TOF MS ES- 179.07 100 BPI 200 * SsASAT1-1 1.67e6 ed acylsugar peak area (n=7) % Relative abundane (%) mali z r 0

1.00 2.00 3.00 4.00 5.00 6.00 7.00 100

1: TOF MS ES- 100 BPI otal no SsASAT1-2 4.08e6 T (n=7) 2.06 0

% 179.07

Time 0 1.00 2.00 3.00 4.00 5.00 6.00 7.00 TRV2−LIC SsASAT1-1 SsASAT1-2 SsASAT2-1 Internal Uninoculated standard Acylsugars

E. 100 1: TOF MS ES- F. Uninoculated 2.05 BPI (n=11) 179.07 3.65e6 % m/z 651-700 701-750 751-800 801-850 0 * 1.00 2.00 3.00 4.00 5.00 6.00 7.00 *

100 1: TOF MS ES- 15

Empty vector 2.06 BPI 20 (n=8) 179.07 3.65e6 100 % 100 80 15 10 *

0 80 1.00 2.00 3.00 4.00 5.00 6.00 7.00 * 60 * 10

100 60 1: TOF MS ES- SsASAT2-1 2.05 BPI 40

179.07 5 3.65e6 40 % (n=7) 5 * 20 Relative abundance % 0 20 1.00 2.00 3.00 4.00 5.00 6.00 7.00 0 0 0 0 100

SsASAT2-2 2.05 1: TOF MS ES- Total normalized acylsugar peak area 179.07 BPI

% (n=10) 3.65e6 TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC SsASAT2-1 SsASAT2-1 SsASAT2-1 SsASAT2-2 SsASAT2-2 SsASAT2-2 SsASAT2-1 SsASAT2-2 Uninoculated Uninoculated 0 Time Uninoculated Uninoculated 1.00 2.00 3.00 4.00 5.00 6.00 7.00

Figure 3: In vivo validation of SsASAT1 and SsASAT2 candidates. (A) Two representative plants with the phytoene desaturase gene silenced using VIGS shown with 18% reflectance gray card. SsPDS-1 and SsPDS-2 have two different regions of the SsPDS transcript targeted for silencing. (B) qPCR results of SsASAT VIGS lines. Relative fold change in ASAT transcript abundance in VIGS knockdown plants compared to empty vector plants. Error bars indicate standard error obtained using three technical replicates. Expression level of the phytoene desaturase (PDS) gene was used as the reference control. File Supplement 1 is the Python script used for picking the most optimal fragments and Source Data 1 includes values obtained from qPCR analysis. (C,D) SsASAT1 knockdown using two different constructs (SsASAT1-1, SsASAT1-2) shows reduction in acylsugar levels. The SsASAT1-1 phenotype is more prominent than the ASAT1-2 phenotype, being significantly lower (p=0.05; KS test). One construct for SsASAT2 (SsASAT2-1) also showed significant decrease in acylsugar levels (p=0.03; KS test). Note that the Y-axis total ion intensity in (C) is different for each chromatogram. (E,F) SsASAT2 knockdown leads to drops in levels of higher molecular weight acylsugars. In (C-F), number of plants used for statistical analysis is noted. Source Data 2 describes normalized peak areas from VIGS plants used for making these inferences. Results of the second set of biological replicate experiments performed at a different time under a different set of conditions -- as described in Table Supplement 1 -- is shown in Figure Supplement 1. Figure Supplement 2 is a more detailed analysis of the SsASAT2 knock- down phenotype showing the individual acylsugar levels under the experimental conditions. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 3 – Table Supplement 1: VIGS plants growth conditions

# Date started Date extracted # of weeks Genes tested Empty Conditions 1 post- vector inoculation control 1 June 8 Aug 13 10-11 wks SsASAT3 TRV2 4 wk old seedlings used for inoculation in peat pellets. After 1 month in growth chamber 1, these were transferred to 2 Sure Mix+1/2 sand in chamber 2. 2 Aug 24 Oct 21 8-9 wks SsASAT1 TRV2 3 wk old seedlings with 2 true leaves grown in growth chamber 1. After 1 month, these were transferred to 2 Sure Mix+1/2 sand in growth chamber 2. 3 Feb 21 Apr 15 7-8 wks SsASAT2 TRV2-LIC 2 wk old seedlings with 2 true leaves grown in SsASAT5 growth chamber 2 on RediEarth 4 Feb 21 Mar 15 3-4 wks SsASAT1, TRV2-LIC 3 wk old seedlings with 4 true leaves grown in SsASAT2 growth chamber 2 on RediEarth2 SsASAT3, SsASAT5 5 Apr 7 May 4 3-4 wks SsASAT1, TRV2-LIC 3 wk old seedlings with 4 true leaves grown in SsASAT2 growth chamber on RediEarth SsASAT5 6 Apr 15 May 9 3-4 wks SsASAT2, TRV2-LIC 3 wk old seedlings with 4 true leaves grown in SsASAT5 growth chamber 2 on RediEarth 7 July 9 Sept 3 7-8 wks SsASAT1, TRV2-LIC 4 wk old seedlings used for inoculation in peat SsASAT2 pellets. After 1 month in growth chamber 2, these were transferred to 2 Sure Mix+1/2 sand in the same chamber. 1 In all cases, growth chamber 1 conditions included 16 hours daylight, with day/night temperatures being ~22C. Growth chamber 2 had 16 hours daylight with 22/12 as day/night temperatures. Relative humidity in chamber 2 was constant at 50%. 2 These conditions seem to be the most optimal. Plants at this stage are more robust and can withstand Agro infiltration without significant negative effects. Final sampling can be completed relatively quickly, in just 3-4 weeks after infiltration. However, certain phenotypes (eg: SsASAT1 knockdown) are better visible under a different set of conditions. certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder Figure 3-FigureSupplement1 bioRxiv preprint the right sideofeach chromatogram. of analyzedplants isnotedon plants showing depicted phenotype outofthe total number in thesetwoknockdowns aredifferent, asseen bytheco-eluting profiles. causes substantial changesinSalpiglossis acylsugar The accumulating acylsugars knockdown showsreduction ofacylsugarlevels.(B)SsASAT2 andSsASAT5 knockdown SsASAT5 transcriptsinadistinctreplicationofVIGS experiments.(A)SsASAT1 Figure 3-FigureSupplement 1:Resultsofknockdown ofSsASAT1, SsASAT2 and 100 100 100 100 B. 0 0 0 % % % 0 % SsASAT5-2 Uninoculated SsASAT2-1 TRV2-LIC doi: 3.00 3.00 3.00 3.00 1237.51 1237.46 https://doi.org/10.1101/136937 1237.48 3.64 3.65 3.68 4.00 4.00 4.00 4.00 5.17;667.21 5.51;693.26 5.63;667.26 5.78;729.22 5.00 5.00 5.00 5.00 695.22 5.30 5.81;721.23 6.02;695.22 695.27 695.26 6.00 6.00 6.00 5.99 5.98 6.00 6.94;737.26 6.96;737.24 723.26 6.35 6.69;709.22 7.04;737.25 A. 7.76;751.25 7.00 7.00 7.00 7.00 Total normalized acylsugar peak area 723.29 7.28 ; 7.68;751.26 7.72;751.23 this versionpostedMay11,2017. a 7.87;737.25 CC-BY 4.0Internationallicense 751.28

7.80 0 500 1000 1500 8.00 8.00 8.00 8.00 Uninoculated 751.33 8.69 9.20;779.29 9.24;779.27 9.00 9.00 9.00 9.00 779.29 779.32 9.19

9.29 TRV2 765.31 p=0.08 9.56 10.00 10.00 10.00 10.00 793.31 793.32 10.08 10.10 SsASAT1−1 10.01;793.32 821.34 10.05;793.29 p=0.001 10.34 10.39;821.32 10.42;779.32 10.44;821.31 SsASAT1−2 11.00 11.00 11.00 11.00 835.35 1 1.20 . 1 1.40;821.37 The copyrightholderforthispreprint(whichwasnot m/z ratios.Number of 12.00 12.00 12.00 12.00 835.35 12.22 10/10 plants 8/8 plants 5/6 plants 5/5 plants 13.00 13.00 13.00 13.00 14.00 14.00 14.00 14.00 1: 1: 1: 1: T T T T O O O O F F F F MS MS MS MS 9.86e4 8.22e4 9.99e4 1.02e5 15.00 15.00 15.00 15.00 ES- ES- ES- ES- BPI BPI BPI BPI Time stem) andisannotatedasanionchannel. Thus, thereasonforthisunusualphenotypeisstillunclear. of uninfectedSalpiglossis(~360readsintrichome;~550shaved otides isnotpreferentiallyexpressedinthetrichomes addition toSsASAT2. However, transcriptwith100%identitytothesilencing fragmentover18nucle thesingleSalpiglossis explanationforthisresultisthatanothertranscriptwassilencedin decreased significantlyinthesilencedlines.Onepossible S5:20(2,2,5,5,6)] increasedsignificantly, oftheseacylsugars[eg: S4:21(5,5,5,6);S5:25(2,5,6,6,6)] whileC5 derivatives S3:17(5,6,6)andS3:18(6,6,6)[e.g.:S4:18(2,5,5,6); These resultsshowthatC2acylatedderivativesofS3:16(5,5,6), the names inredshowsignificantdecreasescomparedwith TRV2-LIC control. parisons, respectively. *:p<0.05;**:p<0.01usingtheKStest. Acylsugar namesingreenshowsignificantincreasesandthe changesinSsASAT2-1_TRV2-LICexperiment. Blueandredasterisksindicatesignificant andSsASAT2-2_TRV2-LIC com- theLC/MSextractedionchromatograms,acrossallplantsin summed normalizedacylsugarpeakareas,obtained using Figure 3-FigureSupplement2:Individualacylsugar levelsinSsASAT2 VIGSreplicate1. Figure 3-FigureSupplement2 certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint

Normalized acylsugar peak area Normalized acylsugar peak area Normalized acylsugar peak area Normalized acylsugar peak area Normalized acylsugar peak area

0 10 25 0 10 0 4 8 0.00 0.06 0.00 0.15 S5.24..2.5.5.6.6. S4.18..2.5.5.6..2 S4.21..5.5.5.6. S4.16..2.2.6.6. S4.14..2.3.4.5. ** ** ** * ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 doi: ** ** * ASAT2−2 ASAT2−2 * ASAT2−2 ASAT2−2 ASAT2−2 https://doi.org/10.1101/136937 TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC

Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated

0.0 1.5 0 30 60 0 10 25 0.0 0.8 0.0 0.4 S5.21..2.2.5.6.6. S5.25..2.5.6.6.6. S4.19..2.5.6.6. S4.17..2.5.5.5. ** ** * ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 S3.14..2.6.6. ** ** ASAT2−2 ASAT2−2 ASAT2−2 ASAT2−2 ASAT2−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC

Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated ; this versionpostedMay11,2017. a CC-BY 4.0Internationallicense 0.0 1.5 0 10 20 0 10 0.0 0.4 0.0 0.2 0.4 S5.25..2.5.6.6.6..1 S4.22..5.5.6.6. S4.20..2.6.6.6. ** S4.14..2.2.5.5. ** ** S3.18..6.6.6.. ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 ** ** ASAT2−2 ASAT2−2 ASAT2−2 ASAT2−2 ASAT2−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC

Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated

0 4 8 0 10 0 5 15 0.0 0.8 0.0 0.3 S6.25..2.2.5.5.5.6. S5.22..2.3.5.6.6. S4.18..2.5.5.6. S4.15..2.2.5.6. ** * Ambiguous *

ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 . The copyrightholderforthispreprint(whichwasnot ** * ASAT2−2 ASAT2−2 ASAT2−2 ASAT2−2 ASAT2−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC

Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated

0 2 4 0 20 40 0 30 60 0 4 8 0.0 0.4 S6.26..2.2.5.5.6.6. S5.20..2.2.5.5.6. S4.18..2.5.5.6..1 S5.23..2.4.5.6.6. ** The boxplotsdenotethe ** * * ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 ASAT2−1 S3.16..5.5.6. ** ** ** ASAT2−2 ASAT2−2 * ASAT2−2 ASAT2−2 ASAT2−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC

Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated - bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 4

A. B.

100

Uninoculated 1: TOF MS ES- 2.05 (n=12) 179.07 BPI 3.65e6

% 40 Uninoculated 35 TRV2-LIC 0 1.00 2.00 3.00 4.00 5.00 6.00 7.00

1: TOF MS ES- 30 SsASAT3-1 100 BPI Empty vector 3.65e6 2.06 25 (n=9) 179.07 % 20

0 15 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Relative abundane (%) Novel acylsugars 1: TOF MS ES- 10 100 BPI 3.65e6 SsASAT3-1 (n=7) 5 % Average normalized acylsugar peak area 0 0.75 387.11 Individual acylsugar peaks from VIGS plants, low to high retention time

0 Time 1.00 2.00 3.00 4.00 5.00 6.00 7.00

C. D. Acylsugars 1: TOF MS ES- 100 TIC Uninoculated 2.25e5 (n=12) % TRI TETRA PENTA HEXA * *

0 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 200 1: TOF MS ES- 12 100 TIC Empty 2.51e5

3 # vector 60 10 150 % (n=9) 8 0

8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 2 1: TOF MS ES- 40 # 100 100 TIC 6 1.71e5 SsASAT5-1 4

% (n=12) 1 20 50 Relative abundane (%) 2 0 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 1: TOF MS ES- TIC 0 100 0 0 0

1.57e5 Total normalized acylsugar peak area SsASAT5-2

% (n=11)

14.51 116.93 SsASAT5-1 SsASAT5-2 SsASAT5-1 SsASAT5-2

0 Time SsASAT5-1 SsASAT5-2 SsASAT5-1 SsASAT5-2 Empty vector Empty vector Empty vector Empty vector Uninoculated Uninoculated Uninoculated 8.00 9.00 10.00 11.00 12.00 13.00 14.00 15.00 Uninoculated

Figure 4: VIGS phenotypes of SsASAT3 and SsASAT5 knockdown plants. In each sub-figure, the left hand panel shows a representative chromatographic phenotype while the right hand panel shows distributions of the aggregated peak areas of all plants of the tested genotypes. (A,B) SsASAT3 VIGS knockdown experiment using a single targeting fragment SsASAT3-1 resulted in appearance of novel acylsugar peaks whose levels are significantly higher (p<0.05, KS test) vs. control. Error bars indicate standard error. Individual acylsugar peak areas are shown in Figure Supplement 1, while positive mode fragmentation patterns of the novel acylsugars are shown in Figure Supplement 2. (C) SsASAT5-1 and SsASAT5-2 chromatograms are from individual plants with two different regions of the SsASAT5 transcripts targeted for silencing. (D) Distributions of the aggregated peak areas of all plants of the tested genotypes. The boxplots show that SsASAT5 knockdown leads to a significant (*: KS test p<0.05; #: KS test 0.05

0 10 25 5 15 0 2 4 0 3 6 0 3 6 doi: S5.24..2.5.5.6.6. S4.18..2.5.5.6..1 S3.13..2.5.6. S4.22..5.5.6.6. S4.14..2.2.5.5.

ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 https://doi.org/10.1101/136937

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC * ** **

Uninoculated Uninoculated Uninoculated * Uninoculated Uninoculated

0.0 1.5 0 10 20 0 3 6 0 10 25 0 2 S5.25..2.5.6.6.6. S5.20..2.2.5.5.6. S4.18..2.5.5.6..2 S4.15..2.2.5.6. Unidentified ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC ** ** ; * this versionpostedMay11,2017. Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated a CC-BY 4.0Internationallicense Highly abundantnovelacylsugars 0.0 1.5 S5.25..2.5.6.6.6..1 5 20 5 15 0 10 20 0 10 S5.21..2.2.5.6.6. S4.16..2.2.6.6. S4.19..2.5.6.6. S3.16..5.5.6 ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC ** Uninoculated Uninoculated Uninoculated Uninoculated Uninoculated **

0 4 8 S6.25..2.2.5.5.5.6. 2 6 12 0 2 4 0 3 6 0.0 0.8 S5.22..2.3.5.6.6. S4.17..2.5.5.5. S4.20..2.6.6.6. S3.18..6.6.6. . ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 The copyrightholderforthispreprint(whichwasnot

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC ** ** Uninoculated Uninoculated Uninoculated * Uninoculated Uninoculated

0 2 4 S6.26..2.2.5.5.6.6. 0 20 40 5 15 0 2 4 0 4 8 S5.23..2.4.5.6.6. S4.18..2.5.5.6. S4.21..5.5.5.6. S3.14..2.6.6. ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2 ASAT3−2

TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC TRV2−LIC ** Uninoculated Uninoculated Uninoculated Uninoculated ** Uninoculated bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 4-Figure Supplement 2

A.

247.08 100 + 2: TOF MS ES+ 4 R6’:C2 2.00e6 O R6’ NH OH O . R4 O HO O O O OH O

HO O O F2:4(2,2) O O + R3’:C2 -NH4 R2 R3’ % -P2:11(5,6) 169.05 S4:15(2,2,5,6)

626.30

631.26 248.09 345.19 127.04 615.28 170.05 663.19 109.03 P2:11(5,6) 0 m/z 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 B. + 2: TOF MS ES+ 247.08 4 2.01e6 100 R6’:C2 O R6’ NH

OH . O R4 O HO O O O OH O O HO O O O R3’:C2 R2 R3’ -NH+ % 4 F2:4(2,2)

169.05 -P2:12(6,6) S4:16(2,2,6,6) 640.32

645.27 127.04 248.09 359.21 629.30 650.75 109.03 170.05 P2:12(6,6) 0 m/z 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 C. 2: TOF MS ES+ 289.13 5.83e5 100 + 4

R6’:C2 NH . R1’:C5 O R6’ O OH R1’ O R4 O O O O O OH O

HO OH

O F2:7(2,5) + O % -NH4 R2 -P2:11(5,6) S4:18(2,5,5,6)

668.35 673.30

187.06 290.13 345.19 657.33 109.03 674.31 127.04 0 P2:11(5,6) m/z 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900

Figure 4-Figure Supplement 2: Positive mode fragmentation patterns of novel acylsugars found in SsASAT3 VIGS knockdown plants. In (A,B,C), mass spectra obtained at elevated collision energies of three different novel acylsucrose peaks are shown. We make the assumption that fragment ions and the most abun- dant pseudomolecular ions appearing at the same retention time are associated. In all three cases, assuming that the original mono- and di-acylations occurred on the pyranose (P) ring, we can infer that the subsequent C2 and C5 additions giving rise to the tri- and tetra-acylated sugars occurred on the furanose (F) ring. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 4-Figure Supplement 3

OH R4:aiC5/aiC6 OH O OH OH O R4 R4 OH HO O OH OH HO HO O O OH O O O A. HO O O OH O HO OH O O OH O HO O OH O OH O H O OH O HO OH O R3:aiC6 O OH O O SsASAT3 O HO SsASAT1 O SsASAT2 OH R3 R2:aiC5/aiC6 R2 R2 R2 SsASAT4? SsASAT4?

R1’:aiC5 OH O O R1’ R4 OH OH HO O O R4 OH O O O O OH O O O OH O O HO O O HO OH O O O R3’ R2 R3’:C2 R2

SsASAT5 SsASAT5 R6’:C2 O R6’ R6’:C2 OH O O O R6’ R4 O HO R4 O OH O O O R4 OH OH HO O O O O O OH HO O O O OH O HO O O R2 R3’

R2

OH R4:aiC5/aiC6 OH O OH OH O R4 R4 OH HO O OH OH HO B. HO O O OH O O O HO O O OH O HO OH O O OH O HO O OH O OH O H O OH O HO OH O R3:aiC6 O OH O O SsASAT3 O HO SsASAT1 O SsASAT2 OH R3 R2:aiC5/aiC6 R2 R2 R2

? ?

R1’:aiC5 OH O R1’ O OH R4 OH HO O OH O R4 O O O O O OH O O OH O OH O O OH O O O O O O O R3 R3’ R3 R2 R2 R3’:C2

SsASAT5 SsASAT5

O R6’:C2 R6’ O OH R6’ R6’:C2 O O O OH R1’ R4 HO O O O R4 O O O O O OH O O OH O OH O O O O OH O O O O R3 O R2 R3’ R3 R2

Figure 4-Figure Supplement 3: Hypothesized routes of metabolite flow in VIGS knockdown plants. (A) SsASAT3 knockdown and (B) SsASAT5 knockdown. The perturbed pathway steps are denoted by a blurred arrow and two red lines. All acylation positions are hypothesized based on in vitro data and knowledge of the enzyme activities bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under OH Figure 5 aCC-BY 4.0 InternationalOH license. HO O O HO OH O OH HO OH Sucrose SsASAT1

OH OH HO O O HO OH O OH HO O O R2:aiC5/aiC6 R2 Mono-acylsugar SsASAT2

R1’:aiC5 O OH R1’ R4:aiC5/aiC6 OH OH O O O O OH R4 OH R4 R4 HO O O HO O O O O ? O ? O O O OH OH OH O O predicted based on O predicted based on O HO OH SsASAT3 OH SsASAT3 HO O HO O O O VIGS results O VIGS results O O R3’ R2 R2 R2 R3:C2 Tri-acylsugar* Di-acylsugar Tri-acylsugar* SsASAT5 SsASAT3 SsASAT5 predicted based on O predicted based on R6’ SsASAT3 SsASAT3 OH OH VIGS results O O R6’:C2 VIGS results R4 OH O HO R4 HO R6’:C2 O O O O O R6’:C2 R6’ O O O SsASAT5 O R6’ R1’ OH OH OH O O OH O R4 O predicted based on O O O O OH O OH R4 O O in vitro activity O HO O O R3:aiC6 O O O O O O OH O OH O R3 R3 O R2 R2 HO OH O O HO O O Tri-acylsugar Tetra-acylsugar* O O R2 R3’ R2 Tetra-acylsugar* ? ? Tetra-acylsugar*

R1’:aiC5 OH O R1’ O OH R4 OH HO O OH O R4 O O O O O OH O O OH O O O O OH O O O O O O O R3 R3’ R3’:C2 R3 R2 R2 Tetra-acylsugar Tetra-acylsugar

SsASAT5 SsASAT5

R6:C2 O R6’:C2 R6’:C2 R6 O O R6’ R6’ O R1’:C2 O O O R6’ O R6’ R1’ R1’ O OH OH OH R6’ O O O O O R4 R4 O O O O O R4 R4 O O HO O O O O O O O O SsASAT5 O O O OH OH OH SsASAT5 OH O O O O predicted based on predicted based on O OH O OH O O O O in vitro activity O O in vitro activity O O O O O O O O O O O O R3 R3 R3 R3’ R3 R2 R2 R2 R2 R3’ Hexa-acylsugar* Penta-acylsugar Penta-acylsugar Hexa-acylsugar* ?

O R6’ O OH R1’ O R4 O O O O O OH O O O O O O O R3 R3’ R2 R3’:C2 Hexa-acylsugar Figure 5: Model for the Salpiglossis acylsugar biosynthetic pathway. The question marks indicate unidentified enzymes. The blue colored acyl chains are positioned on the sucrose molecule based on results of positive mode fragmentation characteristics, co-elution assays, and comparisons with purified acylsugars. Main activities are shown in solid arrows and potential alternate activities - where acylation positions and enzymatic activities are hypothesized based on in vitro and in vivo findings - are shown in dashed arrows. An asterisk (*) next to the acylsugar names indicates no NMR structure is available for the acylsugar in Salpiglossis or Petunia, and the acyl chain positions are postulated based on their fragmentation patterns in positive and/or negative mode, and on hypothesized enzyme activities as described in the main text. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 6

A. B.

50-65 mya 90 SlASAT1 Solanaceae 100 90 PaASAT2 SsASAT2 92 S o Solyc07g043670

16 PaASAT4 75-100 mya l ana l Solyc08g075210 Convolvulaceae (Ipomoea) 13 Capsicum annuum LOC107855820

25 SsASAT1 e

100 s 64 PaASAT1 Hydroleaceae 46 SlASAT2 SsASAT3 Montiniaceae 100 Sphenocleaceae 100 PaASAT3 Solyc07g043710 L

#1 Solyc07g043700 a 75 92 SlASAT3 Lamiaceae (Lavandula) m i

93 99 Solyc11g067290 Pedaliaceae (Sesamum) a

Solyc11g067340 l 100 e

100 Solyc11g067330 Phyrmaceae (Erythranthe) s 47 Solyc08g014490 80 Solyc05g039950 Solyc11g069680 Boraginales 100 Ipomoea trifida sc000393.1.g4 85 82 96 Ipomoea trifida sc0004275.1.g2 Ipomoea trifida sc0011602.1.g2 Boraginaceae #2 Solyc02g081800 84 (Ehretia, Heliotropium) Capsicum pun1 Gentianales 96 Solyc02g081760 94 100 Solyc02g081750 100 Rubiaceae (Coffea) 93 Solyc02g081740 Ehretia acuminata EMAL-2005913 Apocyanaceae 100 Heliotropium karwinsky NIGS-2110309 (Catharanthus, Rauwolfia) Coffea CDP19225.1 Catharanthus MAT 73 100 Catharanthus DAT Coffea CDP12986.1 C. 100 Solyc12g096790 Speciation 44 Solyc12g096770 events (50-65 mya) 54 Solyc12g096800 Solyc12g005430 Solanaceae enzymes 88 Solyc12g005440 100 Duplication 100 68 Solyc02g081770 Solyc12g010980 event Convolvulaceae enzymes 98 Solyc07g008390 100 100 Solyc07g008380 Solanaceae enzymes Solyc12g088170 Coffea CDO97411.1 Solyc01g105590 Convolvulaceae enzymes 99 SlASAT4 75-100 100 54 Solyc01g105550 mya Solanaceae enzymes 99 SsASAT5 69 LavandulaAAT 99 Sesamum indicum VSlike LOC105172243 Convolvulaceae enzymes 43 Rauwolfia serpentina ACT 100 Solyc04g082350 99 Sesamum indicum VSlike LOC105179234 Lamiales enzymes 57 Sesamum indicum VSlike LOC105179198 Solyc06g051320 53 100 Sesamum indicum VSlike LOC105156785 100 Sesamum indicum VSlike LOC105156787 49 Solyc05g014330 100 Solyc07g006680 Boraginales enzymes 59 Solyc07g006670 99 Erythranthe guttatus VSlike LOC105951654 100 Sesamum indicum VSlike LOC105166350 Gentianales enzymes

Figure 6: The origins of acylsugar biosynthesis. (A) Gene tree showing the characterized ASATs (blue squares) and other BAHDs in Clade III of the BAHD family of enzymes (D’Auria, 2006). Only Lamiid species are included in this tree, given significantly similar ASAT sequences were not detected elsewhere in the plant kingdom, as shown in Figure Supplement 1. The blue sub-clade is where most ASAT activities lie, while the red monophyletic sub-clade with high bootstrap support does not contain ASAT activities. Pink sub-clade includes enzymes present in outgroup species beyond Solanales. The purple squares highlight the closest homologs in the Lamiales order. The robustness of this topology was also tested using additional tree reconstruction approaches showed in Figure Supplements 2 and 3. Source Data 1 is the alignment file in the MEGA mas format used for making this tree. (B) Known relationships between different families and orders included in the study, based on Refulio-Rodriguez and Olmstead, 2014. The times indicated are based on a range of studies as described in the main text. (C) Reconciled evolutionary history of the blue ASAT sub-clade based on (A) and (B). Grey color indicates loss or lack of BLAST hits in the analyzed sequence datasets from that lineage. Figure 6-FigureSupplement1 (% IDT)valuesforthetopBLASTPhitsineachspecies areshown. (BLASTP) betweenSlASATsequencesandproteomes ofeachspeciesinthePhytozomedatabase.The%identity Figure 6-Supplement1:Thephylogenetic contextoftomatoASATs.ResultsproteinBLAST certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint doi: https://doi.org/10.1101/136937 Solanum Physcomitrella Selaginella Brachypodium Oryza Panicum Setaria Zea mays Sorghum Aquilegia Mimulus Nicotiana Solanum Vitis Eucalyptus Citrus Citrus Theobroma Gossypium Carica Thellungiella Brassica Capsella Arabidopsis Arabidopsis thaliana Fragaria Malus Prunus Cucumis Glycine Phaseolus Medicago truncatula Populus trichocarpa Linum Ricinus Manihot vinifera sativa clementina sinensis domestica papaya usitatissimum % IDT of best persica italica communis max vesca esculenta rubella virgatum guttatus rapa sativus tuberosum match lycopersicum bicolor coerulea benthamiana ; vulgaris moellendorffi cacao grandis this versionpostedMay11,2017. raimondii a lyrata halophila CC-BY 4.0Internationallicense 1 distachyon 5 0 patens 0 0 0 i

SlASAT1 Solyc12g006330 SlASAT2 Solyc04g012020 SlASAT3 Solyc11g067270 SlASAT4 Solyc01g105580 . The copyrightholderforthispreprint(whichwasnot bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 6-Figure Supplement 2

A. B. C.

81 100 SlASAT1 55 SlASAT1 89 PaASAT2 100 100 SlASAT3 89 PaASAT2 39 SsASAT2 Solyc07g043670 69 SsASAT2 100 Solyc11g067290 17 Solyc07g043670 PaASAT4 14 Solyc08g075210 PaASAT4 99 Solyc11g067340 Solyc08g075210 Capsicum annuum LOC107855820 100 Solyc11g067330 78 44 SsASAT1 Capsicum annuum LOC107855820 56 100 34 SsASAT1 50 PaASAT1 33 100 Solyc07g043700 100 SsASAT3 63 PaASAT1 100 100 23 Solyc07g043710 PaASAT3 100 SsASAT3 PaASAT3 SlASAT2 SlASAT2 SlASAT2 Solyc07g043710 28 22 Solyc07g043700 Solyc07g043710 SsASAT3 100 8 Solyc07g043700 100 83 73 SlASAT3 80 71 96 SlASAT3 99 PaASAT3 93 Solyc11g067290 Solyc11g067340 86 Solyc11g067290 Solyc08g075210 100 60 Solyc11g067330 100 Solyc11g067340 100 Solyc08g014490 100 Solyc11g067330 58 PaASAT1 55 57 Solyc11g069680 78 Jaltomato procumbens TR3592 46 100 Capsicum annuum LOC107855820 Solyc05g039950 Solyc08g014490 Solyc11g069680 Solyc05g039950 SsASAT1 100 IT sc000393.1.g4 100 Ipomoea trifida sc000393.1.g4 45 89 70 69 85 ITK sc0004275.1.g2 77 Ipomoea trifida sc0004275.1.g2 63 PaASAT4 ITK sc0011602.1.g2 Ipomoea trifida sc0011602.1.g2 Solyc07g043670 Solyc02g081800 86 Solyc02g081800 64 65 100 Capsicum pun1 Capsicum pun1 SsASAT2 48 97 Lycianthes dejecta 226320280 64 89 Solyc02g081760 95 81 100 Solyc02g081760 Solyc02g081750 100 SlASAT1 100 Solyc02g081750 100 Solyc02g081740 100 100 61 PaASAT2 76 98 Solyc02g081740 Ehretia acuminata EMAL-2005913 Ehretia acuminata EMAL-2005913 94 Heliotropium karwinsky NIGS-2110309 100 Solyc08g014490 100 100 Heliotropium karwinsky NIGS-2110309 Coffea canephora CDP19225.1 57 Coffea CDP19225.1 Catharanthus MAT Jaltomato procumbens TR3592 Catharanthus MAT 56 Catharanthus DAT 76 100 Solyc05g039950 100 Catharanthus DAT Coffea canephora CDP12986.1 Coffea CDP12986.1 50 SsASAT5 Solyc11g069680 100 Solyc02g081770 21 Lavandula AAT 100 85 IT sc000393.1.g4 46 Solyc12g005430 Sesamum indicum VSlike LOC105172243 84 Solyc12g005440 Rauwolfia serpentina ACT 60 ITK sc0011602.1.g2 43 Solyc12g096770 57 100 93 Solyc04g082350 38 100 Solyc12g096790 Sesamum indicum VSlike LOC105179234 Solyc02g081800 Solyc12g096800 41 72 100 Sesamum indicum VSlike LOC105179198 100 Capsicum pun1 Solyc07g008380 100 Solyc07g006680 93 Solyc07g008390 99 100 Solyc07g006670 93 Lycianthes dejecta 226320280 98 Solyc12g010980 18 Erythranthe guttatus VSlike LOC105951654 Solyc12g088170 49 Solyc02g081760 Coffea CDO97411.1 100 Sesamum indicum VSlike LOC105166350 100 75 Solyc06g051320 Solyc01g105590 100 Solyc02g081750 98 Solyc01g105550 Solyc05g014330 15 100 75 Sesamum indicum VSlike LOC105156785 94 Solyc02g081740 49 SlASAT4 100 100 Sindicum VSlike LOC105179234 100 Sesamum indicum VSlike LOC105156787 81 75 84 SlASAT4 Ehretia acuminata EMAL-2005913 Solyc04g082350 100 Sindicum VSlike LOC105179198 99 Solyc01g105550 100 Heliotropium karwinsky NIGS-2110309 100 Solyc01g105590 100 Sindicum VSlike LOC105156787 Coffea canephora CDO97411.1 Coffea CDP19225.1 46 Sindicum VSlike LOC105156785 Solyc12g088170 16 Solyc05g014330 100 Catharanthus MAT Solyc06g051320 98 Solyc07g008390 97 100 Solyc07g008380 Coffea CDP12986.1 Sindicum VSlike LOC105166350 99 72 16 Eguttatus VSlike LOC105951654 100 Solyc12g005440 Solyc07g006670 Solyc02g081770 SsASAT4 95 100 100 Solyc07g006680 Solyc12g005430 SlASAT4 SsASAT5 82 Solyc12g010980 67 Rserpentina ACT Solyc12g096800 100 Solyc12g096800 20 Sindicum VSlike LOC105172243 71 Solyc12g096790 100 64 100 Solyc02g081770 62 Lavandula AAT 100 Solyc12g096770

Figure 6-Figure Supplement 2: Robustness of the phylogenetic relationships. (A) A maximum likelihood tree obtained using the best model (JTT+G+I+F with 5 rate categories) as per maximum likelihood based search of multiple models. Sites with <70% coverage were deleted. (B) NJ tree using mostly the sequences in the pink+blue+red clade, with 100% site cover- age and JTT model (C) NJ tree of all the sequences obtained using JTT. Only sites with 100% coverage were used for this analysis. Please see Fig. 6A for explanation of the various branch and label colors. Bootstrap support values are a result of 100 iterations. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 InternationalB. license. Figure 6-Figure Supplement 3 NCBI - Within Solanaceae

100 Capsicum annuum acylsugar acyltransferase 3-like (LOC107855820) CDS

36 SsASAT1

SsASAT2 27 100 Solyc12g006330

SsASAT3 A. 1kp tree - Within Solanaceae 100 Solyc04g012020

100 Capsicum annuum acylsugar acyltransferase 3-like (LOC107847146) CDS 100 Solyc12g006330 99 SlASAT1 Solyc11g067270 100 PaASAT2 Atropa belladonna BOLZ-2171211 Solyc05g039950.1.1 92 52 14 69 SsASAT2 Solyc07g043670 100 Solanum tuberosum acylsugar acyltransferase 3-like (LOC107061326) CDS 53 PaASAT4 30 48 PaASAT1 Solyc10g079570.1.1 SsASAT1 61 Solanum melongena cultivar Ichiban acyltransferase (AT3-1) partial CDS 14 100 Atropa belladonna BOLZ-2029346 Atropa belladonna BOLZ-2029345 100 98 100 Solyc04g012020 Solyc02g081800.1.1 33 SlASAT2 Solyc01g105580 100 PaASAT3 SsASAT3 100 100 SsASAT5 Solyc08g075210 Solyc07g043710 27 Solyc07g043700 64 100 Solyc11g067330 47 98 Solyc11g067340 100 Solyc11g067290 Solyc11g067270 84 100 SlASAT3 99 Solyc10g079570 C. Petunia+Convolvulaceae Nicotiana sylvestris MKZR-2110551 100 Solyc08g014490 70 Peaxi162Scf00636g00042.1 75 Solanum dulcamara GHLP-2057362 36 Peaxi162Scf00176g00086.1 99 73 77 Solyc11g069680 63 Peaxi162Scf00342g00001.1 Solyc05g039950 100 Peaxi162Scf01143g00008.1 Peaxi162Scf00176g00093.1 52 Solyc02g081800 100 Capsicum pun1 52 Peaxi162Scf00176g00104.1 100 Solyc02g081760 51 Solyc11g067330 100 SlASAT3 Solyc02g081750 41 100 SlASAT2 100 Solyc02g081740 30 Peaxi162Scf00133g01113.1 100 Heliotropium karwinsky NIGS-2110309 100 100 SsASAT3 Heliotropium convolvulaceum OUER-2027258 100 Peaxi162Scf00798g00002.1 100 Ehretia acuminata EMAL-2005912 SsASAT1 26 100 Ehretia acuminata EMAL-2005913 25 Peaxi162Scf00198g00710.1 SsASAT2 Catharanthus MAT 79 100 Catharanthus DAT 100 SlASAT1 SsASAT5 Peaxi162Scf00498g00021.1 49 Peaxi162Scf00139g01113.1 Solyc04g082350 99 Peaxi162Scf00188g00072.1 Solyc06g051320 100 100 31 Peaxi162Scf00188g00713.1 100 Solyc07g006680 100 39 68 Peaxi162Scf00479g00081.1 100 39 Solyc07g006670 Peaxi162Scf00005g04415.1 Solyc05g014330 Peaxi162Scf00548g00221.1 33 CDP15030.1 100 Itrk sc0004275.1 g000002.1 42 Solyc01g105580 Itr sc000393.1 g00004.1 94 51 100 SlASAT4 Itrk sc0011602.1 g000002.1 77 Solyc02g081740 100 Solyc02g081800 100 Peaxi162Scf00321g00072.1 100 Peaxi162Scf00026g00305.1 Peaxi162Scf00169g00014.1 71 70 100 Peaxi162Scf00169g00223.1 Solyc08g014490 58 Peaxi162Scf00075g01237.1 100 100 Peaxi162Scf00075g01225.1 98 Peaxi162Scf00169g00073.1 100 Peaxi162Scf00444g00533.1 71 Peaxi162Scf00444g00619.1 100 Peaxi162Scf00724g00072.1 Peaxi162Scf00347g00714.1 Figure 6-Figure Supplement 3: Additional related 66 99 Peaxi162Scf01276g00003.1 100 94 Peaxi162Scf00588g00051.1 100 sequences do not affect our inferences regarding ASAT Peaxi162Scf00188g00119.1 clade emergence. Hits obtained using various BLAST 100 Peaxi162Scf00175g01023.1 Peaxi162Scf00175g01139.1 99 Peaxi162Scf00175g00112.1 strategies (A,B) and using Petunia Clade III BAHDs (C) are 78 53 100 Peaxi162Scf00175g01035.1 shown in a phylogenetic context with ASATs and other Ehretia acuminata EMAL-2005912 100 Heliotropium karwinsky NIGS-2110309 relevant homologs. The novel hits that were not included in Catharanthus DAT 100 Catharanthus MAT the final gene tree in Fig. 6 are shown by a green square. All Cuscuta pentagona c22877 g1 i2 SlASAT4 60 trees were generated using NJ, using the JTT model with 99 CDP15030.1 SsASAT5 100 bootstraps. All sites with <80% coverage were elimi- Ipomoea purpurea c27025 g2 i1 nated. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Figure 7

A. Solanum lycopersicum C. OH Solanum betaceum OH HO Solanum quitoense NYBG O OH O OH HO HO Solanum quitoense MSU OH O O O Solanum eutuberosum HO OH OH Solanum brachyantherum HO OH O Solanum aethiopicum SlASAT1 (aASAT2 ortholog) HO OH Solanum jasminoides Sucrose OH Solanum conocarpum Sucrose Solanum nigrum OH OH Solanum pseudocapsicum HO O Physalis viscosa O HO OH Physalis peruviana O Physalis alkekengi OH OH HO O OH Iochroma cyaneum NYBG HO O Iochroma cyaneum MSU O O R2 HO Iochroma fuchsoides OH Saracha punctata Mono-acylsucrose SnASAT1 O (aASAT2 ortholog) OH Withania somnifera HO OH Capsicum chinense Sucrose Capsicum annuum Lycianthes quichensis Lycianthes rantonnetii Brugmansia suaveolens Datura inoxia OH OH Mandragora autumnalis HO O Juanulloa mexicana O HO Solandra sp. OH HnASAT2 O Atropa belladonna OH Hyoscyamus niger (aASAT2 ortholog) HO O Nicotiana sp. Nicotiana alata yellow O R2 Nicotiana alata white Mono-acylsucrose Nicotiana sylvestris Calibrachoa sp. Petunia axillaris Schizanthus sp. OH OH HO Goetzia sp. O O Cestrum diurnum HO OH Salpiglossis sinuata SsASAT2 O OH Ipomoea batatas (aASAT2 ortholog) HO O Ipomoea purpurea Convolvulaceae O Argyreia nervosa R2 Convolvulus cneorum Mono-acylsucrose B.

Solanum lycopersicum genome Deletion

Capsicum annuum genome Inversion

Petunia axillaris genome PaASAT1

Figure 7: The evolution of acylsugar biosynthesis in Solanaceae. (A) ASAT activities overlaid on the Solanaceae phylo- genetic relationships. Each colored square represents a single ASAT, starting from ASAT1 and moving sequentially down the pathway to ASAT5 (left to right, sequentially). Homologs are represented by the same color. Squares not contained within a box are experimentally validated activities. A solid black box indicates that a highly identical transcript exists in the RNA-seq dataset and is trichome-high. A dashed box indicates that, based on a BLAST search, the sequence exists in the genome for the contained enzymes. In vitro validated S. nigrum and H. niger ASAT activities, including their phylogenetic positions, are shown in Figure Supplement 1. Species selected for RNA-seq in this study are colored blue, previously studied species are highlighted in pink and Convolvulaceae species are in red. See Source Data 1 for results of the BLAST analysis. (B) Ortholo- gous genomic regions between three species harboring aASAT1 orthologs. Each gene in the region is shown by a colored block. Orthologous genes are represented by the same color. The PaASAT1 gene (red box) has two homologous sequences in the Capsicum syntenic region, but one of them is truncated. aASAT1 ortholog is not seen in the syntenic region in tomato. Genes used to make this figure are described in Source Data 2. (C) Substrate utilization of aASAT2 orthologs from multiple species is described based on the activities presented in Figure Supplement 1. Hyoscyamus niger (Hn), Salpiglossis sinuata (Ss), Solanum nigrum (Sn) and Solanum lycopersicum (Sl). The hydroxyl group highlighted in red shows the predicted position of acylation by the respective aASAT2 ortholog. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Figure 7-Figure Supplement 1 aCC-BY 4.0 International license.

A. B. C.

Sucrose+aiC6 CoA+ Sucrose+nC12 CoA+ 1: TOF MS ES- SlASAT2 1: TOF MS ES- 100 100 100 569.27 0.0500Da SsASAT2 485.19 0.0500Da SsASAT2 71 SN c65670 g1 219 23 % % 100 HN c61998 g1 0 0 37 SsASAT3 1: TOF MS ES- 1: TOF MS ES- 100 Not S1:6 485.19 0.0500Da 100 HnASAT2 569.27 0.0500Da SlASAT3 HnASAT2 84.8 71 % % 100 HN c46360 g1 93 SN c71009 g1 0 0

1.39 1: TOF MS ES- SsASAT1 100 2.00 1: TOF MS ES- 100 485.19 0.0500Da 100 SsASAT1 SsASAT1 3.17e5 569.27 0.0500Da HN c58659 g1 (HnASAT1) 9.05e5 % %

36 PaASAT4 0 0

1: TOF MS ES- SsASAT2 1.39 1: TOF MS ES- 89 100 485.19 0.0500Da 100 2.01 HnASAT1 HnASAT1 569.27 0.0500Da HN c61400 g1 (HnASAT2) 347 100 1.72e3 % % 82 SlASAT1 0 0 100 SN c63608 g1 (SnASAT1) 1: TOF MS ES- 100 1.41 100 2.09 1: TOF MS ES- 485.19 0.0500Da SsASAT5 SnASAT1 SnASAT1 569.27 0.0500Da 100 8.85e4 2.00 9.83e5 % HN c40105 g1 %

100 SlASAT4 0 0 1.44 2.09 SN c60145 g1 100 1: TOF MS ES- 100 1: TOF MS ES- 100 SlASAT1 485.19 0.0500Da SlASAT1 569.27 0.0500Da 1.40 1.77e6 HN c56915 g1 3.32e6 % 75 % 2.04 98 SN c53868 g2 Time 0 0 0.50 1.00 1.50 2.00 2.50 3.00

1: TOF MS ES- D. 100 Product 555.2 0.1000Da 5.67e4 HnASAT2+(iC4+iC5+aiC5)+S1:5(5) *

% *S1:5(5) produced using [HnASAT1+sucrose+(iC5+aiC5)] S2:10(5,5)

0 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50

Figure 7-Figure Supplement 1: Some aASAT2 orthologs cannot catalyze sucrose to mono-acylsucrose conversion. (A) Neighbor-joining tree made using Salpiglossis and tomato ASATs and their best hits in S. nigrum (Sn) and H. niger (Hn) transcriptome database. Tree was made using default parameters in MEGA6 for alignment and phylogeny, and identifies putative aASAT2 orthologs in S. nigrum and H. niger. (B-D)Shown are LC/MS extracted ion chromatograms for S1:6(6) [B], S1:12(12) [C] and S2:10(5,5) [D]. All reactions were run on a C18 LC column. Results show that Hn, Ss, Sn and SlASAT1s can catalyze sucrose to mono-acylsucrose conversion using aiC6 and nC12 CoA. However, aASAT2 orthologs in Hn and Ss are not able to catalyze this reaction. (D) HnASAT2 is able to catalyze di-acylsucrose formation using the HnASAT1 product. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not Figurecertified 8 by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. SlASAT1 SlASAT3

OH O OH HO O O Present O OH Cultivated tomato O O Refinement of O O

aASAT2 O O Solanum section Lycopersicon SlASAT2 orthologs to (in S. lycopersicum) emerges become ASAT1 10 mya } aASAT1 ortholog loss aASAT2 and aASAT3 orthologs transition to Solanum-Capsicum lineage } perform ASAT1 and ASAT2 activities, respectively estimated split 20 mya

SsASAT3 OH SsASAT1 O O H HO } O O O Salpiglossis lineage OH O OH emerges O O O } O 30 mya

SsASAT2 Solanaceae crown (in Salpiglossis) node time estimate

40 mya Duplications give rise to aASAT1,2,3 and other BAHDs in the blue subclade (aASAT: ancestral ASAT) 50 mya

Speciation that would eventually lead to Solanaceae and

Convolvulaceae families

} 60 mya }

70 mya ASAT1,2,3 clade emerges from an alkaloid biosynthetic ancestor } } 80 mya

Lamiids diverge into Gentianales, Solanales, Lamiales, Boraginales 90 mya orders

100 mya Figure 8: Proposed model for evolution of acylsugar biosynthesis over 100 million years. Events in plant evolution are noted on the left and the acylsugar biosynthetic pathway developments are noted on the right of the timeline. Tri-acylsugars produced by Salpiglossis and tomato ASAT1,2,3 enzymes are shown with the chain color corresponding to the color of the enzyme that adds the chain. Images obtained from Wikimedia Commons under Creative Commons CC BY-SA 3.0 license. We note that Salpiglossis and SsASAT activities should not be considered as “ancestral” and Tomato and SlASAT activities as “modern”, since all of them exist today. We use the word “lineages” (eg: Salpiglossis lineage) to refer to the branch leading up to the currently existing (extant) species. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Supplemental Table 1: Description of the LC gradients used in this study

7min -C18 Flow (mL/min) %A %B Curve Initial 0.3 95 5 Initial A: Water+0.1% Formic acid ( -ve mode) 1 0.3 40 60 6 A: 10mM ammonium formate in water, pH 2.8 (+ve mode) 5 0.3 0 100 6 B: Acetonitrile 6 0.3 0 100 6 6.01 0.3 95 5 6 7 0.3 95 5 6

22min -C18 Flow (mL/min) %A %B Curve Initial 0.3 90 10 Initial A: Water+0.1% Formic acid ( -ve mode) 7 0.3 40 60 6 A: 10mM ammonium formate in water, pH 2.8 (+ve mode) 15 0.3 0 100 6 B: Acetonitrile 20 0.3 0 100 6 20.01 0.3 90 10 6 22 0.3 90 10 6

110min - C18 Flow (mL/min) %A %B Curve Initial 0.3 99 1 Initial A: Water+0.1% Formic acid ( -ve mode) 1 0.3 99 1 6 A: 10mM ammonium formate in water, pH 2.8 (+ve mode) 100 0.3 20 80 6 B: Acetonitrile 101 0.3 0 100 6 105 0.3 0 100 6 106 0.3 99 1 6 110 0.3 99 1 6

9min -BEH Flow (mL/min) %A %B Curve A: Water+0.1% Formic acid Initial 0.3 5 95 Initial B: Acetonitrile 1 0.3 5 95 6 6 0.3 40 60 6 7 0.3 95 5 6 7.01 0.3 5 95 6 9 0.3 5 95 6

bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Supplemental Methods

Extraction and sequencing of RNA Total RNA was extracted from 4-5 week old (S. nigrum, S. quitoense) or 7-8 week old plants (H. niger, Salpiglossis) using Qiagen RNEasy kit (Qiagen, Valencia, California) with on-column DNA digestion. In addition to trichome RNA, total RNA was extracted from shaved stems of S. nigrum and Salpiglossis, and shaved petioles of S. quitoense and H. niger. The quality of extracted RNA was determined using Qubit (Thermo Fisher Scientific, USA) and Bioanalyzer (Agilent Technologies, Palo Alto, California). Total RNA from all 16 samples (4 species x 2 tissues x 2 biological replicates) was sequenced using Illumina HiSeq 2500 (Illumina, USA) in two lanes (8X multiplexing per lane). Libraries were prepared using the Illumina TruSeq Stranded mRNA Library preparation kit LT, sequencing carried out using Rapid SBS Reagents in a 2x100bp paired end format, base calling done by Illumina Real Time Analysis (RTA) v1.18.61 and the output of RTA was demultiplexed and converted to FastQ format by Illumina Bcl2fastq v1.8.4.

Confirmation of Salpiglossis sinuata We confirmed the plant to be Salpiglossis using the chloroplast ndhF and trnLF marker based phylogenies (Figure 2-Figure Supplement 1A,B). Specifically, we amplified these regions using locus-specific primers, sequenced the amplicons and assembled the contigs using Muscle (Edgar, 2004). A neighbor joining tree including ndhF and trnLF sequences from NCBI Genbank was used to confirm the identity of the plant under investigation.

Prediction of protein sequences and orthologous group assignments The protein sequences corresponding to the longest isoform of all expressed transcripts (read count>10 in at least one dataset in a given species) were obtained using TransDecoder (Haas et al., 2013) and GeneWise v2.1.20c (Birney et al., 2004). Only the protein sequences of transcripts with ≥10 reads as defined by RSEM were used for constructing orthologous groups using OrthoMCL v5 (Li et al., 2003). We defined orthologous relationships between S. lycopersicum, S. nigrum, S. quitoense, H. niger, N. benthamiana, Salpiglossis and Coffea canephora (outgroup) using an inflation index of 1.5. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Gene Ontology enrichment analyses We transferred the tomato gene ontology assignments to the homologs from other species in the same orthologous group. GO enrichment analysis was performed using a custom R script, and enriched categories were obtained using Fisher Exact Test and correction for multiple testing based on Q-value (Storey, 2002).

qRT-PCR Primers to specific regions of the targeted transcript were designed with amplicons between 100- 200 bp using Primer3 (Untergasser et al., 2012). The regions selected for amplification did not overlap with the region targeted in the VIGS analysis. Primer sets for Salpiglossis orthologs of PDS and elongation factor alpha (EF1a) were used as controls. We used 1 µg of total RNA from a single VIGS plant with an acylsugar phenotype to generate cDNA using the Thermo Fisher Superscript II RT kit. An initial amplification and visualization on a 1% agarose gel was performed to ensure that the primers yielded an amplicon with the predicted size and did not show visible levels of primer dimers. We first tested multiple primer sets per gene and selected primers within 85-115% efficiency range using a dilution series of cDNA from uninoculated plants. These primers were used for the final qRT-PCR reaction. The Ct values for the transcripts (on 1x template) were measured in triplicate, which were averaged for the analysis. Both uninoculated and empty vector controls were measured with all primer sets for ΔΔCt calculations.

Similarity searches using BLAST Similarity searches against the 1kp and NCBI nr databases were performed using TBLASTN, using SlASAT1, SlASAT3, SsASAT1 and SsASAT2 protein sequences as queries. For the 1kp database, TBLASTN was performed against all Asterid sequences, and the best non-Solanaceae sequences were analyzed using phylogenetic tree reconstruction (see below; Figure 6-Source Data 1). The BLAST at NCBI was performed against several specific groups of species, namely (i) orders Gentianales+Boraginales+Lamiales+Solanales-(family Solanaceae), (ii) Only Convolvulaceae, (iii) Solanaceae-(sub-family Solanoideae), (iv) Solanoideae-(genera Solanum+Capsicum), (v) Solanoideae-(genus Solanum), (vi) Solanum-(section Lycopersicon+species tuberosum). The top 10 hits from each search were manually curated and analyzed in a phylogenetic context with SlASATs and SsASATs (Figure 6-Figure Supplement 2,3). We used a similar approach to search the annotated peptide sequences in C. canephora bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

(Denoeud et al., 2014) and Ipomoea trifida (Hirakawa et al., 2015) sequence databases. Sequences that provided additional insights into the evolution of the ASAT clade were integrated into the final gene tree shown in Figure 6A.

Synteny analysis Synteny between the Petunia, Capsicum and tomato genomes was determined as previously described using MCScanX (Moghe et al., 2014; Wang et al., 2012). Regions corresponding to PaASAT1 and its best matching homologs were used to make Figure 7B.

Estimating the number of acylsugars This and previous studies provide evidence for a number of acyl chains esterified to sucrose, classified into short (C2, C3, C4, iC5, aiC5, iC6, aiC6, C8), long (nC10, iC10, nC11, nC12) and non-aliphatic (malonyl). We only focused on aliphatic chains for this estimate. We typically see one long chain and 3-4 short chains on the sugar molecule across different Solanaceae species. In our calculation, we assumed that each acyl chain has the same probability of being incorporated into the acylsugar, with the core being a sucrose core. Under these assumptions, there are 8 acyl chains that could be incorporated at three positions on a tetra-acylsugar and 12 acyl chains on the fourth position. This gives a theoretical estimate of 8*8*8*12=6144 acylsucroses that could be produced with the above acyl chains.

References

Birney, E., Clamp, M., and Durbin, R. (2004). GeneWise and Genomewise. Genome Res. 14: 988–995.

Denoeud, F. et al. (2014). The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345: 1181–1184.

Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32: 1792–1797.

Haas, B.J. et al. (2013). De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat. Protoc. 8. bioRxiv preprint doi: https://doi.org/10.1101/136937; this version posted May 11, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Hirakawa, H. et al. (2015). Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes 22: 171–179.

Li, L., Stoeckert, C.J., and Roos, D.S. (2003). OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13: 2178–2189.

Moghe, G.D., Hufnagel, D.E., Tang, H., Xiao, Y., Dworkin, I., Town, C.D., Conner, J.K., and Shiu, S.-H. (2014). Consequences of whole-genome triplication as revealed by comparative genomic analyses of the wild radish Raphanus raphanistrum and three other Brassicaceae species. Plant Cell 26: 1925–1937.

Storey, J.D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. B 64: 479–498.

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., and Rozen, S.G. (2012). Primer3—new capabilities and interfaces. Nucleic Acids Res. 40: e115.

Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T., Jin, H., Marler, B., Guo, H., Kissinger, J.C., and Paterson, A.H. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40: e49.