Accepted Manuscript

Positive selection of digestive Cys proteases in herbivorous Coleoptera

Juan Vorster, Asieh Rasoolizadeh, Marie-Claire Goulet, Conrad Cloutier, Frank Sainsbury, Dominique Michaud

PII: S0965-1748(15)30033-3 DOI: 10.1016/j.ibmb.2015.07.017 Reference: IB 2748

To appear in: Insect Biochemistry and Molecular

Received Date: 23 July 2014 Revised Date: 22 July 2015 Accepted Date: 31 July 2015

Please cite this article as: Vorster, J., Rasoolizadeh, A., Goulet, M.-C., Cloutier, C., Sainsbury, F., Michaud, D., Positive selection of digestive Cys proteases in herbivorous Coleoptera, Insect Biochemistry and Molecular Biology (2015), doi: 10.1016/j.ibmb.2015.07.017.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT

IntD4 StCYS1 154 D4-74 D4-95

V84 D4-22

I1 V2 I47 MANUSCRIPTF10 Q45 K76

D4- D4- 146 154

ACCEPTED ACCEPTED MANUSCRIPT

1 Positive selection of digestive Cys proteases in herbivorous Coleoptera

2

3 Juan Vorster a,b†, Asieh Rasoolizadeh a†, Marie-Claire Goulet a, Conrad Cloutier c, Frank Sainsbury a,d,

4 Dominique Michaud a*

5

6 a Département de phytologie, CRIV–Biotechnologie, Université Laval, Québec QC, Canada G1V 0A6.

7 b Department of Plant and Soil Science, Forestry and Agricultural Biotechnology Institute, University of

8 Pretoria, Pretoria, South Africa.

9 c Département de biologie, Université Laval, Québec QC, Canada G1V 0A6.

10 d The University of Queensland, Australian Institute for Bioengineering and Nanotechnology, Centre for

11 Biomolecular Engineering, St. Lucia, Queensland 4072, Australia.

12 13 † Equal contributors to this paper. MANUSCRIPT 14 * Corresponding author. Département de phytologie, Université Laval, Pavillon Envirotron, 2480 boul.

15 Hochelaga, Québec QC, Canada G1V 0A6. Tel.: +1 418 656 2131, ext. 5076.

16 E-mail address: [email protected] (D. Michaud).

17

18 Abbreviations: Acn, acetonitrile; Cys, cysteine; Int, intestain; Ldp30, 30-kDa cystatin-sensitive Cys

19 protease of Colorado potato beetle ( Leptinotarsa decemlineata ); MCA, methylcoumarin; RCR, relative

20 leaf consumption rate; RGR, relative growth rate; Ser, serine; SlCYS8, eighth domain of tomato

21 (Solanum lycopersicumACCEPTED) multicystatin; StCYS1, first domain of potato ( Solanum tuberosum )

22 multicystatin.

23

ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 ABSTRACT

2 Positive selection is thought to contribute to the functional diversification of insect-inducible protease

3 inhibitors in plants in response to selective pressures exerted by the digestive proteases of their

4 herbivorous enemies. Here we assessed whether a reciprocal evolutionary process takes place on the

5 insect side, and whether ingestion of a positively selected plant inhibitor may translate into a

6 measurable rebalancing of midgut proteases in vivo . Midgut Cys proteases of herbivorous Coleoptera,

7 including the major pest Colorado potato beetle ( Leptinotarsa decemlineata ), were first compared using

8 a codon-based evolutionary model to look for the occurrence of hypervariable, positively selected amino

9 acid sites among the tested sequences. Hypervariable sites were found, distributed within –or close to–

10 amino acid regions interacting with Cys-type inhibitors of the plant cystatin family. A close

11 examination of L. decemlineata sequences indicated a link between their assignment to protease

12 functional families and amino acid identity at positively selected sites. A function-diversifying role for 13 positive selection was further suggested empirically by MANUSCRIPTin vitro protease assays and a shotgun proteomic 14 analysis of L. decemlineata Cys proteases showing a differential rebalancing of protease functional family

15 complements in larvae fed single variants of a model cystatin mutated at positively selected amino acid

16 sites. These data confirm overall the occurrence of hypervariable, positively selected amino acid sites in

17 herbivorous Coleoptera digestive Cys proteases. They also support the idea of an adaptive role for

18 positive selection, useful to generate functionally diverse proteases in insect ingesting

19 functionally diverse, rapidly evolving dietary cystatins.

20 21 Keywords: PositiveACCEPTED selection, Herbivorous Coleoptera, Digestive cysteine proteases, Plant cystatins, 22 Plant–insect interactions, Coevolutionary arms race, Leptinotarsa decemlineata Say

Page 2 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 1. Introduction

2 Protease::inhibitor interactions in plant–insect systems are the result of a long coevolutionary arms

3 race triggering the continuous diversification of digestive proteases and plant protease

4 inhibitors (Christeller, 2005; Lopes et al., 2004; Zhu-Salzman and Zeng, 2015). On the one side,

5 herbivorous insects have evolved a range of strategies to cope with dietary protease inhibitors, typically

6 involving the secretion of complex midgut protease complements, the overexpression of inhibitor-

7 sensitive proteases to outnumber the ingested inhibitors, the up-regulation of protease isoforms weakly

8 sensitive to inhibition, and degradation of the plant inhibitors with non-target proteases (Broadway,

9 2000; Zhu-Salzman and Zeng, 2015). On the other side, plants express a range of protease inhibitors

10 upon wounding or insect attack, encoded by gene families responsive to various developmental and

11 environmental stimuli (Ryan, 1990). Protease inhibitor diversity in plants is illustrated by the occurrence

12 of serine (Ser) protease inhibitor gene families in plant genomes (Barta et al., 2000; Kong and 13 Ranganathan, 2008; Li et al., 2011) matching the occurrenceMANUSCRIPT of trypsin and chymotrypsin gene families in 14 their lepidopteran insect predators (Srinivasan et al., 2006). Similarly, plants express an array of stress-

15 inducible cysteine (Cys) protease inhibitors, the so-called cystatins (Benchabane et al., 2010), matching

16 complex sets of digestive Cys proteases in herbivorous Coleoptera (Gruden et al., 2004; Sainsbury et al.,

17 2012a; Tribolium Genome Sequencing Consortium, 2008).

18 Several evolutionary processes have shaped the organization of protease inhibitor complements in

19 biological systems, notably involving gene duplication followed by positive selection of non-synonymous

20 mutations at functionally relevant amino acid sites (Christeller, 2005). Protease inhibitor gene families 21 derived from multipleACCEPTED gene duplications is a common feature of plant genomes, as exemplified by the 22 proteinase inhibitor II family of Solanaceae (Barta et al., 2002; Kong and Ranganathan, 2008; Mishra et

23 al., 2012), the mustard trypsin inhibitor family of Cruciferae (Clauss and Mitchell-Olds, 2004), the Kunitz

24 trypsin inhibitors of Populus and Solanum species (Neiman et al., 2009; Speranskaya et al., 2012) or the

Page 3 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 multigene cystatin complex of higher plants (Girard et al., 2007; Martinez et al., 2005; Massonneau et al.,

2 2005; Tan et al., 2014). Inhibitor variants within these families often show sequence hypervariability at

3 specific amino acid positions, presumably indicative of positive selection and functional diversification

4 towards herbivore digestive proteases (Ingvarsson, 2005; Kiggundu et al., 2006; Kong and

5 Ranganathan, 2008; Li et al., 2011; Neiman et al., 2009; Talyzina and Ingvarsson, 2006).

6 An example of this is potato multicystatin, an eight-domain cystatin known to protect endogenous

7 storage in potato tubers (Green et al., 2013; Weeda et al., 2009) which is also upregulated in

8 leaves upon insect attack (Bouchard et al., 2003). The eight domains of this protein likely were the result

9 of multiple gene duplications in close Solanum ancestor(s) (Benchabane et al., 2010). Their primary

10 sequence includes hypervariable, rapidly evolving amino acid sites giving the protein a range of

11 inhibitory specificities towards plant and insect Cys proteases (Goulet et al., 2008; Kiggundu et al., 2006).

12 Here we assessed whether a similar evolutionary process involving positive selection is taking place on 13 the insect side, using digestive Cys proteases of herbivorous MANUSCRIPT Coleoptera as a model. We also conducted 14 enzymology and functional proteomics work on midgut Cys proteases of the coleopteran herbivore

15 Colorado potato beetle ( Leptinotarsa decemlineata ) to establish a link between the eventual positive

16 selection of Cys proteases in coleopteran herbivores and the response of these insects to the functionally

17 diverse, rapidly evolving cystatins of their plant hosts.

18

19 2. Methods 20 2.1 Sequence variabilityACCEPTED inferences 21 Hyperviariable, positively selected amino acids were searched for in the coding sequences of 25

22 midgut Cys protease genes from the coleopteran herbivores cowpea weevil ( Callosobruchus maculatus ),

23 Western corn rootworm ( Diabrotica virgifera virgifera ) and L. decemlineata (Table S1, Fig. S1) using the

Page 4 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 codon-based mechanistic-empirical combination (MEC) evolutionary model of Doron-Faigenboim and

2 Pupko (2006). Ka/Ks ratios –or ω values– were calculated online using the Selecton server for the

3 identification of codon site-specific positive selection and purifying selection, v. 2.4

4 (http://selecton.tau.ac.il) (Stern et al., 2007), after uploading the codon-aligned sequences of (Fig. S2),

5 and an unrooted phylogenetic tree for (Fig. S3), the 25 proteases. A codon sequence alignment was first

6 produced using the Multiple Sequence Comparison by Log-Expectation (MUSCLE) tool (Edgar, 2004), and

7 then edited manually based on the protein alignment (Fig. S1) to eliminate indel mismatches between

8 the two sequence datasets. An unrooted phylogenetic tree was inferred from the resulting sequences by

9 the neighbor-joining distance method of Saitou and Nei (1987), after generating a sequence similarity

10 matrix using Kimura’s two-parameter model (Kimura, 2003). Because no null model is nested in the MEC

11 model, an Akaike’s Information Criteria (AIC c) score calculated with this model was compared with the

12 AIC c score obtained under the M8a null model of Swanson et al. (2003), which assumes no positive

13 selection in the tested sequences. The MEC model represents a better fit to the data when the calculated MANUSCRIPT 14 AIC c score is lower than the AIC c score calculated for the M8a model. In such a case, a confidence interval

15 between the 5 th and 95 th percentiles is estimated for amino acid positions showing a ω value greater

16 than 1. When the lower bound of the confidence interval is greater than 1, the inference of positive

17 selection at this position is considered statistically significant.

18

19 2.2 Intestain::cystatin interactions

20 Cys protease::cystatin docking simulations were run using as an example the midgut Cys protease of

21 L. decemlineata IntestainACCEPTED D4 (IntD4) (GenBank Accession No. EF154436) (Sainsbury et al., 2012a) and the

22 first inhibitory domain of potato multicystatin StCYS1 (GenBank L16450) (Kiggundu et al., 2006), after

23 building structure homology models for the two proteins. Twenty tentative models were built for each

Page 5 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 protein using Modeller, v. 9.7 (Eswar et al., 2006), with the crystal structure of human cathepsin L (PDB

2 1SC8; http://www.wwpdb.org/) as a template for IntD4 and the NMR structure of oryzacystatin I (PDB

3 1EQK) as a template for StCYS1. Stereochemical quality of the models was assessed by comparison to the

4 template structures using the MetaMQAPII algorithm (https://genesilico.pl/toolkit/unimod?method=

5 MetaMQAPII), and the best models were selected for further analysis. Intestain::cystatin interactions

6 were simulated using the Z-Dock algorithm of Discovery Studio, v. 3.0 (Accelrys Software), to generate

7 2,000 tentative poses of the resulting complex (Chen et al., 2003). Poses with the highest Z-scores were

8 compared to the solved crystal structure of human cathepsin L interacting with human stefin A (PDB

9 3KSE) to validate the relative binding position and orientation of the two proteins in the predicted

10 complex. Five complexes were chosen and refined through energy minimization using the R-Dock

11 algorithm (Li et al., 2003). The top ranking model was selected for complex visualization and inference of

12 protease and cystatin interacting residues at the binding interface (Sainsbury et al., 2012a). 13 MANUSCRIPT 14 2.3 Recombinant cystatins

15 Wild-type tomato cystatin SlCYS8 and two single variants, P2I and T6R, exhibiting different inhibitory

16 spectra against insect and plant Cys proteases were expressed in –and purified from– BL21 E. coli cells

17 using the glutathione S-transferase gene fusion system (GE Healthcare) as described earlier (Goulet et al.,

18 2008). Biotinylated versions of the cystatins were also produced, which included a 17-amino acid ‘AviTag’

19 peptide (Beckett et al., 1999) at the C-terminus for subsequent immobilization on avidin-embedded resin

20 (see 2.7 Mass spectrometry , below). AviTagged wild-type SlCYS8 was produced using a recently devised

21 GoldenGate cloningACCEPTED construct (Sainsbury et al., 2012a). AviTag constructs for P2I and T6R were devised

22 by site-directed mutagenesis of the AviTagged SlCYS8 template, using the Quickchange mutagenesis kit

23 (Stratagene) and appropriate DNA primers to drive the selected mutations (Table S2). The AviTagged

Page 6 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 cystatins were biotinylated in vivo by expression in AVB101 E. coli cells coexpressing a biotin ligase

2 (Avidity LLC), with 50 µM D-biotin added in the growth medium.

3

4 2.4 Insect feeding assay

5 An insect feeding assay was carried out to assess the impact of wild-type SlCYS8, P2I and T6R on L.

6 decemlineata performance, and to obtain midgut protein (protease) samples from cystatin-fed insects

7 for physiological compensation analyses. Synchronized 4 th instars were used for the assay, derived from

8 a laboratory colony collected on potato plants at Laval U experimental station near Québec City QC,

9 Canada. The upper face of the 5 th and 6 th leaves (down from the apex) of greenhouse-grown, 30-cm

10 potato plants cv. Kennebec was painted with 1 ml of 6% (w/v) gelatin in water containing 1 mg of

11 purified cystatin (to give ~0.2 mg inhibitor/cm 2); or with 1 ml of 6% (w/v) gelatin in water (no-inhibitor

12 control). The experimental setup was kept at 20˚C and 65% relative humidity in a PGw36 growth MANUSCRIPT 13 chamber (Conviron), under a 16 h:8 h day/night photoperiod. Three plants were distributed randomly in 14 the growth chamber for each diet (inhibitor) treatment, and two larvae were assigned to each plant, i.e.

15 one on the 5 th leaf and one on the 6 th leaf. Individual larvae (n=6) and leaves (n=6) were monitored after

16 24 h (1 d) and 72 h (3 d) to estimate larval relative growth rates (RGR) and relative consumption rates

17 (RCR) (Cloutier et al., 1999), based on the following equations:

-1 18 RGR (d ) = [W f – W0] / [W 0 x time];

2 -1 -1 19 and RCR (mm .mg .d ) = Leaf area consumed / [W 0 x time]

20 where W f is theACCEPTED weight of individual larvae after one or three days (mg), W 0 the mean fresh weight 21 of each group of six larvae at time 0 (mg), and time the number of days after starting the assay.

22 Individual larvae were weighed using an MT5 microbalance (Mettler Toledo). Foliage consumption

23 corresponded to total leaf surface eaten (mm 2), as estimated using a 1-cm 2 paper disc guide (Brunelle et

Page 7 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 al., 2004). Whole larvae were collected and dissected at the end of the 3-d assay, and their midgut

2 frozen at –80˚C until use for enzymology and proteomic analyses.

3

4 2.5 Insect midgut proteins

5 Midgut proteins for protease assays and proteomic profiling were extracted in 100 mM citrate

6 phosphate extraction buffer, pH 6.0, containing 10% (v/v) ethylene glycol. Snap-frozen midguts were first

7 ground to a fine powder in liquid nitrogen, and then kept for 10 min on ice in three volumes (w/v) of

8 extraction buffer. Protein mixtures were clarified by centrifugation at 15,000 g for 10 min at 4˚C, and the

9 supernatant used as source material for further analysis. Protein content in the extracts was adjusted to

10 2 µg/µl by the addition of extraction buffer, after assaying soluble proteins according to Bradford (1976)

11 with bovine serum albumin (Sigma-Aldrich) as a protein standard. 12 MANUSCRIPT 13 2.6 Protease assays

14 Protease activities were determined by the monitoring of fluorigenic peptide hydrolysis progress

15 curves as described earlier (Goulet et al., 2008). Cathepsin B-like activity was measured at pH 6.5 in 100

16 mM sodium phosphate containing 10 mM L-cysteine, with Z–Arg-Arg–methylcoumarin (MCA) (Sigma-

17 Aldrich) as a substrate. Cathepsin L-like activity was measured in the same buffer with Z–Phe-Arg–MCA

18 (Sigma-Aldrich) as a substrate. Cathepsin D-like activity was measured at pH 3.0 in 50 mM citrate

19 phosphate with the substrate MOCAc-Gly–Lys–Pro–Ile–Leu–Phe–Phe–Arg–Leu–Lys(Dnp)-D-Arg-NH 2 20 (Peptides International).ACCEPTED Trypsin-like activity was measured in 100 mM sodium phosphate buffer, pH 7.5, 21 with the substrate Z–Arg–MCA (Sigma-Aldrich). Fluorescence was detected using a Fluostar Galaxy

22 reader (BMG Labtech), with excitation and emission filters of 360 and 450 nm, respectively, for the MCA

23 substrates; or of 340 and 400 nm, respectively, for the MOCAc substrate.

Page 8 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 2.7 Mass spectrometry

2 L. decemlineata midgut Cys proteases, the so-called ‘intestains’ (Gruden et al., 2003), were

3 characterized by liquid chromatography (LC)–tandem mass spectrometry (MS/MS) analysis of the

4 corresponding protein bands in Coomassie blue-stained polyacrylamide slab gels. For total intestains in

5 midgut extracts of cystatin-fed larvae, 30 µg of midgut protein was resolved by 12% (w/v) SDS-PAGE, and

6 gel slices encompassing protein bands in the ~25–32-kDa interval were carefully excised for further

7 analysis. For cystatin-sensitive intestains of potato-fed insects, target enzymes captured with the

8 AviTagged cystatins were resolved and excised the same way prior to MS/MS analysis. Intestain capture

9 was carried out as recently described, after binding biotinylated wild-type SlCYS8, P2I or T6R to TetraLink

10 avidin resin (Promega) (Sainsbury et al., 2012a). The gel slices were destained, reduced in 10 mM

11 dithiothreitol, alkylated in 55 mM iodoacetamide, and hydrolyzed for 1 h at 58˚C with 125 nM

12 TrypsinGold (Promega) using the MassPrep Workstation (Waters-Micromass) robot (Havlis et al., 2003). 13 Peptides in the gel matrix were extracted in 2% (v/v) MANUSCRIPTacetonitrile (Acn):1% (v/v) formic acid, and then 14 washed several times in 50% (v/v) Acn:1.0% (v/v) formic acid. The extracts were pooled, vacuum

15 centrifuged and resuspended in 7 µl of 0.1% (v/v) formic acid, from which 2 µl was taken for LC-MS/MS

16 analysis. The peptides were resolved by online reversed-phase nanoscale capillary LC, and then

17 submitted to electrospray MS. A Thermo Surveyor MS pump was used, connected to an LTQ linear ion

18 trap mass spectrometer equipped with a nanoelectrospray ion source (ThermoFisher). Peptide

19 separation was performed on a PicoFrit column (NewObjective) packed with Jupiter 5µ C18 300 A bulk

20 packing (Phenomenex), at 200 nl/min [obtained by flow splitting] over 30 min along a linear gradient 21 going from 2 to ACCEPTED 50% (v/v) Acn:0.1% (v/v) formic acid. MS/MS data were acquired under a data- 22 dependent acquisition mode, using the Xcalibur software, v. 2.0 (Thermo Scientific). The seven most

23 intense ions in the 400–2,000 m/z range were selected for collisional induced fragmentation, with the

Page 9 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 dynamic exclusion function enabled, an exclusion duration of 30 s, and the relative collisional

2 fragmentation energy set at 35.

3

4 2.8 Protein identification

5 MS peaklists generated from the Xcalibur raw data using Mascot Distiller, v. 2.3 (Matrix Science)

6 were analyzed with Mascot 2.3.02 (Matrix Science) (Perkins et al., 1999) set up to search the Uniref100

7 database (http://uniprot.org/uniref/) (Suzek et al., 2007). Search parameters for protein matching were

8 as follows: fragment ion mass tolerance of 0.5 Da; parent ion tolerance of 2.0 Da; iodoacetamide

9 derivatives Cys residues as fixed modification; oxidized Met residues as variable modification; and a

10 maximum of two missed trypsin cleavages allowed. MS/MS based peptide and protein identifications

11 were validated with Scaffold, v. 3.6.1 (Proteome Software). Peptide identifications were accepted if they

12 could be established at greater than 95% probability as specified by the Peptide Prophet algorithm MANUSCRIPT 13 (Keller et al., 2002). Protein identifications were accepted if they included at least two identified unique 14 peptides and could be established at greater than 95% probability using the Protein Prophet algorithm

15 (Nesvizhskii et al., 2003). Proteins that contained similar peptides and could not be differentiated based

16 on MS/MS spectra were grouped to satisfy the principle of parsimony.

17

18 2.9 Spectral count analyses

19 A quantitative analysis of MS spectra was done using spectral count sampling statistics (Zhang et al., 20 2006), on those countsACCEPTED corresponding to peptides that were specific to an intestain functional family 21 (Sainsbury et al., 2012a). Spectra obtained from the individual bands were combined for each repetition,

22 and only those belonging to peptides specific to an intestain family were included in the quantitative

23 analysis. Spectral counts were normalized across repetitions, for which the coefficient of variance was

Page 10 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 found to be less than 20%. Differential abundances of captured intestains for each SlCYS8 variant were

2 discriminated statistically with a significance threshold of 5%, taking into account spectral count mean

3 values greater than 4 for at least one treatment (Old et al., 2005).

4

5 3. Results

6 3.1 The digestive Cys proteases of herbivorous Coleoptera are positively selected

7 Sequence hypervariability was searched for among the 25 DNA coding sequences of herbivorous

8 Coleoptera digestive Cys proteases (Fig. S2), including 14 L. decemlineata intestains (Gruden et al., 2004),

9 to look for the occurrence of positively selected amino acids in the protease isoforms. Calculations were

10 made using the MEC model of Doron-Faigenboim and Pupko (2006), which allows for the identification

11 of hypervariable, positively selected codons in the tested sequences. This model integrates empirical

12 amino acid replacement probabilities with theoretical assumptions on codon frequencies and gene

selection driving forces to calculate K /K ratios (or MANUSCRIPTω values), where K refers to nonsynonymous 13 a s a

14 substitution rates, and Ks to synonymous (silent) substitution rates. It is generally assumed that ω values

15 lower than 1 indicate negative [or purifying] selection pointing to an essential structural or functional

16 role for the amino acid. It is assumed by contrast that ω values greater than 1 indicate positive selection

17 likely associated with functional diversification. An unrooted phylogenetic tree was first generated for

18 the 25 proteases (Fig. S3), and then used to calculate ω values for 200 codons found in all gene

19 sequences (Fig. 1). A likelihood ratio test was performed to compare the MEC model with the M8a null

20 model in fitting the sequence data. AIC c scores of 8179.7 and 8280.2 were calculated for the two models,

21 respectively, indicatingACCEPTED a better fit to the data with the MEC model and confirming the occurrence of

22 positively selected amino acid sites in the tested sequences. ω values lower than 1 at p<0.05 were

23 calculated for 43 codons (Fig. 1), several of them matching the conserved amino acid sites involved in

24 catalytic activity (see Sainsbury et al., 2012a). Five codons –i.e. codons 22, 74, 95, 146 and 154 based on

Page 11 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 mature intestain D4 [IntD4] numbering– exhibited ω values greater than 1 at p<0.05, indicative of

2 hypervariable amino acid sites possibly associated with functional diversity towards protein substrates

3 and/or pseudosubstrate inhibitors. Tests performed with other tree topologies, such as star trees (Yang

4 et al., 2000) or trees inferred by the Fitch–Margoliash method (Felsenstein, 1997), led to similar

5 conclusions, with minor differences resulting from varying resolution power in the detection of positive

6 selection or the identification of positively selected sites (not shown).

7

8 3.2 A reciprocal positive selection pathway for plant cystatins and Coleoptera Cys proteases?

9 A closer look was taken at the L. decemlineata intestain sequences to establish a possible link

10 between amino acid motifs at the active site, susceptibility to cystatin inhibition and amino acid identity

11 at positively selected sites. Thirty-six intestain sequences are found in the NCBI database

12 (http://ncbi.nlm.nih.gov/), categorized into six functional families –the IntA, IntB, IntC, IntD, IntE and IntF 13 families– based on primary sequence similarity, conse MANUSCRIPTrved motifs in functionally relevant regions, and 14 differential susceptibility to competitive pseudosubstrate inhibitors (Sainsbury et al., 2012a). As inferred

15 from the sequence alignments, amino acid variation at positively selected sites ranged from 17 to 35%

16 among the 36 intestains, implicating five to eight different residues depending on the position (Table 1).

17 Amino acid preferences at these sites matched the six intestain functional families, albeit not excluding

18 some interfamily overlap at specific codon sites. For instance, a glutamate (Glu) or a lysine (Lys) residue

19 was found at position 74 in intestains A, while an alanine (Ala) was found in intestatins D and F, a

20 phenylalanine (Phe) in intestains B, an isoleucine (Ile) in intestains C, and an asparagine (Asn) in

21 intestains E. Likewise,ACCEPTED a glutamine (Gln) was found at position 154 in intestains A and C, a Phe in

22 intestains B, an arginine (Arg) in intestains E, and a glycine (Gly) in intestains D and F.

23 A function-diversifying role for positively selected amino acids was further suggested by

24 intestatin::cystatin docking simulations using IntD4 (Sainsbury et al., 2012a) interacting with potato

Page 12 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 cystatin StCYS1 (Kiggundu et al, 2006) as an example. Structure homology models were first produced for

2 the two proteins using the solved structures of human cathepsin L (Ljunggren et al., 2007) and

3 oryzacystatin I (Nagata et al., 2000) as templates for IntD4 and StCYS1, respectively. IntD4::StCYS1

4 interactions were then simulated using the Z-Dock algorithm of Chen et al. (2003) to produce a working

5 model for the resulting complex, by homology to the solved structure of cathepsin L in complex with

6 human stefin A (Protein Data Bank [PDB] No. 3KSE) (Fig. 2). As expected given the competitive inhibitory

7 mode of action of plant cystatins (Benchabane et al., 2010), IntD4 amino acid strings predicted to

8 interact with StCYS1 (Fig. 2A) closely matched the protein substrate binding sites identified earlier for

9 this enzyme (see Sainsbury et al., 2012a). The five residues under strong positive selection –e.g. residues

10 G154 [D4-154] and Y146 [D4-146] on Fig. 2C– were located in the same regions, within or close to the

11 sequence motifs interacting with StCYS1 (Fig. 2B,C). This distribution pattern paralleled the distribution

12 pattern of positively selected residues in plant cystatins, also located within, or close to, the inhibitory

13 motifs in contact with target proteases (Fig. 2A) (Kiggundu et al., 2006). MANUSCRIPT 14

15 3.3 Altered digestive protease profiles in L. decemlineata fed a positively selected cystatin

16 An insect diet assay was performed with single variants of an insect-inducible plant cystatin

17 presenting alternative residues at positively selected amino acid sites to test for the possible onset of

18 differential, intestain family-specific compensatory responses to a rapidly evolving cystatin in L.

19 decemlineata (Table 2). We previously engineered single mutants of tomato cystatin SlCYS8 exhibiting

20 differential affinities for the intestains of 4th instars (Goulet et al., 2008). P2I, a single variant of SlCYS8

21 bearing an isoleucineACCEPTED in place of a proline (Pro; P) at positively selected codon 2 (or amino acid site 2),

22 was shown for instance to efficiently inhibit Z–Phe–Arg–MCA- and Z–Arg–Arg–MCA-hydrolyzing

23 intestains compared to the wild-type cystatin. T6R, a single variant bearing an arginine in place of a

24 threonine (Thr; T) at positively selected codon 6, was shown also to exhibit an altered inhibitory profile

Page 13 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 against the intestains, notably presenting a broad inhibitory effect against Z–Arg–Arg–MCA-hydrolyzing

2 intestains compared to SlCYS8 and P2I. L. decemlineata larvae were fed wild-type SlCYS8, P2I or T6R to

3 determine whether the ingestion of cystatin variants presenting alternative residues at positively

4 selected sites would translate into a measurable rebalancing of digestive protease activities, including

5 intestain activities, in the midgut lumen. Potato leaves coated with either inhibitors or with a no-

6 inhibitor gelatin control were provided to 4th instars over 72 h, a period of time sufficient for L.

7 decemlineata larvae to adjust their midgut protease complement to dietary protease inhibitors (Brunelle

8 et al., 2005). Short-term effects on growth rate or foliage intake were observed for some cystatin

9 variants after 24 h (ANOVA ; p<0.05) but the effects were less pronounced after 72 h (p>0.05) (Table 2), in

10 accordance with the reported ability of this insect to readily overcome the primary inhibitory effect of

11 native or recombinant protease inhibitors (Brunelle et al., 2004; Cloutier et al., 1999, 2000; Gruden et al.,

12 1998). 13 Protease assays were conducted with fluorigenic MANUSCRIPT peptide substrates to monitor compensatory 14 adjustments at the protease class level, and to determine whether ingestion of the cystatin functional

15 variants had a differential impact on overall Cys protease (intestain) activities in the midgut (Fig. 3). Cys

16 (cathepsin L- and cathepsin B-like), Asp (cathepsin D-like) and Ser (trypsin-like) protease activities were

17 comparable in control and wild-type SlCYS8-fed larvae after 72 h (post-ANOVA Tukey’s HSD test; p>0.05),

18 suggesting a compensatory response to SlCYS8 restoring basic protease activities and leaf consumption

19 rate (see Table 2). By comparison, cathepsin D activity in midgut extracts of T6R-fed insects was about

20 twice the activity measured in control extracts (p<0.05), indicating an Asp protease-driven compensatory 21 process as reportedACCEPTED previously for the Coleoptera C. maculatus fed soyacystatin N (Zhu-Salzman et al., 22 2003). Similar to T6R, P2I triggered a significant increase of cathepsin D activity in the midgut (p<0.05).

23 Trypsin, cathepsin L and cathepsin B activities were also increased in P2I-fed larvae (p<0.05), indicative of

Page 14 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 a broad compensatory response upon P2I ingestion involving the expression of non-Cys (Ser and Asp)

2 proteases and the overexpression of Cys (intestain) proteases.

3

4 3.4 Differential intestain profiles in L. decemlineata fed functional variants of SlCYS8

5 A shotgun proteomic analysis was conducted to compare intestain profiles in control and cystatin-

6 fed larvae, and to determine whether increased Cys protease activities upon P2I ingestion was

7 associated with a general inducing effect on the intestains, or, instead, with the targeted induction of

8 specific intestain families (Fig. 4). Affinity chromatrography with rice cystatin I as a ligand was shown

9 previously to allow for the enrichment of a cystatin-sensitive Cys protease, Ldp30, detected as a 30-kDa

10 band on Coomassie blue-stained gels following SDS-PAGE (Visal-Shah et al., 2001). Ldp30 and two

11 smaller bands of 27 and 25 kDa were shown recently to actually include isoforms of the different

12 functional intestain families representing, together, the whole complement of midgut intestains MANUSCRIPT 13 (Sainsbury et al., 2012a). Here we digested the three intestain bands with trypsin and submitted the 14 resulting peptides to MS/MS analysis. The relative importance of each intestain family in midgut extracts

15 was determined by the counting of MS/MS peptide spectra, assuming a positive correlation between the

16 number of peptides specific to a given functional family and the abundance of this family in the midgut

17 lumen (Zhang et al., 2006). Sixty-three unique intestain peptides were detected following MS/MS that

18 could each be assigned to a single functional family (Table S3). Intestain members of the six families

19 were detected in control and cystatin-fed larvae after 72 h, with intestains B and D being the most

20 abundant regardless of the inhibitor provided (Fig. 4A). In agreement with the above-described protease

21 assays showing increasedACCEPTED cathepsin L and cathepsin B activities upon P2I ingestion, the total number of

22 intestain peptides detected in midgut extracts of P2I-fed larvae was about twice the numbers recorded

23 in control, wild-type SlCYS8-fed or T6R-fed larvae (ANOVA /Tukey’s HSD; P<0.05) (Fig. 4B). Intestains B, and

24 to a lesser extent intestains A and E, were found at higher levels in extracts of P2I-fed larvae, two to

Page 15 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 three times the amounts detected in control, SlCYS8-fed or T6R-fed insects ( P<0.05) (Fig. 5C). By

2 comparison, intestains C, D and F showed unaltered levels in cystatin-fed larvae compared to the

3 controls (P>0.05), thereby suggesting intestain family-specific compensatory responses to single variants

4 of positively selected SlCYS8 in the herbivore midgut.

5 Two common strategies for L. decemlineata and other Coleoptera to counteract the inhibitory

6 effects of dietary cystatins consist: (i) of overexpressing cystatin-sensitive Cys proteases to outnumber

7 the ingested inhibitors; and (ii) of expressing alternative Cys proteases weakly sensitive to these proteins

8 (Benchabane et al., 2010). We used a recently developed cystatin activity-based proteomics procedure

9 to compare the basic inhibitory profiles of wild-type SlCYS8, P2I and T6R towards intestains of potato-fed

10 insects, to establish whether the increased abundance of intestains B, for example in P2I-fed larvae, was

11 a way for the insect simply to outnumber the inhibitory cystatin molecules, or, instead, to sustain dietary

12 protein digestion using alternative Cys proteases insensitive to inhibition (Fig. 5). The procedure consists 13 of using biotinylated versions of the cystatin variant sMANUSCRIPT expressed in Escherichia coli and bound to an 14 avidin-linked matrix to capture target proteases in the midgut extracts (Sainsbury et al., 2012a). Intestain

15 functional families are then quantified by tandem MS following SDS-PAGE, Coomassie blue staining and

16 tryptic digestion as described above for the shotgun proteomic analysis. Twenty unique intestain

17 peptides were detected following MS/MS, that could each be assigned to a single functional family

18 (Table S4). As inferred from spectral counts, total numbers of intestain peptides captured with

19 biotinylated P2I in midgut extracts were higher than the corresponding numbers obtained with SlCYS8–

20 or T6R–biotin fusions (ANOVA /Tukey’s HSD; P<0.05) (Fig. 5A). As shown by a number of intestain B 21 peptides for P2I–biotinACCEPTED two to three times greater than for SlCYS8– or T6R–biotin (ANOVA /Tukey’s HSD; 22 P<0.05), intestains B accounted for most of the observed difference. Six different unique peptides were

23 captured with T6R compared to only three with wild-type SlCYS8, which suggested a slightly different

24 inhibitory spectrum for the two inhibitor variants despite a similar number of peptides retrieved overall

Page 16 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 (Fig. 5B). No intestains A, E or F were detected following MS/MS, likely due to very low peptide numbers

2 for these enzymes in source extracts ( see Fig. 4) and to the reported insensitivity of some intestains, such

3 as intestains A, to cystatin inhibition (Gruden et al., 2003; Sainsbury et al., 2012a).

4

5 4. Discussion

6 A number of studies have reported the occurrence of positively selected extracellular proteases in

7 insect , including two Ser proteases in the seminal fluid and two others in the immune system

8 of Drosophila (Jiggins and Kim, 2007; Wong et al., 2007); digestive trypsins in the

9 gambiae (Wu et al., 2009); venom cathepsins B in soldiers of the social aphid Tuberaphis

10 styraci (Kutsukake et al., 2004); and Cys cathepsins likely involved in phloem sap protein digestion in the

11 pea aphid Acyrthosiphon pisum (Rispe et al., 2008). Here we provide evidence for the positive selection 12 of herbivorous Coleoptera midgut Cys proteases, and MANUSCRIPTpresent a range of modeling and empirical data 13 supporting the idea of a coevolutionary pathway for these enzymes and their cystatin interacting 14 partners expressed in plant tissues upon herbivory. We previously established a link between the

15 occurrence of positively selected amino acid sites in plant cystatins and their inhibitory spectrum against

16 exogenous Cys proteases (Goulet et al., 2008; Kiggundu et al. 2006). We now propose that herbivorous

17 Coleoptera have acquired on their side the ability to readily, and specifically, adapt their digestive

18 protease complement to the rapidly evolving plant cystatins. As illustrated with L. decemlineata

19 intestains and functional variants of tomato cystatin SlCYS8 bearing alternative residues at positively

20 selected amino acid sites, these insects present a dual compensatory response to plant cystatins,

21 involving: (1) an overallACCEPTED adjustment or restoration of the midgut protease complement, specific to the

22 primary inhibitory effect of each cystatin variant in the midgut lumen; and (2) the expression of Cys

23 protease isoforms that exhibit alternative residues at positively selected amino acid sites (this study) and

Page 17 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 primary sequence heterogeneity in polypeptide motifs interacting with protein substrates and

2 pseudosubstrate inhibitors (Sainsbury et al., 2012a).

3 The occurrence of positively selected amino acid sites in the tested enzymes could be expected

4 given the large number of gene duplicates for Cys proteases in Coleoptera genomes (Gruden et al., 2004;

5 Sainsbury et al. 2012a; Tribolium Genome Sequencing Consotrium, 2008) and the role attributed to

6 protein isoform functional diversification in preventing loss of gene duplicates over evolutionary

7 timescales (Conant and Wolfe, 2008). Positively selected sites could also be expected from a

8 physiological standpoint, considering the generally accepted assumption that a coevolutionary arms race

9 is taking place at the molecular scale between insect midgut proteases and their plant inhibitors

10 (Lopes et al., 2004). The use of complex protease systems for protein digestion in insect herbivores, the

11 overexpression of inhibitor-sensitive proteases to outnumber the ingested inhibitors, and the up-

12 regulation of inhibitor-insensitive protease isoforms following inhibitor intake constitute together a 13 multicomponent counteracting strategy against protea MANUSCRIPTse inhibitors (Zhu-Salzman and Zeng, 2015) that 14 implicates, by definition, an array of possible protease isoforms. As exemplified here with tomato SlCYS8,

15 single amino acid substitutions in a positively selected cystatin are sufficient not only to influence the

16 basic inhibitory efficiency of this inhibitor against target proteases (Goulet et al., 2008) but also to trigger

17 specific adjustments of the herbivore midgut protease complement. Cystatin ingestion resulted, in the

18 present case, in a rapid restoration of the original protease activities in wild-type SlCYS8-fed insects; the

19 overexpression of Asp proteases and maintenance of the original intestain complement in insects fed the

20 T6R variant; or a broader compensatory response involving increased secretion of Asp proteases, Ser 21 proteases and CysACCEPTED protease isoforms sensitive (intestains B and E) or insensitive (intestains A) to the 22 ingested cystatin in P2I-fed insects.

23 Such a striking ability of L. decemlineata larvae to readily detect –and specifically respond to– the

24 ‘inhibitory signatures’ of plant cystatin variants likely explains unsuccessful attempts over the years to

Page 18 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 establish the potential of these proteins in coleopteran pest control (Cloutier et al., 1999, 2000; Koo et

2 al., 2008; Nogueira et al., 2012; Zhu-Salzman et al., 2003; Oppert et al., 2014; Zhu-Salzman and Zeng,

3 2015). It also supports the relevance of combining inhibitors active against different sets of proteases in

4 the pest midgut arsenal, in such a way as to broaden the range of susceptible proteases and minimize

5 compensation by insensitive proteases following inhibitor intake (Oppert et al., 2005; Sainsbury et al.,

6 2012b). Transgene stacking approaches for the simultaneous expression of two or more inhibitors have

7 resulted in effective insect resistance in different plants, hardly achievable using the same inhibitors

8 expressed alone (reviewed in Sainsbury et al., 2012b). Likewise, fusion proteins integrating coding

9 sequences of Asp, Ser and/or Cys protease inhibitors in a single hybrid protein showed improved

10 potential against target herbivores compared to the inhibitors alone (Sainsbury et al., 2012b). An

11 example of this was provided with L. decemlineata larvae suffering important growth delays when fed a

12 translational fusion protein made of corn cystatin II and tomato cathepsin D inhibitor, a broad-spectrum

13 inhibitor of Asp and Ser proteases (Brunelle et al., 2005). MANUSCRIPT 14 On the other hand, broadening the inhibitory range of protease inhibitors to different protease

15 classes raises concerns of specificity and compatibility in an environmental context where extracellular

16 proteolysis is operating at most, if not all, trophic levels (Schlüter et al., 2010). Ser protease inhibitor

17 expression in edible crops may also raise concerns on food safety and quality given the antidigestive and

18 potentially allergenic effects of these proteins in humans (Gilani et al, 2005). By comparison, no Cys

19 proteases are found in the human gut and the innocuity of recombinant cystatins in transgenic crops has

20 been confirmed (Atkinson et al., 2004). In this context, an ecologically viable way of using protease 21 inhibitors in insectACCEPTED pest control might be to identify candidate inhibitors, such as cystatins and other 22 inhibitors of Cys proteases, that allow together for an adequate plant protective effect via the effective

23 inhibition of a minimal, but sufficient set of target proteases in the pest midgut (Oppert et al., 2014).

24 Important lethal effects on L. decemlineata larvae were reported earlier for the low-molecular-weight

Page 19 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 inhibitor of Cys proteases trans -epoxysuccinyl-L-leucylamido-(4-guanidino) butane (E-64) (Bolter and

2 Latoszek-Green, 1997), which unlike cystatins can inhibit the whole complement of midgut intestains

3 (Goulet et al., 2008). Work is underway to determine whether E-64-like detrimental effects could be

4 observed against the same insect by combining plant cystatin variants with the newly described

5 mushroom macrocypins (Sabotic et al., 2009), which were shown recently to inhibit the cystatin-

6 insensitive intestains A (Smid et al., 2013) and hence to represent a promising complement to plant

7 cystatins for reproducing the broad inhibitory effect of E-64.

8

9 Acknowledgements

10 We thank Ann-Julie Rhéaume for helpful advice on proteomic data analysis. This work was

11 supported by Discovery and Discovery Acceleration Supplement grants from the Natural Science and

12 Engineering Research Council of Canada to DM. MANUSCRIPT 13

14 Appendix A. Supplementary data

15 Supplementary data related to this article can be cound at http://dx.doi.org/10.1016/j.ibmb.------.

16

17 References

18 Atkinson, H.J., Johnston, K.A., Robbins, M., 2004. Prima facie evidence that a phytocystatin for transgenic 19 plant resistanceACCEPTED to is not a toxic risk in the human diet. J. Nutr. 134, 431–434. 20 Barta, E., Pintar, A., Pongor, S., 2002. Repeats with variations: accelerated of the Pin2 family of

21 proteinase inhibitors. Trends Genet. 18, 600–603.

Page 20 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Beckett, D., Kovaleva, E., Schatz, P.J., 1999. A minimal peptide substrate in biotin holoenzyme

2 synthetase-catalyzed biotinylation. Prot. Sci. 8, 921–929.

3 Benchabane, M., Schlüter, U., Vorster, J., Goulet, M.-C., Michaud, D., 2010. Plant cystatins. Biochimie 92,

4 1657–1666.

5 Bolter, C.J., Latoszek-Green, M., 1997. Effect of chronic ingestion of the cysteine proteinase inhibitor, E-

6 64, on Colorado potato beetle gut proteinases. Entomol. Exp. Appl. 83, 295–303.

7 Bouchard, É., Cloutier, C., Michaud, D., 2003. Oryzacystatin I expressed in transgenic potato induces

8 digestive compensation in an insect natural predator via its herbivorous prey feeding on the plant.

9 Mol. Ecol. 12, 2439–2446.

10 Bradford, M.M., 1976. A rapid and sensitive method for the quantitation of microgram quantities of

11 protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248–254.

12 Broadway, R.M., 2000. The of insects to protease inhibitors, in: Michaud, D. (Ed.), 13 Recombinant protease inhibitors in plants. Landes BioMANUSCRIPTscience, Georgetown, Texas, pp. 80–88. 14 Brunelle, F., Girard, C., Cloutier, C., Michaud, D., 2005. A hybrid, broad-spectrum inhibitor of Colorado

15 potato beetle aspartate and cysteine proteinases. Arch. Insect Physiol. Biochem. 60, 20–31.

16 Brunelle, F., Cloutier, C., Michaud, D., 2004. Colorado potato beetles compensate for tomato cathepsin D

17 inhibitor expressed in transgenic potato. Arch. Insect Physiol. Biochem. 55, 103–113.

18 Chen, R., Li, L., Weng, Z., 2003. ZDOCK: an initial-stage protein-docking algorithm. Proteins 52, 80–87.

19 Christeller, J.T., 2005. Evolutionary mechanisms acting on proteinase inhibitor variability. FEBS J. 272,

20 5710–5722. 21 Clauss, M.J., Mitchell-Olds,ACCEPTED T., 2004. Functional divergence in tandemly duplicated Arabidopsis thaliana 22 trypsin inhibitor genes. Genetics 166, 1419–1436.

23 Cloutier, C., Jean, C., Fournier, M., Yelle, S., Michaud, D., 2000. Adult Colorado potato beetles,

24 Leptinotarsa decemlineata compensate for nutritional stress on oryzacystatin I transgenic potato

Page 21 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 plants by hypertrophic behavior and over-production of insensitive proteases. Arch. Insect Physiol.

2 Biochem. 44, 69–81.

3 Cloutier, C., Fournier, M., Jean, C., Yelle, S., Michaud, D., 1999. Growth compensation and faster

4 development of Colorado potato beetle (Coleoptera: Chrysomelidae) feeding on potato foliage

5 expressing oryzacystatin I. Arch. Insect Biochem. Physiol. 40, 69–79.

6 Conant, G. C. & Wolfe, K. H., 2008. Turning a hobby into a job: How duplicated genes find new functions.

7 Nat. Rev. Genet. 9, 938–950.

8 Doron-Faigenboim, A., Pupko, T., 2006. A combined empirical and mechanistic codon model. Mol. Biol.

9 Evol. 24, 388–397.

10 Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high-throughput. Nucl.

11 Acids Res. 32, 1792–1797.

12 Eswar, N., Marti-Renom, M.A., Webb, B., Madhusudhan, M.S., Eramian, D., Shen, M., Pieper, U., Sali, A., 13 2006. Comparative protein structure modeling MANUSCRIPT with MODELLER. In: Current Protocols in 14 Bioinformatics, Supplement 15. John Wiley & Sons, New York, pp. 5.6.1–5.6.30.

15 Felsenstein, J., 1997. An alternative least-squares approach to inferring phylogenies from pairwise

16 distances. Syst. Biol. 46, 101–111.

17 Gilani, G.S., Cockell, K.A., Sepher, E., 2005. Effects of antinutritional factors on protein digestibility and

18 amino acid availability in foods. J. AOAC Int. 88, 967–987.

19 Girard, C., Rivard, D., Kiggundu, A., Kunert, K., Gleddie, S.C., Cloutier, C., Michaud, D., 2007. A

20 multicomponent, elicitor-inducible cystatin complex in tomato, Solanum lycopersicum . New Phytol. 21 173, 841–851. ACCEPTED 22 Goulet, M.-C., Dallaire, C., Vaillancourt, L.-P., Khalf, M., Badri, A. M., Preradov, A., Duceppe, M.-O.,

23 Goulet, C., Cloutier, C., Michaud, D., 2008. Tailoring the specificity of a plant cystatin toward

Page 22 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 herbivorous insect digestive cysteine proteases by single mutations at positively selected amino acid

2 sites. Plant Physiol. 146, 1010–1019.

3 Green, A.R., Nissen, M.S., Kumar, G.N.M., Knowles, N.R., Kang, C., 2013. Characterization of Solanum

4 tuberosum multicystatin and the significance of core domains. Plant Cell 25, 5043–5052.

5 Gruden, K., Kuipers, A.G.J., Guncar, G., Slapar, N., Strukelj, B., Jongsma, M.A., 2004. Molecular basis of

6 Colorado potato beetle adaptation to potato plant defence at the level of digestive cysteine

7 proteinases. Insect Biochem. Mol. Biol. 34, 365–375.

8 Gruden, K., Popovic, T., Cimerman, N., Krizaj, I., Strukelj, B., 2003. Diverse enzymatic specificities of

9 digestive proteases, ‘intestains’, enable Colorado potato beetle larvae to counteract the potato

10 defence mechanism. Biol. Chem. 384, 305–310.

11 Gruden, K., Strukelj, B., Popovic, T., Lenarcic, B., Bevec, T., Brzin, J., Kregar, I., Herzog-Velikonja, J.,

12 Stiekema, W.J., Bosch, D., Jongsma, M.A., 1998. The cysteine protease activity of Colorado potato 13 beetle ( Leptinotarsa decemlineata Say) guts, whichMANUSCRIPT is insensitive to potato protease inhibitors, is 14 inhibited by thyroglobulin type-I domain inhibitors. Insect Biochem. Mol. Biol. 28, 549–560.

15 Havlis, J., Thomas, H., Sebela, M., Shevchenko, A., 2003. Fast-response proteomics by accelerated in-gel

16 digestion of proteins. Anal. Chem. 75, 1300–1306.

17 Ingvarsson, P.K., 2005. Molecular population genetics of herbivore-induced protease inhibitor genes in

18 European aspen ( Populus tremula L., Salicaceae). Mol. Biol. Evol. 22, 1802–1812.

19 Jiggins, F.M., Kim, K.W., 2007. A screen for immunity genes evolving under positive selection in

20 Drosophila . J. Evol. Biol. 20, 965–970. 21 Keller, A., Nesvizhskii,ACCEPTED A.I., Kolker, E., Aebersold, R., 2002. Empirical statistical model to estimate the 22 accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392.

Page 23 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Kiggundu, A., Goulet, M.-C., Goulet, C., Dubuc, J.-F., Rivard, D., Benchabane, M., Pépin, G., Van der Vyver,

2 C., Kunert, K., Michaud, D., 2006. Modulating the proteinase inhibitory profile of a plant cystatin by

3 single mutations at positively selected amino acid sites. Plant J. 48, 403–413.

4 Kimura, M., 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge.

5 Kong, L. & Ranganathan, S., 2008. Tandem duplication, circular permutation, molecular adaptation: how

6 Solanaceae resist pests via inhibitors. BMC Bioinform. 9(Suppl 1), S22.

7 Koo, Y.D., Ahn, J.-E., Salzman, R.A., Moon, J., Chi, Y.H., Yun, D.-J., Lee, S.Y., Koiwa, H., Zhu-Salzman, K.,

8 2008. Functional expression of an insect cathepsin B-like counter-defence protein. Insect Mol. Biol. 17,

9 235–245.

10 Kutsukake, M., Shibao, H., Nikoh, N., Morioka, M., Tamura, T., Hoshino, T., Ohgiya, S., Fukatsu, T., 2004.

11 Venomous protease of aphid soldier for colony defense. Proc. Natl. Acad. Sci. U.S.A. 101, 11338–

12 11343. 13 Li, L., Chen, R., Weng, Z., 2003. RDOCK: refinement of rigid-bodyMANUSCRIPT protein docking predictions. Proteins 53, 14 693–707.

15 Li, X.-Q., Zhang, T., Donnelly, D., 2011. Selective loss of cysteine residues and disulphide bonds in a

16 potato proteinase II family. PLoS One 4, e18615.

17 Ljunggren, A., Redzynia, I., Alvarez-Fernandez, M., Abrahamson, M., Mort, J.S., Krupa, J.C., Jaskolski, M.,

18 Bujacz, G., 2007. Crystal structure of the parasite protease inhibitor chagasin in complex with a host

19 target cysteine protease. J. Mol. Biol. 371, 137–153.

20 Lopes, A.R., Juliano, M.A., Juliano, L., Terra, W.R., 2004. Coevolution of insect trypsins and inhibitors. 21 Arch. Insect Biochem.ACCEPTED Physiol. 55, 140–152. 22 Martinez, M., Abraham, Z., Carbonero, P., Diaz, I., 2005. Comparative phylogenetic analysis of cystatin

23 gene families from arabidopsis, rice and barley. Mol. Gen. Genom. 273, 423–432.

Page 24 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Massonneau, A., Condamine, P., Wisniewski, J.-P., Zivy, M.. Rogowsky, P.M., 2005. Maize cystatins

2 respond to developmental cues, cold stress and drought. Biochim. Biophys. Acta 1729, 186–199.

3 Mishra, M., Mahajan, N., Tamhane, V.A., Kulkarni, M.J., Baldwin, I.T., Gupta, V.S., Giri, A., 2012. Stress

4 inducible proteinase inhibitor diversity in Capsicum annuum . BMC Plant Biol. 12, 217.

5 Nagata, K., Kudo, N., Abe, K., Arai, S., Tanokura, M., 2000. Three-dimensional solution structure of

6 oryzacystatin I, a cysteine proteinase inhibitor of the rice, Oryza sativa L. japonica. Biochemistry 39,

7 14753–14760.

8 Neiman, M., Olson, M.S., Tiffin, P., 2009. Selective histories of poplar protease inhibitors: elevated

9 polymorphism, purifying selection, and positive selection driving divergence of recent duplicates.

10 New Phytol. 183, 740–750.

11 Nesvizhskii, A.I., Keller, A., Kolker, E., Aebersold, R., 2003. A statistical model for identifying proteins by

12 tandem mass spectrometry. Anal. Chem. 75, 4646–4658. 13 Nogueira, F.C.S., Silva, C.P., Alexandre, D., Sumuels, MANUSCRIPT R.I., Soares, E.L., Aragao, F.J.L., Palmisano, G., 14 Domont, G.B., Roepstorff, P., Campos, F.A.P., 2012. Global proteome changes in larvae of

15 Callosobruchus maculatus (Coleoptera: Chrysomelidae: Bruchinae) following ingestion of a cysteine

16 proteinase inhibitor. Proteomics 12, 2704–2715.

17 Old, W.M., Meyer-Arendt, K., Aveline-Wolf, L., Pierce, K.G., Mendoza, A., Sevinsky, J.R., Resing, K.A., Ahn,

18 N.G., 2005. Comparison of label-free methods for quantifying human proteins by shotgun proteomics.

19 Mol. Cell. Proteom. 4, 1487–1502.

20 Oppert, B., Rasoolizadeh, A., Michaud, D., 2014. The Coleopteran gut and targets for pest control, in: 21 Hoffmann, K. (Ed.),ACCEPTED Insect molecular biology and ecology. CRC Press, Boca Raton, Florida, pp. 291–317. 22 Oppert, B., Morgan, T.D., Hartzer, K., Kramer, K.J., 2005. Compensatory proteolytic responses to dietary

23 proteinase inhibitors in the red flour beetle, Tribolium castaneum (Coleoptera: Tenebrionidae). Comp.

24 Biochem. Physiol. C 140, 53–58.

Page 25 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Perkins, D.N., Pappin, D.J.C., Creasy, D.M., Cottrell, J.S., 1999. Probability-based protein identification by

2 searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.

3 Rispe, C., Kutsukake, M., Doublet, V., Hudaverdian, S., Legeai, F., Simon, J.-C., Tagu, D., Fukatsu, T., 2008.

4 Large gene family expansion and variable selective pressures for cathepsin B in aphids. Mol. Biol. Evol.

5 25, 5–17.

6 Ryan, C.A., 1990. Protease inhibitors in plants: genes for improving defenses against insects and

7 pathogens. Annu. Rev. Phytopathol. 28, 425–449.

8 Sabotic, J., Popovic, T., Puizdar, V., Brzin, J., 2009. Macrocypins, a family of cysteine protease inhibitors

9 from the basidiomycete Macrolepiota procera . FEBS J. 276, 4334–4345.

10 Sainsbury, F., Rhéaume, A.-J., Goulet, M.-C., Vorster, J., Michaud, D., 2012a. Discrimination of

11 differentially inhibited cysteine proteases by activity-based profiling using cystatin variants with

12 tailored specificities. J. Proteome Res. 11, 5983–5993. 13 Sainsbury, F., Benchabane, M., Goulet, M.-C., MichauMANUSCRIPTd, D., 2012b. Multimodal protein constructs for 14 herbivore insect control. Toxins 4, 455–475.

15 Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic

16 trees. Mol. Biol. Evol. 4, 406–425.

17 Schlüter, U., Benchabane, M., Munger, A., Kiggundu, A., Vorster, J., Goulet, M.-C., Cloutier, C., Michaud,

18 D., 2010. Recombinant protease inhibitors for herbivore pest control: a multitrophic perspective. J.

19 Exp. Bot. 61, 4169–4183.

20 Smid, I., Gruden, K., Gasparic, M. B., Koruza, K., Petek, M., Pohleven, J., Brzin, J., Kos, J., Zel, J., Sabotic, J., 21 2013. InhibitionACCEPTED of the growth of Colorado potato beetle larvae by macrocypins, protease inhibitors 22 from the Parasol Mushroom. J. Agric. Food Chem. 61, 12499–12509.

Page 26 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Speranskaya, A.S., Krinitsina, A.A., Kudryavtseva, A.V., Poltronieri, P., Santino, A., Oparina. N.Y., Dmitriev,

2 A.A., Belenikin, M.S., Guseva, M.A., Shevelev, A.B., 2012. Impact of recombination on polymorphism

3 of genes encoding Kunitz-type protease inhibitors in the genus Solanum . Biochimie 94, 1687–1696.

4 Srinivasan, A., Giri, A.P., Gupta, V.S., 2006. Structural and functional diversities in lepidopteran serine

5 proteases. Cell. Mol. Biol. Lett. 11, 132–154.

6 Stern, A., Doron-Faigenboim, A., Erez, E., Martz, E., Bacharach, E., Pupko, T., 2007. Selecton 2007:

7 advanced models for detecting positive and purifying selection using a Bayesian inference approach.

8 Nucl. Acids Res. 35, W506–W511.

9 Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R., Wu, C.H., 2007. UniRef: comprehensive and non-

10 redundant UniProt reference clusters. Bioinformatics 23, 1282–1288.

11 Swanson, W.J., Nielsen, R., Yang, Q., 2003. Pervasive adaptive evolution in mammalian fertilization

12 proteins. Mol. Biol. Evol. 20, 18–20. 13 Tan, Y., Wang, S., Liang, D., Li, M., Ma, F., 2014. Genome-wideMANUSCRIPT identification and expression profiling of 14 the cystatin gene family in apple ( Malus x domestica Borkh.). Plant Physiol. Biochem. 79, 88–97.

15 Tribolium Genome Sequencing Consortium., 2008. The genome of the model beetle and pest Tribolium

16 castaneum . Nature 452, 949–955.

17 Talyzina, N.M., Ingvarsson, P.K., 2006. Molecular evolution of a small gene family of wound inducible

18 Kunitz trypsin inhibitors in Populus . J. Mol. Evol. 63, 108–119.

19 Visal-Shah, S., Vrain, T.C., Yelle, S., Nguyen-Quoc, B., Michaud, D., 2001. An electroblotting, two-step

20 procedure for the detection of proteinases and the study of proteinase/inhibitor complexes in 21 gelatin-containingACCEPTED polyacrylamide gels. Electrophoresis 22, 2646–2652. 22 Weeda, S.M., Kumar, G.N.M., Knowles, N.R., 2009. Developmentally linked changes in proteases and

23 protease inhibitors suggest a role for potato multicystatin in regulating protein content of potato

24 tubers. Planta 230, 73–84.

Page 27 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Wong, A., Turchin, M.C., Wolfner, M.F., Aquadro, C.F., 2007. Evidence for positive selection on

2 Drosophila melanogaster seminal fluid protease homologs. Mol. Biol. Evol. 25, 497–506.

3 Wu, D.-D., Wang, G.-D., Irwin, D.M., Zhang, Y.-P., 2009. A profound role for the expansion of trypsin-like

4 serine protease family in the evolution of hematophagy in mosquito. Mol. Biol. Evol. 26, 2333–2341.

5 Yang, Z., Nielsen, R., Goldman, N., Pedersen A.-M.K., 2000. Codon-substitution models for

6 heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449.

7 Zhang, B., VerBerkmoes, N.C., Langston, M.A., Uberbacher, E., Hettich, R.L., Samatova, N.F., 2006.

8 Detecting differential and correlated protein expression in label-free shotgun proteomics. J.

9 Proteome Res. 5, 2909–2918.

10 Zhu-Salzman, K., Zeng, R., 2015. Insect response to plant defensive protease inhibitors. Annu. Rev.

11 Entomol. 60, 233–252.

12 Zhu-Salzman, K., Koiwa, H., Salzman, R. A., Shade, R.E., Ahn, J.-E., 2003. Cowpea bruchid Callosobruchus 13 maculatus uses a three-component strategy to overcomeMANUSCRIPT a plant defensive cysteine protease 14 inhibitor. Insect Mol. Biol. 12, 135–145.

15

ACCEPTED

Page 28 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Table 1

2 Amino acid variation and preferred residues at positively selected sites among Leptinotarsa

3 decemlineata intestain families.

4

5

6 Amino acid site a

7 ______

8 22 74 95 146 154

9

10

11 Amino acid variation (%) b 8/23 (34.8) 6/21 (28.6) 5/21 (23.8) 7/28 (25.0) 5/29 (17.2)

12 Preferred residues c

13 . Intestains A Asn, Glu Glu, Lys Gln Gly Gln 14 . Intestains B Glu, Gly Phe MANUSCRIPT Ile, Thr Phe, Ser, Pro Phe 15 . Intestains C Ser, Gln Ile Arg Ala Gln

16 . Intestains D Pro Ala Lys Tyr Gly

17 . Intestains E Lys Asn Arg Gly Arg

18 . Intestains F Asp Ala Ile Phe Gly

19

20 a Amino acid numbering based on mature IntD4 protease sequence.

21 b Refers to the number of different amino acids at the given amino acid site over the number of sequences

22 available for this same site in the GenBank/NCBI database (http://ncbi.nlm.nih.gov/).

23 c Preferred residuesACCEPTED within intestain functional families, based on the 6-family classification of Sainsbury et al.,

24 2012a.

25

Page 29 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Table 2

2 Relative growth rate and relative leaf consumption rate of L. decemlineata 4th instars fed potato leaves

3 painted with wild-type SlCYS8, P2I, T6R or a control, no-inhibitor solution. a

4

5 RGR (d-1) b RCR (mm 2.mg -1.d -1) b

6 ______

7 24 h 72 h 24 h 72 h

8

9 Control treatment 0.31 ± 0.04 a 0.42 ± 0.04 3.25 ± 0.41 a 6.47 ± 0.58

10 SlCYS8 (wild-type) 0.45 ± 0.07 ab 0.47 ± 0.05 7.72 ± 0.56 b 6.05 ± 0.55

11 P2I variant 0.42 ± 0.04 ab 0.60 ± 0.04 7.09 ± 1.29 ab 7.44 ± 0.75

12 T6R variant 0.79 ± 0.14 b 0.69 ± 0.10 6.81 ± 0.83 ab 9.69 ± 1.06

13

a 14 Data are expressed as mean values from six biological replicatesMANUSCRIPT (n=6 larvae) ± SE, 24 h (1 d) or 72 h (3 d) 15 after initiating the bioassay.

b 16 Numbers with the same letter are not significantly different (post-ANOVA Tukey’s HSD test; α=0.05). No

17 significant effect was observed at α=0.05 among inhibitor treatments after 72 h, for either RGR (P=0.089)

18 and RCR (P=0.074).

19

ACCEPTED

Page 30 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Figure legends

2 Fig. 1. Positively selected amino acids in herbivorous Coleoptera digestive Cys proteases. Yellow shading

3 and asterisks indicate reliable positive selection inference (i.e. ω>1) in the 95% statistical confidence

4 interval. Purple shading indicates negative [purifying] selection inference (i.e. ω<1) at p<0.05. Grey

5 shading indicates amino acid strings predicted to interact with cystatins (see Sainsbury et al., 2012a).

6 Protease codon sequences (Fig. S2) were compared under the MEC evolutionary model of Doron-

7 Faigenboim and Pupko (2006) using the Selecton servers for the identification of site-specific positive

8 selection and purifying selection (http://selecton.tau.ac.il) (Stern et al. 2007). Data are expressed using

9 mature Intestain D4 numbering (see Fig. S1 for primary sequence). The 18 N-terminal residues were not

10 included in the analysis due to missing information for some sequences. FOR COLOUR REPRODUCTION

11 ON THE WEB AND IN PRINT.

12 13 Fig. 2. In silico protease::inhibitor docking simulation MANUSCRIPT for Intestain D4 (IntD4) interacting with potato 14 cystatin StCYS1. (A) Primary sequence of the two proteins showing the position of positively selected

15 amino acids in IntD4 (Fig. 1, this study) and StCYS1 (from Kiggundu et al., 2006). Asterisks indicate

16 positively selected residues (in red characters). Sequence motifs shaded in green (IntD4) or in blue

17 (StCYS1) indicate residues or regions that interact with the protein partner (see Sainsbury et al., 2012a).

18 (B) Tertiary structure model for the IntD4::StCYS1 complex. Positively selected residues are highlighted in

19 red. Numbered residues on IntD4 (D4-#) and StCYS1 identify amino acid sites under positive selection. (C)

20 Closer view of the IntD4::StCYS1 complex model showing the interacting amino acid residues. Residues in 21 red correspond toACCEPTED positively selected amino acids. Areas in green (IntD4) or in blue (StCYS1) highlight 22 amino acids physically involved in the enzyme–inhibitor interaction (see panel A). FOR COLOUR

23 REPRODUCTION ON THE WEB AND IN PRINT.

Page 31 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

1 Fig. 3. Cathepsin B-, cathepsin L-, cathepsin D- and trypsin-like protease activities in midgut extracts of L.

2 decemlineata instars fed wild-type SlCYS8-, P2I- or T6R-coated potato leaves. Control insects (Ctrl) were

3 fed potato leaves coated with a gelatin, no-inhibitor solution. Each bar is the mean of six independent

4 (insect replicate) values ± SE . Bars with the same letter are not significantly different (post-ANOVA Tukey’s

5 HSD; α=0.05).

6

7 Fig. 4. Intestain peptide MS/MS spectral counts in midgut extracts of L. decemlineata instars fed wild-

8 type SlCYS8-, P2I- or T6R-coated potato leaves. Control insects (Ctrl) were fed potato leaves coated with

9 a gelatin, no-inhibitor solution. (A) Relative distribution of intestain family-specific peptides among

10 intestain functional families A to F (see Table S3 for details on MS/MS peptides). (B) Total intestain

11 peptide spectral counts in midgut extracts. (C) Total spectral counts for each intestain family.

12 Distribution rates for each intestain family in (A) represent mean relative percentages in midgut extracts 13 from three independent (biological) replicates. Each barMANUSCRIPT in (B) and (C) is the mean of three independent 14 (insect replicate) values ± SE . Bars with the same letter are not significantly different (post-ANOVA Tukey’s

15 HSD; α=0.05).

16

17 Fig. 5. Spectral counts for intestain peptides captured with biotinylated versions of wild-type SlCYS8, P2I

18 or T6R in midgut extracts of potato-fed (naive) L. decemlineata larvae. (A) Total and intestain B-specific

19 spectral counts. Total counts correspond to the sum of peptide spectra counted for intestains B, C and D

20 (see Table S4). Each bar is the mean of three independent (insect replicate) values ± SE . Bars with the 21 same letter are notACCEPTED significantly different (post-ANOVA Tukey’s HSD; α=0.05). (B) Relative distribution of 22 intestain B unique peptides captured with wild-type SlCYS8, P2I or T6R. Data are presented for each

23 cystatin variant as the relative count number for each unique peptide, over the total number of captured

24 intestain B peptides. Correspondence between numbers and peptide sequences is given in Table S4.

Page 32 ACCEPTED MANUSCRIPT

10 20 30 40 50 60 70 80 90 100 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| EVPDSIDWTEKGAVLEVKDQNPCGSCWAFSATGALKGQNAILNNVKISLSEQQLLDCSAAYGNGNCKEGGDMSAAFDYVRDYGIQSEKSYPYIRKQTECQ * * * 110 120 130 140 150 160 170 180 190 200 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| YDASKTILKIKGYKNVTTSEEGLRKAVGTIGPISIAMNSDPLQLYYSGTISGKGCSHDLDHGVLVVGYGKASQWSGETKFWRVKNSWGKIWGENGYFRIK * * 210 MANUSCRIPT ....|....|....|... RDANNLCGIADDPTYPVL

ACCEPTED ACCEPTED MANUSCRIPT

A Intestain D4 10 20 30 40 50 60 70 80 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| EVPDSIDWTEKGAVLEVKDQNPCGSCWAFSATGALKGQNAILNNVKISLSEQQLLDCSAAYGNGNCKEGGDMSAAFDYVR * * 90 100 110 120 130 140 150 160 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| DYGIQSEKSYPYIRKQTECQYDASKTILKIKGYKNVTTSEEGLRKAVGTIGPISIAMNSDPLQLYYSGTISGKGCSHDLD * * * 170 180 190 200 210 ....|....|....|....|....|....|....|....|....|....|....|... HGVLVVGYGKASQWSGETKFWRVKNSWGKIWGENGYFRIKRDANNLCGIADDPTYPVL

StCYS1 10 20 30 40 50 60 70 80 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| IVGGLVDVPFENKVEFDDLARFAVQDYNQKNDSSLEFKKVLNVKQQIVAGIMYYITFEATEGGNKKEYEAKILLRKWEDL ** * * * * * 90 ....|....|.... KKVVGFKLVGDDST * B C StCYS1 StCYS1 MANUSCRIPT F10

V84 Q45

V84 Q45 I47 D4- D4-95 K76 22 V2 I47 D4-154 D4-154 V2 K76 I1 I1 154 D4-146 D4-74 D4-146 ACCEPTED

Intestain D4 Intestain D4 ACCEPTED MANUSCRIPT

b 80 Cathepsin B b 200 Cathepsin L 90 Cathepsin D c 60 Trypsin b b 60 150 45 a 60 a a 40 a 100 ab 30 a a a 30 a 20 a 50 15 a

Activity (units/sec) 0 0 MANUSCRIPT0 0 Ctrl WT P2I T6R Ctrl WT P2I T6R Ctrl WT P2I T6R Ctrl WT P2I T6R

+ SlCYS8 + SlCYS8 + SlCYS8 + SlCYS8

ACCEPTED ACCEPTED MANUSCRIPT

A B 150 Int (Total) b Intestain families distribution (%) 125 0 20 40 60 80 100 100 a a Ctrl 75 a WT 50 P2I 25

T6R (#) Spectral counts 8 0 A B C D E F Ctrl WT P2I T6R + SlCYS8 C b 10 IntA 30 IntB b 20 IntC 25 8 15 ab 20 a 6 15 a 10 4 a a a 10 MANUSCRIPT 5 2 5 Spectral counts (#) Spectral counts 0 (#) Spectral counts 0 (#) Spectral counts 0 Ctrl WT P2I T6R Ctrl WT P2I T6R Ctrl WT P2I T6R + SlCYS8 + SlCYS8 + SlCYS8 80 12 12 IntD IntE b IntF 60 9 9 a 40 6 a a 6 20 3 3

Spectral counts (#) Spectral counts 0 (#) Spectral counts 0 (#) Spectral counts 0 Ctrl WT P2I T6R Ctrl WT P2I T6R Ctrl WT P2I T6R + SlCYS8ACCEPTED+ SlCYS8 + SlCYS8 ACCEPTED MANUSCRIPT

A 30 b 25 Total IntB b 20 20 15 a 10 ab 10 a a 5

Spectral counts (#) Spectral counts 0 0 WT P2I T6R WT P2I T6R B Peptides distribution MANUSCRIPT (%) 0 20 40 60 80 100

WT 1 2 3

P2I 1 2 3 4 5 6 7 8 9 10 111213

T6R 1 2 3 5 7 10

ACCEPTED ACCEPTED MANUSCRIPT

Editor – IBMB

Highlights for Vorster et al., 2015

1. Positive selection contributes to the diversification of plant protease inhibitors. 2. We assessed whether a reciprocal evolutionary process is acting on the insect side. 3. Midgut Cys proteases of herbivorous Coleoptera were found to be positively selected. 4. A functional impact of positive selection on the proteases was empirically confirmed. 5. This suggests a coevolutionary route for the insect proteases and the plant inhibitors.

MANUSCRIPT

ACCEPTED

ACCEPTED MANUSCRIPT

Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

Table S1 Herbivorous Coleoptera digestive Cys protease sequences submitted to the MEC model of Doron- Faigenboim and Pupko (2006)

Insect species Protease name GenBank Accession

Diabrotica virgifera virgifera CAL1 AF190653.1 Callosobruchus maculatus CPA1 AF544834.1 CPA9 AF544835.1 CPA11 AF544836.1 CPA13 AF544837.1 CPA15 AF544838.1 CPA16 AF544839.1 CPA21 AF544840.1 CPA23 AF544841.1 CPA27 AF544843.1 CPB1 AF544844.1 Leptinotarsa decemlineata Intestain A1 (IntA1) AY159368.1 IntA2 MANUSCRIPT AY159369.1 IntA24 AY528226.1 IntA26 AY528227.1 IntA27 AY528228.1 IntB2 AY159372.1 IntB3 AY159373.1 IntB11 AY528229.1 IntB12 AY528230.1 IntD4 EF154436.1 IntD5 EF154437.1 IntD6 EF154438.1 IntE2 EF154432.1 IntE3 EF154433.1

Doron-Faigenboim, A.ACCEPTED, Pupko, T., 2006. A combined empirical and mechanistic codon model. Mol. Biol. Evol. 24, 388–397.

Page 1 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

Table S2 Primer sequences for Quickchange site-directed mutagenesis of AviTagged SlCYS8 a

Name Sequence Notes

SlCYS8-P2I-F CCTTGGTCTCACCGAATATTGGGGGCATTACC Forward primer | P2I mutation

SlCYS8-P2I-R GGTAATGCCCCCAATATTCGGTGAGACCAAGG Reverse primer | P2I mutation

SlCYS8-T6R-F CTGGGGGCATTCGTAATGTTCCATTCC Forward primer | T6R mutation

SlCYS8-T6R-R GGAATGGAACATTACGAATGCCCCCAG Reverse primer | T6R mutation

a Mutated nucleotides highlighted in bold characters; AviTagged SlCYS8 template construct for mutagenesis from Sainsbury et al., 2012.

Sainsbury, F., Rhéaume, A.-J., Goulet, M.-C., Vorster, J., Michaud, D., 2012. Discrimination of differentially inhibited cysteine proteases by activity-based profiling using cystatin variants with tailored MANUSCRIPTspecificities. J. Proteome Res. 11, 5983–5993.

ACCEPTED

Page 1 ACCEPTED MANUSCRIPT

Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

Table S3 Unique intestain peptides detected by MS/MS in midgut extracts of Leptinotarsa decemlineata larvae fed potato leaves painted with wild-type SlCYS8, P2I or T6R a

Family Unique peptides Spectral counts b Accession numbers

------Total Ctrl WT P2I T6R

Q6QRP9, Q6QRP8, Q8I888, A NSWGTTWGEDGYFR 17 5 2 9 1 Q8I889 Q6QRP8, Q6QRP9, KLLADEDELKK 13 5 2 5 1 Q6QRQ0, Q8I888, Q8I889 Q6QRP8, Q6QRP9, LLADEDELKK 7 3 0 3 1 Q6QRQ0, Q8I888, Q8I889 Q6QRP9, Q6QRP8, Q8I888, VKNSWGTTWGEDGYFR 7 1 1 5 0 Q8I889 Q8I886, Q6QRP7, Q8I884, B GIEADSSYPYK 29 6 6 12 5 Q6QRP6 Q8I886, Q6QRP7, Q8I885, GIDTPCQYDAK 27 7 6 8 6 Q6QRP6 Q6QRP6, Q6QRP7, Q8I884, DANNLCGIADK 24 7 5 6 6 Q8I885 Q6QRP6, Q6QRP7, Q8I884, NSWGKDWGEQGYFR 23 5 3 11 4 Q8I885 Q6QRP6, Q6QRP7, Q8I884, VKNSWGKDWGEQGYFR 14 6 1 6 1 Q8I885 Q8I886, Q6QRP7, Q8I885, GIDTPCQYDAKK 13 5 0 6 2 MANUSCRIPTQ6QRP6 YQGGCGSCWAFSATGALEGQNAIVNNVK 13 2 1 8 2 Q6QRP6, Q6QRP7, Q8I883 Q6QRP6, Q6QRP7, Q8I884, DWGEQGYFR 12 2 3 6 1 Q8I885 Q8I886, Q6QRP6, Q8I884, NVSISEEELKK 8 2 0 3 3 Q8I885 Q6QRP6, Q6QRP7, Q8I886, SATGALEGQNAIVNNVK 5 1 1 2 1 Q8I884, Q8I885, Q8I883 GIEADSSYPYKGIDTPCQYDAK 5 0 0 5 0 Q8I886, Q6QRP7, Q6QRP6 Q6QRP6, Q6QRP7, Q8I884, DANNLCGIADKASYPIL 3 1 2 0 0 Q8I885 Q6QRP6, Q6QRP7, Q8I884, RDANNLCGIADK 3 0 0 3 0 Q8I885 Q6QRP6, Q6QRP7, Q8I884, IPLSEQQLLDCSKPYGNDDCEHGGLMSFAFDYVLDK 2 2 0 0 0 Q8I885 GIEADSSYPYKGIDTPCQYDAKK 1 1 0 0 0 Q8I886, Q6QRP7, Q6QRP6 C NADNQCGIATRACCEPTED 34 7 8 11 8 Q8I879, Q6QRP5, Q8I882 NTEGDLTHAVLVTGYGSQDGKDYWIVK 13 3 3 5 2 Q8I879, Q6QRP5, Q8I882

KNDEIDLQK 13 3 3 6 1 Q6QRP5, Q8I882, Q8I881

LISLSEQQLVDCVK 13 3 1 5 4 Q6QRP5, Q8I882, Q8I881

Page 1 ACCEPTED MANUSCRIPT

Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

QGAVTEVK 10 1 1 6 2 Q6QRP5, Q8I882

NSWGAEYGMDGYLR 9 2 2 3 2 Q6QRP5, Q8I882

AIKKNDEIDLQK 6 2 1 3 0 Q6QRP5, Q8I882, Q8I881

NDEIDLQK 3 0 1 2 0 Q6QRP5, Q8I882, Q8I881

SQGSCGSCWAFSTTGSVESHNFIK 2 0 0 2 0 Q6QRP5, Q8I882 A2I7Q0, A2I7P4, A2I7P3, D ASQWSGETK 167 38 30 57 42 A2I7P2, A2I7P9 A2I7P4, A2I7Q3, A2I7P3, DYGIQSEK 76 22 14 24 16 A2I7P2 AVGTIGPISIAMNSDPLQLYYSGIISGK 47 13 6 20 8 A2I7Q0, A2I7P6 A2I7P6, A2I7P4, A2I7P3, GCSHDLDHGVLVVGYGK 28 7 7 6 8 A2I7P2, A2I7P9 KAVGTIGPISIAMNSDPLQLYYSGIISGK 25 9 2 9 5 A2I7Q0, A2I7P6

EGGDMSAAFEYVR 24 8 3 9 4 A2I7P4, A2I7Q3, A2I7P3 A2I7P4, A2I7Q3, A2I7P3, KQTECQYDASK 23 6 4 9 4 A2I7P2 A2I7P4, A2I7Q3, A2I7P3, NVTTSEEGLRK 17 5 2 6 4 A2I7P2 A2I7P4, A2I7Q3, A2I7P3, ISLSEQQLLDCSAAYGNGNCK 14 3 4 3 4 A2I7P2 A2I7Q0, A2I7P4, A2I7P3, ASQWSGETKFWR 12 2 3 3 4 MANUSCRIPTA2I7P2, A2I7P9 GAVLEVKDQNPCGSCWAFSATGALEGQNAILNNVK 11 3 2 3 3 A2I7Q3, A2I7P3 A2I7P4, A2I7Q3, A2I7P3, QTECQYDASK 7 3 2 2 0 A2I7P2 A2I7P6, A2I7P4, A2I7P3, IWGENGYFR 6 0 0 0 6 A2I7P2, A2I7P9, A2I7Q0 DQNPCGSCWAFSATGALEGQNAILNNVK 5 0 0 2 3 A2I7Q3, A2I7P3

EGGDMSAAFEYVRDYGIQSEK 4 1 0 3 0 A2I7P4, A2I7Q3, A2I7P3 A2I7P6, A2I7P4, A2I7P3, NSWGKIWGENGYFR 3 0 0 3 0 A2I7P2, A2I7P9, A2I7Q0 A2I7P4, A2I7Q3, A2I7P3, KQTECQYDASKTILK 3 1 0 2 0 A2I7P2 A2I7P4, A2I7Q3, A2I7P3, GYKNVTTSEEGLRK 1 1 0 0 0 A2I7P2 A2I7P4, A2I7Q3, A2I7P3, NVTTSEEGLR 1 1 0 0 0 A2I7P2 E GADLGVKDQGKACCEPTED 13 3 2 6 2 A2I7N9

GIEAGSSYPYQGR 10 3 2 3 2 A2I7N9, A2I7N8

KAVGTIGPISVAVSSEHLR 10 2 2 5 1 A2I7N9, A2I7N8

AVGTIGPISVAVSSEHLR 8 2 2 3 1 A2I7N9, A2I7N8

Page 2 ACCEPTED MANUSCRIPT

Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

LYGGGVITTR 6 1 0 3 2 A2I7N9, A2I7N8

ASEVELKK 4 0 0 3 1 A2I7N9, A2I7N8

AKGIEAGSSYPYQGR 3 0 0 3 0 A2I7N9, A2I7N8

TWGDHGYFK 3 0 1 2 0 A2I7N9, A2I7N8

DLDHAVLAVGYGSENGR 2 0 0 2 0 A2I7N9, A2I7N8

APESIDWTQK 1 0 1 0 0 A2I7N9, A2I7N8

F SIETVCQYDASK 5 0 0 3 3 A2I7Q1, A2I7Q2

DYGIEAEESYPYK 6 0 0 2 4 A2I7Q1, A2I7Q2

AVGTIGPISIAMNADPLQLYFGGILSGR 4 0 0 2 2 A2I7Q1, A2I7Q2, A2I7N6

NVTASEEELRK 4 0 0 2 2 A2I7Q1, A2I7Q2, A2I7N6

GCTDELDHGVLAVGYGEVSQSSGNTK 2 0 0 2 0 A2I7P5

VKNSWGDYWGEK 2 0 0 2 0 A2I7P5, A2I7N6, A2I7P8 A2I7P6, A2I7P5, A2I7P4, D, F DANNLCGIADDPTYPVL 24 6 6 8 4 A2I7P3, A2I7P2, A2I7P9, A2I7P8, A2I7Q0, A2I7N7 A2I7P6, A2I7P5, A2I7P4, RDANNLCGIADDPTYPVL 2 0 0 0 2 A2I7P3, A2I7P2, A2I7P9, A2I7P8, A2I7Q0, A2I7N7 MANUSCRIPTA2I7P4, A2I7P3, A2I7P2, EDLEVPDSIDWTEK 1 1 0 0 0 A2I7Q1, A2I7Q2, A2I7Q3 A2I7P4, A2I7P3, A2I7P2, DLEVPDSIDWTEK 1 0 1 0 0 A2I7Q1, A2I7Q2, A2I7Q3

a Control (Ctrl) larvae were fed potato leaves painted with a gelatin, no-inhibitor solution. b Total spectral counts for three biological replicates.

ACCEPTED

Page 3 ACCEPTED MANUSCRIPT

Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

Table S4 Unique peptides detected following MS/MS analysis of intestains captured with wild-type SlCYS8, P2I or T6R in a midgut extract of L. decemlineata larvae raised on potato, cv. Kennebec

a Family Unique peptides Spectral counts Accession numbers

------Total WT P2I T6R

B 1 NSWGKDWGEQGYFR 21 4 11 6 Q6QRP6, Q6QRP7, Q8I884, Q8I885

2 DANNLCGIADKASYPIL 19 10 6 3 Q6QRP6, Q6QRP7, Q8I884, Q8I885

3 GIEADSSYPYK 14 1 12 1 Q8I886, Q6QRP7, Q8I884, Q6QRP6

4 DANNLCGIADK 14 0 14 0 Q6QRP6, Q6QRP7, Q8I884, Q8I885

Q6QRP6, Q6QRP7, Q8I886, Q8I884, Q8I885 5 IPLSEQQLLDCSK 11 0 6 5 Q8I883

6 DWGEQGYFR 10 0 6 0 Q6QRP6, Q6QRP7, Q8I884, Q8I885

7 VKNSWGKDWGEQGYFR 9 0 5 4 Q6QRP6, Q6QRP7, Q8I884, Q8I885

8 GIEADSSYPYKGIDTPCQYDAKK 5 0 5 0 Q8I886, Q6QRP7, Q6QRP6

Q6QRP6, Q6QRP7, Q8I886, Q8I884, Q8I885 9 SATGALEGQNAIVNNVK 5 0 5 0 MANUSCRIPT Q8I883 10 YQGGCGSCWAFSATGALEGQNAIVNNVK 4 0 3 1 Q6QRP6, Q6QRP7, Q8I883

11 GIDTPCQYDAK 3 0 3 0 Q8I886, Q6QRP7, Q8I885, Q6QRP6

12 GIDTPCQYDAKK 3 0 3 0 Q8I886, Q6QRP7, Q8I885, Q6QRP6

13 WGKDWGEQGYFR 2 0 2 0 Q6QRP6, Q6QRP7, Q8I884, Q8I885

C LISLSEQQLVDCVK 10 1 5 4 Q6QRP5, Q8I882, Q8I881

NSWGAEYGMDGYLR 8 1 5 2 Q6QRP5, Q8I882

A2I7P6, A2I7P4, A2I7P3, A2I7P2, A2I7P9 D IWGENGYFR 6 1 3 2 A2I7Q0

EGGDMSAAFEYVR 5 0 5 0 A2I7P4, A2I7Q3, A2I7P3 ASQWSGETKFWRACCEPTED 5 0 5 0 A2I7Q0, A2I7P4, A2I7P3, A2I7P2, A2I7P9 GCSHDLDHGVLVVGYGK 5 0 5 0 A2I7P6, A2I7P4, A2I7P3, A2I7P2, A2I7P9

ISLSEQQLLDCSAAYGNGNCK 3 0 2 1 A2I7P4, A2I7Q3, A2I7P3, A2I7P2

a Total spectral counts for three biological replicates.

Page 1 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

Multiple sequence alignment by MUSCLE (3.8) | CLUSTALW Format output

AF544844 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSVVEEKRRFSVFQKNLIDIQEHNKK AF544839 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLVDIQEHNKK AF544837 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLVDIQEHNKK AF544840 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLVDIQEHNKK AF544843 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLVDIQEHNKK AF544838 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLIDIQEHNKK AF544835 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLIDIQEHNKK AF544841 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSLVEEKRRFSVFQKNLIDIQEHNKK AF544836 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSVVEEKRRFSVFQKNLIDIQEHNKK AF544834 MKFLVVVATLVLVAGASSVYEEWQQFKLDHGKTYRSVVEEKRRFSVFQKNLIDIQEHNKK AF190653 --LFIAFAAVILSAGALSLNQHWESFKVQHGKVYKNPIEERVRFSVFQANLKTINEHNAK EF154432 MKLFALIAAVLVTVNASTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQNNLRTIEEHNAK EF154433 ---FALIAAVLVTVNASTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQNNLRTIEEHNAK AY159368 ------AY159369 ------AY528226 MKLLTFFAAVLMTVSASTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQNNLRTIEKHNAK AY528228 MKLLTFFAAVLMTVSASTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQNNLRTIEKHNAE AY528227 MKLLTFFAAVLMTVSASTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQNNLRTIEKHNAK AY528229 MKLLAVFASVLLAVNALTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQSNLRKIEEHNAK AY528230 MKLLAVFATVLLAVNALTDKDQWVAFKQTHGKTYKSLLEERTRFGIFQSNLRKIEEHNAK AY159373 ------AY159372 ------EF154438 MKFLAIFATVLIAVTASTNEDQWIAFKQTHGKTYKNLLEERTRFGIFQRNLIKIKEHNAR EF154436 MKFLAIFATVLIAVTASTNEDQWIAFKQTHGKTYKNLLEEKTRFGIFQRNLIKIKEHNAR EF154437 MKLLAIFATVLIAVTASTNEDQWIAFKQTHGKTYKNLLEEKTRFGIFQRNLIKIKEHNAR **.: .*:::: . * : : * **MANUSCRIPT ***.*.. :**. **.:** ** *::**

Fig. S1. Amino acid alignment of the 25 Coleoptera Cys proteases selected for the positive selection analysis. The alignments were generated online using the Multiple Sequence Comparison by Log- Expectation (MUSCLE) alignment program (http://www.ebi.ac.uk/Tools/msa/muscle/) (Edgar, 2004). Asterisks indicate conserved amino acid residues, dots partially conserved residues. Open and black circles on page 2 point, respectively, to the first residue of the mature enzyme after N-terminal signal peptide removal, and to the starting residue for the positive selection analysis. Corresponding protease names for the 25 accession numbers are given in Table S1.

______Edgar, R.C., 2004. MUSCACCEPTEDLE: multiple sequence alignment with high accuracy and high-throughput. Nucl. Acids Res.

32, 1792–1797.

Page 1 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 YERGEVSFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSDAVYF---EETDIEEKDAVDW AF544839 YERGEESFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDIDMEEKDAIDW AF544837 YERGEESFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNSEDIDMEEKDAVDW AF544840 YERGEESFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDIDMEEKDAVDW AF544843 YERGEESFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDIDMEEKDAVDW AF544838 YERGEVSFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDTDMEEKDAVDW AF544835 YERGEVSFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDTDMEEKDAVDW AF544841 YERGEVSFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDTDMEEKDAVDW AF544836 YERGEVSFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDTDMEEKDAVDW AF544834 YERGEVSFAKKVTQFADMTHEEFLDLLKLQ--GVPALPSNAVHFDNFEDTDMEEKDAVDW AF190653 YEQGLVGYTMAVNQFADMTPEEFKAKLGMQAKNMPKIKKSRHVK----NVNAEVPDSVDW EF154432 YDKGEETYYMGVNQFADMTAEEFRHMLGLQNGARPNLNATLHVF----SENLQAPESIDW EF154433 YDKGEETYYMGVNQFADMTAEEFRHMLGLQNGARPNLNATLHVF----SENLQAPESIDW AY159368 ------AY159369 ------AY528226 YEEGKVTYYMAVTQFADMTRDEFRKKLGLQNNRRPNLNATLQVF----PEDLELPEQIDW AY528228 YEEGKVTYYMAVTQFADMTRDEFRKKLGLQNNRRPNLNATLRVF----PEDLELPEQIDW AY528227 YEEGKVTYYMAVTQFADMTRDEFRKKLGLQNNRRPNLNATLRVF----PEDLELPEQIDW AY528229 YDKGEESYFLGVTPFADLTHDEFKDELRRQIKTKPNVEATLAVF----PEGLEVPDSIDW AY528230 YDKGEESYFLGVTPFADLTHDEFKDKLRRQIKTKPNVEATLAVF----PEGLEVPDSIDW AY159373 ------AY159372 ------EF154438 CDKGEETYLLGVTRFADLTHEEFKDILKGQIKNKPRLNATPTVF----PEDLEVPDSIDW EF154436 YDKGEETYLLGVTRFADLTHEEFKDILKGQIKNKPRLNATPTVF----PEDLEVPDSIDW EF154437 YDKGEETYLLGVTRFADLTHEEFKDILKGQIKNKPRLNATPTVF----PEDLEVPDSIDW : * : *. ***:* :** * * * : . : : :** !

AF544844 RKEGAVTPVKNQGHCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEYYGNEGCN AF544839 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEDYGNNGCK AF544837 REEGAVTPAKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEDYGNNGCKMANUSCRIPT AF544840 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKK NGTLVSLSAQELVDCATEDYGNNGCK AF544843 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEDYGNNGCK AF544838 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEDYGNNGCK AF544835 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEEYGNNGCR AF544841 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEEYGNNGCR AF544836 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEEYGNNGCR AF544834 REEGAVTPVKDQANCGSCWAFSAVGAIEGQFFKKNGTLVSLSAQELVDCATEEYGNNGCR AF190653 RQKGAVLGVKDQGQCGSCWAFSATGSLEGQNYIVNGKSEPLSEQELLDCSVE-YGNGDCD EF154432 TQKGADLGVKNQGKCGSCWAFSSTGSLEGQNAIHHKVKTPLSERQLLDCSSS-YGNGDCD EF154433 TQKGADLGVKDQGKCGSCWAFSSTGSLEGQNAIHHKVKTPLSEQQLLDCSSS-YGNGDCD AY159368 ------EQGECGSCWAFSTTGSVEGQNAIHNKVKTPLSEQQLLDCSAS-YGNGDCH AY159369 ------EQGECGSCWAFSTTGSLEGQNAIHNKVKTPLSEQQLLDCSAS-YGNGDCD AY528226 TEKGAVLPVKNQGNCRSCWAFSTTGSLEGQNAIHNKVKTPLSEQQLLDCSAS-YGNGDCD AY528228 TEKGAVLPAKNQGNCRSCWAFSTTGSLEGQNAIHNKVKTPLSEQQLLDCSAS-YGNGDCD AY528227 TEKGAVLPVKNQGNCRSCWAFSTTGSLEGQNAIHNKVKTPLSEQQLLDCSAS-YGNGDCD AY528229 TQKGAVLDVKYQGGCGSCWAFSATGALEGQNAIVNNVKIPLSEQQLLDCSKP-YGNDDCE AY528230 TQKGAVLDVKYQGGCGSCWAFSATGALEGQNAIVNNVKIPLSEQQLLDCSKP-YGNDDCE AY159373 ------EQGECGSCWAFSATGALEGQNAIVNNVKIPLSEQQLLDCSKP-YGNDDCE AY159372 ACCEPTED ------EQGECGSCWAFSATGALEGQNAIVNNVKIPLSEQQLLDCSKP-YGNDDCE EF154438 TEKGAVLEVKGQNPCGSCWAFSATGALEGQNAILNNAKISLSEQQLLDCSAA-YGNGNCK EF154436 TEKGAVLEVKDQNPCGSCWAFSATGALKGQNAILNNVKISLSEQQLLDCSAA-YGNGNCK EF154437 TEKGAVLEVKDQNPCGSCWAFSATGALEGQNAILNNVKISLSEQQLLDCSAA-YGNGNCK ::** .* * * ******:.*:::** : .** .:*:**: *** .* "

Fig. S1. Cont’d

Page 2 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 -GGLMGQAFDFVEDEGIQTEESYPYKAKRSICQMNGE-YVTKVKTY-HLLLNEQEIARAV AF544839 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGE-YVTKVKTY-VFPLDEQEMARTV AF544837 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGE-YVTKVKTY-VFPLDEQEMARTV AF544840 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGE-YVTKVKTY-VFPLDEQEMARTV AF544843 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGE-YVTKVKTY-VFPLDEQEMARTV AF544838 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGE-YVTKVKTY-VFPLDEQEMARTV AF544835 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGD-YVTKVKTY-VFPLDEQEMARTV AF544841 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGD-YVTKVKTY-VFPLDEQEMARTV AF544836 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGD-YVTKVKTY-VFPLDEQEMARTV AF544834 -GGLMGQAFDFVQDEGIQTEESYPYEGRRSSCKKSGD-YVTKVKTY-VFPLDEQEMARTV AF190653 EGGLMTLAFEFVEENGIVSEASYPYEAIQGDCRTTNDKAVLHIQGYNEVYPSEEALRQAV EF154432 EGGLMTNAFKYIKAKGIEAGSSYPYQGRVGSCRYNAQKTILRIKGFKELRASEVELKKAV EF154433 EGGLMTNAFKYIKAKGIEAGSSYPYQGRVGSCRYNAQKTILRIKGFKELRASEVELKKAV AY159368 DGGLMTKAFNYIIDNGIEAESSYPYVEQMTECQYDAKKTIVQIKGYKKLLADEDELKKAV AY159369 DGGLMTEAFDYIIDNGIEAESSYPYVEQMTECQYDAKKTIVQIKGYKKLLADEDELKKAV AY528226 DGGLMTEAFDYIIDNGIEAESSYPYVEQMTECQYDAKKTIVQIKGYKKLLADEDELKKAV AY528228 DGGLMTEAFDYIIDNGIEAESSYPYVEQMTECQYDAKKTIVQIKGYKKLLADEDELKKAV AY528227 DGGLMTEAFDYIIDNGIEAESSYPYVEQMTECQYDAKKTIVQIKGYKKLLADEDELKKAV AY528229 HGGLMSFAFDYVLDKGIEADSSYPYKGIDTPCQYDAKKTVLKIKGYKNVSNSEEELKKAV AY528230 HGGLMSFAFDYVLDKGIEADSSYPYKGIDTPCQYDAKKTVLKIKGYRNVSISEEELKKAV AY159373 HGGLMSFAFDYVLDKGIEADSSYPYKGTDTPCQYDAKKTVLKIKGYKNVSISEEELKKAV AY159372 HGGLMSFAFDYVLDKGIEADSSYLYKGIDTPCQYDAKKTVLKIKGYKNVSISEEELKKAV EF154438 EGGDMSAAFEYVRDYGIQSEKSYPYIRKQTECQYDASKTILKIKGYKNVTTSEEGLRKAV EF154436 EGGDMSAAFDYVRDYGIQSEKSYPYIRKQTECQYDASKTILKIKGYKNVTTSEEGLRKAV EF154437 EGGDMSAAFEYVRDYGIQSEKSYPYIRKQTECQYDASKTILKIKGYKNVTTSEEGLRKAV ** * **.:: ** : ** * *. . : .:: : . .* : .:*

AF544844 SAKGPVAVAIDASQLSFYDQGIVDEKCKCSKKREDLNHGVLVVGYGSEN----GVDYWIV AF544839 AAKGPVAVAIEASQLSFYDKGIVDERCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF544837 AAKGPVAVAIEASQLSFYDKGIVDERCRCSNKREDLNHGVLVVGYGSENMANUSCRIPT----GVDYWIV AF544840 AAKGPVAVAIEASQLSFYDKGIVDERCRCSNKREDLNHGVLVVGYGSEN ----GVDYWIV AF544843 AAKGPVAVAIEASQLSFYDKGIVDERCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF544838 AAKGPVAVAIEASQLSFYDKGIVDERCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF544835 AAKGPVAVAIEASQLSFYDKGIVDETCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF544841 AAKGPVAVAIEASQLSFYDKGIVDETCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF544836 AAKGPVAVAIEASQLSFYDKGIVDEKCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF544834 AAKGPVAVAIEASQLSFYDKGIVDEKCRCSNKREDLNHGVLVVGYGSEN----GVDYWIV AF190653 GTVGPISAAIWAEPIQFFSSGIYDDPN-CLNYVEYLDHGILVVGYGEEN----GTPYWIV EF154432 GTIGPISVAVSSEHLRLYGGGVITTR--CIK---DLDHAVLAVGYGSEN----GRKYWKI EF154433 GTIGPISVAVSSEHLRLYGGGVITTR--CIK---DLDHAVLAVGYGSEN----GRKYWKI AY159368 GAVGPISVGMSSENLHMYGGGILDDQ--CYF---DMDHAVLVVGYGEAN----GKKFWRV AY159369 GTVGPISVGMSSENLHMYGGGVLDDQ--CYF---GMDHAVLVVGCGEAN----GKKFWKV AY528226 GTVGPISVGMSSENLHMYGGGVLGDQ--CYF---GMDHAVLVVGYGEAN----GKKFWKV AY528228 GTVGPISVGMSSENLHMYGGGVLDDQ--CYF---GMDHAVLVVGYGEAN----GKKFWKV AY528227 GTVGPISVGMSSENLHMYGGGVLDDQ--CYF---GMDHAVLVVGYGEAN----GKKFWKV AY528229 GTVGPVSVAIDADPIQLYFGGILDGLF-CTH---NLNHGVLAVGYGEEDHLFGKKKFWKV AY528230 GTVGPVSVAIDADPIQLYSGGILDGLF-CTH---NLNHGVLAVGYGEEDHLFGKKKFWKV AY159373 GTVGPVSVAIDADPIQLYSGGILDGLF-CTH---NLNHGVLAVGYGEEDHLFGKKKFWKV AY159372 ACCEPTED GTVGPVSVAIDADPIQLYFGGILDGLF-CTH---NLNHGVLAVGYGEEDHLFGKKKFWKV EF154438 GTIGPMSIAMNSGPLQLYYSGIFSGKG-CSH---DLDHGVLVVGYGKASQWSGETKFWRV EF154436 GTIGPISIAMNSDPLQLYYSGTISGKG-CSH---DLDHGVLVVGYGKASQWSGETKFWRV EF154437 GAIGPISIAMNSDPLQLYYSGIISGKG-CSH---DLDHGVLVVGYGKASQWSGETKFWRV .: **:: .: : : :: * * ::*.:*.** *. . :* :

Fig. S1. Cont’d

Page 3 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 KNSWGADWGEKGYFRLKKDV-KACGIGNYNTYPVLL AF544839 KNSWGADWGEKGYFRLKKDV-KACGIGTYNTYPVLL AF544837 KNSWGADWGEKGYFRLKKDV-KACGIGYYNTYPILL AF544840 KNSWGADWGEKGYFRLKKDV-KACGIGYYNTYPIFL AF544843 KNSWGADWGEKGYFRLKKDV-KACGIGYYNTYPILL AF544838 KNSWGADWGEKGYFRLKKDV-KACGIGYYNPYPILL AF544835 KNSWGADWGEKGYFRLKKDV-KACGIGYYNPYPILL AF544841 KNSWGADWGEKGYFRLKKDV-KACGIGYYNTYPIFL AF544836 KNSWGADWGEKGYFRLKKDV-KACGIGYYNTYPILL AF544834 KNSWGADWGEKGYFRLKKDV-KACGIDYYNTYPILL AF190653 KNSWGATWGEEGYFRLKRNI-ALCGLAQMASYPVL- EF154432 RNSWGKTWGDHGYFKLARDAGNLCGVASMASYPLL- EF154433 RNSWGKTWGDHGYFKLARDAGNLCGVASMASYPLL- AY159368 KNSWGTTWGEDGYFRIERDADNLCDIASMCSYPILL AY159369 KNSWGTTWGEDGYFRIERDADNLCDIASMCSYPILL AY528226 KNSWGATWGEDGYFRIERDADNLCDIASMCSYPILL AY528228 KNSWGTTWGEDGYFRIERDANNLCDIASMCSYPILL AY528227 KNSWGTTWGEDGYFRIERDADNLCDIASMCSYPILP AY528229 KNSWGKDWGEQGYFRIKRDANNLCGIADKASYPIL- AY528230 KNSWGKDWGEQGYFRIKRDANNLCGIADKASYPIL- AY159373 KNSWGKDWGEQGYFRIKRDANNLCGIADKASYPIL- AY159372 KNSWGKDWGEQGYFRIKRDANNLCGIADKASYPIL- EF154438 KNSWGKIWGENGYFRIKRDANNLCGIADDPTYPVL- EF154436 KNSWGKIWGENGYFRIKRDANNLCGIADDPTYPVL- EF154437 KNSWGKIWGENGYFRIKRDANNLCGIADDPTYPVL- .**** **: ***.: .: *.: .**::

Fig. S1. Cont’d MANUSCRIPT

ACCEPTED

Page 4 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

Multiple sequence alignment by MUSCLE (3.8) | CLUSTALW Format output

AF544844 (CPB1) GAAGAAAAAGACGCGGTCGACTGGAGAAAAGAGGGAGCCGTGACA AF544839 (CPA16) GAAGAAAAGGACGCGATCGACTGGAGAGAAGAGGGAGCCGTGACA AF544838 (CPA15) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544837 (CPA13) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544840 (CPA21) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544843 (CPA27) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544834 (CPA1) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544836 (CPA11) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544835 (CPA9) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF544841 (CPA23) GAAGAAAAGGACGCGGTCGACTGGAGAGAAGAGGGAGCCGTGACA AF190653 (CAL1) GAGGTTCCGGACTCCGTTGATTGGAGACAAAAAGGTGCCGTTTTA EF154432 (IntE2) CAAGCTCCAGAATCTATAGATTGGACACAGAAGGGAGCAGATTTG EF154433 (IntE3) CAAGCTCCAGAATCTATAGATTGGACACAGAAGGGAGCAGATTTG AY159368 (IntA1) ------AY159369 (IntA2) ------AY528228 (IntA27) GAATTGCCAGAACAGATCGATTGGACGGAAAAAGGAGCAGTTTTG AY528226 (IntA24) GAATTGCCAGAACAGATCGATTGGACGGAAAAAGGAGCAGTTTTG AY528227 (IntA26) GAATTGCCAGAACAGATCGATTGGACGGAAAAAGGAGCAGTTTTG AY159373 (IntB3) ------AY159372 (IntB2) ------AY528229 (IntB11) GAGGTTCCAGACTCTATCGATTGGACTCAAAAAGGAGCTGTCCTC AY528230 (IntB12) GAGGTTCCAGACTCTATCGATTGGACTCAAAAAGGAGCTGTCCTC EF154438 (IntD6) GAAGTTCCAGATTCCATAGATTGGACCGAGAAGGGAGCAGTTTTG EF154436 (IntD4) GAAGTTCCAGATTCCATAGATTGGACCGAGAAGGGAGCAGTTTTG EF154437 (IntD5) GAAGTTCCAGATTCCATAGATTGGACCGAGAAGGGAGCAGTTTTG a gaMANUSCRIPT t ga tgga a a gg gc g !

Fig. S2. Alignment of Coleoptera Cys protease-encoding codons for the positive selection analysis. Coding sequences of the mature proteases were aligned online using the Multiple Sequence Comparison by Log-

Expectation (MUSCLE) alignment program (http://www.ebi.ac.uk/Tools/msa/muscle/) (Edgar, 2004), and then edited manually based on the protein sequence alignments (see Fig. S1) to eliminate indel mismatches between the two sequence datasets. Open and black circles under the sequence alignment point, respectively, to the first residue of the mature enzymes and to the starting residue for the positive selection analysis. Asterisks point to conserved nucleotides among the 25 tested sequences. Lowercase letters identify conserved nucleotides among sequence data from incomplete datasetsACCEPTED. Protease names for the 25 accession numbers are given in parentheses.

______Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high-throughput. Nucl. Acids Res.

32, 1792–1797.

Page 1 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 CCTGTCAAGAACCAGGGACATTGCGGGTCATGCTGGGCATTCAGTGCGGTGGGGGCTATC AF544839 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCTTTCAGTGCGGTGGGGGCTATC AF544838 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCTTTCAGTGCGGTGGGGGCTATC AF544837 CCTGCCAAGGACCAGGCAAATTGCGGATCATGCTGGGCTTTCAGTGCGGTGGGGGCTATC AF544840 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCTTTCAGTGCGGTGGGGGCTATC AF544843 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCTTTCAGTGCGGTGGGGGCTATC AF544834 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCCTTCAGTGCGGTGGGGGCTATC AF544836 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCCTTCAGTGCGGTGGGGGCTATC AF544835 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCCTTCAGTGCGGTGGGGGCTATC AF544841 CCTGTCAAGGACCAGGCAAATTGCGGATCATGCTGGGCCTTCAGTGCGGTGGGGGCTATC AF190653 GGTGTTAAAGATCAAGGACAGTGTGGATCCTGCTGGGCTTTCAGTGCTACTGGTTCTCTT EF154432 GGAGTGAAAAATCAAGGAAAATGTGGATCTTGCTGGGCTTTCAGTTCCACCGGGTCATTG EF154433 GGAGTGAAAGATCAAGGAAAATGTGGATCTTGCTGGGCTTTCAGTTCCACCGGGTCATTG AY159368 ------GAGCAAGGGGAGTGTGGGTCGTGTTGGGCGTTCAGTACTACTGGATCAGTG AY159369 ------GAGCAAGGGGAGTGTGGGTCGTGCTGGGCTTTCAGTACTACTGGATCACTG AY528228 CCCGCGAAAAACCAGGGCAATTGTAGATCTTGCTGGGCTTTCAGTACTACTGGATCACTG AY528226 CCTGTGAAAAACCAGGGCAATTGTAGATCTTGCTGGGCTTTCAGTACTACTGGATCACTG AY528227 CCTGTGAAAAACCAGGGCAATTGTAGATCTTGCTGGGCTTTCAGTACTACTGGATCACTG AY159373 ------GAGCAGGGGGAGTGTGGGTCGTGTTGGGCTTTCAGTGCGACTGGAGCTTTG AY159372 ------GAGCAGGGGGAGTGCGGGTCGTGTTGGGCTTTCAGTGCGACTGGAGCTTTG AY528229 GATGTTAAATATCAAGGAGGTTGTGGTTCGTGCTGGGCTTTCAGTGCGACTGGAGCTTTG AY528230 GATGTTAAATATCAAGGAGGTTGTGGTTCGTGCTGGGCTTTCAGTGCGACTGGAGCTTTG EF154438 GAAGTTAAAGGTCAGAATCCATGTGGTTCTTGCTGGGCTTTCAGTGCTACCGGAGCTTTG EF154436 GAAGTTAAAGATCAAAATCCATGTGGTTCTTGCTGGGCTTTCAGTGCTACCGGAGCTTTG EF154437 GAAGTTAAAGATCAAAATCCATGTGGTTCTTGCTGGGCTTTCAGTGCTACCGGAGCTTTG g aa ** ** * ** ** ***** ****** * ** * * "

AF544844 GAGGGTCAGTTTTTCAAAAAGAACGGAACTTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544839 GAGGGTCAGTTTTTCAAGAAAAACGGGACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544838 GAGGGTCAGTTTTTCAAGAAAAACGGGACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTGMANUSCRIPT AF544837 GAGGGTCAGTTTTTCAAGAAAAACGGGACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544840 GAGGGTCAGTTTTTCAAGAAAAACGGGACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544843 GAGGGTCAGTTTTTCAAGAAAAACGGGACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544834 GAGGGTCAGTTTTTCAAAAAAAACGGAACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544836 GAGGGTCAGTTTTTCAAAAAAAACGGAACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544835 GAGGGTCAGTTTTTTAAAAAAAACGGAACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF544841 GAGGGTCAGTTTTTCAAAAAAAACGGAACGTTGGTCAGCTTGAGCGCCCAGGAACTAGTG AF190653 GAAGGACAAAACTACATCGTCAATGGAAAATCAGAACCCCTCAGTGAGCAGGAACTTCTG EF154432 GAAGGGCAGAACGCTATCCACCATAAAGTGAAGACTCCTCTGAGCGAACGACAACTTCTT EF154433 GAAGGGCAGAACGCTATCCACCATAAAGTGAAGACTCCTCTGAGCGAACAACAACTTCTT AY159368 GAAGGTCAAAACGCTATTCACAACAAAGTGAAGACTCCCCTAAGTGAGCAACAACTTCTG AY159369 GAAGGTCAAAACGCTATTCACAACAAAGTGAAGACTCCCCTAAGTGAGCAACAACTTCTG AY528228 GAAGGTCAAAACGCTATTCACAACAAAGTGAAGACTCCCCTAAGTGAGCAACAACTTCTG AY528226 GAAGGTCAAAACGCTATTCACAACAAAGTGAAGACTCCCCTAAGTGAGCAACAACTTCTG AY528227 GAAGGTCAAAACGCTATTCACAACAAAGTGAAGACTCCCCTAAGTGAGCAACAACTTCTG AY159373 GAAGGACAGAATGCCATTGTCAATAATGTGAAGATTCCTTTGAGTGAACAGCAGCTATTG AY159372 GAAGGACAGAATGCCATTGTCAATAATGTGAAGATTCCTTTGAGTGAACAGCAGCTATTG AY528229 GAAGGACAGAATGCCATTGTCAATAATGTGAAGATTCCTTTGAGTGAACAGCAGCTATTG AY528230 ACCEPTED GAAGGACAAAATGCCATTGTCAATAATGTGAAGATTCCTTTGAGTGAACAGCAGCTATTG EF154438 GAAGGACAAAACGCAATTCTGAACAACGCGAAGATCTCTTTGAGTGAACAGCAGTTACTC EF154436 AAGGGACAAAACGCAATTCTGAACAACGTGAAGATCTCTTTGAGTGAACAGCAGTTACTC EF154437 GAGGGACAAAACGCAATTCTGAACAACGTGAAGATCTCTTTGAGTGAACAGCAGTTACTC * ** ** * * * ** * * * * *

Fig. S2. Cont’d

Page 2 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 GACTGTGCGACTGAATACTACGGCAATGAGGGTTGCAAT---GGTGGTCTGATGGGTCAG AF544839 GACTGTGCGACTGAGGACTACGGCAATAATGGTTGCAAA---GGTGGCCTGATGGGTCAG AF544838 GACTGTGCGACTGAGGACTACGGCAATAATGGTTGCAAA---GGTGGCCTGATGGGTCAG AF544837 GACTGTGCGACTGAGGACTACGGCAATAATGGTTGCAAA---GGTGGCCTGATGGGTCAG AF544840 GACTGTGCGACTGAGGACTACGGCAATAATGGTTGCAAA---GGTGGCCTGATGGGTCAG AF544843 GACTGTGCGACTGAGGACTACGGCAATAATGGTTGCAAA---GGTGGCCTGATGGGTCAG AF544834 GACTGTGCAACTGAGGAGTACGGCAATAATGGTTGTCGG---GGTGGTCTGATGGGTCAG AF544836 GACTGTGCAACTGAGGAGTACGGCAATAATGGTTGTCGG---GGTGGTCTGATGGGTCAG AF544835 GACTGTGCAACTGAGGAGTACGGCAATAATGGTTGTCGG---GGTGGTCTGATGGGTCAG AF544841 GACTGTGCAACTGAGGAGTACGGCAATAATGGTTGTCGG---GGTGGTCTGATGGGTCAG AF190653 GATTGTTCCGTAGAA---TACGGAAATGGTGACTGCGATGAAGGTGGTCTTATGACCCTT EF154432 GACTGTTCCTCCAGT---TACGGAAACGGTGATTGTGATGAAGGAGGACTCATGACGAAC EF154433 GACTGTTCCTCCAGT---TACGGAAACGGTGATTGTGACGAAGGAGGACTCATGACGAAC AY159368 GATTGTTCTGCCAGT---TATGGAAATGGTGATTGTCATGACGGTGGTCTCATGACTAAA AY159369 GATTGTTCTGCCAGT---TATGGAAATGGTGATTGTGATGACGGTGGTCTCATGACTGAA AY528228 GATTGTTCTGCCAGT---TATGGAAATGGTGATTGTGATGACGGTGGTCTCATGACTGAA AY528226 GATTGCTCTGCCAGT---TATGGAAATGGTGATTGTGATGACGGTGGTCTCATGACTGAA AY528227 GATTGTTCTGCCAGT---TATGGAAATGGTGATTGTGATGACGGTGGTCTCATGACTGAA AY159373 GACTGTTCCAAACCT---TACGGTAATGATGATTGCGAACATGGAGGCCTGATGAGCTTC AY159372 GACTGTTCCAAACCT---TACGGTAATGATGATTGCGAACATGGAGGCCTGATGAGCTTC AY528229 GACTGTTCCAAACCT---TACGGTAATGATGATTGCGAACATGGAGGCCTGATGAGCTTC AY528230 GACTGTTCCAAACCT---TACGGTAATGATGATTGCGAACATGGAGGCCTGATGAGCTTC EF154438 GATTGTTCTGCTGCT---TATGGCAATGGTAACTGTAAAGAAGGAGGTGATATGTCAGCC EF154436 GATTGTTCTGCTGCT---TATGGCAATGGTAACTGTAAAGAAGGAGGTGATATGTCAGCC EF154437 GATTGTTCTGCTGCT---TATGGCAATGGTAACTGTAAAGAAGGAGGTGATATGTCAGCC ** *** * a ** ** ** ** ** ** ***

AF544844 GCATTCGATTTCGTGGAAGATGAGGGCATCCAGACTGAAGAATCGTATCCTTACAAAGCC AF544839 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTATCCTTACGAAGGC AF544838 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTATCCTTACGAAGGCMANUSCRIPT AF544837 GCATTCGATTTCGTGCAAGATGAGG GCATCCAGACTGAAGAATCGTATCCTTACGAAGGC AF544840 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTATCCTTACGAAGGC AF544843 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTATCCTTACGAAGGC AF544834 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTACCCTTACGAAGGC AF544836 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTACCCTTACGAAGGC AF544835 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTACCCTTACGAAGGC AF544841 GCATTCGATTTCGTGCAAGATGAGGGCATCCAGACTGAAGAATCGTACCCTTACGAAGGC AF190653 GCTTTTGAGTTCGTCGAGGAAAACGGAATCGTATCTGAAGCTAGCTATCCGTACGAAGCT EF154432 GCTTTCAAATATATCAAAGCCAAGGGAATAGAAGCAGGAAGTTCTTATCCTTATCAAGGA EF154433 GCTTTCAAATATATCAAAGCCAAGGGAATAGAAGCAGGAAGTTCTTATCCTTATCAAGGA AY159368 GCTTTCAATTATATTATAGACAACGGAATTGAGGCAGAATCTTCTTACCCCTATGTAGAA AY159369 GCTTTCGATTATATTATAGACAACGGAATTGAGGCAGAATCTTCTTACCCCTATGTNGAA AY528228 GCTTTCGATTATATTATAGACAACGGAATTGAGGCAGAATCTTCTTACCCCTATGTAGAA AY528226 GCTTTCGATTATATTATAGACAACGGAATTGAGGCAGAATCTTCTTACCCCTATGTAGAA AY528227 GCTTTCGATTATATTATAGACAACGGAATTGAGGCAGAATCTTCTTACCCCTATGTAGAA AY159373 GCTTTTGATTATGTCCTAGACAAAGGAATCGAAGCAGATAGTTCTTACCCGTACAAGGGA AY159372 GCTTTTGATTATGTCCTAGACAAAGGAATCGAAGCAGATAGTTCTTACCTGTACAAGGGA AY528229 GCTTTTGATTATGTCCTAGACAAAGGAATCGAAGCAGATAGTTCTTACCCGTACAAGGGA AY528230 ACCEPTED GCTTTTGATTATGTCCTAGACAAAGGAATCGAAGCAGATAGTTCTTACCCGTACAAGGGA EF154438 GCATTCGAATATGTCAGAGACTACGGTATACAATCAGAGAAGTCTTATCCATACATACGC EF154436 GCATTCGATTATGTCAGAGACTACGGTATACAATCAGAGAAGTCTTATCCATACATACGC EF154437 GCATTCGAATATGTCAGAGACTACGGTATACAATCAGAGAAGTCTTATCCATACATACGC ** ** * * * * * ** ** * * ** * **

Fig. S2. Cont’d

Page 3 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 AAAAGGTCAATCTGCCAAATGAATGGCGAA---TATGTGACTAAAGTCAAAACCTAC--- AF544839 CGTAGGTCAAGCTGCAAAAAAAGTGGCGAA---TATGTGACCAAGGTCAAGACGTAC--- AF544838 CGTAGGTCAAGCTGCAAAAAAAGTGGCGAA---TATGTGACCAAGGTCAAGACGTAC--- AF544837 CGTAGGTCAAGCTGCAAAAAAAGTGGCGAA---TATGTGACCAAGGTCAAGACGTAC--- AF544840 CGTAGGTCAAGCTGCAAAAAAAGTGGCGAA---TATGTGACCAAGGTCAAGACGTAC--- AF544843 CGTAGGTCAAGCTGCAAAAAAAGTGGCGAA---TATGTGACCAAGGTCAAGACGTAC--- AF544834 CGTAGGTCGAGCTGCAAAAAAAGTGGAGAC---TATGTGACCAAGGTCAAGACATAC--- AF544836 CGTAGGTCGAGCTGCAAAAAAAGTGGAGAC---TATGTGACCAAGGTCAAGACATAC--- AF544835 CGTAGGTCGAGCTGCAAAAAAAGTGGCGAC---TATGTGACCAAGGTCAAGACATAC--- AF544841 CGTAGGTCGAGCTGCAAAAAAAGTGGCGAC---TATGTGACCAAGGTCAAGACATAC--- AF190653 ATCCAAGGAGATTGCAGAACAACCAATGACAAGGCAGTACTTCATATTCAAGGTTACAAT EF154432 AGAGTGGGTTCCTGCAGGTATAATGCCCAAAAAACTATTCTCAGAATAAAAGGATTCAAA EF154433 AGAGTGGGTTCCTGCAGGTATAATGCCCAAAAAACTATTCTCAGAATAAAAGGATTCAAA AY159368 CAAATGACTGAATGCCAATACGATGCCAAGAAAACCATTGTGCAAATAAAGGGATATAAG AY159369 CAAATGACTGAATGCCAATACGATGCCAAGAAAACCATTGTTCAAATAAAGGGATATAAG AY528228 CAAATGACTGAATGCCAATACGATGCCAAGAAAACCATTGTTCAAATAAAGGGATATAAG AY528226 CAAATGACTGAATGCCAATACGATGCCAAGAAAACCATTGTTCAAATAAAGGGATATAAG AY528227 CAAATGACTGAATGCCAATACGATGCCAAGAAAACCATTGTTCAAATAAAGGGATATAAG AY159373 ACAGATACCCCGTGCCAATACGACGCTAAAAAAACGGTTTTGAAAATCAAGGGTTACAAA AY159372 ATAGATACCCCGTGCCAATACGACGCTAAAAAAACGGTTTTGAAAATCAAGGGTTACAAA AY528229 ATAGATACCCCGTGCCAATACGACGCTAAAAAAACGGTTTTGAAAATCAAGGGTTACAAA AY528230 ATAGATACCCCGTGCCAATACGACGCTAAAAAAACGGTTTTGAAAATCAAGGGTTACAGA EF154438 AAGCAAACTGAGTGCCAGTACGATGCAAGCAAAACAATTTTGAAAATTAAAGGGTACAAA EF154436 AAGCAAACTGAGTGCCAGTACGATGCAAGCAAAACAATTTTGAAAATCAAAGGGTACAAA EF154437 AAGCAAACTGAGTGCCAGTACGATGCAAGCAAAACAATTTTGAAAATCAAAGGGTACAAA *** aa * * * * aa

AF544844 CACTTACTTTTGAATGAACAGGAAATTGCTAGGGCTGTGTCTGCCAAAGGCCCAGTAGCT AF544839 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544838 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544837 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAMANUSCRIPTGGTCCAGTAGCT AF544840 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544843 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544834 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544836 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544835 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF544841 GTCTTCCCTTTGGATGAGCAGGAAATGGCTAGGACTGTGGCTGCCAAAGGTCCAGTAGCT AF190653 GAAGTATATCCAAGTGAAGAAGCTTTGAGACAAGCTGTCGGTACTGTTGGTCCCATTTCT EF154432 GAATTGCGCGCTTCTGAAGTAGAACTGAAAAAAGCCGTAGGTACCATTGGCCCCATATCA EF154433 GAATTGCGCGCTTCTGAAGTAGAACTGAAAAAAGCCGTAGGTACCATTGGCCCCATATCA AY159368 AAATTGCTAGCGGACGAAGATGAACTCAAAAAAGCTGTAGGAGCTGTCGGTCCCATATCA AY159369 AAATTGCTAGCGGATGAAGATGAACTCAAAAAAGCTGTAGGAACTGTCGGTCCCATATCA AY528228 AAATTGCTAGCGGATGAAGATGAACTCAAAAAGGCTGTAGGAACTGTCGGTCCCATATCA AY528226 AAATTGCTAGCGGATGAAGATGAACTCAAAAAAGCTGTAGGAACTGTCGGTCCCATATCA AY528227 AAATTGCTAGCGGATGAAGATGAACTCAAAAAGGCTGTAGGAACTGTCGGTCCCATATCA AY159373 AATGTCAGCATCTCAGAAGAAGAACTGAAGAAAGCTGTTGGAACCGTTGGTCCCGTATCA AY159372 AATGTCAGCATCTCAGAAGAAGAACTGAAGAAAGCTGTTGGAACCGTTGGTCCCGTATCA AY528229 AATGTCAGCAACTCAGAAGAAGAACTGAAGAAAGCTGTTGGAACCGTTGGTCCCGTATCA AY528230 AATGTCAGCATCTCAGAAGAAGAACTGAAGAAAGCTGTTGGAACCGTTGGTCCCGTATCA EF154438 ACCEPTED AATGTGACCACATCTGAAGAGGGGCTCAGAAAAGCTGTTGGAACCATCGGTCCCATGTCG EF154436 AATGTGACCACATCTGAAGAGGGGCTCAGAAAAGCTGTTGGAACCATCGGTCCCATATCG EF154437 AATGTGACCACATCTGAAGAGGGGCTCAGAAAAGCTGTTGGAGCCATCGGTCCCATATCG * ** * * * ** * ** ** * *

Fig. S2. Cont’d

Page 4 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 GTAGCCATAGACGCGTCTCAGCTGTCATTTTACGATCAGGGAATTGTGGATGAGAAGTGC AF544839 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAGGTGC AF544838 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAGGTGC AF544837 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAGGTGC AF544840 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAGGTGC AF544843 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAGGTGC AF544834 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAAGTGC AF544836 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGAAGTGC AF544835 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGACGTGC AF544841 GTAGCCATAGAAGCGTCTCAGCTGTCATTTTACGATAAGGGAATTGTGGACGAGACGTGC AF190653 GCAGCTATTTGGGCTGAACCAATCCAGTTCTTCTCAAGCGGTATATATGACGACCCAAAT EF154432 GTAGCTGTTAGTTCAGAACATTTGCGGTTGTACGGTGGAGGAGTTATAACTACTAGG--- EF154433 GTAGCTGTTAGTTCAGAACATTTGCGGTTGTACGGTGGAGGAGTTATAACTACTAGG--- AY159368 GTGGGAATGAGCTCTGAAAATTTGCATATGTATGGTGGAGGAATCCTTGATGATCAA--- AY159369 GTGGGAATGAGCTCTGAAAATTTGCATATGTATGGTGGAGGAGTCCTTGATGATCAA--- AY528228 GTGGGAATGAGCTCTGAAAATTTGCATATGTATGGTGGAGGAGTCCTTGATGATCAA--- AY528226 GTGGGAATGAGCTCTGAAAATTTGCATATGTATGGTGGAGGAGTCCTTGGTGATCAA--- AY528227 GTGGGAATGAGCTCTGAAAATTTGCATATGTATGGTGGAGGAGTCCTTGATGATCAA--- AY159373 GTTGCCATCGATGCTGACCCAATTCAACTATATTCCGGAGGAATTCTGGATGGATTATTC AY159372 GTTGCCATCGATGCTGACCCAATTCAACTATATTTCGGAGGAATTCTGGATGGATTATTC AY528229 GTTGCCATCGATGCTGACCCAATTCAACTATATTTCGGAGGAATTCTGGATGGATTATTC AY528230 GTTGCCATCGATGCTGACCCAATTCAACTATATTCCGGAGGAATTCTGGATGGATTATTC EF154438 ATTGCCATGAATTCTGGTCCATTACAACTTTATTATTCAGGAATATTCAGCGGAAAAGGC EF154436 ATTGCCATGAATTCTGATCCATTACAACTTTATTATTCAGGAACAATCAGCGGAAAAGGC EF154437 ATTGCCATGAATTCTGATCCATTACAACTTTATTATTCAGGAATAATCAGCGGAAAAGGC * * * * * * **

AF544844 AAATGCAGTAAGAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTCGGATATGGAAGC AF544839 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTCGGATATGGAAGC AF544838 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTAGGATATGGAAGCMANUSCRIPT AF544837 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTAGGATATGGAAGC AF544840 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTAGGATATGGAAGC AF544843 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTAGGATATGGAAGC AF544834 AGATGTAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTCGGATATGGAAGC AF544836 AGATGTAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTCGGATATGGAAGC AF544835 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTAGGATATGGCAGC AF544841 AGATGCAGCAACAAACGGGAAGATCTCAACCACGGCGTCCTGGTGGTAGGATATGGCAGC AF190653 ---TGCTTGAACTATGTCGAATACTTGGATCACGGAATTCTCGTCGTAGGTTACGGTGAA EF154432 ---TGTATCAAA------GACCTTGATCATGCTGTTCTCGCTGTTGGATATGGTTCT EF154433 ---TGTATCAAA------GACCTTGATCATGCTGTTCTCGCCGTTGGATATGGTTCT AY159368 ---TGTTATTTT------GACATGGATCACGCTGTTCTTGTTGTTGGATATGGTGAA AY159369 ---TGTTATTTT------GGCATGGATCACGCTGTTCTTGTTGTTGGATGTGGTGAA AY528228 ---TGTTATTTT------GGCATGGATCACGCTGTTCTTGTTGTTGGATATGGTGAA AY528226 ---TGTTATTTT------GGCATGGATCACGCTGTTCTTGTTGTTGGATATGGTGAA AY528227 ---TGTTATTTT------GGCATGGATCACGCTGTTCTTGTTGTTGGATATGGTGAA AY159373 ---TGCACACAT------AATCTGAACCATGGAGTGCTGGCTGTCGGGTATGGGGAG AY159372 ---TGCACACAT------AATCTGAACCATGGAGTGCTGGCTGTCGGGTATGGGGAG AY528229 ---TGCACACAT------AATCTGAACCATGGAGTGCTGGCTGTCGGGTATGGGGAG AY528230 ACCEPTED ---TGCACACAT------AATCTGAACCATGGAGTGCTGGCTGTCGGGTATGGGGAG EF154438 ---TGTTCACAT------GACCTAGATCATGGCGTACTTGTTGTAGGTTACGGAAAG EF154436 ---TGTTCACAT------GACCTAGATCATGGCGTACTTGTTGTAGGTTACGGAAAG EF154437 ---TGTTCACAT------GACCTAGATCATGGCGTACTTGTTGTAGGTTACGGAAAG a a** gaa * * ** * * ** * ** ** * **

Fig. S2. Cont’d

Page 5 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544839 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544838 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544837 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544840 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544843 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544834 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544836 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544835 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF544841 GAAAAC------GGCGTGGACTACTGGATCGTGAAGAATTCTTGGGGAGCGGAC AF190653 GAGAAT------GGCACACCCTACTGGATTGTCAAGAACTCATGGGGAGCAACT EF154432 GAAAAC------GGTAGGAAGTACTGGAAGATTAGGAACTCTTGGGGAAAAACT EF154433 GAAAAC------GGTAGGAAGTACTGGAAGATTAGGAACTCTTGGGGAAAAACT AY159368 GCCAAT------GGAAAGAAATTTTGGAGAGTGAAAAATTCTTGGGGAACAACT AY159369 GCCAAT------GGAAAGAAATTTTGGAAAGTGAAAAATTCTTGGGGAACAACT AY528228 GCCAAT------GGAAAGAAATTTTGGAAAGTGAAAAATTCTTGGGGAACAACT AY528226 GCCAAT------GGAAAGAAATTTTGGAAAGTGAAAAATTCTTGGGGAGCAACT AY528227 GCCAAT------GGAAAGAAATTTTGGAAAGTGAAAAATTCTTGGGGAACAACT AY159373 GAAGACCATTTGTTCGGCAAGAAAAAGTTCTGGAAGGTGAAAAACTCCTGGGGGAAAGAC AY159372 GAAGACCATTTGTTCGGCAAGAAAAAGTTCTGGAAGGTGAAAAACTCCTGGGGAAAAGAC AY528229 GAAGACCATTTGTTCGGCAAGAAAAAGTTCTGGAAGGTGAAAAACTCCTGGGGAAAAGAC AY528230 GAAGACCATTTGTTCGGCAAGAAAAAGTTCTGGAAGGTGAAAAACTCCTGGGGAAAAGAC EF154438 GCATCCCAGTGGTCTGGTGAAACCAAGTTCTGGAGAGTCAAAAACTCATGGGGAAAAATC EF154436 GCATCCCAGTGGTCTGGTGAAACCAAGTTCTGGAGAGTCAAAAACTCATGGGGAAAAATC EF154437 GCATCCCAGTGGTCTGGTGAAACCAAGTTTTGGAGAGTCAAAAACTCATGGGGAAAAATC * ca t gt gg * **** * * ** ** *****

AF544844 TGGGGGGAGAAGGGATACTTCAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544839 TGGGGCGAGAAGGGATACTTCAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544838 TGGGGCGAGAAGGGATACTTTAGGCTCAAGAAGGACGTCMANUSCRIPT---AAAGCGTGCGGAATTGGC AF544837 TGGGGCGAGAAGGGATACTTCAGGC TCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544840 TGGGGCGAGAAGGGATACTTTAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544843 TGGGGCGAGAAGGGATACTTTAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544834 TGGGGCGAGAAGGGATACTTCAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGAC AF544836 TGGGGCGAGAAGGGATACTTCAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544835 TGGGGCGAGAAGGGATACTTCAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF544841 TGGGGCGAGAAGGGATACTTCAGGCTCAAGAAGGACGTC---AAAGCGTGCGGAATTGGC AF190653 TGGGGTGAAGAAGGTTACTTCAGGTTGAAGAGAAATATT---GCATTATGTGGTTTGGCT EF154432 TGGGGAGATCATGGATACTTCAAACTTGCAAGAGATGCTGGCAACCTATGCGGTGTTGCC EF154433 TGGGGAGATCATGGATACTTCAAACTTGCAAGAGATGCTGGCAACCTATGCGGTGTTGCC AY159368 TGGGGAGAAGACGGATACTTCCGAATTGAAAGAGATGCTGATAATCTGTGTGACATTGCT AY159369 TGGGGAGAAGACGGATACTTCCGAATTGAAAGAGATGCCGATAATTTGTGTGACATTGCT AY528228 TGGGGAGAAGACGGATACTTCCGAATTGAAAGAGACGCCAATAATTTGTGTGACATTGCT AY528226 TGGGGAGAAGACGGATACTTCCGAATTGAAAGAGATGCCGATAATTTGTGTGACATTGCT AY528227 TGGGGAGAAGACGGATACTTCCGAATTGAAAGAGATGCCGATAATTTGTGTGACATTGCT AY159373 TGGGGAGAGCAAGGATACTTCAGGATCAAGCGAGATGCCAATAACTTATGCGGTATCGCT AY159372 TGGGGAGAGCAAGGATACTTCAGGATCAAGCGAGATGCCAATAACTTATGCGGTATCGCT AY528229 TGGGGAGAGCAAGGATACTTCAGGATCAAGCGAGATGCCAATAACTTATGCGGTATCGCT AY528230 ACCEPTED TGGGGAGAGCAAGGATACTTCAGGATCAAGCGAGATGCCAATAACTTATGCGGTATCGCT EF154438 TGGGGAGAAAACGGGTACTTCAGAATCAAAAGAGATGCTAACAACCTATGTGGTATTGCC EF154436 TGGGGAGAAAACGGGTACTTCAGAATCAAAAGAGATGCTAACAACCTATGTGGTATTGCC EF154437 TGGGGAGAAAACGGGTACTTCAGAATCAAAAGAGATGCTAACAACCTATGTGGTATTGCC ***** ** * ** ***** * * ** * * *

Fig. S2. Cont’d

Page 6 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

AF544844 AACTATAACACTTACCCCGTTCTTCTG AF544839 ACCTATAACACTTACCCCGTTCTTCTG AF544838 TACTACAACCCTTACCCCATTCTTCTG AF544837 TACTACAACACTTACCCCATTCTTCTG AF544840 TACTACAACACTTACCCCATTTTTCTG AF544843 TACTACAACACTTACCCCATTCTTCTG AF544834 TACTACAACACTTACCCCATTCTTCTG AF544836 TACTACAACACTTACCCCATTCTTCTG AF544835 TACTACAACCCTTACCCCATTCTTCTG AF544841 TACTACAACACTTACCCCATTTTTCTG AF190653 CAGATGGCCAGTTATCCAGTTTTG--- EF154432 TCTATGGCATCATATCCGCTTCTT--- EF154433 TCTATGGCATCATATCCACTTCTT--- AY159368 TCTATGTGTTCATATCCTATTCTCCTC AY159369 TCTATGTGTTCATATCCTATTCTCCTC AY528228 TCTATGTGTTCATATCCTATTCTCCTC AY528226 TCTATGTGTTCATATCCTATTCTCCTC AY528227 TCTATGTGTTCATATCCTATTCTCCCC AY159373 GATAAAGCTTCTTACCCAATTCTA--- AY159372 GATAAAGCTTCTTACCCAATTCTA--- AY528229 GATAAAGCTTCTTACCCAATTCTA--- AY528230 GATAAAGCTTCTTACCCAATTCTA--- EF154438 GATGACCCCACTTACCCAGTTCTA--- EF154436 GATGACCCCACTTACCCAGTTCTA--- EF154437 GATGACCCCACTTACCCAGTTCTA--- ** ** ** * c

MANUSCRIPT

ACCEPTED

Page 7 ACCEPTED MANUSCRIPT Vorster et al. 2015 Positive selection of herbivorous Coleoptera proteases

MANUSCRIPT Fig. S3. Unrooted phylogram for the 25 Coleoptera Cys protease sequences submitted to the MEC evolutionary model of Doron-Faigenboim and Pupko (2006). Phylogenetic relationships were inferred by the neighbor-joining distance method of Saitou and Nei (1987) after generating a sequence similarity matrix based on Kimura’s two-parameter model (Kimura, 1983). See main article for reference details.

The tree shown above was produced using the following Newick string :

(((((AF544844:0.03218,(AF190653:0.21142,(((EF154432:0.00440,EF154433:0.00395):0.13852,(AY159368:0.01695,(AY159369:0.0

0836,(AY528228:0.00426,(AY528226:0.00354,AY528227:0.00265):0.00090):0.00912):0.00786):0.11751):0.03615,(((AY159373:0.

00537,AY159372:0.00643):0.00036,(AY528229:0.00435,AY528230:0.00279):0.01007):0.12055,(EF154438:0.00782,(EF154436:- 0.00063,EF154437:0.00777):0.00646):0.14902):0.01839):0.02089):0.21267):0.02501,((AF544834:0.00144,AF544836:0.00138):0.ACCEPTED 00653,(AF544835:0.00318,AF544841:-

0.00036):0.00307):0.00805):0.00665,AF544838:0.00481):0.00479,(AF544840:0.00242,AF544843:0.00040):0.00198):0.00129,AF

544839:0.00492,AF544837:0.00640);

Page 1