Structural venomics reveals evolution of a complex venom by duplication and diversification of an ancient peptide-encoding gene

Sandy S. Pinedaa,b,1,2,3,4, Yanni K.-Y. China,c,1, Eivind A. B. Undheimc,d,e, Sebastian Senffa, Mehdi Moblic, Claire Daulyf, Pierre Escoubasg, Graham M. Nicholsonh, Quentin Kaasa, Shaodong Guoa, Volker Herziga,5, John S. Mattickb,6, and Glenn F. Kinga,2

aInstitute for Molecular Bioscience, The University of Queensland, St Lucia, QLD 4072, Australia; bGarvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia; cCentre for Advanced Imaging, The University of Queensland, St Lucia, QLD 4072, Australia; dCentre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, 7491 Trondheim, Norway; eCentre for Ecological & Evolutionary Synthesis, Department of Biosciences, University of Oslo, 0316 Oslo, Norway; fThermo Fisher Scientific, 91941 Courtaboeuf Cedex, France; gUniversity of Nice Sophia Antipolis, 06000 Nice, France; and hSchool of Life Sciences, University of Technology Sydney, Broadway, NSW 2007, Australia

Edited by Adriaan Bax, National Institutes of Health, Bethesda, MD, and approved March 18, 2020 (received for review August 21, 2019) are one of the most successful venomous , with N-terminal strand is sometimes present (12). The cystine knot more than 48,000 described species. Most venoms are comprises a “ring” formed by two disulfide bonds and the in- dominated by cysteine-rich peptides with a diverse range of phar- tervening sections of the peptide backbone, with a third disulfide macological activities. Some spider venoms contain thousands of piercing the ring to create a pseudoknot (11). This knot provides unique peptides, but little is known about the mechanisms used to generate such complex chemical arsenals. We used an integrated transcriptomic, proteomic, and structural biology approach to Significance demonstrate that the lethal Australian funnel-web spider pro- duces 33 superfamilies of venom peptides and proteins. Twenty- The venom of the Australian funnel-web spider is one of the six of the 33 superfamilies are disulfide-rich peptides, and we show most complex chemical arsenals in the natural world, com- BIOCHEMISTRY that 15 of these are knottins that contribute >90% of the venom prising thousands of peptide toxins. These toxins have a di- proteome. NMR analyses revealed that most of these disulfide-rich verse range of pharmacological activities and vary in size from peptides are structurally related and range in complexity from sim- short (3 to 4 kDa) to long (8 to 9 kDa). It is unclear how spiders ple to highly elaborated knottin domains, as well as double-knot evolved such complex venoms and whether there is an evolu- toxins, that likely evolved from a single ancestral toxin gene. tionary relationship between short and long toxins. Here, we introduce a “structural venomics” approach to show that the spider venom | venom evolution | structural venomics | transcriptomics | venom of Australian funnel-web spiders evolved primarily by proteomics duplication and elaboration of a single ancestral knottin gene; short toxins are simple knottins whereas most long toxins are piders evolved from an ancestor in the Late Or- either highly elaborated single-domain knottins or double-knot Sdovician around 450 million years ago (1), and they have toxins created by intragene duplications. since become one of the most successful lineages on the planet, with >100,000 extant species (2). A key contributor to Author contributions: S.S.P., E.A.B.U., J.S.M., and G.F.K. designed research; S.S.P., Y.K.-Y.C., E.A.B.U., S.S., M.M., C.D., P.E., G.M.N., Q.K., S.G., and V.H. performed research; their evolutionary success is the use of venom to capture prey S.S.P., Y.K.-Y.C., E.A.B.U., S.S., M.M., C.D., P.E., G.M.N., Q.K., S.G., V.H., and G.F.K. analyzed and defend against predators. The constant selection pressure on data; and S.S.P., Y.K.-Y.C., E.A.B.U., and G.F.K. wrote the paper. venoms over hundreds of millions of years has enabled them to Competing interest statement: C.D. is affiliated with Thermo Fisher Scientific. evolve into complex mixtures of bioactive compounds with a This article is a PNAS Direct Submission. diverse range of pharmacological activities. Published under the PNAS license. Spider venoms are a heterogeneous mixture of salts, low Data deposition: Atomic coordinates for protein structures determined in this study were molecular weight organic compounds (<1 kDa), linear and deposited in the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8K disulfide-rich peptides (DRPs) (typically, 3 to 9 kDa with three while corresponding NMR chemical shifts were deposited in BioMagResBank under to six disulfide bonds), and proteins (10 to 120 kDa) (3–5). accessions BMRB25774, BMRB25778, BMRB25853,andBMRB30352. Metadata and annotated nucleotide sequences were deposited in the European Nucleotide Archive However, peptides are the major components of most spider under project accessions PRJEB6062 (ERA298588)andPRJEB35693. Mass spectrometry venoms, with some containing >1,000 peptides (6). The majority data has been deposited to ProteomeXchange Consortium via the PRIDE partner of these peptides are “short” DRPs (2.5 to 5 kDa), but there is repository with the dataset identifier PXD016886. also a significant proportion of “long” DRPs (6.5 to 8.5 kDa) (5). 1S.S.P. and Y.K.-Y.C. contributed equally to the work. As the primary function of spider venom is to rapidly immobilize 2To whom correspondence may be addressed. Email: [email protected] or glenn. prey, it is perhaps not surprising that most spider-venom DRPs [email protected]. that have been functionally characterized target neuronal ion 3Present address: Brain and Mind Centre, University of Sydney, Camperdown, NSW 2050, channels and receptors (5, 7). Australia. 4 ’ Although spider-venom DRPs have been shown to adopt a Present address: St Vincent s Clinical School, University of New South Wales, Sydney, NSW 2010, Australia. variety of three-dimensional (3D) structures, including the 5 β Present address: School of Science & Engineering, University of the Sunshine Coast, Sippy Kunitz (8), prokineticin/colipase (9), disulfide-directed -hairpin Downs, QLD 4556, Australia. (DDH) (10), and inhibitor cystine knot (ICK) fold (11), the 6Present address: School of Biotechnology & Biomolecular Sciences, University of New majority of spider-venom DRP structures solved to date conform South Wales, Sydney, NSW 2010, Australia. to the ICK motif. The ICK motif is defined as an antiparallel This article contains supporting information online at https://www.pnas.org/lookup/suppl/ β-sheet stabilized by a cystine knot (11). In spider toxins, the doi:10.1073/pnas.1914536117/-/DCSupplemental. β-sheet typically comprises only two β-strands although a third

www.pnas.org/cgi/doi/10.1073/pnas.1914536117 PNAS Latest Articles | 1of10 Downloaded by guest on September 28, 2021 ICK peptides (also known as knottins) (13) with exceptional The distribution of peptide masses in H. infensa venom is bi- resistance to chemicals, heat, and proteases (14, 15), which has modal. Most peptides fall in the mass range 2.5 to 5.5 kDa, but made them of interest as drug and insecticide leads (5, 14, 16). there is a significant cohort of larger peptides with mass 6.5 to Some spider toxins show minor (17) or more significant (18) 8.5 kDa (Fig. 1 C and D). This bimodal distribution matches that elaborations of the basic ICK fold involving an additional sta- previously described for venom from related funnel-web spiders bilizing disulfide bond. More recently, “double-knot” spider (6, 25), various tarantulas (26), and the spitting spider Scytodes toxins have been reported in which two structurally independent thoracica (27), and it is also reflected in the mass profile generated ICK domains are joined by a short linker (19, 20). for all spider toxins reported to date (5). As reported previously Like other folds with stabilizing disulfide bridges, knottins for Australian funnel-web spiders (6, 25), the 3D venom landscape display a remarkable diversity of biological functions, including (Fig. 1F) revealed no correlation between peptide mass and modulation of many different types of ligand- and voltage-gated peptide hydrophobicity, as judged by reversed-phase (RP) high- ion channels (5). Despite strong conservation of the knottin pressure liquid chromatography (HPLC) retention time. scaffold across a taxonomically diverse range of spiders, several factors have hampered analysis of their evolutionary history (21). Transcriptomics Uncovers the Biochemical Diversity of H. infensa First, it is not until recently that a large number of knottin pre- Venom. Consistent with MS analysis of secreted venom, se- cursor sequences have become available from venom-gland quencing of a venom-gland transcriptome from H. infensa transcriptomes. Second, the disulfide framework in small DRPs revealed a biochemically diverse venom, with at least 33 toxin generally constrains the peptide fold to such an extent that most superfamilies (Fig. 2). In light of their likely toxic function, each noncysteine residues can be mutated without damaging the pep- superfamily of toxins was named, as suggested previously (28), tide’s structural integrity, a luxury not afforded to most globular after gods/deities of death, destruction, or the underworld. proteins (21). Thus, evolution of DRPs is typically characterized Expressed sequence tags were sequenced using the 454 platform by the accumulation of many mutations, leaving very few con- and assembled using MIRA v3.2 (29). This produced a total of served residues available for deep evolutionary analyses (21). 26,980 contigs and 7,194 singlets, with an average contig length Third, very few structures have been solved for DRPs larger than 5 of 496 base pairs (bp) (maximum length 3,159 bp, N50 674 bp). kDa, making it unclear whether just a few or many of the larger After assembly, all contigs and singlets were submitted to DRPs are highly derived or duplicated knottins. Blast2GO (30) to acquire BLAST and functional annotations. The only attempt so far to study the phylogeny of spider- Sequences were then grouped into three categories: 1) proteins venom knottins concluded that both orthologous diversification and other enzymes (61%); 2) toxins and toxin-like peptides and lineage-specific paralogous diversification were important in (14%); and 3) sequences with no hits in ArachnoServer (31) or generating the diverse arsenal of spider-venom knottins (22). the National Center for Biotechnology Information (NCBI, These authors also reported that knottins that act as gating 25%) databases (SI Appendix, Fig. S1). Recovered BLAST hits modifiers of voltage-gated ion channels likely evolved on multi- included other spider toxins, as well as peptides and proteins ple independent occasions. However, whether this resulted from found in other venomous with recently annotated convergent functional radiation of toxins or independent evolu- genomes, including the tick Ixodes scapularis, the honey bee Apis tion of gating modifiers from nonvenom knottins remains un- mellifera, the jumping ant Harpegnathos saltator, and the para- clear. The lack of nonvenom gland homologs and the use of only sitoid wasp Nasonia vitripennis. a single nonspider knottin outgroup precluded firm conclusions To maximize discovery of DRPs, we used an in-house algo- regarding the toxin, or nontoxin, origins of these convergently rithm to identify potential open reading frames (ORFs). Both evolved spider-venom knottins. Thus, the deep evolutionary annotated and previously unidentified peptides were then com- history of spider-venom knottins remains enigmatic (22). bined into a single file that was used to search for processing Here, we report the combined use of proteomics, transcriptomics, signals in all precursors. After determination of cleavage regions and structural biology to determine the complete repertoire and (SI Appendix, Fig. S2), toxins were classified into superfamilies evolutionary history of DRPs expressed in the venom of (SFs) based on their signal-peptide sequence (SI Appendix, Fig. infensa, a member of the family of lethal Australian funnel-web S3), total number of cysteine residues, and their known or pre- spiders. This structural venomics approach has provided the most dicted cysteine framework (Fig. 2 and SI Appendix, Fig. S3). A comprehensive overview to date of the toxin arsenal of a spider, and brief description of each superfamily is provided in SI Appendix, it revealed that the incredibly diverse repertoire of spider-venom Figs. S4–S36. DRPs is largely derived from a single ancestral knottin gene. Transcript abundance for each superfamily was estimated us- ing RSEM v1.2.31 (32), and these values are reported as tran- Results and Discussion scripts per million (TPM) in SI Appendix, Table S1 and Fig. 3. Mass Spectrometry (MS) Reveals a Complex Spider-Venom Peptidome. Abundance varied greatly from very low for SF11, SF21, and MS analyses revealed that the venom peptidome of H. infensa is SF25 (TPM 2 to 12) to very high for SF2, SF3, SF4, SF13, and extremely complex. We used two complementary MS platforms to SF23 (TPM 9,000 to 13,000). Due to the low throughput of 454 examine the distribution of peptide masses in pooled venom from sequencing compared to newer sequencing technologies, we in- female H. infensa (23). Analysis of secreted venom using matrix- vestigated the extent of sequence coverage by random sampling assisted laser desorption/ionization tandem time-of-flight (MAL- of the total read data to produce 11 fractional datasets and then DI-TOF/TOF) and Orbitrap MS resulted in 2,053 and 1,649 dis- determined the total number of superfamilies recovered for each tinguishable peptide masses, respectively (Fig. 1 A–D); only 651 data subset (SI Appendix, Fig. S37). Even at the lowest level of masses were shared between the two MS datasets, leading to a sampling (5%), 16 of 33 superfamilies were recovered, and de- total of 3,051 unique venom peptides (Fig. 1E). This is consider- tection saturation (30 of 33 superfamilies) was reached when ably more than the ∼1,000 peptides reported to be present in 80% of reads were used for reassembly. Not surprisingly, the venom of the Australian funnel-web spider Hadronyche versuta (6) more reads added to the assembly, the more isoforms we re- although this difference is likely a reflection of recent improve- covered, and, similarly, superfamilies with low transcript abun- ments in MS sensitivity rather than an indication of major dif- dance needed higher sampling to be detected. We conclude that ferences in venom complexity between these closely related the 454 sequencing platform provided good coverage of the H. species. We conclude that Australian funnel-web spiders have one versuta venome. of the most complex venom peptidomes reported for a terrestrial Using this approach, we determined that the venom gland of animal, with similar diversity to those of marine cone snails (24). H. infensa expresses at least 26 families of DRPs, three families

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al. Downloaded by guest on September 28, 2021 3 –5

A LC-MALDI-TOF-TOF Counts/HPLC fraction B LC-MALDI-TOF-TOF

2

1 Mass (m/z)

Absorbance 215 nm x 10 0

Retention time (min) Retention time (min) Cumulative mass count ( ) Cumulative mass count ( ) LC-MALDI-TOF-TOF LC-ESI Orbitrap CDn = 2052

n = 1648 Counts per mass bin ( ) Counts per mass bin ( ) 1.0–1.5 4.5–5.0 5.0–5.5 5.5–6.0 7.0–7.5 7.5–8.0 8.5–9.0 1.5–2.0 2.0–2.5 2.5–3.0 3.0–3.5 3.5–4.0 4.0–4.5 6.0–6.5 6.5–7.0 8.0–8.5 9.0–9.5 9.0–9.5 3.0–3.5 3.5–4.0 4.5–5.0 5.0–5.5 5.5–6.0 6.0–6.5 6.5–7.0 7.0–7.5 7.5–8.0 8.0–8.5 8.5–9.0 1.5–2.0 2.0–2.5 2.5–3.0 4.0–4.5 1.0–1.5 9.5.–10.0 9.5.–10.0 11.0.–11.5 10.5.–11.0 11.5.–12.0 12.5.–13.0 13.5.–14.0 10.0.–10.5 12.0.–12.5 13.0.–13.5 14.0.–14.5 14.0.–14.5 14.5.–15.0 11.0.–11.5 10.5.–11.0 11.5.–12.0 13.0.–13.5 13.5.–14.0 14.0.–14.5 10.0.–10.5 12.0.–12.5 12.5.–13.0 14.0.–14.5 Mass, Bin Range (kDa) Mass, Bin Range (kDa)

E F 3D VENOM LANDSCAPE 3051 venom peptides

25000

Orbitrap MALDI 20000

15000 110 998651 1402 10000 100 90

Abundance 80 5000 70 60 50 3400 BIOCHEMISTRY 3600 number 3800 40 4000 4200 HPLC fraction 4400 4400 4800 Mass (Da) 5000 5200

Fig. 1. Mass profile of H. infensa venom. (A) RP-HPLC chromatogram showing fractionation of crude H. infensa venom. Absorbance at 215 nm is shown in dark gray (left ordinate-axis) while mass count for each HPLC fraction is shown in light gray (right ordinate-axis). (B) Distribution of MALDI-TOF/TOF masses as a function of RP-HPLC retention time. (C and D) Histograms showing distribution and abundance of peptides in H. infensa venom as detected using (C) MALDI-TOF/TOF and (D) Orbitrap MS. Masses are grouped in 500-Da bins. Gray bars indicate cumulative total number of toxins (right ordinate-axis). (E) Euler plot showing overlap between mass counts generated via MALDI-TOF/TOF and Orbitrap MS. (F) A 3D landscape of H. infensa venom showing the correlation between RP-HPLC retention time and peptide mass and abundance generated by MALDI-TOF/TOF analysis. (Inset) A photo of a female H. infensa. Image credit: Bastian Rast (photographer).

of enzymes (hyaluronidase, lipase, and phospholipase A2 [PLA2]), analyses. Using the same criteria but a different search program one family of cysteine-rich secretory proteins (CRiSPs), and (Mascot v2.5.1), we identified 877 of the same of the 1,224 three families of uncharacterized secreted proteins (Figs. 2 and venom proteins and peptides, 849 of which were identified by 3). Hyaluronidase and PLA2 have been convergently recruited both programs (SI Appendix, File 1). These proteomically iden- into many animal venoms (33). Venom hyaluronidases act as tified venom components spanned 22 (ProteinPilot) or 20 hemostatic factors (33) or facilitate spread of neurotoxins by (Mascot) of the predicted 33 superfamilies, including all of the enhancing tissue permeability (34). Sequence diversity across most diverse superfamilies, and three of the seven predicted taxa is minimal, and no new activities have been reported for protein superfamilies. There was a strong correlation between venom hyaluronidases (33). In contrast, venom PLA2s are di- low expression and lack of proteomic detection; the predicted verse, and they have acquired a range of new functions, some of peptide superfamilies with cumulative expression values <100 which are independent of their catalytic activity (35). There are TPM (Fig. 3) accounted for all peptide superfamilies not de- only a few reports of lipases in animal venoms, but the H. infensa tected in the venom. Taken together, our transcriptomic and venom lipase has significant homology to those found in wasp proteomic analyses indicate that H. infensa has a highly complex venoms (36, 37). It is unclear what function lipases might play in peptide-dominated venom. spider venom. Finally, consistent with an actively regenerating venom gland, the transcriptome is enriched with transcripts The Venom of H. infensa Contains both Classical and Highly Elaborated encoding proteins involved in translation, processing, folding, ICK Toxins. Twenty-six of the 33 toxin superfamilies identified in the and posttranslational modification of venom toxins (SI Appendix, venom-gland transcriptome of H. infensa are DRPs (Fig. 2 and SI Fig. S38). Appendix, Fig. S3). Thus, DRPs are the dominant polypeptides in the venom, even when transcript abundance is considered (Fig. 3). Proteomic Analyses Confirm the Biochemical Complexity of H. infensa Eight of the 26 DRP superfamilies could be confidently classified Venom. To further probe the proteomic complexity of H. infensa as knottins based on prior determination of the 3D structure of venom, we searched our annotated transcriptome with tandem one of the superfamily members (SF3) or strong homology with an MS spectra generated from fractionated, reduced, alkylated, and orthologous knottin (i.e., from another spider venom) whose trypsin-digested venom. Using a stringent confidence threshold structure has been elucidated (SF8 to -10, SF13, SF16, SF17, and (<1% false discovery rate [FDR] calculated from a decoy-based SF20). A further three superfamilies were confidently predicted to FDR analysis in ProteinPilot v.4.5.1), we detected 1,108 of the 1,224 be knottins because their cysteine framework and intercysteine venom proteins and peptides predicted from the transcriptomic spacing conform to the ICK motif (SF4, SF19, and SF24), albeit

Pineda et al. PNAS Latest Articles | 3of10 Downloaded by guest on September 28, 2021 Toxin superfamilies

SF1 SF2 SF3 SF4 SF5 SF6 SF7 SF8 SF9 SF10 SF11 AhPuch Anat Hades Erlik Chi-You ‘Oro Pele Ixtab Shiva Kisin Thanatos C-C-CC-C-C- C-C-CC-C-C-CPC-C-C C-C-CC-C-C-C-C C-C-C-CC-C-C-C C-C-C-C C-C-CC-C-C-C-C C-C-C-C-C C-C-CC-C-C C-C-CC-C-C C-C-CC-C-C C-C-C-C C-C-CC-C-C MIT/Bv8-likeμ-agatoxin-like Gomesin -grammotoxin-like -hexatoxin-1 -hexatoxin-2 Cystatin-like

Predicted ICK fold Unknown fold #1 2N8F 2KRA 1KQH 1KFP 2N6R 1KOZ 1AXH 1G9P 3L0R Double ICK non-ICK ICK ICK non-ICKICK non-ICK ICK ICK ICK non-ICK

SF12 SF13 SF14 SF15 SF16 SF17 SF18 SF19 SF20 SF21 SF22 Sekhmet Ares Wurrukatte Pluto Zahhak Nergal Grim Reaper Yama Ereshkigal Anubis Loke C-C-C-C-CPC-C-C C-C-CCC-C-C-C C-C-C-CC-C-C-C C-C-C-CC- C-C-CC-C-C C-C-CC-C-C C-C-C-C-C-C-C-C C-C-CC-C-C C-C-CC-C-C C-C-C-C C-C-CC-C-C-C-C-C-C C-C-C-C-C MIT/Bv8-like -hexatoxin Imperatoxin-like VSTx1-like

Modified colipase/ Bv8/MIT1/ Predicted prokineticin ICK fold fold Unknown Unknown Unknown Unknown fold #2 fold #3 fold #4 fold #5 1VTX 1IE6 1S6X 2H1Z 2N8K non-ICK ICK ICK ICK ICK ICK non-ICK non-ICK

SF23 SF24 SF25 SF26 SF27 SF28 SF29 SF30 SF31 SF32 SF33 Qumaits Ankuo Tezcatlipoca Set Mara Tuoni Mars Sauron Hulda Midgardsormen Mehnhit C-C-CC-C-C-C-C C-C-CC-C-C C-C-C-C-C-C- C-C-CC-C-C-C-C ENZYMES CRISP SECRETED PROTEINS C-C-C-C-C-C Lipase Hyaluronidase Phospholipase A2 Venom allergen 5 Techylectin-like

Predicted ICK fold Unknown Unknown Unknown fold #8 fold #6 fold #7 2N6N 6BA3 1W52 1FCU 1POC 1QNX 1JC9 ICK ICK ICK non-ICK

Fig. 2. Overview of the venom proteome of H. infensa. The venom proteome of H. infensa consists of 33 toxin superfamilies (SF1 to SF33). For each SF, the cysteine framework is shown in blue, and the 3D fold is classified as ICK, putative ICK, double ICK, non-ICK, or unknown. Light blue boxes enclose previously solved structures of superfamily members from H. infensa. Black boxes enclose structures of orthologous superfamily members from related mygalomorph spiders or, in the case of enzymes and CRiSP proteins, orthologs from venomous hymenopterans (bees and wasps). Red boxes enclose structures solved in the current study (SF6, SF22, SF23, and SF26). Stars enclosed by dashed black boxes signify toxin superfamilies for which no structural information is currently available. For each of the structures, β-strands are shown in blue, helices are green, and core disulfide bonds are shown as solid red tubes. For DRPs containing an ICK motif, additional disulfide bonds that do not form part of the core ICK motif are shown as orange tubes. Toxin superfamilies are named after gods or deities of death, destruction, and the underworld. For each structure shown, PDB accession numbers are given in the lower right corner.

with an additional disulfide bond in the case of SF4. In addition, which contain only low-abundance transcripts (Fig. 3). Peptides SF1 is a family of double-knot toxins, one of whose structure was were expressed in the periplasm of Escherichia coli using a pre- recently solved (Fig. 4A) (20). Thus, 12 of the 26 DRP super- viously optimized system (40), then purified using nickel-affinity families can be assigned as knottins. Only six DRP superfamilies chromatography followed by RP-HPLC (SI Appendix, Fig. S5). could be confidently assigned as not being knottins: SF2 corre- All of the chosen peptides were successfully produced with the sponds to the MIT/Bv8/prokineticin superfamily (38), SF5 is a exception of SF14, which failed to express in E. coli. The SF6, family of β-hairpin toxins with high homology to the antimicrobial SF22, SF23, and SF26 peptides were uniformly labeled with 15N peptide gomesin isolated from hemocytes of the tarantula Acan- and 13C, and their structures were determined using an NMR thoscurria gomesiana (39), SF11 is a family of cystatin-like toxins, strategy we described previously that takes advantage of non- SF12 is a derived MIT/Bv8/prokineticin family with four rather uniform sampling to expedite data acquisition and improve than the canonical five disulfide bonds, and SF7 and SF21 have structure quality (41). Each of the determined structures is of less than the minimum of six cysteine residues required to form an high precision and has excellent stereochemical quality, as evi- ICK motif. denced by the structural statistics presented in SI Appendix, Eight of the 26 DRP superfamilies with three or more disul- Table S2. fide bonds have sequences with unknown structure and bi- The SF6 family member adopts a highly derived knottin fold ological activity (SF6, SF14, SF15, SF18, SF22, SF23, SF25, and with several unique elaborations (Fig. 4B): 1) The β-hairpin loop SF26). In order to better understand the structural diversity and encompasses an eight-residue α-helix; 2) an extended C-terminal evolutionary origin of spider-venom DRPs, we attempted to tail is locked in place by a disulfide bridge to the adjacent determine the structure of representative members of SF6, SF14, C-terminal β-strand of the ICK motif; and 3) a highly extended SF26, and homologous sequences to those of SF22 and SF23. N-terminal region (i.e., preceding the first cysteine residue) in- These superfamilies were prioritized over SF15, SF18, and SF25, cludes a short six-residue α-helix and a β-strand that forms a

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al. Downloaded by guest on September 28, 2021 II III 14000 Disulfide-rich peptides is inserted in the Cys –Cys loop in the ICK motif. The SF23 11000 family member adopts a prototypical knottin fold with an addi- tional stabilizing disulfide bond at the base of the β-hairpin loop 8000 (Fig. 4D), which is a relatively common elaboration in spider 5000 toxins (42). Superimposition of the 3D structures of SF6, SF23, ICK 2000 Non-ICK SF26, and either of the two ICK domains of SF1 reveals that, Proteins regardless of the size and structural elaborations acquired in 500 these peptides (Fig. 4E and Movie S1), the central ICK motif 400 remains essentially the same, despite enormous variations in 300 amino acid sequence. 200 Proteins Elucidation of the 3D structure of toxin U33-TRTX-Cg1c from Transcripts per million Transcripts the Chinese tarantula Chilobrachys guanxiensis, which is homol- 100 ogous to SF22, revealed that these toxins do not adopt an ICK 0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233 fold. The overall topology of SF22 (Fig. 4F) is similar to the Superfamily number prokineticin/colipase fold adopted by other five-disulfide DRPs, such as the SF2 and SF12 toxins, but they are not homologous. Fig. 3. Abundance of transcripts encoding DRPs and proteins obtained The prokineticin/colipase fold found in vertebrate proteins can from sequencing of an H. infensa venom-gland transcriptome. Blue bars be viewed as a head-to-tail duplication of two DDH motifs with represent protein-encoding transcripts while red and gray bars denote various elaborations on the loops (10). SF22 also contains two ’ transcripts encoding DRPs that do or don t have an ICK scaffold, respectively. core motifs. The C-terminal motif is structurally similar to the ICK transcripts dominate the venom-gland transcriptome. Superfamily numbers correspond to those shown in Fig. 2. C-terminal subdomain of prokineticin and colipase, and it adopts a DDH motif with a long 17-residue loop between CysVI and CysVIII (cf. four to six residues in the equivalent loop of most β IV– ICK motifs). The extended loop bridges over to the N-terminal small -sheet via hydrogen bonds with residues in the Cys III V motif, stabilized by an intersubdomain disulfide bond (Cys – Cys intercystine loop. The SF26 family member also adopts a VII IV highly modified knottin fold (Fig. 4C), with the following elab- Cys ) and a hydrogen bond between Cys15 (Cys ) and Tyr49. However, in contrast with prokineticin and colipase, the orations of the classical ICK motif: 1) Similar to SF6 toxins, an

N-terminal region does not adopt a DDH fold. Rather, the BIOCHEMISTRY extended C-terminal tail is stapled in place by a disulfide bridge N-terminal core is composed of two stacked antiparallel between the C-terminal Cys residue and a Cys residue at the base β-hairpins in which the parallel strands in the adjacent hairpins β of the adjacent C-terminal -strand of the ICK motif; 2) the long are covalently linked by two disulfide bonds (Fig. 4F). This N-terminal region includes a short six-residue 310 helix followed stacked-hairpin fold resembles the “mini-granulin” fold seen in by an extended segment that stretches across toward the much diverse proteins, such as granulin, serine proteases, growth fac- longer β-sheet of the cystine knot; and 3) a five-residue 310 helix tors, and cone snail venom peptides (43, 44). Given the uncertain

Fig. 4. Structures of selected DRPs found in the venom of H. infensa, highlighting key structural innovations in “short” and “long” peptide toxins. (A) Schematic representation of the SF1 double-knot toxin, which is comprised of two independently folded ICK domains joined by an inflexible linker. (B)SF6 family peptides adopt a typical ICK fold with several unique elaborations, including an extended C-terminal tail that is stapled to the rest of the structure via an additional disulfide bond (“tail-lock”), an enlarged intercystine loop 4 that contains an α-helical insertion, and a highly extended N-terminal region that includes a six-residue α-helix and a short two-stranded β-sheet. (C) SF26 DRPs also adopt a highly elaborated knottin fold with a C-terminal tail tail-lock, a five- II III residue 310 helix in the core ICK region between Cys and Cys , and a long N-terminal extension that includes a short α-helix. (D) SF23 DRPs only differ from classical ICK toxins by having an extra disulfide bond (“β-sheet staple”) that stabilizes the β-hairpin loop. (E) Structural alignment of the ICK core regions of DRPs from SF1, SF6, SF17, SF22, and SF23, highlighting the strong conservation of the ICK motif (DDH in the case of SF22) regardless of the extent of structural elaborations outside this core region. (F) SF22 DRP represents a previously unidentified toxin fold comprised of two independent structural domains con- nected by a disulfide bond. The C-terminal domain forms a DDH core while the N-terminal domain adopts a DABS fold. In all panels, disulfide bonds comprising the ICK motif are shown as red tubes while additional noncore disulfide bonds are shown as orange tubes. β-sheets and α-helices are highlighted in blue and green, respectively.

Pineda et al. PNAS Latest Articles | 5of10 Downloaded by guest on September 28, 2021 evolutionary history and diverse taxa that express this protein H. infensa venom peptidome, with 15 of the 26 identified DRP fold, we refer to it here as a “disulfide-stabilized antiparallel superfamilies comprised of simple or highly derived knottins. β-hairpin stack” (DABS). Covalent linkage of a DDH and DABS Moreover, analysis of transcript abundance revealed that 11 of domain has not been reported in venom peptides, and thus SF22 the 14 most highly expressed DRP superfamilies are ICK toxins represents a new class of toxins. (Fig. 3). Overall, based on transcript abundance, ICK toxins represent ∼91% of the total venom peptidome. If one includes ICK Toxins Dominate the Venom Arsenal of H. infensa. Our combined the seven superfamilies of putative protein toxins (SF27–SF33), transcriptomic, proteomic, and structural biology data revealed which are all expressed at very low levels, the venom proteome is that ICK toxins are the major contributor to the diversity of the composed of 90.5% ICK toxins, 9.1% non-ICK DRPs, and 0.4%

SF9

4i_

1g _0

SF26 c _73643 U 21 TXP9_APOSC -HXTX-Hi1a

TXP4_APOSC A0A04Q8K2H9 A0A1D0BNE8 A0A04Q8KAQ8 A0A04Q8K7Y5

A0A04Q8KAQ9

N TXP1_APOSC D

o-Hi1b_

H _ 2 6

S0F1N0

Y _

_

T

h I g

W4VRX8 1

1

N i iH

W4VRU3 I ATRO H

1

R - -

1

T o o o-Hi1c_2 A0A4Q8KAG8 ICK17_TRILK P0DMQ3 S0F1N7 O1F_HADVN W4VSB7 CDF44148 T

TO1F_ O1C_HADMO P0DMQ0 AOA4Q8K621 T P83580 A5A3H1 ICK19_TRILK W4V 93 S0F1M6 CDF44163 TOK1H_HADVE R 65 75 W4VS32 V1 SF19U M5AXR5 14 W4VSI6 W4VSI7 94 B3FIP1 -HXTX-Hi1c B1P1J5 U 93 14 U -HXTX-Hi1b 90 14 83 85 -HXTX-Hi1a_2 SF16 97 A0A088BPH1

97

70 U

1 A0A4V2H9N91 94 -HXTX-Hi1a A0A482ZF69 SF17 B6DCT3 SF24

83 SF8 A0A4K1D8L1 A0A0P0D113 A0A482ZD57 -HXTX-Hi1a_2

A0A1D0C02812 A0A4Q8KAF9U

A0A4Q8K7P6 -HXTX-Hi1a_2 A0A4Q8K7E1 92 A0A1D0BR9819 -HXTX-Hi1b_3 A0A4K1D8B0 U 98 98 12 U -HXTX-Hi1b 6 A0A482ZAB2 82 U A0A482Z8A8 95 -HXTX-Hi1a_1 6 U B1P1H6 75 D2Y2B6 JZT56_CHIGU 70 A0A1D0BNA3A0A4Q8K0R5 ICK8_TRILK BPP1I1 W4VSB9

75 W4VR 94 W4VS46 BAN13533 V2 99 81 BAM15662 H18A1_HAPHA 98 D5J6X8 I2FKH7 99 JZT65_CHIGU A0A4Q8K436 96 86 JZT64_CHIGU D2Y251 71 56 A0A482ZHB5B1P1I3 V3 SF20 75 97 W4VR 97 TXPT6_MACGS A0A482Z6E1 91 ICK6_TRILKP83558 96 TXMG2_MACGS A0A4Q8K5T9 64 87 72 99 A0A4Q8K6Z3 A0A4Q8K4S4 -HXTX-Hi1a ABY77683 91 U 15 TX18A_HAPSC A0A4Q8K442 B3FIQ7 A0A4Q8K1V5 68 TX16C_HAPSC D2Y299 B3FIN4 A0A4Q8K1T3 64 land tr D2Y283 92 92 g an s ABY77693 77 78 m c o r B3FIQ5 U n i 3-HXTX-Hi1a_1U e p H16D1_HAPHA 3-HXTX-Hi1b t H16B1_HAPHA A0A4Q8K7H1 V s 93 D2Y271 A0A4V6MK97 A3F7X1 A0A4Q8K8J7 SF4 TXLT4_LASSB A0A4Q8KA16 99 98 B3FIV1 A0A4Q8K630 82 TXBS1_BRASMADF28491 U3-HXTX-Hi1c_1 99 99 A0A4Q8K2N9 A0A4Q8K9A4 A0A4Q8K765 82 ICK2_TRILK A0A4Q8K1W5 A0A4V2H993 A0A4Q8K4K9 SF13 A0A4Q8K1B6 Q75WG7 A0A4Q8K3R5 A0A4Q8K088 99 A0A1D0BPK9 99 δ-HXTX-Hi1a_16 A0A4Q8K3Q5 98 W4VSI5 98 TXM14_MACGS 99 83 Cshyp4_0_g2_i1_11 ICK24_TRILK P83560 Mg2_c0_g1_i1 A0A2P6L7U6 W4VRX0 Na_c0_g1_i2 96 75 62 73 U9-HXTX-Hi1a_2 A0A4Y2B2M7 83 A0A2D0PBN7 A0A1D0BNB4 A0A4Q8K2G6 SF14 Cshyp5_0_g1_i1_1A0A4Q8K501 A0A1D0BR67 83 A0A4Q8K427 1 A0A4Q8KAE2 51 67 lmL2_0_g2_i1_1 95 88 A0A4Q8K559 55 88 LmL2_0_g1_i1_191 B1P1H8 97 A0A4Q8K5W3 A0A482ZAI5A0A4Q8K2E3 A0A2D0PBU5 82 99 M. giganteus A0A4Y2B7R1 62 Cshyp6_0_g2_i1_7 A0A4Q8K4G6 Hh_DN37914_c0_g1_i1 -HXTX-Hi1d_d2 A0A087TBW8 (whip scorpion) A0A4Q8K3N9 A0A4Q8K9M1 -HXTX-Hi1b_d2 95 A0A4Y2B988 98 99 A0A087TD82 -HXTX-Hi1e_d2 A0A2D0PC75 Q5Y4U4 -HXTX-Hi1a_d2 SF1 A0A087T772 -HXTX-Hi1c_d2 -HXTX-Hi1b_d1 A0A4Q8KCP8 -HXTX-Hi1d_d2 -HXTX-Hi1a_d1 56 o-Hi2g_1 80 99 o-Hi1-2d 50 99 A0A4Q8K123 A_HADIN A0A4V2H9S7 TOT 97 P83562 A0A1D0C005 A0A4Q8K2P9 A0A4Q8K995 84 A0A4Q8K6G9 93 A0A4Q8KC10ATRIL Non-venom gland tissue SF10 A_ P60272 TOT Q9BJV7 -HXTX-Hi1a_2 A0A4Q8KB62 U2 84 86 99 SF3 A0A4Q8K2E8 Q5Y4U8 98 0.7 A0A4Q8KBP3 97 D2Y2C5 Q75WH5 A0A1D0BRB0-HXTX-Hi1a U 18 -HXTX-Hi1b_3 Q5Y4V1 18 A0A4Q8K0Y5 U A0A1D0BZG7 67

SF23 P60980 82 P0DQJ5

B1P1D2 A0A482Z9S1

A0A482Z7T0B1P1B9 A0A482Z6H4B1P1C6 B1P1A4 P0DM67 B1P1C8 B1P1C7 Haplopelma hainanum clade

Fig. 5. Evolution of spider-venom ICK toxins. Maximum likelihood tree showing the phylogenetic relationship between H. infensa DRPs and DRPs isolated from the venom gland or other tissues of other spider species. The tree was rooted using the whip scorpion M. giganteus as the outgroup. Bootstrap values are shown at each node, except for nodes where support was 100. The tree shows that, although many of the phylogenetic relationships between super- families remain unresolved, all venom-derived DRPs form a well-supported monophyletic clade. Superfamily sequences belonging to H. infensa are high- lighted in blue text, and representative structures for each superfamily are shown. DRP sequences from muscle and other tissues are highlighted in red; all other sequences (denoted by the light blue broken circle) represent venom DRPs. Venom peptides isolated from other species have their corresponding accession numbers/common toxin names listed in the labels, with the exception of the H. hainanum superfamily XVI clade. A summary of all accessions used can be found in SI Appendix.

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al. Downloaded by guest on September 28, 2021 protein toxins. We conclude that ICK toxins dominate the recruitment as toxins. Phylogenetic reconstruction of these and venom arsenal of H. infensa both in terms of molecular diversity all other spider ICK peptides identified using our approach and abundance. grouped all confirmed venom knottins into a well-supported To expand on the activities reported for ICK toxins from monophyletic clade (Fig. 5). Our analysis also clustered several funnel-web spider venom (SI Appendix, Table S4), we examined putative venom toxins with sequences from the leg muscle whether SF4, SF6, SF23, and SF26 are insecticidal. All four transcriptome of Liphistius malayanus, which could indicate ei- toxins induced paralysis when injected into sheep blowflies (SI ther a physiological role for these putative toxins or multiple Appendix, Table S3). SF4 and SF23 were only active at high dose, recruitment of ICK peptides into spider venom. However, re- and flies recovered within 24 h. SF6 caused irreversible paralysis, gardless of the number of ICK recruitments into venom, our re- leading to death at moderate doses (9.7 nmol/g), while SF26 sults suggest that the vast majority of the structural diversity of paralyzed all flies within 30 min, even at low dose (0.3 nmol/g), ICK toxins, which make up the bulk of the molecular diversity in but the effects were reversible. The activity of SF26 is compa- the venom of H. infensa, arose from a single lineage of weaponized rable to other potent insecticidal spider toxins tested in this “classical” ICK fold. assay (45). Elaborations on the Basic Knottin Disulfide Architecture Facilitate ICK Phylogenetic Analyses Suggest Structural Radiation of a Single Lineage Diversification. Although we were unable to resolve the higher of Spider-Venom ICK Toxins. In order to examine whether ICK order phylogenetic relationships between ICK toxin superfam- toxins in funnel-web spider venom have a common origin, we ilies, our analyses highlight the importance of loss and gain of examined their phylogenetic relationship with homologous se- disulfide bonds in their evolutionary radiation (Fig. 5). Func- quences from nonvenom gland spider tissues (hypodermal tissue, tional radiation of ICK and other DRP folds is thought to occur book lung, and leg), as well as tissue from the whip scorpion primarily by diversification of the intercystine loop regions, Mastigoproctus giganteus, a nonvenomous arachnid. All known which does not affect the structural integrity of the peptides. spider-venom sequences deposited in UniProtKB were also used However, our data show that elaborations on the disulfide ar- (Fig. 5 and SI Appendix). Briefly, sequence files were downloaded, chitecture itself can play a prominent role in the diversification trimmed, and assembled, and then contigs/singlets were analyzed of DRPs. While the core ICK motif contains only three disulfide to identify ICK variants. These ICK variants included non-venom- bonds (11), our data suggest that the ancestral ICK spider toxin gland transcripts from H. infensa (leg contig DN34637_c0_g1_i4) either contained a fourth disulfide bond stabilizing the β-sheet in and Haplopelma hainanum (book-lung contigs DN16443_c0_g1_i1, loop 4 or that this additional disulfide was gained on at least two DN40049_c0_g1_i5, and i6, and DN40049_c0_g2_i1) that encoded separate occasions. A loss of intramolecular disulfide bonds may BIOCHEMISTRY peptide sequences that are identical or near identical to confirmed seem counterintuitive during the evolution of venom DRPs for venom components. In the case of H. infensa, the leg transcript was which stability is paramount. However, intriguingly, the stabi- identical to that of ω-HXTX-Hi1h (venom gland contig RL5_rep_ lizing loop-4 disulfide bridge was present in all homologs iden- c1832), which was confidently identified in the venom (SI Appendix). tified in the outgroup datasets. Moreover, disulfide loss appears Similarly, three transcripts from the book lung transcriptome of H. to have been crucial during evolution of DDH toxins (10) from hainanum are identical with toxins previously characterized from ICK precursors in scorpions (21), suggesting that this scenario the venom of this species. Detection of these sequences in non- may not be as unlikely as previously postulated (10, 47). venom gland transcriptomes is likely due to “leaky” expression of Regardless of whether the ancestral ICK toxin contained the bona fide toxin genes, which has been reported in other venomous loop-4 disulfide, our data indicate that gain of disulfide bonds lineages (46). underlies the myriad of structural variations in ICK spider toxins. Strikingly, our analyses of nonvenom gland tissues only This not only highlights the importance of disulfide bonds in returned hits to peptides with simple “classical” ICK folds. Thus, stabilizing the structure of spider toxins but further illustrates the the highly elaborated ICK structures observed for several extraordinary versatility of the ICK fold. The permissiveness of the venom-peptide superfamilies in H. infensa appear to represent ICK fold to both sequence variation (21) and disulfide elabora- extreme cases of structural diversification following functional tions likely explains why it appears to be the most widespread

Conjugation: limited conjugation to produce mixed-domain toxins

7 10 8 2 5 Conjugated 1 4 6 9 DDH-DABS fold 3

N C 2N8K Other domains

Domain duplication: limited intragenic duplication to produce double-knot and double-DDH toxins

7 6 3 9 12 5 2 10 8 Double DDH toxin 2 5 8 11 Double-knot 1 4 6 9 toxin (prokineticin/MIT1 fold) 1 4 7 10 3 DDH fold ICK fold N C N C 2N8F 2KRA

4 2 6 3 1 3 Ancestral 2 5

fold 1 4

NC N C C 6 6 6 3 6 3 3 6 3 3 C 6 3 2 2 5 2 5 2 5 5 2 5 2 5

1 4 1 4 1 4 1 4 1 4 1 4 C C C N N N N N N C 2N6R 1DL0 1G9P 1VTX 2N6N 2M36 Domain elaboration: single-knot toxins with extended termini, enlarged loops, additional SS bonds

Fig. 6. Overview of structural innovations in spider-venom ICK peptides. Schematic overview of the mechanisms by which an ancestral ICK or DDH toxin was duplicated, conjugated, and elaborated upon to form the diversity of ICK scaffolds found in extant mygalomorph spider venoms. Gray numbered circles represent cysteine residues, with core disulfide bonds that form the cystine knot or DDH motif indicated by solid red lines. Additional noncore disulfides are highlighted in orange, with dashed lines indicating interdomain disulfide bonds. Blue arrows and green rectangles denote β-strands and α-helices, re- spectively. N and C termini are labeled, and PDB accession codes for representative structures are given below each schematic peptide fold.

Pineda et al. PNAS Latest Articles | 7of10 Downloaded by guest on September 28, 2021 cysteine-rich fold known, being found in diverse taxa, including B (90% ACN, 0.1% FA) in 0.1% FA over 45 min, using a flow rate of , fungi, insects, molluscs, plants, sea anemones, sponges, 180 μL/min. MS1 survey scans were acquired at 300 to 1,800 m/z over 250 and even viruses (21, 48). ICK peptides also appear to constitute ms, and the 20 most intense ions with a charge of +2 to +5 and an in- the most abundant fold in the “dark proteome” (49), suggesting tensityofatleast120countspersecond were selected for MS2. The unit mass precursor ion inclusion window was ±0.7 Da, and isotopes within that we still have much to learn about their structural and ±2DawereexcludedfromMS2,whichscanswereacquiredat80to functional plasticity. 1,400 m/z over 100 ms and optimized for high resolution. For protein In summary, our structural venomics approach revealed the identification, MS/MS spectra were searched against the translated an- pathway for evolution of complex venoms in Australian funnel- notated H. infensa venom-gland transcriptome using ProteinPilot v4.5.1 web spiders. Multiple duplications of an ancestral ICK gene were (AB SCIEX, Framingham, MA). Searches were run as thorough identifi- followed by periods of diversification, generation of structural cation searches, specifying tryptic digestion and cysteine alkylation by variants with additional disulfide bonds, and selection of variants iodoacetamide. Decoy-based FDRs were estimated by ProteinPilot; for via adaptive evolution to generate small disulfide-rich toxins. protein identification, we used a protein confidence cutoff corresponding to a < Limited intragenic duplication of an ICK gene produced double- local FDR of 1%. Spectra were also manually examined to further eliminate > false positives. MS/MS data were also searched against the venom-gland tran- knot toxins, but most venom peptides with mass 6 kDa arose scriptome using Mascot v2.5.1 (Matrix Science, Boston, MA) using a decoy-based from diverse elaborations of a single ICK scaffold, often ac- FDR cutoff of <1%. For this search, we set the instrument to ESI-quadrupole companied by inclusion of additional disulfide bonds (Fig. 6). time-of-flight (QUAD-TOF), specified trypsin as the cleavage enzyme, allowed Thus, the extraordinary pharmacological diversity of spider up to one missed cleavage, and used a peptide tolerance of ±10 parts per venom, which is one of the most complex chemical arsenals in million and MS/MS tolerance of ±0.1 Da, with cysteine carbamidomethylation as the natural world, is conferred by a panel of structurally related a fixed modification. C-terminal amidation and N-terminal pyroglutamate for- DRPs that all evolved from an ancestral ICK toxin. mation were allowed as variable modifications. These searches returned 3,012 and 2,944 hits for ProteinPilot and Mascot, respectively. Materials and Methods Spider and Venom Collection. H. infensa (mudjar nhiling guran) were col- Complementary DNA (cDNA) Library Construction and Sequencing. Three spi- lected on private land from Orchid Beach, K’gari (Fraser Island), QLD, Aus- ders were milked via aggravation to deplete their glands of venom. tralia. Spiders were housed individually at 23 to 25 °C in plastic containers in Three days later, they were anesthetized, and their venom glands were dark cabinets. Adult female spiders were aggravated using forceps until dissected and placed in TRIzol reagent (Life Technologies). Total RNA from venom was expressed on the fang tips, and then the venom was collected pooled venom glands was extracted following the standard TRIzol protocol. continuously by aspiration until no more venom could be obtained. We Messenger RNA (mRNA) enrichment from total RNA was performed using an assume this corresponds to exhaustion of the venom-gland contents, and Oligotex direct mRNA mini kit (Qiagen). RNA quality and concentration were hence the venom collected should closely represent the complete venom measured using a Bioanalyzer 2100 pico chip (Agilent Technologies). μ proteome. Venom was stored at –20 °C. If impurities, such as soil or sand, A cDNA library was constructed from 100 g of mRNA using the standard were detected, the venom was diluted 10-fold with ultrapure water, and Roche cDNA rapid library preparation and emPCR method. Sequencing was contaminants were removed using a low-protein–binding 0.22-μm Ultrafree- carried at the Australian Genome Research Facility using a ROCHE GS-FLX MC centrifugal filter (Millipore). sequencer. The raw standard flowgram file (.SFF) was processed using Cangs software, and low-quality sequences were discarded (Phred score cutoff of Venom Profiling. The venom profile of H. infensa was obtained using a 25) (51). De novo assembly was performed using MIRA software v3.2 (29) = – = – = combination of RP-HPLC and MS techniques (6). Offline liquid chromatog- using the following parameters: -GE:not 4 project Hinfensa job = = = raphy (LC)-MALDI-TOF/TOF was utilized to maximize the amount of mass denovo,est,accurate,454 454_SETTINGS -CL:qc no -AS:mrpc 1 -AL:mrs = information extracted. Liquid chromatography-electrospray ionization 99,egp 1. Consensus sequences from contigs and singlets were submitted (LC-ESI) OrbiTrap MS data were acquired as described (50). A conservative to Blast2GO to acquire functional annotations (30). In parallel to the func- | analysis was used to manually curate both MS datasets. Analysis criteria in- tional annotation, an in-house algorithm (Toxin seek) was written to allow cluded: 1) Only spectra with well-defined peaks were considered as “real”;2) prediction of potential toxin ORFs that may represent novel DRPs. Once a signal-to-noise ratio of 15 was set before masses were recorded; 3) masses annotated and predicted, lists were merged, redundancies were removed, were excluded if they met the following criteria relative to another recorded and signal sequences were determined using SignalP v3.1 (52). Putative mass: +16 Da and +32 Da (oxidized), –18 Da (dehydrated), –17 Da (deami- propeptide cleavage sites were predicted using a sequence logo analysis (53) dated), +22 Da (sodium adduct); 4) masses <1,000 Da, which were presumed of all known spider precursors (SI Appendix, Fig. S2). After identifying all to correspond to small organic compounds such as polyamines or matrix processing signals, toxins were classified into superfamilies based on their clusters, rather than peptides, were excluded from the analysis. Duplicated signal sequence and cysteine framework. Signal sequences were considered < masses were eliminated from adjacent fractions, and a final list of sorted different if the level of amino acid sequence identity was 65%; the pair- masses was generated. Potential dimers and doubly charged species were wise comparison of signal sequences in SI Appendix, Fig. S3 shows that, for < also eliminated (±3 to 5 Da). Then 3D contour plots, or “venom landscapes,” most signal sequence comparisons, the level of identity is 30%. were constructed from LC-MALDI-TOF/TOF MS data as described (6). To examine the extent of sequence coverage provided by the 454 plat- For proteomic analysis, 5 mg of milked venom pooled from three speci- form, we generated 11 subsets of data that randomly sampled the entire read mens was fractionated on a Shimadzu Prominence HPLC (Kyoto, Japan) using set in 10% increments. Each randomly sampled dataset was assembled as previously described using MIRA v3.2. Each dataset was then submitted to a Vydac C18 column (300 Å, 5 μm, 4.6 mm × 250 mm). Venom was eluted | | using a gradient of 5 to 80% solvent B (90% acetonitrile [ACN], 0.043% Tox Blast, a module of Tox Note (31), or Transdecoder v5.5.0. Hits were | trifluoroacetic acid [TFA]) in solvent A (0.05% TFA) over 2 h. Twelve fractions isolated from the Tox Blast output. We used cd-hit2d v4.6 (54) to cluster and | were manually collected (one every 10 min), and absorbance was monitored compare results from the Tox Blast/Transdecoder outputs (settings: -c 0.6 -n at 214 nm and 280 nm. Fractions were lyophilized and reconstituted in ul- 4 -d 100) and then recorded the number of superfamilies recovered for each trapure water, and 1/10 were removed for proteomic analyses. Fractionated data subset (SI Appendix, Fig. S37). proteins and peptides were reduced with dithiothreitol (5 mM in 50 mM ammonium bicarbonate, 15% ACN, pH 8), alkylated with iodoacetamide Nomenclature. Toxins were named using the rational nomenclature described (10 mM in 50 mM ammonium bicarbonate, 15% ACN, pH 8), and digested previously (55). Spider was taken from the World Spider Catalog overnight with trypsin (30 ng/μLin15μL of 50 mM ammonium bicarbonate, v19.5 (https://wsc.nmbe.ch). 15% ACN, pH 8). Upon completion of digestion, formic acid (FA) was added to a final concentration of 5%, and the digested samples were desalted Peptide Expression and Purification. Genes encoding toxins of interest were using a C18 ZipTip (Thermo Fisher, Waltham, MA). synthesized using codons optimized for E. coli expression and subcloned into Desalted, digested samples were dried in a vacuum centrifuge and dis- a pLICC_D168 vector by GeneArt (Regensburg, Germany). Plasmids were solved in 0.5% FA, and 2 μg were analyzed by LC–tandem MS (LC-MS/MS) on transformed into E. coli BL21 (λDE3), and toxin expression and purification an AB Sciex 5600TripleTOF (AB Sciex, Framingham, MA) equipped with a were carried out using published methods (40) (see SI Appendix, Fig. S5 for

Turbo-V source heated to 550 °C and coupled to a Shimadzu Nexera UHPLC. an example). Peptides were separated from salts, His6-MBP and His6-TEV Samples were fractionated on an Agilent Zorbax stable-bond C18 column protease using a Shimadzu Prominence HPLC system (Kyoto, Japan). Sepa-

(2.1 × 100 mm, 1.8 μm, 300 Å pore size) using a gradient of 1 to 40% solvent ration was performed on a Jupiter C4 reverse-phase HPLC column (250 × 10 mm,

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al. Downloaded by guest on September 28, 2021 300 Å, 10 μm; Phenomenex), using a flow rate of 5 mL/min and an elution peptide predicted using SignalP v4.1 (52) and aligned with MAFFT v7.304b gradient of 5 to 60% solvent B over 30 min. Absorbance was monitored at using a local Smith–Waterman alignment method with affine gap cost (L-INS-I 214 nm and 280 nm, and fractions were manually collected. Purified pep- option) (65), and then the alignment was adjusted manually to correct mis- tides were lyophilized and reconstituted in water or buffer, and approxi- aligned structurally equivalent cysteine residues before realigning the inter- mate peptide concentration was determined from absorbance at 280 nm cysteine loops using the MAFFT regional realignment script. The resulting measured using a NanoDrop spectrophotometer (Thermo Scientific). ESI-MS alignment was used to estimate phylogenetic relationships via ML re- spectra were acquired using an API 2000 LC-MS/MS triple quadrupole mass construction using IQ-TREE v.6.12 (66). The TESTNEW command was used to spectrometer. Mass spectra were obtained using positive ion mode over m/z allow IQ-Tree to determine the best fitting model according to Bayesian in- range of 400 to 1,900 Da. formation criterion analysis, and node support was tested by ultrafast boot- strap approximation (UFBoot) (67) using 10,000 iterations. Structure Determination. The structures of toxins from SF6, SF22, SF23, and We also estimated phylogenetic relationships via BI using MrBayes v3.2.2 SF26 were determined using heteronuclear NMR. Each sample contained (68), with standard settings, except that we specified the same amino acid μ 13 15 μ 300 Lof C/ N-labeled peptide (100 to 300 M) in 20 mM 2-(N-morpho- substitution model identified by IQ-TREE (69, 70). The Markov chain Monte lino)ethanesulfonic acid (MES) buffer, 0.02% NaN3,5%D2O, pH 6. NMR data Carlo algorithm was run for 10,000,000 generations with sampling fre- were acquired at 25 °C on an Avance II+ 900 MHz spectrometer (Bruker quency of 500. Two or four runs were performed simultaneously with chain BioSpin) equipped with a cryogenically cooled triple resonance probe. Res- 1 15 temperatures of 0.05, 0.1, 0.2, or 0.5. However, none of the calculations onance assignments were obtained from two-dimensional (2D) H- N het- reached convergence as judged by comparison of mean SD of split fre- eronuclear single-quantum coherence (HSQC), 2D 1H-13C HSQC, 3D HNCACB, quencies (SD > 0.5). A comparison of phylogenies obtained by ML and BI 3D CBCA(CO)NH, 3D HNCO, 3D HBHA(CO)NH, and four-dimensional (4D) analysis of a reduced dataset that did converge (SI Appendix, Fig. S40) HCC(CO)NH–total correlated spectroscopy (41) spectra. The 3D and 4D showed good agreement between the two approaches and supported the spectra were acquired using nonuniform sampling and were reconstructed ML phylogeny based on the full sequence alignment (SI File 3). For the BI using the maximum entropy algorithm in the Rowland NMR Toolkit. tree summary, the log-likelihood score of each saved tree was plotted 13C-aliphatic, 13C-aromatic and 15N-edited nuclear Overhauser effect spec- against the number of generations to establish the point at which the log troscopy (NOESY)-HSQC spectra were acquired using uniform sampling for likelihood scores reached asymptote. Posterior probabilities for clades were extraction of interproton distance restraints. established by constructing a majority-rule consensus tree for all trees Resonance assignments and integration of NOESY peaks were achieved generated after burn-in. Trees were visualized and exported using Archae- using SPARKY 3 (56), followed by automatic peak list assignment, extraction of distance restraints, and structure calculation using the torsion angle dy- opteryx v0.99 (www.phylosoft.org/archaeopteryx) and FigTree v1.3.1 (http:// namics package CYANA 3.0 (57). Dihedral-angle restraints derived from tree.bio.ed.ac.uk/software/figtree/). TALOS chemical shift analysis (58) were also integrated in the calculation, with the restraint range set to twice the estimated SD. CYANA was used to Blowfly Toxicity Assay. Recombinant toxins were dissolved in water and in-

calculate 200 structures from random starting conformations, and then the jected into the ventrolateral thoracic region of sheep blowflies (Lucilia BIOCHEMISTRY best 20 structures were selected based on their stereochemical quality as cuprina; mass 23.9 to 36.5 mg) as previously described (42). The extent of judged using MolProbity (59). Structural statistics and Protein Data Bank lethality and paralysis was scored at 0.5, 1, and 24 h postinjection. (PDB) accession codes are summarized in SI Appendix, Table S2. Data Deposition. Metadata and annotated nucleotide sequences were de- Homology Modeling. Superfamilies identified to have a structural counterpart posited in the European Nucleotide Archive under project accessions in the PDB (SI Appendix, Table S5) were modeled using automodel param- PRJEB6062 (ERA298588) and PRJEB35693. Atomic coordinates were deposited eters in Modeler v9.12 (60). A total of 100 to 500 models were generated for in the Protein Data Bank under accession codes 2N6N, 2N6R, 6BA3, and 2N8K each structure. The lowest energy models were selected and visualized using while the corresponding NMR chemical shifts were deposited in PyMOL v.2.0 (Schrödinger, LLC). BioMagResBank under accession codes BMRB25774, BMRB25778, BMRB25853, and BMRB30352. MS data we deposited to the ProteomeXchange Consortium Phylogenetic Analyses. Bayesian inference (BI) of phylogeny and maximum (71) via the PRIDE (72) partner repository with the dataset identifier likelihood (ML) reconstruction were used to examine the evolution of ICK PXD016886. toxins. To search for nonvenom outgroups, we examined datasets from one species of whip scorpion (M. giganteus [Uropygi]; SRR1145698), a non- ACKNOWLEDGMENTS. We thank David Wilson for collection of H. infensa; venomous sister group to spiders (61), and nonvenom gland transcriptomes Philip Lawrence for curating MS data; Julie Klint, Niraj Bende, and Jessie Er from six species of taxonomically diverse spiders: leg muscle from H. infensa for recombinant peptide production; Lars Ellgaard for alerting us to the (PRJEB35676), Macrothele calpeiana (SRR6994009), L. malayanus (SRR1145736), mini-granulin fold; Geoff Brown (Department of Agriculture and Fisheries, and Neoscona arabesca (SRR1145741); hypodermal tissue from Cupiennius salei Queensland, Australia) for supply of blowflies; and Michael Nuhn and Lien Li (European Molecular Biology Laboratory-Australia Bioinformatics Resource) (SRR880446); and book-lung tissue from H. hainanum (SRR2155568). Fastq files for help with data submission to European Nucleotide Archive. This research were downloaded, trimmed using trimmomatic v0.39 (62) with window quality was supported by Australian National Health & Medical Research Council 30, window size 4, minimum read length 60 bp, and then assembled using de- Principal Research Fellowship APP1136889 (to G.F.K.), Australian Research fault settings in Trinity v2.3 (63). The resulting assemblies were combined, and Council Discovery Early Career Researcher Award Fellowship DE160101142 predicted coding DNA sequences (CDSs) were extracted using a combination of (to E.A.B.U.), Research Council of Norway Funding Scheme for Independent tools (31, 64). Accession numbers for all sequences are given in SI Appendix, Projects–Young Research Talents Fellowship 287462, and a University of File 2. The resulting sequences were filtered based on the presence of a signal Queensland International Postgraduate Scholarship (to S.S.P.).

1. J. Lozano-Fernandez et al., A molecular palaeobiological exploration of ar- 8. C.-H. Yuan et al., Discovery of a distinct superfamily of Kunitz-type toxin (KTT) from thropod terrestrialization. Philos. Trans. R. Soc. Lond. B Biol. Sci. 371, 20150133 tarantulas. PLoS One 3, e3414 (2008). (2016). 9. R. A. V. Morales et al., Chemical synthesis and structure of the prokineticin Bv8. 2. C. Mora, D. P. Tittensor, S. Adl, A. G. Simpson, B. Worm, How many species are there ChemBioChem 11, 1882–1888 (2010). on Earth and in the ocean? PLoS Biol. 9, e1001127 (2011). 10. X. Wang et al., Discovery and characterization of a family of insecticidal neurotoxins 3. F. C. Schroeder et al., NMR-spectroscopic screening of spider venom reveals sulfated with a rare vicinal disulfide bridge. Nat. Struct. Biol. 7, 505–513 (2000). nucleosides as major components for the brown recluse and related species. Proc. 11. P. K. Pallaghy, K. J. Nielsen, D. J. Craik, R. S. Norton, A common structural motif in- Natl. Acad. Sci. U.S.A. 105, 14283–14287 (2008). corporating a cystine knot and a triple-stranded β-sheet in toxic and inhibitory 4. L. Kuhn-Nentwig, R. Stocklin, W. Nentwig, Venom composition and strategies in polypeptides. Protein Sci. 3, 1833–1839 (1994). spiders: Is everything possible? Adv. In Insect Phys. 40,1–86 (2011). 12. G. F. King, H. W. Tedford, F. Maggio, Structure and function of insecticidal neuro- 5. G. F. King, M. C. Hardy, Spider-venom peptides: Structure, pharmacology, and po- toxins from australian funnel-web spiders. Toxin Rev. 21, 361–389 (2002). tential for control of insect pests. Annu. Rev. Entomol. 58, 475–496 (2013). 13. G. Postic, J. Gracy, C. Périn, L. Chiche, J. C. Gelly, KNOTTIN: The database of inhibitor 6. P. Escoubas, B. Sollod, G. F. King, Venom landscapes: Mining the complexity of spider cystine knot scaffold after 10 years, toward a systematic structure modeling. Nucleic venoms via a combined cDNA and mass spectrometric approach. Toxicon 47, 650–663 Acids Res. 46, D454–D458 (2018). (2006). 14. H. Kolmar, Natural and engineered cystine knot miniproteins for diagnostic and 7. J. J. Smith et al., ““Therapeutic applications of spider-venom peptides”” in Venoms to therapeutic applications. Curr. Pharm. Des. 17, 4329–4336 (2011). Drugs: Venom as a Source for the Development of Human Therapeutics, G. F. King, 15. V. Herzig, G. F. King, The cystine knot is responsible for the exceptional stability of the Ed. (The Royal Society of Chemistry, 2015), pp. 221–244. insecticidal spider toxin ω-hexatoxin-Hv1a. Toxins (Basel) 7, 4366–4380 (2015).

Pineda et al. PNAS Latest Articles | 9of10 Downloaded by guest on September 28, 2021 16. G. F. King, Venoms as a platform for human drugs: Translating toxins into thera- 43. A. H. Jin et al., Conotoxin Φ-MiXXVIIA from the superfamily G2 employs a novel peutics. Expert Opin. Biol. Ther. 11, 1469–1484 (2011). cysteine framework that mimics granulin and displays anti-apoptotic activity. Angew. 17. J. I. Fletcher et al., The structure of a novel insecticidal neurotoxin, ω-atracotoxin-HV1, Chem. Int. Ed. Engl. 56, 14973–14976 (2017). from the venom of an Australian funnel web spider. Nat. Struct. Biol. 4, 559–566 44. L. D. Nielsen et al., The three-dimensional structure of an H-superfamily conotoxin (1997). reveals a granulin fold arising from a common ICK cysteine framework. J. Biol. Chem. 18. N. S. Bende et al., The insecticidal neurotoxin Aps III is an atypical knottin peptide that 294, 8745–8759 (2019). potently blocks insect voltage-gated sodium channels. Biochem. Pharmacol. 85, 45. N. S. Bende et al., A distinct sodium channel voltage-sensor locus determines insect 1542–1554 (2013). selectivity of the spider toxin Dc1a. Nat. Commun. 5, 4350 (2014). 19. C. J. Bohlen et al., A bivalent tarantula toxin activates the capsaicin receptor, TRPV1, 46. J. J. Smith, E. A. B. Undheim, True lies: Using proteomics to assess the accuracy of by targeting the outer pore domain. Cell 141, 834–845 (2010). transcriptome-based venomics in centipedes uncovers false positives and reveals 20. I. R. Chassagnon et al., Potent neuroprotection after stroke afforded by a double- startling intraspecific variation in Scolopendra subspinipes. Toxins (Basel) 10,96 knot spider-venom peptide that inhibits acid-sensing ion channel 1a. Proc. Natl. Acad. (2018). 47. J. J. Smith et al., Unique scorpion toxin with a putative ancestral fold provides insight Sci. U.S.A. 114, 3750–3755 (2017). into evolution of the inhibitor cystine knot motif. Proc. Natl. Acad. Sci. U.S.A. 108, 21. E. A. B. Undheim, M. Mobli, G. F. King, Toxin structures as evolutionary tools: Using 10478–10483 (2011). conserved 3D folds to study the evolution of rapidly evolving peptides. BioEssays 38, 48. B. Madio, E. A. B. Undheim, G. F. King, Revisiting venom of the sea anemone Sti- 539–548 (2016). chodactyla haddoni: Omics techniques reveal the complete toxin arsenal of a well- 22. R. C. Rodríguez de la Vega, A note on the evolution of spider toxins containing the studied sea anemone genus. J. Proteomics 166,83–92 (2017). ICK-motif. Toxin Rev. 24, 383–395 (2005). 49. N. Perdigão et al., Unexpected features of the dark proteome. Proc. Natl. Acad. Sci. 23. P. Escoubas, L. Quinton, G. M. Nicholson, Venomics: Unravelling the complexity of U.S.A. 112, 15898–15903 (2015). animal venoms with mass spectrometry. J. Mass Spectrom. 43, 279–295 (2008). 50. C. Dauly, P. Escoubas, G. F. King, G. M. Nicholson, “Characterization of spider venom 24. S. Dutertre et al., Deep venomics reveals the mechanism for expanded peptide di- peptides by high-resolution LC-MS/MS analysis” (Application Note: 511, Thermo versity in cone snail venom. Mol. Cell. Proteomics 12, 312–329 (2013). Fisher Scientific, 2011). 25. A. Palagi et al., Unravelling the complex venom landscapes of lethal Australian 51. R. V. Pandey, V. Nolte, C. Schlötterer, CANGS: A user-friendly utility for processing and funnel-web spiders (Hexathelidae: Atracinae) using LC-MALDI-TOF mass spectrome- analyzing 454 GS-FLX data in biodiversity studies. BMC Res. Notes 3, 3 (2010). – try. J. Proteomics 80, 292 310 (2013). 52. T. N. Petersen, S. Brunak, G. von Heijne, H. Nielsen, SignalP 4.0: Discriminating signal 26. P. Escoubas, L. Rash, Tarantulas: Eight-legged pharmacists and combinatorial chem- peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011). – ists. Toxicon 43, 555 574 (2004). 53. T. D. Schneider, R. M. Stephens, Sequence logos: A new way to display consensus 27. P. A. Zobel-Thropp, S. M. Correa, J. E. Garb, G. J. Binford, Spit and venom from sequences. Nucleic Acids Res. 18, 6097–6100 (1990). scytodes spiders: A diverse and distinct cocktail. J. Proteome Res. 13, 817– 835 (2014). 54. L. Fu, B. Niu, Z. Zhu, S. Wu, W. Li, CD-HIT: Accelerated for clustering the next- 28. S. S. Pineda et al., Diversification of a single ancestral gene into a successful toxin generation sequencing data. Bioinformatics 28, 3150–3152 (2012). superfamily in highly venomous Australian funnel-web spiders. BMC Genomics 15, 55. G. F. King, M. C. Gentz, P. Escoubas, G. M. Nicholson, A rational nomenclature for 177 (2014). naming peptide toxins from spiders and other venomous animals. Toxicon 52, 29. B. Chevreux et al., Using the miraEST assembler for reliable and automated mRNA 264–276 (2008). transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14, 1147–1159 56. T. D. Goddard, D. G. Kneller, SPARKY (Version 3.11.5, University of California, San (2004). Francisco, 2001). 30. A. Conesa et al., Blast2GO: A universal tool for annotation, visualization and analysis 57. P. Güntert, Automated NMR structure calculation with CYANA. Methods Mol. Biol. in functional genomics research. Bioinformatics 21, 3674–3676 (2005). 278, 353–378 (2004). 31. S. S. Pineda et al., ArachnoServer 3.0: An online resource for automated discovery, 58. Y. Shen, F. Delaglio, G. Cornilescu, A. Bax, TALOS+: A hybrid method for predicting analysis and annotation of spider toxins. Bioinformatics 34, 1074–1076 (2018). protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44, 32. B. Li, C. N. Dewey, RSEM: Accurate transcript quantification from RNA-seq data with 213–223 (2009). or without a reference genome. BMC Bioinformatics 12, 323 (2011). 59. V. B. Chen et al., MolProbity: All-atom structure validation for macromolecular crys- 33. B. G. Fry et al., The toxicogenomic multiverse: Convergent recruitment of proteins tallography. Acta Crystallogr. D Biol. Crystallogr. 66,12–21 (2010). into animal venoms. Annu. Rev. Genomics Hum. Genet. 10, 483–511 (2009). 60. A. Sali, T. L. Blundell, Comparative protein modelling by satisfaction of spatial re- 34. O. Biner et al., Isolation, N-glycosylations and function of a hyaluronidase-like enzyme straints. J. Mol. Biol. 234, 779–815 (1993). from the venom of the spider Cupiennius salei. PLoS One 10, e0143963 (2015). 61. P. P. Sharma et al., Phylogenomic interrogation of arachnida reveals systemic conflicts – 35. R. M. Kini, Excitement ahead: Structure, function and mechanism of snake venom in phylogenetic signal. Mol. Biol. Evol. 31, 2963 2984 (2014). phospholipase A2 enzymes. Toxicon 42, 827–840 (2003). 62. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: A flexible trimmer for Illumina se- – 36. D. C. de Graaf et al., Insights into the venom composition of the ectoparasitoid wasp quence data. Bioinformatics 30, 2114 2120 (2014). Nasonia vitripennis from bioinformatic and proteomic studies. Insect Mol. Biol. 19 63. M. G. Grabherr et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). (suppl. 1), 11–26 (2010). 64. E. Afgan et al., Genomics virtual laboratory: A practical bioinformatics workbench for 37. B. Vincent et al., The venom composition of the parasitic wasp Chelonus inanitus the cloud. PLoS One 10, e0140829 (2015). resolved by combined expressed sequence tags analysis and proteomic approach. 65. K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: BMC Genomics 11, 693 (2010). Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). 38. S. Wen et al., Discovery of an MIT-like atracotoxin family: Spider venom peptides that 66. L. T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: A fast and effective share sequence homology but not pharmacological properties with AVIT family stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. proteins. Peptides 26, 2412–2426 (2005). 32, 268–274 (2015). 39. P. I. Silva Jr., S. Daffre, P. Bulet, Isolation and characterization of gomesin, an 67. B. Q. Minh, M. A. T. Nguyen, A. von Haeseler, Ultrafast approximation for phyloge- 18-residue cysteine-rich defense peptide from the spider Acanthoscurria gomesiana netic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013). hemocytes with sequence similarities to horseshoe crab antimicrobial peptides of the 68. F. Ronquist, J. P. Huelsenbeck, MrBayes 3: Bayesian phylogenetic inference under – tachyplesin family. J. Biol. Chem. 275, 33464 33470 (2000). mixed models. Bioinformatics 19, 1572–1574 (2003). 40. J. K. Klint et al., Production of recombinant disulfide-rich venom peptides for struc- 69. T. Müller, M. Vingron, Modeling amino acid replacement. J. Comput. Biol. 7, 761–776 tural and functional analysis via expression in the periplasm of E. coli. PLoS One 8, (2000). e63865 (2013). 70. J. Soubrier et al., The influence of rate heterogeneity among sites on the time de- 41. M. Mobli, A. S. Stern, W. Bermel, G. F. King, J. C. Hoch, A non-uniformly sampled 4D pendence of molecular rates. Mol. Biol. Evol. 29, 3345–3358 (2012). HCC(CO)NH-TOCSY experiment processed using maximum entropy for rapid protein 71. E. W. Deutsch et al., The ProteomeXchange consortium in 2017: Supporting the cul- sidechain assignment. J. Magn. Reson. 204, 160–164 (2010). tural change in proteomics public data deposition. Nucleic Acids Res. 45, 42. N. S. Bende et al., The insecticidal spider toxin SFI1 is a knottin peptide that blocks the D1100–D1106 (2017). pore of insect voltage-gated sodium channels via a large β-hairpin loop. FEBS J. 282, 72. Y. Perez-Riverol et al., The PRIDE database and related tools and resources in 2019: 904–920 (2015). Improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1914536117 Pineda et al. Downloaded by guest on September 28, 2021