Unique with a putative ancestral fold provides insight into evolution of the inhibitor cystine knot motif

Jennifer J. Smitha, Justine M. Hillb, Michelle J. Littlec, Graham M. Nicholsonc, Glenn F. Kinga,1, and Paul F. Alewooda,1

aInstitute for Molecular Bioscience and bSchool of Chemistry and Molecular Biosciences, University of Queensland, St Lucia QLD 4072, Australia; and cNeurotoxin Research Group, School of Medical and Molecular Biosciences, University of Technology, Sydney NSW 2007, Australia

Edited by Jerrold Meinwald, Cornell University, Ithaca, NY, and approved May 10, 2011 (received for review March 7, 2011)

The three- inhibitor cystine knot (ICK) motif is a fold Here we describe the discovery of a scorpion- peptide common to venom peptides from spiders, , and aquatic that represents an example of a unique fourth scorpion toxin fold. cone snails. Over a decade ago it was proposed that the ICK motif Moreover, this structure is an example of a native peptide that is an elaboration of an ancestral two-disulfide fold coined the adopts the previously hypothetical DDH fold. Thus, the current disulfide-directed β-hairpin (DDH). Here we report the isolation, work provides support for the hypothesis that the DDH fold is

characterization, and structure of a novel toxin [U1-liotoxin-Lw1a the evolutionary precursor of the ICK motif. (U1-LITX-Lw1a)] from the venom of the scorpion Liocheles waigien- sis that is the first example of a native peptide that adopts the DDH Results

fold. U1-LITX-Lw1a not only represents the discovery of a missing Mass Profile of the Liocheles waigiensis Venom Peptidome. Research link in venom protein evolution, it is the first member of a fourth on scorpion has focused primarily on members of the structural fold to be adopted by scorpion-venom peptides. Addi- medically important Buthidae family. There have been far fewer

tionally, we show that U1-LITX-Lw1a has potent insecticidal activity studies on nonbuthids, and very little research has been performed across a broad range of pest species, thereby providing a on Australian scorpions despite their evolutionary significance. unique structural scaffold for bioinsecticide development. We therefore decided to examine the venom of the nonbuthid Australian scorpion L. waigiensis (Scorpiones: Scorpionoidea: molecular evolution ∣ NMR ∣ protein structure Liochelidae) as this species had not been studied previously. MS analysis of L. waigiensis venom revealed 200 distinct masses >10 corpions are one of the most ancient venomous , with ranging from 1 kDa to kDa. The majority (88%) of observa- Sthe oldest fossil scorpion dating to the Silurian Period around ble masses were smaller than 5 kDa (Fig. 1B), indicating that the 430 million years ago (1). Scorpion venoms are consequently venom is dominated by small peptides. Mass profiling of the ve- the product of millions of years of evolutionary fine-tuning, re- nom using both nanoelectrospray and MALDI-TOF instruments sulting in the chemically and pharmacologically complex mixture proved to be complementary, with only approximately 30% of the present in extant venoms (2). The predominant components of total masses (57 of 192) observed using both methods (Fig. 1B). scorpion venoms are bioactive peptides, many of which have potential application in the pharmaceutical (3) and agrochemical Isolation and Sequencing of a Unique Peptide. A unique peptide of (4) industries. mass 4,171.91 Da was purified from L. waigiensis venom using Over the past 60 years, approximately 730 scorpion-venom reversed-phase (rp) HPLC (Fig. 1A). Because its molecular target peptides have been isolated from 56 species (5). The vast majority remains to be determined, the peptide was named U1-liotoxin- of scorpion characterized to date contain a common core Lw1a (U1-LITX-Lw1a) based on the recently introduced nomen- topology comprising one or two short α-helices connected to a clature for venom peptides (14). Reduction of the purified triple-stranded antiparallel β-sheet stabilized by three or four peptide followed by alkylation with iodoacetamide led to a mass disulfide bonds (6). This fold, known as the cystine-stabilized increase of 232 Da, indicating the presence of two disulfide bonds. The reduced and alkylated peptide was subjected to α/β (CSα/β) motif, is one of only three structural scaffolds that N-terminal sequencing that, combined with tandem mass spectro- have been found in disulfide-containing scorpion-venom pep- metry (MS/MS) analysis, revealed the primary structure of the tides. The topologically unrelated cystine-stabilized α-helix-loop- 36-residue peptide as DFPLSKEYESCVRPRKCKPPLKCNKA- helix (CSα/α) fold consists of two short α-helices connected by a QICVDPNKGW. β-turn; only seven scorpion peptides have been discovered that adopt this fold (7). The third fold found in scorpion-venom pep- U -LITX-Lw1a Is Expressed as a Prepropeptide. The DNA sequences tides is the inhibitor cystine knot (ICK) motif (8) that dominates 1 of clones from 5′ RACE analysis of a venom-gland cDNA library the venom peptidome of spiders (9) and is also present in pep- revealed the presence of a unique transcript upstream of the tides from evolutionarily unrelated organisms (10). The CSα/β scorpion neurotoxins evolved by recruitment into the venom and subsequent neofunctionalization of CSα/β defen- Author contributions: J.J.S., M.J.L., G.M.N., and P.F.A. designed research; J.J.S., J.M.H., sins (11). The CSα/β defensins are antimicrobial peptides that are M.J.L., and G.F.K. performed research; J.J.S., M.J.L., G.M.N., G.F.K., and P.F.A. analyzed data; widely distributed in plants, fungi, nematodes, and the and J.J.S., G.M.N., G.F.K., and P.F.A. wrote the paper. classes Insecta and Arachnida. The conservation of gene struc- The authors declare no conflict of interest. ture between scorpion CSα/β defensins, which are found in the This article is a PNAS Direct Submission. haemolymph, and the CSα/β venom neurotoxins supports a para- Data deposition: The atomic coordinates and structure factors have been deposited in the Protein Data Bank, www.pdb.org (PDB ID code 2KYJ). The NMR chemical shifts have been logous relationship between these two classes of peptides (12). deposited in the BioMagResBank, www.bmrb.wisc.edu (accession no. 16963). In contrast, little is known about the evolutionary origins of the 1To whom correspondence may be addressed. E-mail: [email protected] or CSα/α and ICK folds. It was previously suggested that the three- [email protected]. disulfide ICK fold is an elaboration of a simpler, ancestral two- This article contains supporting information online at www.pnas.org/lookup/suppl/ disulfide fold coined the disulfide-directed β-hairpin (DDH) (13). doi:10.1073/pnas.1103501108/-/DCSupplemental.

10478–10483 ∣ PNAS ∣ June 28, 2011 ∣ vol. 108 ∣ no. 26 www.pnas.org/cgi/doi/10.1073/pnas.1103501108 Downloaded by guest on October 1, 2021 A Structure of U1-LITX-Lw1a. The three-dimensional structure of 1.8 90 U1-LITX-Lw1a was determined using homonuclear NMR meth- 1.6 80 ods. Statistics highlighting the high precision and stereochemical 1.4 70 quality of the ensemble of 20 U1-LITX-Lw1a structures are * shown in Table S1. The highest-ranked member of the ensemble 1.2 60 oS

vl

e has a MolProbity score of 2.54, placing it in the 45th percentile 1.0 50 n t relative to all other structures ranked by MolProbity.

0.8 40 % B

(arbritrary units) The structure of U1-LITX-Lw1a is remarkable for a scorpion- 0.6 30 α β α α 214nm venom peptide. It does not contain the CS / ,CS / , or ICK A 0.4 20 motif common to other disulfide-rich scorpion toxins but instead 0.2 10 contains a unique two-disulfide scaffold (Fig. 2). The only ele- 0.0 ments of secondary structure are two short but well-defined two- 10 20 30 40 50 60 70 80 90 100 110 stranded β-sheets. The N-terminal β-sheet comprises β-strands 1 Retention time (min) and 2 (residues 4–5 and 16–17, respectively), whereas the C-term- B inal β-sheet is composed of β-strands 3 and 4 (residues 22–23 and 80 250 29–30, respectively) (Fig. 2). Nanospray 200 uC MALDI-TOF Remarkably, a search for structural homologs of U1-LITX- 60 m

u (105) (144)

l Lw1a using DALI (17) yielded a total of 29 unique matches with 150 a

vit ≥2

40 e a statistically significant Z score . Of these structural homo- 100 ot logs, 28 contain an ICK motif, including 23 spider toxins and one at 48 57 87 20 50 l cone snail toxin. Notably, however, all of these structurally homo- Number of masses 0 0 logous toxins contain at least one additional disulfide bond. <1 ICK toxins have proliferated in spider venoms to the point >10 1 to 22 to 33 to 44 to 55 to 66 to 77 to 88 to 9 9 to 10 where they now dominate most spider-venom peptidomes (18). Mass (kDa) The marked insensitivity of this structural scaffold to changes in Fig. 1. Analysis of the venom peptidome of L. waigiensis.(A) rpHPLC intercystine resides has enabled spiders to develop diverse phar- chromatogram of crude venom from the scorpion L. waigiensis. The aceto- macologies using the same disulfide framework. Fig. 4 shows an nitrile gradient is shown as a solid line, and the peak marked with an asterisk overlay of U1-LITX-Lw1a on the structure of ICK spider toxins corresponds to the U1-LITX-Lw1a peptide. (B, Left) Histogram showing the with different pharmacologies, including toxins with high affinity abundance of peptide toxins in the venom of L. waigiensis, sorted into for P2X3 receptors, NaV channels, KV channels, and calcium- 1-kDa molecular mass bins. The overlaid curve shows the cumulative total K number of peptide masses identified from MS analyses. (Right) Venn diagram activated potassium ( Ca) channels. The inherent functional showing overlap of peptide masses determined using electrospray and diversity of ICK toxins means that these structural homologs MALDI-TOF mass spectrometry. provide few clues as to the likely target of U1-LITX-Lw1a. In con- trast, as discussed below, U1-LITX-Lw1a might provide an indi- region encoding the mature U1-LITX-Lw1a toxin. Analysis of cation of the evolutionary origin of ICK venom peptides. the transcript using SignalP 3.0 (15) indicated that it is comprised of a 5′ UTR, signal peptide, propeptide, mature toxin encoding U1-LITX-Lw1a Is a Potent Insecticidal Toxin. U1-LITX-Lw1a was region, and a 3′ UTR (Fig. S1), with a polyadenylation signal lethal to all three insect pest species tested (Fig. 3). In crickets, (AATAAA) 41 nucleotides downstream of the stop codon. The intermittent twitching of appendages was initially observed and signal peptide consists of 25 residues, 16 of which are hydropho- by 2 h postinjection the degree of paralysis ranged from slight at low doses to complete paralysis at high doses. Blowfly larvae bic. The propeptide comprises 14 residues and half of these are also exhibited a dose-dependent contractile paralysis. The LD50 acidic. Although acidic propeptide regions are a common feature ranged from 0.78 nmol∕g in mealworms to 5 nmol∕g in adult of spider-venom toxin transcripts (16), of those scorpion toxin blowflies (Fig. 3). precursors known to have a propeptide region, only a few mem- We attempted to determine the molecular target of the toxin bers of the calcine family have a high number of acidic residues in by analyzing its effect on ion channel currents in cockroach dorsal the propeptide region. unpaired median (DUM) neurons. However, no significant effects were observed on the amplitude, kinetics, or the voltage Synthesis of U1-LITX-Lw1a and Determination of Disulfide-Bond dependence of activation of voltage-gated sodium (NaV ), potas- Connectivity. Synthetic U1-LITX-Lw1a was produced by solid- K μ phase peptide synthesis. Oxidation of the synthetic peptide sium ( V ), and calcium (CaV ) currents when up to 3 MU1- LITX-Lw1a was applied to DUM neurons. resulted in one major peak that was shown to coelute with the native peptide on rpHPLC (Fig. S2). Because the synthetic pep- tide coeluted with the native peptide, and there was only a limited amount of native material, the disulfide-bond connectivity was determined by tryptic digest of the synthetic toxin. There are

three connectivities possible for a peptide with two disulfide BIOCHEMISTRY bonds (Fig. S3A). However, as there is a tryptic cleavage site be- tween each of the residues in U1-LITX-Lw1a, digestion of the fully oxidized peptide with trypsin should give a unique mass fingerprint for each of the three possible disulfide-bond Fig. 2. Stereo view of the structure of U1-LITX-Lw1a. The 20 members of connectivities. Indeed, the mass spectra obtained for the oxidized the structural ensemble are overlaid for best fit over the backbone atoms – β – peptide following tryptic digestion (Fig. S3B) allowed unambig- of residues 2 36. -strands 1 4 are shown in cyan, and the side chains of the two disulfide bonds are highlighted in red. The N and C termini are uous determination that the cysteine residues in U1-LITX-Lw1a labeled. The hydrophobic core of the protein is composed of the two disul- are arranged in a 1–3, 2–4 connectivity (Fig. S3A, Top). fide bonds and the side chain of Leu21, which is shown in magenta.

Smith et al. PNAS ∣ June 28, 2011 ∣ vol. 108 ∣ no. 26 ∣ 10479 Downloaded by guest on October 1, 2021 Mealworm Cricket 100 100 unperturbed by removal of the N-terminal disulfide bridge (20, 80 80 21). Thus, we previously proposed that the ICK fold is a minor 60 60 elaboration of a simpler ancestral fold that we referred to as the 40 40 DDH (13). % lethality 20 LD50 784 pmol/g % lethality 20 LD50 1284 pmol/g The DDH fold is shown schematically in Fig. 4F, and its amino 0 0 acid consensus sequence was determined to be CX5–19CX2 -11 -10 -9 -8 -7 -11 -10 -9 -8 -7 ½ log Dose (moles/gram) log Dose (moles/gram) GorP X2CX6–19C, where X is any (13). The DDH fold differs from the ICK fold in that there are only two manda- Blowfly larvae Blowfly adult 100 100 tory disulfide bridges that form the bulk of the hydrophobic core, 80 80 so that loop 1 is no longer necessarily bounded by an N-terminal 60 60 Cys residue, and loop 3 is generally five residues in length with a 40 40 central Gly or Pro to ensure a tight turn prior to the first β-strand. % lethality % lethality LD 4382 pmol/g LD 4995 pmol/g 20 50 20 50 The residue following the Gly/Pro in loop 3 is generally a hydro- 0 0 phobe as this residue, along with the two buried disulfide bridges, -11 -10 -9 -8 -7 -11 -10 -9 -8 -7 log Dose (moles/gram) log Dose (moles/gram) constitute the minihydrophobic core of this domain. Based on their studies on EETI II, Heitz et al. (20) also proposed the ex- Fig. 3. Dose-response curves resulting from injection of U1-LITX-Lw1a into istence of an elementary motif, which they named the cystine- various . The LD50 values are indicated. Data points are the mean SE stabilized β-sheet (CSB). The CSB motif is in fact a more strin- of three experiments. gently defined DDH, containing smaller first and third loops and a triple-stranded β-sheet. Because the DDH motif is less rigidly Discussion defined, it should be considered the ancestral fold as it includes Evolution of the Inhibitor Cystine Knot. Despite its abundance in many proteins that are excluded from the CSB fold. invertebrate venoms such as spiders, scorpions, and cone snails, It is clear from Fig. 4 that the two in U1-LITX-Lw1a the evolutionary origin of the ICK motif is unclear as there are no overlay well with the two central disulfides in the ICK motif; in invertebrate body proteins known to contain this motif. The ICK contrast, the “outer” solvent-exposed disulfide that closes the “ ” motif comprises a ring formed by two disulfides and the inter- ring of the ICK motif is missing in U1-LITX-Lw1a. Thus, the vening sections of polypeptide backbone, with a third disulfide structure of U1-LITX-Lw1a corresponds exactly to the previously piercing the ring to create a pseudoknot (Fig. 4F). The compact hypothesized DDH fold. Although the Liochelidae family to hydrophobic core of the ICK motif consists largely of the two cen- which L. waigiensis belongs is not believed to be an evolutionarily tral disulfide bridges that emanate from the two β-strands that ancient scorpion (22), U1-LITX-Lw1a adopts the heretofore- characterize the ICK fold (19). In contrast, the N-terminal disul- missing native ancestral DDH fold that may be the evolutionary fide bridge contributes very little to the hydrophobic core, and it precursor to the ICK motif. Thus, U1-LITX-Lw1a lends support has been demonstrated that the tertiary structure and thermal to the hypothesis that the ICK toxins, at least in arachnids, may stability of the ICK-containing trypsin inhibitor EETI II is largely have originated from a DDH-encoding gene that was recruited

Fig. 4. Overlay of U1-LITX-Lw1a and structurally homologous ICK toxins. The NMR structure of U1-LITX-Lw1a (red) is overlaid on the structures of different structurally homologous spider ICK toxins (gray), including (A) guangxitoxin (PDB ID code 2WH9), (B) GsMTX-4 (PDB ID code 1TYK), (C) hainantoxin-I (PDB ID code 1NIX), (D) purotoxin (PDB ID code 2KGU), and (E) κ-hexatoxin-Hv1c (PDB ID code 1DL0). The two central disulfides of the ICK toxins, shown as gray tubes, overlap with the two disulfides of U1-LITX-Lw1a (shown as gold tubes). Additional disulfide bonds in the ICK toxins are shown as blue tubes. The molecular target of each spider toxin is indicated. (F) Graphical representation of the DDH and ICK motifs, with the disulfides and β-sheet of the DDH motif shown in orange and green, respectively. The third disulfide necessary for the formation of the ICK motif is shown in blue.

10480 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1103501108 Smith et al. Downloaded by guest on October 1, 2021 into the venom and subsequently elaborated to encode an addi- bined data suggest that U1-LITX-Lw1a is an insect-selective toxin tional disulfide bond. that represents a previously undescribed molecular scaffold for The phylogeny of scorpions is highly controversial, with recent bioinsecticide development. estimates of the number of extant families ranging from 13 to 18 (23–25). Nevertheless, it is clear that Liochelidae is not a basal Structure of the U1-LITX-Lw1a Transcript. It is becoming evident that group (22). Thus, it is possible that U1-LITX-Lw1a is not the pre- venom from nonbuthid species possess fewer high molecular cursor of an ICK peptide but rather derived from it. However, weight long-chain NaV channel toxins than found in Buthidae this would leave unsolved the question of how the ICK fold scorpion venoms (2). The venom profile of L. waigiensis was recruited into the venom peptidome. In contrast, as pointed continues this trend as most of the observed peptide masses were out previously (13), the DDH fold is present in numerous eukar- less than 5 kDa. However, although the venom peptidome of yotic body proteins, and hence it seems likely that the highly L. waigiensis is consistent with that of other nonbuthid venoms, stable ICK fold was derived by simple modification of an ances- the transcript encoding U1-LITX-Lw1a is atypical for scorpion- trally recruited DDH motif. venom peptides. The mRNA transcript encoding U1-LITX-Lw1a comprises a A Unique Bioinsecticide Scaffold. The primary structure of U1- 25-residue signal sequence and a 14-residue propeptide preced- LITX-Lw1a shows homology to three scorpion peptides: LaIT1 ing the 36-residue mature toxin sequence. Propeptide sequences (91% identity) from Liocheles australasiae (family Liochelidae) are ubiquitous in transcripts encoding peptide toxins from cone (26), OcyC10 (68% identity) from Opisthacanthus cayaporum snails and spiders, but they are uncommon in scorpion toxin tran- (Liochelidae) (27), and SmpIT2 (80% identity) from Scorpio scripts (18). Of the 250 scorpion toxin transcripts that have been maurus palmatus (Scorpionidae) (28) (Fig. 5). LaIT1 and sequenced, only approximately 20 have a proregion following the SmpIT2, like U1-LITX-Lw1a, are insecticidal, which is not sur- signal peptide (5); these correspond to members of the long-chain β α prising given that insects are the primary prey of most scorpions, and short-chain potassium channel toxins, non-disulfide- – and hence this is the function of most venom toxins. LaIT1 bonded antimicrobial peptides, and the calcine family (29 31). induced limb spasms when injected into crickets at a dose of Ten of the proregion containing peptides belong to the calcine approximately 4.8 nmol∕g (i.e., approximately 4-fold higher than family. The calcines are -penetrating peptides (CPPs) that – – the LD50 value for U1-LITX-Lw1a) (26), whereas injection of a target ryanodine receptors (32 34). They contain 33 35 residues, venom fraction containing SmpIT2 into Sarcophaga falculata adopt the ICK fold (35), and, like most CPPs, have a high net blowfly larvae caused flaccid paralysis (28). This contrasts with positive charge (+7) (36, 37). An alanine scan of maurocalcine and imperatoxin-A revealed a critical basic region containing an the contractile paralysis caused by U1-LITX-Lw1a in Lucilia cu- essential Arg residue at position 24 that is essential for activity at prina blowfly larvae; thus, SmpIT2 and U1-LITX-Lw1a possibly have different effects on the same molecular target or act on the ryanodine receptor (37, 38). Although U1-LITX-Lw1a and different receptors altogether. the calcines both have numerous basic residues and are similar in size, U1-LITX-Lw1a has a net charge of only +4, and it does The similar LD50 values obtained when U1-LITX-Lw1a was tested across a range of invertebrate orders implies that the not appear to have the attributes necessary for ryanodine recep- molecular target of the toxin is evolutionarily conserved. More- tor activation (Fig. S4). Nevertheless, the conservation of tran- script architecture and the similarity in size and structure of the over, this molecular target may be unique for arachnid toxins mature toxins suggests that the calcines and U1-LITX-Lw1a because high doses of U1-LITX-Lw1a (3 μM) did not affect might be evolutionarily related. NaV , KV ,orCaV currents in cockroach DUM neurons, ruling In conclusion, the solution structure of U1-LITX-Lw1a re- out most of the common targets of scorpion and spider toxins. presents a previously undescribed scorpion toxin fold. This U1-LITX-Lw1a did not alter NaV or CaV currents in rat dorsal previously undescribed fold corresponds to the previously pro- root ganglion neurons at a concentration of 1 μM, and no toxic posed but heretofore hypothetical DDH motif, and it provides effects were reported when approximately 240 pmol LaIT1 support for the hypothesis that the DDH motif is the ancestral (which shares 91% sequence identity with U1-LITX-Lw1a) was precursor of scorpion-venom ICK toxins such as the calcines. applied intracerebroventricularly to mice (26). Thus, the com- Further research into the venoms of scorpions and other inverte- brates might reveal the presence of additional peptides with the DDH fold, and analysis of their structure and function should Signal peptide Propeptide provide a greater understanding of the evolutionary relationship between DDH and ICK toxins. Materials and Methods Venom Extraction. Liocheles waigiensis were purchased from The Green Scorpion. Venom was extracted manually by inducing the scorpion to sting a parafilm-covered Eppendorf tube. Venom was lyophilized before storage at −20 °C. Each milking occurred at least 2 wk after the previous milking.

Mature peptide Venom Fractionation. Lyophilized venom was reconstituted in 0.05% TFA in water (solvent A) before HPLC fractionation using a narrow bore reversed- phase C8 column (Zorbax C8, 2.1 × 150 mm, 300 Å) on a Shimadzu VP system. 0 25 ∕ Venom components were eluted at a flow rate of . mL min using a linear BIOCHEMISTRY gradient of 5–60% solvent B (90% acetonitrile, 0.043% TFA) over 115 min preceded by 5% solvent B for 5 min. Absorbance was monitored at 10A 214 nm and 280 nm using a SPD- VP UV detector. Fractions were manually collected and lyophilized.

Fig. 5. Alignment of the amino acid sequences of U1-LITX-Lw1a, OcyC10, Mass Spectrometric Analysis. Lyophilized HPLC fractions were dissolved in LaIT1, and SmpIT2. The amino acid translation of the predicted signal and solvent A and 0.5 μL was spotted onto a MALDI-TOF plate with 0.5 μL α- predicted propeptide regions of the U1-LITX-Lw1a and OcyC10 transcripts cyano-4-hydroxycinnamic acid matrix (10 mg∕mL in 60% acetonitrile). Spots are shown. Only the mature toxin regions of LaIT1 and SmpIT2 are available. were analyzed using an ABI 4700 Voyager mass spectrometer (MS) in positive Signal, propeptide, and mature toxin regions are indicated with arrows. reflector mode. For nanospray MS analysis, crude venom was dissolved in a

Smith et al. PNAS ∣ June 28, 2011 ∣ vol. 108 ∣ no. 26 ∣ 10481 Downloaded by guest on October 1, 2021 minimal amount of a 50∶50 mixture of solvent A and B and then diluted to was determined as described previously (42). Whole-cell recordings of ionic 0.5 mg∕mL in solvent A. Dissolved venom (0.75 μL) was fractionated by currents in DUM neurons isolated from the American cockroach Periplaneta rpHPLC using a Vydac C18 column (300 Å, 5 μm, 150 × 0.3 mm). Venom com- americana were made as described previously (43), with minor modifications ponents were eluted at a flow rate of 4 μL∕ min using a gradient of 2% buffer as described in SI Text. B (90% acetonitrile, 0.1% formic acid) for 10 min, followed by 10–40% buffer – B over 110 min, then 40 80% buffer B over 10 min. Online nanoelectrospray NMR Structure Determination. Lyophilized U1-LITX-Lw1a was resuspended at analysis was performed using a QSTAR Elite mass spectrometer. a final concentration of 300 μM in 90% H2O∕10% D2O, pH 3.1. The sample was filtered using a low-protein-binding Ultrafree-MC centrifugal filter Peptide Sequencing. U1-LITX-Lw1a was isolated from crude L. waigiensis (0.22 μm), then 550 μL was added to a 5-mm outer-diameter susceptibility- venom. Disulfides were reduced with 100 mM tris(2-carboxyethyl)phosphine matched microtube (Shigemi). NMR spectra were acquired at 283 K using in 100 mM ammonium bicarbonate (pH 6.5) at 50 °C for 1 h. were a cryoprobe-equipped 900-MHz Avance spectrometer (Bruker BioSpin). alkylated by incubation with 200 mM iodoacetamide in 100 mM ammonium Two-dimensional total correlation spectroscopy (spin-lock time of 80 ms), bicarbonate (pH 9.0) for a further hour at 50 °C and the peptide desalted NOESY (mixing time of 250 ms), and double-quantum filtered COSY spectra using rpHPLC. Reduced and alkylated U1-LITX-Lw1a was sequenced using were acquired. an Applied Biosystems Procise HT Protein Sequencer at the Biomolecular NMR spectra were analyzed using the program XEASY (44). A complete Research Facility. For de novo sequencing using MS/MS, reduced and alky- set of sequence-specific resonance assignments was obtained, and chemical 1∶50 lated U1-LITX-Lw1a was digested by incubation with an approximately shifts were deposited in BioMagResBank (accession no. 16963). The NOESY molar ratio of trypsin (Proteomics Grade, Sigma) at pH 9 for 14 h at 37 °C, and spectrum was manually peak picked and integrated; then the peak lists were the resultant cleavage products were separated using rpHPLC. A 4700 Voya- assigned and an ensemble of structures calculated automatically using the ger mass spectrometer was used for analysis of tryptic fragments; MS ions CANDID module of the torsion angle dynamics package CYANA (45, 46). were selected for MS/MS followed by manual analysis of spectra using The tolerances used for CANDID were 0.025 ppm in both 1H dimensions. GPS DeNovo Explorer (version 3.6) and MS-Product (http://prospector.ucsf. The Phe2–Pro3 and Lys18–Pro19 peptide bonds were determined to be in edu/prospector/cgi-bin/msform.cgi?form=msproduct). the trans conformation on the basis of characteristic Hα–Hδ NOEs, whereas the Arg13–Pro14 and Pro19–Pro20 peptide bonds were both clearly identi- 1 Chemical Synthesis of U1-LITX-Lw1a. U -LITX-Lw1a was chemically synthesized fied as cis based on characteristic Hα–Hα NOEs (47). Initial structure calcula- using the stepwise in situ neutralization protocol for Boc chemistry followed tions in combination with analysis of NOE networks allowed assignment by HF cleavage (39). Boc-protected amino acids were purchased from Nova- of χ1 restraints for all four Cys residues. biochem and the following side-chain protected amino acids were used: The disulfide-bond configuration of U1-LITX-Lw1a was unambiguously C(MeBzl), D(Chxl), E(Chxl), K(ClZ), N(Xan), Q(Xan), R(Tos), S(Bzl), and Y(BrZ). determined via chemical methods (see Results), thus allowing disulfide-bond All other amino acids were unprotected. The cleaved peptide was purified restraints of 2.0 ≤ d ≤ 2.1 ÅforSγ ðiÞ-Sγ ðjÞ, and 3.0 ≤ d ≤ 3.1 Å for both using rpHPLC, then oxidized/folded in 30% isopropanol/0.1 M ammonium CβðiÞ-Sγ ðjÞ and Sγ ðiÞ-CβðjÞ to be used in the structure calculations. Six hydrogen bicarbonate, pH 8.0 for 3 d. The folded peptide was purified to homogeneity bonds were clearly identified in preliminary rounds of structure calculation. using rpHPLC. Hydrogen-bond restraints of 1.7–2.2 Å and 2.7–3.2 Å were employed for the

HN-O and N-O distances, respectively, in subsequent structure calculations Sequencing of U1-LITX-Lw1a Transcript. A QuickPrep Micro mRNA Purification (48). In the final round of structure calculations, CYANA was used to calculate Kit (Illustra, GE Healthcare) was used to extract mRNA from the telson of one 100 structures from random starting conformations; then the 20 conformers L. waigiensis specimen 3 d after milking. The FirstChoice RLM-RACE Kit with highest stereochemical quality as judged by MolProbity (49) were (Ambion) was used to ligate a 5′ adapter sequence to the mRNA. Single- selected to represent the solution structure of U1-LITX-Lw1a. During the stranded cDNA was synthesized from the ligated mRNA using Superscript automated NOESY assignment/structure calculation process the CANDID III reverse transcriptase (Invitrogen) and a poly dT primer. The 5′ region of module of CYANA assigned 93.6% of all NOESY cross-peaks (1,699 out of the U1-LITX-Lw1a transcript was established by amplifying the region of 1,815), a significantly higher assignment rate than in most previous reports cDNA between the 5′ adaptor and the start of the mature toxin. The 5′ adap- (46, 50). Coordinates for the final ensemble of structures are available from tor primer used was as per the Ambion kit, and a degenerate primer was the Protein Data Bank (PDB ID code 2KYJ). designed based on the N-terminal amino acid sequence of U1-LITX-Lw1a determined from Edman sequencing. The complete U1-LITX-Lw1a prepro- peptide transcript was found by PCR using the 5′ adaptor primer and a poly ACKNOWLEDGMENTS. We thank Dr. Geoffrey W. Brown (Agri-Science dT primer. Amplified fragments were subsequently cloned and sequenced. Queensland, Brisbane, Australia) for supply of sheep blowfly larvae, and Dr. Brid Callaghan [Queensland Brain Institute, University of Queensland (UQ), St Lucia, Australia] for DRG assays. We acknowledge financial support Insect Bioassays. U1-LITX-Lw1a solutions in insect saline (40) were injected from Australian Research Council Discovery Grants DP0774245 and into Acheta domestica (house crickets, mass 60–100 mg), Tenebrio molitor DP1095728 (to G.F.K.), the National Health and Medical Research Council larvae (mass 190–210 mg), and Lucilia cuprina larvae and adults (mass (NHMRC) of Australia (Program Grant 351446), and UQ. J.M.H. is an NHMRC 38–42 mg and 18–22 mg, respectively) (41). The median lethal dose (LD50) RD Wright Fellow (401748).

1. Dunlop JA, Selden PA (2009) Calibrating the chelicerate clock: A paleontological reply 12. Zhu S, Li W, Jiang D, Zeng X (2000) Evidence for the existence of insect defensin-like to Jeyaprakash and Hoy. Exp Appl Acarol 48:183–197. peptide in scorpion venom. IUBMB Life 50:57–61. 2. Rodriguez de la Vega RC, Schwartz EF, Possani LD (2010) Mining on scorpion venom 13. Wang X, et al. (2000) Discovery and characterization of a family of insecticidal biodiversity. Toxicon 56:1155–1161. neurotoxins with a rare vicinal disulfide bridge. Nat Struct Biol 7:505–513. 3. Mamelak AN, Jacoby DB (2007) Targeted delivery of antitumoral therapy to glioma 14. King GF, Gentz MC, Escoubas P, Nicholson GM (2008) A rational nomenclature for nam- and other malignancies with synthetic chlorotoxin (TM-601). Expert Opin Drug Deliv ing peptide toxins from spiders and other venomous animals. Toxicon 52:264–276. – 4:175 186. 15. Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal 4. Zlotkin E, Fishman Y, Elazar M (2000) AaIT: From neurotoxin to insecticide. Biochimie peptides: SignalP 3.0. J Mol Biol 340:783–795. – 82:869 881. 16. Tedford HW, Sollod BL, Maggio F, King GF (2004) Australian funnel-web spiders: 5. He QY, et al. (2008) ATDB: A uni-database platform for toxins. Nucleic Acids Res Master insecticide chemists. Toxicon 43:601–618. 36:D293–D297. 17. Holm L, Rosenstrom P (2010) Dali server: Conservation mapping in 3D. Nucl Acids Res 6. Rodriguez de la Vega RC, Possani LD (2004) Current views on scorpion toxins specific 38:W545–W549. for Kþ-channels. Toxicon 43:865–875. 18. Sollod BL, et al. (2005) Were arachnids the first to use combinatorial peptide libraries? 7. Chagot B, et al. (2005) An unusual fold for potassium channel blockers: NMR structure Peptides 26:131–139. of three toxins from the scorpion Opisthacanthus madagascariensis. Biochem J 19. Fletcher JI, et al. (1997) The structure of a novel insecticidal neurotoxin, ω-atracotoxin- 388:263–271. – 8. Pallaghy PK, Nielsen KJ, Craik DJ, Norton RS (1994) A common structural motif HV1, from the venom of an Australian funnel web spider. Nat Struct Biol 4:559 566. incorporating a cystine knot and a triple-stranded β-sheet in toxic and inhibitory 20. Heitz A, Le-Nguyen D, Chiche L (1999) Min-21 and min-23, the smallest peptides that polypeptides. Protein Sci 3:1833–1839. fold like a cystine-stabilized beta-sheet motif: design, solution structure, and thermal – 9. King GF, Tedford HW, Maggio F (2002) Structure and function of insecticidal stability. Biochemistry 38:10615 10625. neurotoxins from Australian funnel-web spiders. J Toxicol-Toxin Rev 21:359–389. 21. Le-Nguyen D, Heitz A, Chiche L, El Hajji M, Castro B (1993) Characterization and 2D 10. Craik DJ, Daly NL, Waine C (2001) The cystine knot motif in toxins and implications NMR study of the stable [9–21, 15–27] 2 disulfide intermediate in the folding of the 3 for drug design. Toxicon 39:43–60. disulfide trypsin inhibitor EETI II. Protein Sci 2:165–174. 11. Fry BG, et al. (2009) The toxicogenomic multiverse: Convergent recruitment of 22. Prendini L (2000) Phylogeny and classification of the superfamily Scorpionoidea proteins into animal venoms. Annu Rev Genomics Hum Genet 10:483–511. Latreille 1802 (Chelicerata, Scorpiones): An exemplar approach. Cladistics 16:1–78.

10482 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1103501108 Smith et al. Downloaded by guest on October 1, 2021 23. Prendini L, Wheeler WC (2005) Scorpion higher phylogeny and classification, taxo- 36. Boisseau S, et al. (2006) Cell penetration properties of maurocalcine, a natural venom nomic anarchy, and standards for peer review in online publishing. Cladistics peptide active on the intracellular ryanodine receptor. Biochim Biophys Acta 21:446–494. 1758:308–319. 24. Fet V, Soleglad ME (2005) Contributions to scorpion systematics. I. On recent changes 37. Mabrouk K, et al. (2007) Critical amino acid residues of maurocalcine involved in in high-level . Euscorpius 31:1–13. pharmacology, lipid interaction and cell penetration. Biochim Biophys Acta 25. Volschenk ES, Mattoni CI, Prendini L (2008) Comparative anatomy of the mesosomal 1768:2528–2540. organs of scorpions (Chelicerata, Scorpiones), with implications for the phylogeny of 38. Lee CW, et al. (2004) Molecular basis of the high-affinity activation of type 1 ryanodine the order. Zool J Linn Soc 154:651–675. receptors by imperatoxin A. Biochem J 377:385–394. 26. Matsushita N, Miyashita M, Sakai A, Nakagawa Y, Miyagawa H (2007) Purification and 39. Schnolzer M, Alewood P, Jones A, Alewood D, Kent SB (1992) In situ neutralization in characterization of a novel short-chain insecticidal toxin with two disulfide bridges Boc-chemistry solid phase peptide synthesis. Rapid, high yield assembly of difficult from the venom of the scorpion Liocheles australasiae. Toxicon 50:861–867. sequences. Int J Pept Protein Res 40:180–193. 27. Silva EC, et al. (2009) Cloning and characterization of cDNA sequences encoding for 40. Eitan M, et al. (1990) A scorpion venom neurotoxin paralytic to insects that affects new venom peptides of the Brazilian scorpion Opisthacanthus cayaporum. Toxicon sodium current inactivation: Purification, primary structure, and mode of action. 54:252–261. Biochemistry 29:5941–5947. 28. Lazarovici P, Yanai P, Pelhate M, Zlotkin E (1982) Insect toxic component from the 41. Wang X, et al. (1999) Structure-function studies of omega-atracotoxin, a potent venom of a chactoid scorpion, Scorpio maurus palmatus (Scorpionidae). J Biol Chem antagonist of insect voltage-gated calcium channels. Eur J Biochem 264:488–494. 257:8397–8404. 42. Tedford HW, Fletcher JI, King GF (2001) Functional significance of the beta hairpin in 29. Yuan W, et al. (2010) Cloning and functional characterization of a new antimicrobial the insecticidal neurotoxin ω-atracotoxin-Hv1a. J Biol Chem 276:26568–26576. peptide gene StCT1 from the venom of the scorpion Scorpiops tibetanus. Peptides 43. Gunning SJ, et al. (2008) The Janus-faced atracotoxins are specific blockers of inverte- 31:22–26. brate K(Ca) channels. FEBS J 275:4045–4059. 30. Zeng XC, et al. (2000) Cloning and characterization of a cDNA sequence encoding the 44. Bartels C, Xia TH, Billeter M, Güntert P, Wüthrich K (1995) The program XEASY for precursor of a chlorotoxin-like peptide from the Chinese scorpion Buthus martensii computer-supported NMR spectral analysis of biological macromolecules. J Biomol Karsch. Toxicon 38:1009–1014. NMR 6:1–10. 31. Zhu S, et al. (1999) Molecular cloning and sequencing of two ‘short chain’ and two 45. Güntert P (2004) Automated NMR structure calculation with CYANA. Methods Mol ‘long chain’ Kþ channel-blocking peptides from the Chinese scorpion Buthus martensii Biol 278:353–378. Karsch. FEBS Lett 457:509–514. 46. Herrmann T, Güntert P, Wüthrich K (2002) Protein NMR structure determination with 32. Nabhani T, et al. (2002) Imperatoxin a enhances Ca2þ release in developing skeletal automated NOE assignment using the new software CANDID and the torsion angle muscle containing ryanodine receptor type 3. Biophys J 82:1319–1328. dynamics algorithm DYANA. J Mol Biol 319:209–227. 33. Fajloun Z, et al. (2000) Chemical synthesis and characterization of maurocalcine, a 47. Wüthrich K (1986) NMR of Proteins and Nucleic Acids (John Wiley, New York). scorpion toxin that activates Ca2þ release channel/ryanodine receptors. FEBS Lett 48. Fletcher JI, Chapman BE, Mackay JP, Howden MEH, King GF (1997) The structure of 469:179–185. versutoxin (δ-atracotoxin-Hv1) provides insights into the binding of site 3 neurotoxins 34. Shahbazzadeh D, et al. (2007) Hemicalcin, a new toxin from the Iranian scorpion to the voltage-gated sodium channel. Structure 5:1525–1535. Hemiscorpius lepturus which is active on ryanodine-sensitive Ca2þ channels. Biochem 49. Davis IW, et al. (2007) MolProbity: All-atom contacts and structure validation for J 404:89–96. proteins and nucleic acids. Nucleic Acids Res 35:W375–W383. 35. Mosbah A, et al. (2000) A new fold in the scorpion toxin family, associated with an 50. Robson SA, King GF (2006) Domain architecture and structure of the bacterial cell activity on a ryanodine-sensitive calcium channel. Proteins 40:436–442. division protein DivIB. Proc Natl Acad Sci USA 103:6700–6705. BIOCHEMISTRY

Smith et al. PNAS ∣ June 28, 2011 ∣ vol. 108 ∣ no. 26 ∣ 10483 Downloaded by guest on October 1, 2021