FUNCTIONAL PREDICTIONOF BIOACTIVE IN THROUGH BIOINFORMATICS TAN THIAM JOO ,PAUL ( B. Appl. Sc. (Hons.), NUS ) ATHESISSUBMITTED FORTHEDEGREEOFDOCTOROFPHILOSOPHY DEPARTMENTOFBIOCHEMISTRY NATIONAL UNIVERSITYOFSINGAPORE 2005

Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics Acknowledgements

ThroughoutmyPh.D.candidature,Ihavebeenaccompaniedandsupportedby friendsandfamilymemberstocompletethisthesis.SoitiswithdeepgratitudethatI expressmyheartfeltappreciationtothefollowing:  AlmightyGodwhohasblessedmewithgiftsandtalentstosharewithothers.  Professor Vladimir Brusic, my supervisor and mentor, whom I owe lots of gratitude. Through his guidance and advice, I have improved on my writing skillandlearnttobeanindependentresearcher.Itisalsothroughhisfaithin methatIhaverealisedmypotential.  ProfessorShobaRanganathan,mycosupervisor,forhervaluableadvicesand supportwhichmotivatedmetopursuePh.D.  Seng Hong, Fahad, ZongHong, Anitha and XuanLinh for their computing assistanceinmyresearch.  Asif, Heiny, Stephanie and Wilson for their critique of my dissertation and companionshipduringlunchandatI2R.  Judice,Chris,YewKwangandLynnfortheirlisteningearsandencouragement duringdifficulttimes.  Bernett,Lesheng,Vivek,VictorandJustin,myfellowpostgraduatefriendsfor theircomradeship.  My mother, Madam Soong Kim Song, for her perseverance in the face of adversity.  My family especially my eldest sister, Anna, for their love, encouragement, prayersandsupport.

Mydeepestandsinceregratitude,

PaulTanThiamJoo

November,2005

I Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics TableofContents

Acknowledgements...... I TableofContents…...... II Summary…………...... VI ListofTables……...... VIII ListofFigures……...... IX PartI:Chapter1 Introduction ...... 1 1.1 Researchissuesinvestigatedinthisthesis...... 6 1.2 Contributionofthisthesis...... 8 1.3 Asummaryofthethesis ...... 9 PartI:Chapter2 Literaturereview ...... 11 2.1 Useofbioinformaticstocomplementexperimentalstudies...... 12 2.2 Genomesequencingofvenomousanimals...... 13 2.3 Sourcesofdataandrelatedinformation...... 14 2.3.1 GenBankandGenPeptdatabases...... 14 2.3.2 SwissProtandTrEMBLdatabases ...... 15 2.3.3 ProteinDataBank(PDB)...... 15 2.3.4 PubMedliteraturedatabase...... 16 2.3.5 Issuesondatacollection,cleaning,annotation ...... 16 2.4 Datawarehousesoftoxins ...... 18 2.5 Bioinformatictools ...... 18 2.6 Bioinformaticapplications...... 19 2.7 Predictionofstructureandfunctionoftoxins...... 20

II Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics Chaptersummary...... 22 PartII:Chapter3Classificationofscorpiontoxindata ...... 24 3.1 Classificationofscorpiontoxins...... 27 3.2 Dataclassificationofscorpiontoxinsequences ...... 29 3.3 MaterialsandMethods...... 30 3.3.1 ClassificationofsequencesintogroupsbyBLAST ...... 31 3.3.2 DataclassificationintosubgroupsbyClustalW ...... 34 3.3.2 VerificationofgroupsandsubgroupsbyMEGA3.0 ...... 34 3.4 Results–Classificationofscorpiontoxinsequences ...... 36 3.5 Discussionandconclusions ...... 46 Chaptersummary...... 48 PartII:Chapter4 Extraction of functional peptide motifs in scorpion toxins ……………… ...... 49 4.1 MaterialsandMethods...... 51 4.1.1 Scalingofbindingaffinitiestoacommonscaleinmutanttoxindata...... 52 4.1.2 Dataanalysis ...... 52 4.2 Resultsanddiscussion ...... 53 4.2.1 Chloridechannelmotif...... 56 4.2.2 Sodiumchannels–βexcitatorymotif ...... 58 4.2.3 Sodiumchannels–βmammalmotif ...... 60 4.2.4 Sodiumchannels–αmotif ...... 62 4.2.5 Sodiumchannels–αlikemotif ...... 64 4.2.6 Potassiumchannelsubtype–EtheragogorelatedK +channelmotif ...... 66 4.2.7 Potassium channel subtype – Small conductance Ca 2+ activated K + channel motif… ...... 67

III Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics 4.2.8 Potassium channel subtypes – Large conductance Ca 2+ activated K + channel andvoltagedependentK +channelmotifs ...... 68 4.3 Conclusion ...... 70 Chaptersummary...... 72 PartII:Chapter5 Functionalpredictionofbioactivetoxinsinscorpionvenom ...... 73 5.1 Predictionoffunctionalpropertiesofnovelscorpiontoxinsbynearest neighbouranalysis,sequencecomparisonanddecisionrules ...... 75 5.2 MaterialsandMethods...... 76 5.2.1 Scorpiontoxindata ...... 76 5.2.2Algorithm–nearestneighbourandrulebased...... 77 5.3 Results–Accuratepredictionoffunctionalpropertiesofnovelscorpiontoxins. 79 5.4 Discussionandconclusions ...... 89 Chaptersummary...... 91 Part III: Chapter 6 Implementation of scorpion toxin data warehouse …………………… ...... 93 6.1 Datawarehouseforinformationusageandknowledgediscovery ...... 95 6.2 Implementationofthedatawarehouseofscorpiontoxins,SCORPION2...... 97 6.3 Materialsandmethods...... 97 6.3.1 Datacollectionofnativeandmutantscorpiontoxinsequencesandtheir3D structures...... 98 6.3.2 Generationofhomologymodelsofscorpiontoxins...... 99 6.3.3 Datacleaning...... 100 6.3.4 Dataannotation ...... 100 6.4 Results...... 101 6.4.1 Databasedescription ...... 106 6.4.2 DescriptionoftheSCORPION2records ...... 110

IV Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics 6.5 Discussionandconclusion...... 114 Chaptersummary...... 116 PartIII:Chapter7Exploringbioinformaticapproachesforfunctionalprediction ofbioactivescorpiontoxins ...... 118 7.1 MaterialsandMethods...... 119 7.1.1 Algorithmforpredictingstrengthofbindingaffinityofscorpiontoxins...... 121 7.2 Results...... 121 7.3 Discussionandconclusion...... 124 Chaptersummary...... 125 PartIV:Chapter8Generaldiscussion ...... 126 Chaptersummary...... 130 PartIV:Chapter9Conclusion ...... 132 9.1 Largescaleclassification...... 133 9.2 Largescaleanalysis...... 134 9.3 Developmentoffunctionalpredictiontool...... 135 9.4 Datawarehouseofscorpiontoxins...... 136 9.5 Evaluationofapplicationofbioinformaticsinvenomresearch ...... 137 Conclusionsummary ...... 138 9.6 Futureworks ...... 139 References…………...... 142 Author’sPublications...... 171 Appendix1………...... 172 Appendix2………...... 193

V Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics Summary

Scorpionsarevenomousanimalsthatproduceamyriadofimportantbioactivetoxins that are used in ion channel studies, drug discovery, and even formulation of insecticides.Determiningtheirstructurefunctionrelationshipsareofgreatinterestfor scientific, medical and industrial applications. This thesis presents a systematic bioinformatics approach to a largescale study of structurefunction relationships in scorpion toxin sequences. Systematic characterisation of their structural features and functionalpropertiesofevenoneindividualtoxinrequiresasignificantexperimental effort.Consequently,mostresearchgroupsfocusondeterminingfunctionalproperties of individual toxins or small groups of toxins. Bioinformatic analyses improve the efficacy of research by assisting in selection of critical experiments. Bioinformatic approaches involve access to toxin data across multiple databases, inspection for errors, analysis and classification of toxin sequences and their structures, and the designanduseofpredictivemodelsforsimulationoflaboratoryexperiments.

Several novel aspects are presented in this thesis. This is, to the author’s knowledge,thefirstlargescaleclassificationofcurrentlyknownscorpiontoxinsbased on ion channel specificity and primary sequence similarity. This classification is important for identification of the general patterns in their structurefunction relationships.Theauthorproposedaclassificationthathasdefinedseveralnewgroups ofscorpiontoxins.

A new approach to extract functionally relevant motifs from scorpion toxins basedonanalysesofmultiplesequencealignmentofnativescorpiontoxinsequences,

3D structures and mutated scorpion toxin data was developed in this work. This approachidentifiedcriticalfunctionalresiduesatkeypositionsinthetoxinsequences whichlackconservedresiduesinthemultiplesequencealignment.Thefirstreportof

VI Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics eightfunctionallyrelevantbindingmotifstosodiumandpotassiumchannelsfacilitates thedeterminationofspecificityofnewlyidentifiedscorpiontoxinstovariouschannel subtypes.

The most important contribution to scorpion venom research is a new bioinformatic tool for accurate identification of functional properties in newly identifiedscorpiontoxins.Itwasdevelopedfromthelargescaleanalysisofscorpion toxin sequences. The prediction algorithm includes sequence comparison, nearest neighbour analysis and decision rules. High prediction accuracy of ion channel specificity, toxin subtype, toxicity action and cellular specificity was validated by experimentaldata.

Thefirstdatabaseofnativeandmutantscorpiontoxinsequences,developedas partofthiswork,isamajorresourceforefficientsearchingofscorpiontoxinrelated information.Therecordswerecleanedoferrorsandcontainhighlyenrichedstructural and functional information extracted from the literature. The 548 new homology models contribute to threedimensional analyses of scorpion toxins. Integration of search, extraction, prediction and threedimensional visualisation tools allows researcherstoanalysescorpiontoxinsequencesefficiently.

The bioinformatics approach employed in this study is novel, generic and applicableforthestudiesofstructurefunctionrelationshipsofbioactivetoxinsfrom othervenomousorganisms.Becausetoxinsarefunctionallydiverse,butbelongtoa limited number of structural families, they are ideal for application of data mining techniquesfordiscoveryofpreviouslyunknownrelationshipsamongdata.

VII Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics

ListofTables Table1 Examplesofvenomousanimalslivingonlandandatsea...... 2 Table2 Differentcriteriacanbeusedtoclassifyscorpiontoxins...... 28 Table3 Summaryoftheclassifiedgroupsfor393scorpiontoxins...... 37 Table4 Classificationof135K +scorpiontoxinsequences...... 40 Table5 Classificationof222Na +scorpiontoxinsequences...... 44 Table6 MotifsofscorpiontoxinsextractedforNa +,K +andCl channels...... 55 Table 7 Functional properties predicted for the first test set of 52 new toxin sequences...... 81 Table 8 Functional properties predicted for the second test set of 127 new toxin sequences...... 86 Table9 Asummaryof82scorpiontoxinPDBstructuresinSCORPION2 ...... 104 Table10 DescriptionoffieldsinaSCORPION2record. ...... 111 Table11 Fourcategoriesofstrengthofbindingaffinity...... 120 Table 12 Predicted ion channel specificity and strength of binding affinity for 26 newlyidentifiedscorpiontoxins...... 122 Table13 Physicalpropertiesofthe20Lαaminoacids....... 195

VIII Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics ListofFigures

Figure1 The3Dstructuresofscorpiontoxins...... 21

Figure2 Flowchartofthelargescaleclassificationofscorpiontoxindata...... 31

Figure3 ClassificationofscorpiontoxinsequencesintogroupsusingBLAST ...... 33

Figure4 ClassificationintosubgroupsusingClustalW...... 35

Figure5 Verificationofgroupsandsubgroupsbyphylogeneticanalysis...... 36

Figure6 Phylogenetictreeofrepresentativescorpiontoxins...... 38

Figure7 RepresentativescorpiontoxinsfromK+subfamilies...... 41

Figure8 MultiplesequencealignmentofγKTxtoxins...... 42

Figure9 MultiplesequencealignmentofCsEv1,Cn5andCssII...... 43

Figure10 RepresentativescorpiontoxinsfromNa +toxingroups1–18...... 45

Figure11 RepresentativescorpiontoxinsfromCa 2+ toxingroups1–4...... 45

Figure12 Scalingbindngaffinitiesof2anditsmutantsequences ...... 54

Figure13 Conservedresiduesof18Cl specificscorpiontoxins...... 57

Figure14 Cl specificscorpiontoxinsadoptthecysteinestabilisedαhelixfold ...... 57

Figure15 Conservedresiduesof19Na +βexcitatorytoxins ...... 59

Figure16 Functionalmotifofβexcitatorytoxins...... 60

Figure17 Conservedresiduesof13experimentallydeterminedβtoxins...... 61

Figure18 SpatialorganisationofthefunctionalresiduesofCss4...... 62

Figure19 Conservedresiduesof14experimentallydeterminedαtoxins...... 63

Figure20 FunctionalandstructuralresiduesofLqhαIT...... 63

Figure21 FunctionalandstructuralresiduesofBmKM1...... 64

Figure22 Conservedresiduesofeightexperimentallydeterminedαliketoxins ...... 65

Figure23 FunctionalresiduesofBeKm1...... 66

IX Functionalpredictionofbioactivetoxinsinscorpionvenomthroughbioinformatics Figure 24 Functionalresiduesofscorpiontoxinstargetingsmall conductance Ca 2+

activatedK +channels...... 67

Figure25 Functionalresiduesof...... 69

Figure26 Mutiplesequencealignmentofscorpiontoxinstargetingvoltagedependent

K+,largeandsmallconductanceCa 2+ activatedK +channels...... 69

Figure27 Accuracyoffunctionalpredictionof Annotate Scorpion module...... 80

Figure28 StatisticsofSCORPION2databaseasofNovember2005...... 103

Figure29 Numberofrecordshavingerrorsordiscrepancies...... 103

Figure30 SitemapofSCORPION2database...... 106

Figure31 ThewebinterfaceoftheSCORPION2database...... 107

Figure32 BLASTresultuponsubmissionof...... 108

Figure33 Visualisationofscorpiontoxin3DstructuresusingJmol...... 109

Figure 34 Flowchart of predicting ion channel specificity and strength of binding

affinity...... 120

Figure35 PredictedbindingaffinityofKTX3from Buthus occitanus tunetanus ..... 123

Figure 36 Predicted binding affinity of AmmVIII from Androctonus mauretinicus

mauretinicus ...... 124

Figure 37 Venn diagram of the 20 naturally occurring amino acids based on their

physicochemicalproperties...... 194

X Chapter 1: Introduction

PartI:Chapter1 Introduction ‘Man'smindstretchedtoanewideanever goesbacktoitsoriginaldimensions.’ SridaAvabhas(AdiDaSamraj)

1 Chapter 1: Introduction

1. Introduction

Scorpionsareamongthefirstlandanimals.Theyappearedsome450millionyearsago

(Briggs,1987).Therearemorethan1,500distinctspeciesworldwide,livinginevery continent except Antarctica (Lourenco, 1994). All scorpion species produce venom whichtheyuseforhuntingpreyanddefenseagainstpredators.Venomisacomplex mixtureoftoxins–proteins,amines,lipidsandother components (MartinEauclaire and Couraud, 1995). Venomderived protein toxins are highly bioactive molecules belongingtoarelativelysmallnumberofstructuralfamilies.Theydisplayavarietyof functionalpropertieswhichincludeinteraction withcellularreceptors,ionchannels, andassistinginpreydigestion(Maslennikov et al .,1999;Kini,2002;Zhu et al .,2003;

Zhu et al ., 2004a). The likely ancestral function of was enzymatic activity involvedinpreydigestion,however,insomevenomousanimalsincludingscorpions, their venom glands have evolved to produce potent toxins (Valentin and Lambeau,

2000)( Table1 ).

Table1Examplesofvenomousanimalslivingonlandandatsea.Unlikepoisonous animals(e.g.toads,puffer)whichhavetoxinsbuthavenomethodofdelivery, venomousanimalshavespecialisedorganstodelivertheirvenoms. Terrestrial Organs Marine Organs Ant Stinger/bite Blueringoctopus Bite Assassinbug Proboscis Conesnail Proboscis Bee Stinger Coral Tentacle Blackwidowspider Fang Crownofthornsstarfish Spine Centipede Fang Cuttlefish Bite Duckbilledplatypus(male) Spike Jellyfish Tentacle/Stinger Gilamonster Bite Seaanemone Tentacle/Stinger Kingcobra Fang Seaurchin Spine/pedicellaria Komododragon Bite Scorpionfish Spine/Stinger Rattlesnake Fang Stingray Spine/Stinger Scorpion Stinger Stonefish Spine/Stinger Wasp Stinger Yellowlippedseakrait Fang

2 Chapter 1: Introduction

Mortalityandmorbidityfromanimalenvenomationremains a serious health issue (Theakston et al ., 2003), accounting for more than 150,000 deaths per year

(White, 2000). However, venoms also contain highly bioactive compounds for discovery of molecules with interesting pharmacological properties and potential therapeuticsforanarrayofmedicaldisorders (Alonso et al .,2003;Bradbury,2003;

Lewis and Garcia, 2003; Rajendra et al ., 2004). An assortment of highly bioactive toxins characterised by high specificity and selectivityareused asresearchtoolsto characterise different ion channels subtypes and molecular isoforms of receptors

(Grant et al .,2004;LiandTomaselli,2004;RodriguezdelaVegaandPossani,2004;

Lewis,2004;TsetlinandHucho,2004).Analysesoftheinterfacesbetweentoxinsand their channels/receptors facilitate design of synthetic equivalents of toxins without toxicpropertieswhichcanbedevelopedaspotentialtherapeutics.

Rapidlyemergingknowledgefromstudiesofthemolecularmechanismofthe ionchannelsisusedinthedevelopmentofnoveltherapeuticsforionchannelrelated diseasessuchasepilepsy,cardiacarrhythmiaandpersistentpainsyndromes(Curran,

1998;Catterall,2002;Kohling,2002;Wickenden,2002a;Wickenden,2002b;Wulff et al .,2003;Gottlieb et al .,2004).Therapeuticssuccessfullydevelopedfromstudiesof animalvenomsincludeAncrodandCaptoprilthatweredevelopedfromsnakevenom fortreatmentofhypertensionandcardiacfailure(vonSegesser et al .,2001;Smithand

Vane, 2003). Another example is Ziconotide, developed from marine cone snail venom,fortreatingseverechronicpain(Miljanich,2004).Antivenoms arecurrently developedfromanimalantiseratotreatenvenomation(Harrison,2004;Gazarian et al .,

2005).Animalvenomsarepromisingalternativestochemicalinagricultural pestmanagement.Theincreasedpestresistancetochemicalpesticides,coupledwith heightened awareness of the potential environmental, human and animal health

3 Chapter 1: Introduction impactsofthesechemicals,havepromptedthesearchfordevelopmentofbio fromanimalvenoms(e.g.Sun et al .,2002;Gilles et al .,2003;Szolajska et al .,2004).

Identificationofnewtoxinsequencesanddeterminationoftheirfunctionalsitesand structuralpropertiesisthereforeofgreatinterestandvalueforscientific,medicaland commercialapplications.

Thenumberofdifferentvenomcomponentsinanindividualscorpionconsists ofapproximately100differenttoxins(Lourenco,1994).Given1500scorpionspecies exist, the natural library of scorpion toxins is therefore estimated to contain some

100,000 different toxins (Lourenco, 1994). However, toxin entries in public protein andDNAdatabasesrepresentonlyatinyfraction,lessthan1%oftheestimatednatural venomlibrary(asofNovember2005).

Sequenceandthreedimensional(3D)structuredataonthesetoxinsareusually deposited in public repositories such as GenBank (Benson et al ., 2005), SwissProt

(Bairoch et al .,2004)andProteinDataBank(Deshpande et al .,2005).Functionaland structural properties of toxins are reported mainly in published articles, while such annotations of entries in public sequence databases are very limited (Brusic et al .,

2000).AdvancesinsequencingprojectsinvolvingcDNAsandmassfingerprintingby massspectrometryresultedinexponentialaccumulationoftoxindata(e.g.Batista et al .,2004;Davies et al .,2004;He et al .,2004).Forinstance,asetof170 sequences weredepositedintoGenBankin2001(Conticello et al .,2001)whichalmost doubledthenumberofpublicconotoxinentriesatthattime.However,noneofthese sequenceshadanystructuralorfunctionalannotations,onlysequenceswerereported.

Experimental characterisation 1 of structurefunction relationships for the many

1Throughoutthisthesis,term‘characterised’describesproceduresoflaboratorybenchworkorwetlab experimentation.

4 Chapter 1: Introduction individualtoxinsequencesislaborious,expensiveandtimeconsuming.Increasingly, researchers are exploiting bioinformatics to expedite characterisation of the growing numberofnewlyidentifiedtoxinsequencesthroughinformationgatheredfromtoxin datascatteredinpublicrepositoriesandtheliterature.

Bioinformatics is an interdisciplinary field incorporating computer science, mathematicsandbiology,formanagementandanalysisofbiologicaldata.Themain branchesofbioinformaticsare:1)biologicaldatabases,2)analysisandinterpretation ofbiologicaldata,and3)developmentofanalysistoolsandalgorithms.Thebiological databases, tools and algorithms are important methodologies in scientific research especiallyingenomicsandproteomics,whichgeneratehugeamountsofdata.These dataarestoredinbiologicaldatabaseswhichcontinuetogrowinsizeandcomplexity where more than 700 biological databases are publicly available (Galperin, 2005).

Insights gained from analyses and interpretations of the data are used for the developmentofnewanalysistoolsandalgorithmsforanalysesofdata,andplanning andminimisationofthenumberoffurtherexperiments.

Thisthesisdescribesoriginalfindingsfromapplicationofbioinformaticbased approachtothelargescalestudyofstructurefunctionrelationshipsofscorpiontoxins.

In this thesis, the word ‘structure’ encompasses primary, secondary and tertiary structuresofproteinsunlessstatedotherwise.Thecurrentnumberofscorpiontoxins thatarestructurallyandfunctionallycharacterisedissmallandmeasuresonlyinthe hundreds,incontrasttothenaturallibraryoftoxinsthatisestimatedtobe100times larger. However, with the expected rapid growth of toxin data through largescale sequencing,experimentalapproachwillneedtobecomplementedwithbioinformatic analysesforfacilitatingcharacterisationofthelargenumberofnewlyidentifiedtoxin sequences.

5 Chapter 1: Introduction

1.1 Researchissuesinvestigatedinthisthesis

Largescaleanalysisofscorpiontoxinsprovidesaglobalviewofthegeneral pattern of their structurefunction relationships. This analysis in turns supports experimental studies by assisting in planning of critical experiments and, when properly used, it significantly improves the efficiency of experimental studies of structurefunction relationships. However, such largescale analyses are hindered by inadequate data management where scorpion toxin data are scattered across public databases. Records in the databases typically contain sequence information, while structurefunction information is available in the literature. Thus, consolidating the scattereddataintoacentraliseddatabaseandenrichingthetoxindatawithstructure functioninformationisaprerequisiteforasystematiclargescaleanalysis.Information gained from such analysis is useful for developingnewanalyticaltoolsforstudyof noveltoxinsequencesandpredictionoftheirstructuralandfunctionalproperties.

The author of this work was earlier involved in building the SCORPION database (Srinivasan et al., 2002a) which contained 277 native scorpion toxin sequences. Mutation studies (such as sitemutagenesis) of scorpion toxins, which providebiologicallyrelevantinformationoncriticalresiduesandtheirpositions,are availableintheliteratureandarenormallynotusedforextractionoffunctionalmotifs inscorpiontoxins.

The original contribution of this work is the systematic application of bioinformaticbasedapproachto:1)buildthefirstcomprehensivemoleculardatabase of scorpion toxins which includes native and synthetic variants, SCORPION2 for improveddatamanagementanddetailedanalysis, 2)classify393nativescorpiontoxin sequences into functional groups based on ion channel specificity and sequence similarityusingBLAST,multiplesequencecomparisonandphylogeneticanalysis,and

6 Chapter 1: Introduction

3)developapredictiontoolforpredictingthefunctionalpropertiesofuncharacterised scorpiontoxins.Thisdatabasewasbuiltusingmoleculardatawarehousingprinciples.

It contains subjectorientated (scorpion toxins), integrated (comprehensive information),nonvolatile(validated),andexpertinterpreted(annotated)collectionof biological data (e.g. a family of proteins that have similar structures and functions)

(Schonbach et al .,2000).Totheauthor’sknowledge,datawarehousinghasnotbeen appliedneithertothelargescalemanagementofscorpiontoxindatanorthesystematic studyoftheirstructurefunctionrelationships.

The systematic application of bioinformatics to the study of venoms – venominformatics–isacombinationofbioinformaticsandvenomresearchwhichhas the potential to revolutionise the way that researchers manage toxin data and information.Forexample,currentlythereisnotoolavailableforaccuratepredictionof functional properties of toxins. This research area is important for prediction of function in newly identified toxins. In general, toxins display an array of diverse functions where detailed examination of their molecular functional sites allows alterations of their pharmacological specificity, selectivity and potency especially in thefieldofdrugdesignanddiscovery.Therefore,thespecificobjectivesofthisthesis weretofocusonscorpiontoxinsandincludethefollowingsubprojects:

1) buildadatawarehouseofscorpionnativeandmutanttoxinsequenceswith

integratedquery,extractionandpredictiontools,

2) enrichrecordswithstructureandfunctioninformation extracted from the

literatureandpublicinformationrepositories,

3) predict tertiary structures of scorpion toxins by homology modeling for

toxinswithoutexperimentallydetermined3Dstructures,

4) analyse the toxin dataset (primary, secondary and tertiary structures) for

7 Chapter 1: Introduction

identificationoffunctionalmotifsand,

5) developatooltopredictspecificfunctionalpropertiesofnewlyidentified

scorpiontoxins.

1.2 Contributionofthisthesis

Theauthor’soriginalcontributionstothefieldofvenomresearchinclude:

1) Organisedalargeanduniquedatasetof819entriesofscorpiontoxindata

from public databases and literature, inclusive of 426 scorpion mutant

toxin sequences extracted solely from literature to develop the

SCORPION2database.Thisdatawarehouseofscorpiontoxinsisamajor

resource for researchers to identify scorpion toxins and analyse their

sequence which otherwise would involve multiple querying of other

databases.

2) Extractedfunctionalinformationofbindingaffinityandtoxicitydatafrom

approximately 500 scientific articles and deposited them into

SCORPION2.

3) Classifiedcurrentlyknownscorpiontoxinsequencesintofunctionalgroups

forabroadviewonthegeneralpatternofstructurefunctionrelationships.

Thegroupingscontainscorpiontoxinsequencegroupsthathavenotbeen

previouslydefinedandclassified.

4) Developed a new approach to extract functionally relevant motifs from

scorpion toxins based on analyses of multiple sequence alignment of

scorpion toxins, 3D structures and scorpion mutant data. This approach

also helped in the identification of critical functional residues at key

positionsintoxinsequenceswhichlackconservedresidues.

8 Chapter 1: Introduction

5) Developed the first prediction tool, Annotate Scorpion which accurately

predictsthefunctionalpropertiesofnewlyidentifiedscorpiontoxins.The

accuracywasvalidatedbynewexperimentaldata.Thistoolhelpsreduce

the number of experiments needed to characterise their functional

properties.

6) Generated 548 homology models of scorpion toxins not available

previouslyandmadethempubliclyaccessiblefor3Danalysis.

1.3 Asummaryofthethesis

This thesis consists of four parts. Part I provides an introduction to the importanceandissuesofvenomresearch,andhowbioinformaticscanfacilitatevenom research (Chapter 1). A review on venominformatics applications and related information in major public databases, and bioinformatics applications available for analysinglargenumberoftoxindataisdiscussed(Chapter2).

Part II presents the original findings of the research undertaken in this dissertationwhichincludesalargescaleclassificationofscorpiontoxinsbyfunctional propertiesandprimarysequencesimilarity,foraglobalviewontheirgeneralpattern ofstructurefunctionrelationships(Chapter3).Functionalmotifswereextractedfrom analyses of multiple sequence alignment of scorpion toxins, 3D structures and information from scorpion mutant data (Chapter 4). A new algorithm, based on sequence comparison, nearest neighbour analysis and decision rules to predict the functional properties of novel scorpion toxins, was implemented (Chapter 5). High prediction accuracy was achieved as validated by experimentally characterised scorpiontoxinsequences.

9 Chapter 1: Introduction

PartIIIdescribestheimplementationofspecialiseddatawarehouseofscorpion toxins–SCORPION2–integratedwithbioinformaticstools(Chapter6).Thecurrent limitations of bioinformatics for functional prediction of scorpion toxins was also explored(Chapter7).

Part IV (Chapters 8 and 9) draws conclusions from the bioinformaticbased approachtolargescaleanalysisofscorpiontoxinsandalsodiscussesfuturedirections.

The work presented in this thesis has been published in a series of journal articles. These include: the review on bioinformatics for venom science, Tan et al .

(2003) – Chapter 2; Tan et al . (2006a) – Chapter 4 where functionally relevant scorpiontoxinmotifswereextractedfromtheapproachofincludingscorpionmutant toxin data in the analysis; Tan et al . (2005) – Chapter 5 where the first functional predictiontoolwasdevelopedforscorpiontoxinresearch;Tan et al .(2006b)–Chapter

6 discussed the data warehouse of scorpion native and mutant toxin data with integratedbioinformatictoolsfordataanalysis.

10 Chapter 2: Literature review

PartI:Chapter2 Literaturereview ‘Idonotfearcomputers.Ifearthelackofthem.’ IsaacAsimov

11 Chapter 2: Literature review

Researchers currently spend significant time and effort in searching for all available information on animal toxins because a centralised repository is lacking. Toxin sequencesarescatteredacrossnumerouspublicdatabases.Mostofthestructuraland functionalinformationthatcanimproveourunderstandingofbioactivetoxinsisstored intheliterature.Thescatteredtoxindataandliteratureinformationhascreatedaneed forimproveddatamanagementinthefieldoftoxin research. A data warehouse of toxins serves as a major repository for analysis and interpretation of consolidated, cleaned and enriched toxin data. The data warehouse with integrated bioinformatic toolsalsofacilitatescharacterisationoftheincreasingnumberofnewlyidentifiedtoxin sequenceswithunknownfunction.

Venominformatics is a field combining venom biology and bioinformatics.

Venom biology generates large quantities of biological data, while bioinformatics providesaneffectivemeanstostoreandanalyselargevolumesofcomplexbiological data.Combiningthetwofieldsprovidesthepotentialforgreatstridesinunderstanding andincreasingtheeffectivenessofvenomresearch.Themaingoalistheextractionof newknowledgefromlargescaleanalysisoftoxindata. The bioinformatic approach providesameansforthesystematicstudyofalargenumberoftoxins,andfacilitates experimental design and selection of key experiments. This chapter focuses on resourcescontainingtoxindata,bioinformaticapplicationsforanalysisoftoxindata, andpredictionoftheirstructurefunctionrelationships.

2.1 Useofbioinformaticstocomplementexperimentalstudies

Animalvenomscontainadiversearrayofbioactivetoxinsthathaveavariety ofbiochemicalandpharmacologicalfunctions(KordisandGubensek,2000;Fry et al .,

12 Chapter 2: Literature review

2003).Establishedmethodsfordeterminingspecificfunctionsoftoxinsarebasedon experimentalstudiesofnaturallyoccurringpeptides (e.g.Inceoglu et al .,2002;Zhu et al ., 2004b), sitedirected mutagenesis (e.g. Everhart et al ., 2004; Ivanovski et al .,

2004),oruseofchemicallymodifiedvariants(e.g.Chang et al .,2004;Chang et al .,

2005).Thepharmacologicalpropertiesoftoxinsaretestedinanimalmodelssuchas mice, rats, crustaceans or insects. The experimentation is often supported by computationalalgorithmsforsequencecomparison(Wang et al .,2003;Cohen et al .,

2004;Cohen et al .,2005)orformodellingoftoxin3Dstructures(Mourier et al .,2003;

Benkhadir et al ., 2004). Systematic functional study of even one individual toxin requires a significant experimental effort. Consequently, most research groups focus on determining functional properties of individual toxins or small groups of toxins.

Bioinformaticanalysescanimprovetheefficacyofresearchbyassistinginselectionof criticalexperiments.Bioinformaticapproachesinvolveaccesstotoxindatascattered across multiple databases, inspection for errors, classification and analysis of toxin sequences and their structures, and the design and use of predictive models for simulationoflaboratoryexperiments.

2.2 Genomesequencingofvenomousanimals

Currently,genomesofhoneybee,seaurchinandduckbilledplatypusarebeing sequenced ( http://www.genome.gov/10002154 ). Largescale studies of toxins from identificationofexpressedsequencesgeneratelargeamountofunannotatedsequence datadepositedinpublicdatabases.Forexample,8966uniqueputativesequenceswere assembled from the honey bee brain expressed sequence tag project

(http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=7460 ). The cDNA libraries constructed with mRNA isolated from venom glands have been used for

13 Chapter 2: Literature review sequencingtoxinsinscorpions,snakes,andconesnails(e.g.Peng et al .,2003;Jiqun et al .,2004;Santos et al .,2004).Someoftheseprojectshaveresultedinidentificationof hundredsofnewtoxinsequences.Forexample,170novelwereidentified from cone snail expressedsequence tag assemblage (Conticello et al ., 2001).

Bioinformaticsaidsinthelargescalestudiesoftoxinswhereputativefunctioncanbe assignedefficientlyforlargenumberoftoxinsequences.

2.3 Sourcesoftoxindataandrelatedinformation

Toxindataandinformationarescatteredacrossmultiple resources. The data includenucleotideandaminoacidsequences,secondarystructuresand3Dstructures deposited in public databases such as GenBank (Benson et al ., 2005), SwissProt

(Bairoch et al ., 2004) and PDB (Deshpande et al ., 2005). Structurefunction information, particularly mutation studies (such as sitedirected mutagenesis), is availableintheliterature.Theadvantagesanddisadvantagesofthesedatabasesforthe creation of data warehouses of toxins would be reviewed in the subchapters. The issuesofdatacollection,cleaningandannotationwhenconsolidatingthescattereddata wouldalsobedescribed.

2.3.1 GenBankandGenPeptdatabases

ToxindataareextractedfromGenBank(Benson et al .,2005)databasebecause it contains a comprehensive collection of publicly available nucleotide sequences.

GenBank encourages direct submissions of new data and batch submissions from largescale sequencing projects to help maintain accuracy, relevance and comprehensiveness of the database. GenPept protein database contains translated

14 Chapter 2: Literature review nucleotidesequencesfoundinGenBank.However,recordsinthesedatabasescontain onlybasicinformationsuchasthetoxinsequence,itsname,taxonomyofthesource organism,andwhenavailable,alistofbasicsequence features and references. The records need to be enriched with structural and functional information (such as residues important for folding, binding affinity and toxicity information) which is availableintheliterature(seesection2.3.4).

2.3.2 SwissProtandTrEMBLdatabases

Toxin data are also extracted from SwissProt and TrEMBL (Bairoch et al .,

2004)databasesbecausetheyhaveacomprehensivecollectionofproteinsequences.

SwissProtcontainsahighlevelofcuratedstructuralandfunctionalinformationthat may include disulfide connectivity, secondary structure information, ion channel specificity and protein family classification, among others. TrEMBL contains computationallyannotatedtranslationsofallEMBL(Cochrane et al .,2006)nucleotide sequenceentriesnotyetintegratedinSwissProt.TheinformationinSwissProtand

TrEMBL records expedites subsequent annotation when new structurefunction informationisavailable.

2.3.3 ProteinDataBank(PDB)

Analysingtoxin3Dstructuresareimportantbecausetoxinfunctionisrelatedto itsstructuralfolding.Inclusionof3Dstructuralinformationtotoxinsequenceanalysis facilitatesidentificationofresiduesthatareimportantforstructureandfunction.Asof

May2005,thestructuraldatabasePDB(Deshpande et al .,2005)containsonly823D structuresofscorpiontoxinsincontrasttotheestimated100,000differenttoxinsinthe

15 Chapter 2: Literature review naturalvenomlibrary.Becausethegrowthofnewscorpiontoxinsequencesoutpaces thatofexperimentallysolved3Dstructures,toxinstructurepredictionisnecessaryto overcomethisdisparity.Theexperimentallysolved3D structures serve as templates for generatinghomology modelsoftoxinsequencesbecause a majority of scorpion toxins share a common scaffold (Kobayashi et al .,1991;Rodriguezdelavegaand

Possani,2004,2005).Thegeneratedhomologymodelsdonotreplace,butserveasan alternative to, experimentally determined structures because homology models may notbeasaccurateasthelatter.

2.3.4 PubMedliteraturedatabase

Thewealthofinformationfromtheliteratureisimportantforinterpretationof experiments and predictions. Most structural and functional information of toxin sequencesisreportedinpublishedliteraturewhereabstractsofthepublishedliterature can be searched in PubMed (http://www.pubmed.gov/ ) or similar data sources.

Extraction of structurefunction information is important for enriching toxin data records,particularlythosewhichhavelimitedornoannotation.Theenrichedrecords enableamoredetailedanalysisincontrasttorecordswithonlysequenceinformation.

2.3.5 Issuesondatacollection,cleaning,annotation

Thecollectionoftoxindatafromdifferentdatabasesishamperedbydifferent databaseformatsandvariationsinfieldnamesthatdescribethesameinformation.For example,atoxinprimarysequenceisdescribedinthe‘translation’fieldofaGenBank record but in SwissProt, it is described in the ‘sequence’ field. The differences in fieldnamesdescribingthesameinformationneedtobestandardisedtoauniformdata

16 Chapter 2: Literature review representation. For example, a standard field such as ‘translation’ can be used to describe toxin primary sequence regardless of data sources. The uniform data representationiscriticalbecauseconsistencyisrequiredforefficiencyofsubsequent analyses.

When consolidating records from different databases, the same data may be duplicatedinanotherdatabase,resultingindataredundancy. Data cleaning involves removing these redundant records to improve on data quality. For example, of all snakevenomphospholipaseA 2toxinentriesintheGenBankandSwissProtdatabases,

55%wereredundantandneededtobefilteredoutpriortodataanalysis(Tan et al .,

2003). Data cleaning also involves detecting discrepancies in data information, highlighting, and subsequently correcting the conflicts. Some examples include detectingdiscrepancyinthetoxinprimarysequencebetweenliteratureanddatabase, different names for the same sequence and missing links between databases

(Srinivasan et al .,2002a).

Records in the public databases typically contain basic information. Data annotation, also known as data enrichment or enhancement, is the process of furnishingcriticalcommentaryorexplanatorynotes1.Dataannotationenrichesthedata for extrapolation of meaningful insights from multisource bits of information.

Correlatingtherelevantinformationfrommultiplesourcesiscriticalforincreasingthe overallknowledgeandforimprovingtheunderstandingofaspecificsubjectinthedata warehouse (Karasavvas et al ., 2004). It is important to differentiate experimentally determinedfunctionfromthosethathavebeenpredictedcomputationally(Karp et al .,

2001)becausethelatterrequiresubsequentvalidation.Thiswouldallowresearchersto verify and decrease the propagation of incorrect predicted function during data

1http://dictionary.reference.com/search?q=annotation

17 Chapter 2: Literature review annotation.

2.4 Datawarehousesoftoxins

To the author’s knowledge, only four toxin data warehouses are currently available as major resources for the study of toxins. The databases contain entries collectedfromdifferentsources,cleaned,organised,analysedandclassifiedaccording totheirstructurefunctionrelationships.TheSCORPION(Srinivasan et al .,2002a)had

277entriesofnativescorpiontoxinsequences,annotatedandclassifiedaccordingto theirstructuralandfunctionalproperties.TheSCORPION2databasehas819entriesof native and mutant scorpion toxin sequences annotated with functional information extracted from literature and 624 3D structures. The MOLLUSK 2database contains

457 peptides from the cone snail venoms where each entry has a unique field to facilitate comparison of conotoxin entries. Functionally annotated entries of snake venomphospholipaseA 2(svPLA 2)and(svNTXs)arefoundinthesvPLA 2

(Tan et al .,2003) andsvNTXs(Siew et al .,2004)databases,respectively.

2.5 Bioinformatictools

Generalbioinformatictoolscommonlyusedinanalysesoftoxindatainclude butarenotlimitedtoBLAST(Altschul et al .,1997)andClustalW(Thompson et al .,

1994). The BLAST search tool finds regions of local similarity between query sequences and database sequences by calculating the statistical significance of matches. Uses of BLAST include inferring functionalandevolutionary relationships betweensequencesaswellashelpidentifymembersofgenefamilies.ClustalWisa

2Molluskdatabaseofconesnailtoxins.http://research.i2r.astar.edu.sg/MOLLUSK/

18 Chapter 2: Literature review general purpose multiple sequence alignment program for nucleotide or protein sequences. It involves the optimal alignment of the greatest number of identical or similarresiduesintocolumnsacrossmanynucleotideorproteinsequences.Patternsof aligned sequences can be used in the analysis of function, structure and phylogeny relationshipbetweensequences.Phylogenetictoolssuch as Mega 3.0 (Kumar et al .,

2004) have been developed as easytouse computer programs for inference of evolutionaryrelationshipbetweensequenceswhichprovidesaguidetotheirstructure functionrelationships.Differenthomologymodelingserverse.g.SDPMOD(Kong et al .,2004)andSwissModel(Schwede et al .,2003)areavailabletogeneratehomology modelsoftoxinslackingexperimentalstructures.

Specialised tools for analysis of toxins is currently lacking since toxin data needstobeanalysedpriortodevelopmentofanalysistoolandsuchdetailedanalyses havebeenoflimitedscope.

2.6 Bioinformaticapplications

Commonlyusedbioinformaticmethodsforanalysingtoxindataare:

• phylogeneticanalysis,

• multiplesequencealignments,

• 3Dstructureanalysis,and

• homologymodeling.

Phylogeneticanalysishasbeenusedtostudydiversificationofscorpiontoxins, snaketoxinsandconotoxins(Conticello et al .,2001;FryandWuster,2004;Zhu et al .,

2004a),andclassificationofscorpion(RodriguezdelaVegaandPossani,2004)and

19 Chapter 2: Literature review snake toxins (Fry, 2005). Multiple sequence alignment and analysis of their 3D structures provide a complementary approach to sitedirected mutagenesis for identificationoffunctionalresiduesintoxins(ChioatoandWard,2003;Everhart et al .,

2004;Karbat et al .,2004b).Homologymodelinghasbeenusedindesigningmutants to determine function of toxins and computational simulations of ligand channel/receptor interactions to determine interacting residues and structure guided drugdevelopment(ChenandPellequer,2004;Dutertre et al .,2004;LiuandLin,2004;

Yu et al ., 2004). Researchers are increasingly using a combination of these bioinformatic methods to establish structurefunction relationships (Bagdany et al .,

2004;Giangiacomo et al .,2004;Karbat et al .,2004a;RamosandSelistredeAraujo,

2004;Siew et al .,2004).

2.7 Predictionofstructureandfunctionoftoxins

Crystallisation of macromolecules is a slow and complex process, which requires optimisation of various interdependent physical, chemical, and biological parameters(McPherson,1999).Therefore,predictionof3Dstructuresofproteinsfrom primarystructuresbycomparativeanalysisandhomologymodelingtechniquesisan attractive alternative for studying structurefunctionrelationshipsinlargenumberof toxins.Thecomparisonofhomologymodelswithexperimentallysolved3Dstructures oftoxinsenabledidentificationofputativefunctionalresiduesinvolvedinbindingand catalyticsite,whichweresubsequentlyexperimentallyvalidated(Hains et al .,1999;

ChurchandHodgson,2002;MorenoMurciano et al .,2003).3Dmolecularsimulations of toxinreceptor complexes have been used for determination of critical interacting residuesonthesurfaceoftoxins(Grant et al .,2004;Wu et al .,2004;Yu et al .,2004).

20 Chapter 2: Literature review

The majority of scorpion toxin 3D structures determined share a common structural motif, called the cysteinestabilised αhelix (CSH) fold ( Figure 1A). The

CSHfoldcomprisesan αhelixcrosslinkedbythreetofourdisulfidebridges to an extended βsheets(Kobayashi et al .,1991).Thus,scorpiontoxinsareagoodexample of dissimilar proteins sharing similar structural scaffolds. The CSHtype scorpion toxinshavedifferentlengthsofloopsandtypesofturns,resultinginawiderangeof pharmacological properties. This makes the prediction of function from structure

(primary,secondary,and3D)aloneadifficulttask.Anewfold,consistingoftwoshort helixescrosslinkedwithtwodisulfidebridges,was recently characterised in a new familyofweakK +toxins(Srinivasan et al .2002b,Nirthanan et al .,2005)( Figure1B ).

A) B)

Figure1The3Dstructuresofscorpiontoxins. A)Cysteinestabilised αhelixfoldwas sharedbymajorityofscorpiontoxins.Thefoldconsistedofanαhelixandtwo–three βsheetscrosslinkedbythree–fourdisulfidebonds. Representedby charybdotoxin (PDB ID: 2CRD). B)Anew fold,consistingoftwoparallelheliceslinked by two disulfide bridges, was determined in a group of new family of weak K + scorpion toxins.Representedbyhefutoxin(PDBID:1HP9).

21 Chapter 2: Literature review

To the best of the author’s knowledge, a specialised bioinformatic tool for functional prediction of toxins does not exist, other than the tool presented in this thesis. Function of uncharacterised toxins is inferred from identification of characterised similar sequences using BLAST (Altschul et al ., 1997) or FASTA

(Pearson,2000)programs.Alternatively, functionisassignedby searchesinpattern databasessuchasPROSITE(Hulo et al .,2004).Generally,allpatterndatabasesuse statistical approaches to assign confidence levelstoquerymatchestothemotifsbut statisticalsignificancedoesnotnecessarilyequatetobiologicalproof(Attwood,2000).

For example,‘ProteinkinaseC’ (accession ID: PDOC00005)and‘CaseinkinaseII’

(accession ID: PDOC0006) phosphorylation sites were found in a sodium specific toxin, AaHIT2 (Loret et al ., 1990) upon submission in PROSITE. Both phosphorylation sites which are irrelevant for the function of sodium toxins have a highprobabilityofoccurrenceinmostproteinsequences.

Conversely,mutationstudiesoftoxins(suchassitedirectedmutagenesisand chemical modification) have identified critical residues important forboth structural andfunctionalproperties.However,thisinformationhasnotbeenusedinlargescale analysisoftoxindataforidentifyingcriticalstructuralandfunctionalresidues.Insights gainedfromtheanalysiscanbeusedtodevelopfunctionalpredictiontoolsthatinclude biologicalinformationfrommutantdata.

Chaptersummary

• Large amountoftoxindatawithlimitedornoannotation is generated from

increasedautomationofexperimentaltechniques(e.g. largescale sequencing

ofthethreevenomousanimalgenomes).Thesetoxindataarescatteredacross

general databases whose aim is to contain as many genomic or protein

22 Chapter 2: Literature review

sequencesaspossible.

• Structurefunctioninformation,inparticularthatofmutationstudiesoftoxins,

isavailableintheliteraturebutisusuallynotusedtoenrichthetoxinrecordsin

thegeneraldatabasesorextractionoffunctionallymotifs.

• The scattered toxin data and structurefunction information requires an

improveddatamanagementinthefieldoftoxinresearch.Venominformatics,a

field combining toxin research and bioinformatics, allows systematic large

scalestudiesoftoxindatawhereitfacilitatesexperimentaldesignandselection

ofkeyexperimentsbydevelopmentoffunctionalpredictiontools.

• Specialiseddatawarehousesoftoxinsarededicatedrepositoriesoftoxindata

extracted from public databases, literature or other public repositories, and

experimental measurements. Data warehouses have integrated bioinformatic

toolsfordetaileddataanalysisandmining.

• The bioinformatic tools commonly used include BLAST for inference of

functional and evolutionary relationships and Clustal W for analyses of

structure, function and phylogeny relationships. Comparative homology

modeling servers help predict tertiary structures of toxins which lack

experimentallysolved3Dstructures.

23 Chapter 3: Data classification

PartII:Chapter3 Classificationofscorpiontoxindata ‘Thegreatchallengeinbiologicalresearch todayishowtoturndataintoknowledge. I have met people who think data is knowledge but these people are then strivingforameansofturningknowledge intounderstanding.’ SydneyBrenner

24 Chapter 3: Data classification

Classification of all currently known scorpion toxins according to their function is necessaryforclarifyingtheglobalperspective,includinganoverviewofthefunctional repertoireofthetoxins.Suchknowledgewillfacilitatefunctionalassignmentofnewly identifiedscorpiontoxins.Classificationalsoprovidesaneffectivemeanstoretrieve relevantbiologicalinformationfromvastamountsoftoxindata. Advancesingenomics and proteomics have identified new scorpion toxin sequences at an everincreasing rate. For example, more than 100 different components were identified from the proteome analysis of Tityus cambridgei scorpion venom, of which 26 have been partially sequenced (Batista et al ., 2004). Many of these toxin sequences have yet unknown function. Consequently, there is a need to analyse and organise these sequences with currently known and annotated scorpion toxin data (nearly 1000 sequencesasofNovember2005)forabroadviewontheirgeneralpatternsinstructure andfunction.

However,theavailablelargescaleclassificationofscorpiontoxinsequencesis limited and is based mainly on the analysis of evolutionary properties of scorpion toxinscurrentlyknown.Theseclassificationswereperformedontoxinsisolatedfrom distinct scorpion species (Corona et al ., 2002, Goudet et al ., 2002) and also by specificitytodifferentionchannels.Forexample,Possani et al .(1999)classified36 sodiumspecificscorpiontoxinsinto10groupsbasedonanimalspeciesspecificityand pharmacologicaleffectsonsodiumchannels.Sincethen,thenumberofknownsodium specific scorpion toxins has increased to 213 toxins. For potassium specific toxins,

Tytgat et al .(1999)classifiedthemintothreefamilies(α,βandγscorpiontoxins).

Theαandβtoxinswereclassifiedbasedonpeptidelengthandalignmentofcysteines andotherconservedresidueswhileγtoxinswerebasedonspecificitytoetheragogo

25 Chapter 3: Data classification potassium channel subtype. The αtoxin family by that time contained 49 different toxins, comprising 12 subfamilies (Tytgat et al ., 1999). This classification has expandedto18subfamiliesasnewscorpiontoxinsequenceswereidentifiedbutdid notfitintotheformer12subfamilies(RodriguezdelaVegaandPossani,2004).In protein classification databases such as (Bateman et al ., 2004) and ProDom

(Servant et al .,2002),proteinfamiliesareobtainedfrommultiplesequencealignments ofsimilarproteins.Thesegroupshoweverarebasedonsequencesimilarityandarenot necessarily functionally relevant. For example, Toxin 3 family (accession ID:

PF00537)inPfamrelease18.0classifiedscorpiontoxinsalongwithplant.

Here, the author describes a systematic largescale classification of scorpion toxin sequences into groups based on ion channel specificity and primary sequence similarity, combined with multiple sequence alignments and phylogenetic analyses

(Tan et al .,2005).ThisclassificationisbasedonTytgat’sapproach(1999)ofprimary sequencesimilarityandmultiplesequencealignmentbutrefinedusingalargernumber of scorpion toxin sequences. The toxin sequences were classified with reference to their structural and functional properties. This largescale classification of currently knownscorpiontoxinsreflectstheunderlyingtoxinfamiliesforaglobalviewoftheir structurefunctionrelationships.Thisisadynamicfieldwhereclassifiedgroupscanbe definedandredefinedasthenumberofknowntoxinsequencesgrows.Manygroups contain scorpion toxin sequences that have not been classified. Highly accurate functional predictions of novel scorpion toxin sequences have been obtained by comparisonwiththeclassifiedgroups(Chapter5).

26 Chapter 3: Data classification

3.1 Classificationofscorpiontoxins

Scorpion toxins are important physiological probes for characterising ion channels.Theyhavebeenclassifiedintofourbroadgroups,namelythosethatinteract with sodium (Na +), potassium (K +), calcium (Ca 2+ ), or chloride (Cl ) ion channels

(GordonandGurevitz,2003;Fuller et al .,2004;Giangiacomo et al .,2004;Lacinova,

2004)( Table2).Scorpiontoxinsarealsoclassifiedaslongchaintoxinscontaining60

–70aminoacidresidueswithfourdisulfidebridgesorshortchaintoxinscontaining

30–40aminoacidresidueswiththreeorfourdisulfidebridges(Goudet et al .,2002).

Na +toxinsbelongtothelongchaintoxinfamilywhileK+,Ca 2+ orCl toxinsbelongto theshortchaintoxinfamily.Additionally,scorpiontoxinscanbeclassifiedaccording to speciesspecificity of toxicity (insect, crustacean or mammal). Some toxins show crossspecificity; for example, BmK M1 from Buthus martensii Karsch targets both insectandmammaliancells(Liu et al .,2005).

Based on electrophysiological studies, scorpion toxins that interact with Na + channelshavebeenclassifiedintothreetypes:α,αlikeandβtoxins.Theαtoxins(e.g.

AaHII from Androctonus australis Hector) slow or block the inactivation of Na + channel in a voltagedependent mechanism whereas β toxins (e.g. Cn2 from

Centruroides noxius ) affect the Na + channel activation independently of membrane potential (Couraud et al ., 1982). The third type is αlike toxins (e.g. LqhIII from

Leiurus quinquestriatus Hebraeus)thatinducesodiumcurrentinneuronalpreparation but do not compete for AaHII binding (Gordon et al ., 1996; Gordon and Gurevitz,

2003).Theαtoxinsandαliketoxinsbindtosite3,whileβtoxinsbindtosite4onthe

Na + channel (Jover et al ., 1984). β toxins are further classified into depressant and excitatory toxins. Depressant toxins induce a block of action potentials whereas excitatorytoxinscausearepetitiveactivityonNa+axonalmembrane(Zlotkin et al .,

27 Chapter 3: Data classification

1985).

ThesubtypesofK + channelstargetedbyscorpiontoxinsincludevoltagegated

K+ channels (Pragl et al ., 2002), inward rectifier K + channels (Lu and MacKinnon,

1997),etheragogorelatedgeneK+channels(Frenal et al .,2004;Korolkova et al .,

2004) and Ca 2+ activated K + channels that include large, intermediate and small conductanceCa 2+ activatedK +channels(RodriguezdelaVega et al .,2003;Jouirou et al .,2004;Xu et al .,2004a).TwoCa 2+ channelssubtypeswerereportedtobetargeted byscorpiontoxins:Type1ryanodine(Zamudio et al .,1997;Fajloun et al .,2000;Zhu et al ., 2004b) and Ttype voltagegated Ca 2+ channels (Chuang et al .,1998; Lopez

Gonzalez et al ., 2003). The ability of scorpion toxins to block Cl channels is controversial(Maertens et al .,2000;Dalton et al .,2003;Fuller et al .,2004).

Table2Differentcriteriacanbeusedtoclassifyscorpiontoxins:peptidelength,ion channelspecificityandelectrophysiology. Peptidelength Ionchannel Electrophysiology specificity Long chain Na + α 60–70residues αlike β(depressant,excitatory) Short chain K+ Voltagegated 30–40residues Inwardrectifier Ca 2+ activated Etheragogo Ca 2+ Type1ryanodine Ttypevoltagegated Cl Cysticfibrosistransmembraneconductanceregulator

28 Chapter 3: Data classification

3.2 Dataclassificationofscorpiontoxinsequences

Dataclassificationisanimportantstepforeffectiveinformationmanagement as it provides an overview of the categories of related biological sequences. It also describes the key relationships between particular characteristics and the correspondingdata.Anadequateclassificationofrelatedbiologicalsequencescanbe usedtopredictthefunctionofanunknownsequencebasedoninferenceofhomology between the unknown and the characterised sequences in a class. Homologous sequences are assumed to descend from a common evolutionary ancestor and thus likelysharesimilarfunction.

Largescaleclassificationofscorpiontoxinsequencedatawasachievedinthis work by a combination of bioinformatic approaches through pairwise and multiple sequence alignments, and phylogenetic analyses. This is necessary because each individualapproachhasitslimitations.Thepairwisealignmentdoesnotgiveaclear indicationofthedomainstructureofproteins(Bateman et al .,2000)ascomparedto multiplesequencealignmentwhichgivesabetterpictureofmostconservedresiduesin aproteinfamily.Formultiplesequencealignment,becauseoflargenumberofpossible alignments,alignmentmethodsoftenproducemistakeswhichcompromisethequality ofresults(Cline et al .,2002).Differentalgorithmshavebeendesignedforassemblyof multiple sequence alignments. However, none of these algorithms performs consistentlybetterthantheothers(Poirot et al .,2003).Anindividualalgorithmmaybe betterthanothersforcertaintypesofproblems,butnoneoftheseisthebestacrossa broadrangeofalignmentproblems(LassmannandSonnhammer,2002).Inphylogeny, the accuracies of multiple sequence alignment affect the estimation of phylogenetic relationshipamongsequencedataanalysed.Ifthesequencesalignwell,theyarelikely tobederivedfromacommonancestralsequence.Ontheotherhand,agroupofpoorly

29 Chapter 3: Data classification aligned sequences may share a complex and distant evolutionary relationship.

Inaccurate multiple alignments will result in incorrect trees and erroneous interpretationofthesetrees.Thus,bycombiningmultipleapproaches,thelimitations canbeminimisedandmoreaccurateanalysesofstructurefunctionofscorpiontoxins canresultfromtheseanalyses.

3.3 MaterialsandMethods

Scorpiontoxinsequenceswerecollectedfrompublicdatabasesandliterature, and deposited into the SCOPRION2 database (discussed in Section 6.3). Of 393 currently known scorpion toxins, 383 sequences were classified into four broad families, namely sodium (Na +), potassium (K +), calcium (Ca 2+ ) and chloride (Cl ) specifictoxins.Potassiumfamilywasfurtherdividedintothreesubfamilies:α,βandγ according to Tytgat et al . (1999). One sequence belonged to the scorpion familywhilethemoleculartargetsofninesequenceswerenotreportedandwerecalled

‘orphans’.Theorphansandthedefensinwerenotincludedintheclassification.Within each broad group, sequences were further classified into groups based on primary structuresimilarityusingBLAST(Altschul et al .,1997) andClustalW(Thompson et al .,1994),andverifiedbyphylogeneticanalysisusingMEGA3.0(Kumar et al .,2004)

(Figure 2 ). The classification process has been illustrated using sodium group 1

(Na01)asanexample.

30 Chapter 3: Data classification

Scorpiontoxindata

Classifybyionchannelspecificity

Na + K+ Ca 2+ Cl Orphan Defensin

BLAST:classifyintogroups

ClustalW:classifyintosubgroups

MEGA3.0:verifygroupings

Figure2Flowchartofthelargescaleclassificationofscorpiontoxindata.

3.3.1 ClassificationofsequencesintogroupsbyBLAST

Withineachbroadionchannelfamily,arepresentativesequencetoxinfroma listofunclassifiedtoxinsequenceswassubmitted totheblastpprogram againstthe nonredundant (nr) database at NCBI ( http://ww.ncbi.nlm.nih.gov/BLAST/ ). BLAST resultobtainedaftereverysubmissionrankedproteinsequencesfromthemostsimilar totheleastsimilartothequery.Bylookingattheexpectation(E) value,acutoffwas determined by manual inspection for each group of sequences. The E value is a parameter that describes the probability of matches against database sequences. The cutoffvaluewouldvarydependingonthesimilarityofthetopscoringsequences,but usuallyoccurredwheretherewasalargechangeinEvaluesbetweentwoconsecutive sequences.ThehigherEvaluewouldserveasthecutoffvalue.Sequenceswhichhad

31 Chapter 3: Data classification

E valueslowerthanthiscutoffwereclusteredintoa group. The sequences scoring higherthanthecutoffwereaddedtoanewgroup.Forexample,inFigure3,thequery sequence was Na + toxin, BmKIT1 isolated from Buthus martensii Karsch. A large difference inthe Evaluesfrom2×e20 to1×e7betweensequences17and18(BjxtrIt andCseV5,respectively)servesasacutoff(Figure3A ).Thehigher Evalueof1×e7 wassetasthecutoffvaluetoclustersequences1–17intoapreliminarygroupi.e. (KIT)toScorpiontoxinBjxtrIT.

Thepreliminarygroupingofthe17sequenceswasconfirmedbysubmittingthe last sequence (BjxtrIT) before the cutoff value 1×e7 for a second BLAST search

(Figure3B ).IntheresultreturnedbythesecondBLASTsearch,thecutoffvalueof

8×e8wassetbecausealargedifferenceinthe Evaluesoccurredbetweensequences

Toxin 1 and TbITI. Sequences with E values lower than 8×e8 were clustered and comparedwiththelistoftoxinsequencesinthepreliminarygroup.Thegroupingwas finalised if both BLAST results clustered the same toxin sequences together. The confirmed group was named according to ion channel specificity followed by a numbere.g.Na01.Theclassifiedsequenceswereremovedfromthelistofunclassified toxinsequencesandtheprocesswasrepeatedwithsubsequentsequencesuntilthelist wasexhausted.

32 Chapter 3: Data classification

A) ScoreE Sequencesproducingsignificantalignments:(Bits)Value gi|3036821|emb|CAA76604.1|neurotoxin(KIT)[Mesobuthusmartensi1523e36 gi|3063655|gb|AAC14130.1|neurotoxinBmKITprecursor[Buthus...1501e35 gi|3649606|gb|AAC61256.1|insectneurotoxinprecursor[Buthus...1262e28 gi|58176732|pdb|1T0Z|BChainB,StructureOfAnExcitatoryIn...1262e28 gi|161147|gb|AAA29950.1|neurotoxinAaHIT11217e27 gi|69545|pir||XISR1Ainsecttoxin1Saharascorpion>gi|223...1217e27 gi|232628|gb|AAA03882.1|insectspecificneurotoxinprecursor...1217e27 gi|134372|sp|P01497|SIX1_ANDAUInsecttoxin1precursor(AaH...1217e27 gi|56404741|sp|P68722|SIXB_LEIQHInsectneurotoxin1bprecursor1185e26 gi|102777|pir||G34444insecttoxin2precursorSaharascorp...1186e26 gi|56404740|sp|P68721|SIXA_LEIQHInsectneurotoxin1aprecursor1153e25 gi|84690|pir||S08267toxin1scorpion(Leiurusquinquestria...1149e25 gi|56404742|sp|P68723|SIXC_LEIQHInsectneurotoxin1cprecursor1085e23 gi|56404743|sp|P68724|SIXD_LEIQHInsectneurotoxin1dprecursor1049e22 gi|6094296|sp|P56637|SIXE_BUTJUExcitatoryinsecttoxinBjxtr...1002e20 gi|3858953|emb|CAA09988.1|neurotoxin,variant38K[Hottentotta1002e20 gi|4140001|pdb|1BCG|ScorpionToxinBjxtrIt1002e20 gi|102790|pir||C23727neurotoxinV5barkscorpion>gi|2052...57.81e07 gi|15825922|pdb|1I6G|AChainA,NmrSolutionStructureOfThe...57.81e07 B) ScoreE Sequencesproducingsignificantalignments:(Bits)Value gi|6094296|sp|P56637|SIXE_BUTJUExcitatoryinsecttoxinBjxtr...1753e43 gi|4140001|pdb|1BCG|ScorpionToxinBjxtrIt1753e43 gi|3858953|emb|CAA09988.1|neurotoxin,variant38K[Hottentotta1741e42 gi|3063655|gb|AAC14130.1|neurotoxinBmKITprecursor[Buthus...1016e21 gi|3649606|gb|AAC61256.1|insectneurotoxinprecursor[Buthus...1018e21 gi|58176732|pdb|1T0Z|BChainB,StructureOfAnExcitatoryIn...1018e21 gi|3036821|emb|CAA76604.1|neurotoxin(KIT)[Mesobuthusmartensi1002e20 gi|56404741|sp|P68722|SIXB_LEIQHInsectneurotoxin1bprecursor95.93e19 gi|56404743|sp|P68724|SIXD_LEIQHInsectneurotoxin1dprecursor93.62e18 gi|161147|gb|AAA29950.1|neurotoxinAaHIT193.22e18 gi|56404740|sp|P68721|SIXA_LEIQHInsectneurotoxin1aprecursor93.22e18 gi|134372|sp|P01497|SIX1_ANDAUInsecttoxin1precursor(AaH...93.22e18 gi|69545|pir||XISR1Ainsecttoxin1Saharascorpion>gi|223...92.05e18 gi|232628|gb|AAA03882.1|insectspecificneurotoxinprecursor...92.05e18 gi|102777|pir||G34444insecttoxin2precursorSaharascorp...90.12e17 gi|56404742|sp|P68723|SIXC_LEIQHInsectneurotoxin1cprecursor89.72e17 gi|84690|pir||S08267toxin1scorpion(Leiurusquinquestria...86.72e16 gi|41017886|sp|P60275|SCXI_TITBAInsecttoxinTbITI58.28e08 gi|58379083|gb|AAW72463.1|putativelongchaintoxinprecursor57.41e07

Figure 3 BLAST search results were used for classification of scorpion toxin sequences into groups based on high primary sequence similarity. First column providesthedatabaseaccessionnumbersandtoxinnames.Secondandthirdcolumns representthescoreand Evalues.Sequenceswereclassifiedintogroupsbyanalysing their Evalues. A)ResultofsubmittingBmKIT1,aNa +toxinfrom Buthus martensii Karsch,againstthenrdatabaseatNCBI.Amarkedincreaseinthe Evaluesfrom2×e 20 to 1×e 07 would serve as a borderline where 1×e 07 was designated as the cutoff value.Sequenceshaving Evalues<1×e 07wereclusteredintoapreliminarygroup. B) ResultofsubmittingBjxtrIT1from Hottentotta judaica ,whichwasthelastsequence before the cutoff value, 1×e 07 into nr database. The cutoff value for this second BLASTsearchwas8×e 08,wheresequenceshaving Evalueslowerthan8×e 08were clusteredintothegroup.

33 Chapter 3: Data classification

3.3.2 DataclassificationintosubgroupsbyClustalW

Withineachgroup,multiplesequencealignmentwasperformedusingClustal

W. Sequences with distinct primary structure patterns were further classified into subgroups. Subgroups were given an alphabet after the group name e.g. Na01a. For example, the two BjxtrIT sequences had five distinctresiduesatpositions16to21 from the rest of the sequences inNa01 and was classified into a subgroup, Na01c

(Figure4).TheCterminalresiduesofBmKITAP,BmKIT1,Bm32VIandBmKIT were distinct, sharing pattern of Na01c. These four were classified into a second subgroupwhiletherestofthesequencesformthelastsubgroup.

3.3.2 VerificationofgroupsandsubgroupsbyMEGA3.0

The author verified these groupings by phylogenetic analysis using MEGA

3.0. Phylogenetic trees were reconstructed using the neighbourjoining algorithm

(SaitouandNei,1987).SequencesinNa01bwereclusteredandseparatedfromNa01a andNa01c( Figure5).Thegroupandsubgroupclassificationofeachscorpiontoxin sequencecanbefoundintheSCORPION2databaseentries(SeeAppendix1).

34 Chapter 3: Data classification

1102030405060 ||||||| AaHIT1 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 AaHIT1v17L KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 AaIT KK NGY AV DSSG KAP EC LL SNYCNNQCT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 AaHIT2 KK NGY AV DSSG KAP EC LL SNYCYN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 AaHIT3 KK DGY AV DSSG KAP EC LL SNYCYN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 IsomTx1 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLSDD KK VL E55 IsomTx2 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLSDD KK VL D55 LqqIT1v33EKK NGY AV DSSG KAP EC LL SNYCYN ECT KVHY ADKGYCC LL SCYC VGLSDD KK VL E55 LqqIT1v33DKK NGY AV DSSG KAP EC LL SNYCYN ECT KVHY AEKGYCC LL SCYC VGLSDD KK VL E55 LqhIT1a KK NGY AV DSKGKAP EC FL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 LqhIT1b KKNGY AV DSKGKAP EC FL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 LqhIT1c KK NGY AV DSKGKAP EC FF SNYCNN ECT KVHY AEKGYCC LL SCYC VGLNDD KK VM E55 LqhIT1d KK NG FAV DSNG KAP EC FF DHYCNS ECT KVYY AEKGYCCT LSCYC VGLDDD KK VL D55 BmKITAP KK NGY AV DSSG KVA EC LF NNYCNN ECT KVYY ADKGYCC LL KCYC FGLA DD KPVL D55 BmKIT1 KK NGY AV DSSG KVSEC LL NNYCNN ICT KVYY ATSGYCC LL SCYC FGLDDD KAVL K55 Bm32VI KK NGY AV DSSG KVSEC LL NNYCNN ICT KVYY ATSGYCC LL SCYC FGLDDD KAVL K55 BmKITKK NGY AV DSSG KVSEC LL NNYCN INCTKVYY ATSGYCC LL SCYC FGLDDD KAVL K55 BjxtrITv56E KK NGY PL DRNG KTT ECSG VNAIAP HYCNS ECT KVYY AESGYCC WGACYC FGLEDD KPI GP60 BjxtrITv56K KK NGY PL DRNG KTT ECSG VNAIAP HYCNS ECT KVYY AKSGYCC WGACYC FGLEDD KPI GP60 ** :*:.:*.**..** :: ** **** :** .**** ***.** *** : 70 | AaHIT1 ISDTRK SYC DTT II N70 AaHIT1v17L ISDTRK SYC DTT II N70 AaIT ISDTRK SYC DTT II N70 AaHIT2 ISDTRK SYC DTPII N70 AaHIT3 ISDTRK SYC DTPII N70 IsomTx1 ISDTRKK YC DYT II N70 IsomTx2 ISDTRKK YC DYT II N70 LqqIT1v33EISDARKK YC DFV TIN70 LqqIT1v33DISDARKK YC DFV TIN70 LqhIT1a ISGTT KK YC DFTII N70 LqhIT1b ISDTT KK YC DFTII N70 LqhIT1c ISDTRKK ICDTT II N70 LqhIT1d ISDTRKK LCDFTLF N70 BmKITAP IW DST KNYC DVQII DLS72 BmKIT1 IKDATKSYC DVQIN69 Bm32VI IKDATKSYC DVQII G70 BmKIT IKDATKSYC DVQII N70 BjxtrITv56E MKDITKK YC DVQIIP S76 BjxtrITv56K MKDITKK YC DVQIIP S76 : . *. ** Figure 4 FurtherclassificationoftoxinsequencesinNa01intosubgroupsbasedon distinct primary structure patterns in their multiple sequence alignment. The first columnshowstoxinnames;secondcolumnshowsthetoxinprimarysequences;third column displays the peptide length. The shaded boxes highlight distinct patterns amongsequences.

35 Chapter 3: Data classification

AaHIT1 76 AaIT

71 AaHITv17L

AaHIT2

87 AaHIT3

IsomTx1

78 IsomTx2 Na01b 81 LqqIT1v33E

98 LqqIT1v33D

LqhITa 98 LqhIT1b

LqhITc

LqhIT1d

Bmk ITAP

Bm32 VI 70 Na01a 99 BmKIT1

Bmk IT

BjxtrITv56E Na01c 100 BjxtrITv56K

Figure5VerificationofgroupsandsubgroupsbyphylogeneticanalysisusingMEGA 3.0. Each bootstrap number represents the percentage of 1000bootstrap value. SubgroupNa01bwasseparatedfromNa01aandNa01c.WithinthebranchofNa01a and Na01c, the two BjxtrIT sequences were classified into Na01c, while BmK IT, BmKIT1,Bm32VIandBmkITAPwereclassifiedasNa01a.

3.4 Results–Classificationofscorpiontoxinsequences

The393scorpiontoxinsequenceswereclassifiedinto62groups( Figure 6 ).

The general distribution of scorpion toxins in the 62 groups is not significantly differentfrompreviousclassificationintheliterature(Possani et al .,1999;Rodriguez de la Vega and Possani, 2004) except for the larger number of toxin sequences analysed.Sometoxinsequenceshadbeenreclassifiedandnewgroupswereintroduced

36 Chapter 3: Data classification forsequenceswhichdidnotfitintothepreviousclassification.Asnewscorpiontoxin sequencesareidentifiedandnewinformationisavailable,revisionoftheclassification willbenecessary.Thedetaileddescriptionofthe62groupsisavailableinAppendix1.

ThelongchainNa +toxinswereseparatedfromtheshortchainscorpiontoxins namely,K +,Ca 2+ andCl excepttoxinsinBmP09andBeI2groups.BmP09shareshigh homologywithNa +toxinsbuthasasulfoxideattheCterminuswhichresultedina dramaticswitchfromaNa +channelblockertoaK+channelblocker(Yao et al .,2005).

BeI2wasmisalignedwithshortK +toxins.Of393scorpiontoxins,135K+toxinswere classifiedinto32groups,222Na +toxinswereclassifiedinto18groups,whileeight

Ca 2+ toxins were classified into four groups ( Table 3 ). K + group 6 was further classifiedintothreesubgroups.Na +groups2and9wereclassifiedintofoursubgroups each,Na +group1intothreesubgroupsandNa +group12intotwosubgroups.19Cl toxinswereclassifiedintotwosubgroups.Scorpineandninescorpiontoxinsequences with no annotated molecular target were assigned to the ‘defensin’ and ‘orphan’ groups,respectively.

Table3Summaryoftheclassifiedgroupsfor393scorpiontoxins.Kurtoxinhasbeen classifiedintoNa +andCa 2+ sinceittargetsbothionchannels(OlamendiPortugal et al .,2002). K+ Na + Ca 2+ Cl Defensin Orphan No.ofgroups 32 18 4 1 1 6 No.oftoxins 135 222 8 19 1 9

37

Chapter 3: Data classification

3 8 2

9

4 2 6

W

6 0 7 7

6 5 B 6 7

9 1

5 1 N

O 1

0

9 P P

x 2

0 4

T

Q

9 6 P T

8

I

4

8 h 4

T 4 a

0

0 a 1 T B

9 h I

P 0 A 5

K p h l

5 P

2 a 9 a m

P 9 s

2 A 1

h

s 0 B 0

5 h 3

q P T

C 6 a 6

I L t

9 P A m 2

4

o B 2 75

1

B 1 8

0 n 5

2

P C P

1

6 n

3 i

6 x s

5 o 7 T t

P r 9

i 4

1

1 B

t 01

s 2

7 P T

2

B 1 5

5 P

K IT MJ

1 h

B a Q5

9

P A in

x

Q to 2

o

I d

1 aio

e h

m

8 P

B

J

K

N e

5

9 B

Q

4 0

K 48

K 1

m 0 B P

h3

Aa

1

B

U 8

M 67

8 56

Q P

3

8 qh 3 L

K m B

Toxin1 Q9BLM0 1

x

T m O

2

1

N

B 9

Q

1 i n x o t s u l u m a T 90 BmTXPL2 Q95P

15 TsK4 01 P562 P46114 P

BmP02 Q9NJ Pi2 P55927 P7 BmTX KB Q8W 407 S33 BmK37 P83 3 Ts 24 TXK 83 B P 1 P 69 Tc 8 940 02 46 PB O Tx 1 1 P x 60 T 6 B 1 o 2 m 64 C 7 L T 0 qh X 1 1 KS Q 51 1 i1 P Q P 5 4 96 1 56 3F 8 T 6 8 2 8 C c 0 6 0 3 6 P ll 2 4 E P 2 X C r P g 6 T 6 C s T 0 3 E 2 X h X N O 1 T 9 A r T 2 g 1

K 9 a s x

K T Q 5 T

9 X

P 2 8 P x 6

F 2 1 1 x P Q 7

3 Q

T Q 8 U 6 4 u 3 8 9 9

8 8 H

B 2 6 6 f 7 Q 4 Q T

7 x 4 1 U 1 F P K 5 4 8 K 2 8

m 5 0 B

L n i e x T

o x t 4 P o 5

r 1 2 o 6 l 0 3 h 6 4

C P

1

a c M Figure6PhylogenetictreeofrepresentativescorpiontoxinsforNa +,K +,Ca 2+ and Cl channels.Forclarity,only52branchesaredisplayed.Theorphananddefensin groups were not included. The tree was calculated using the neighbourjoining algorithm. Na +,K+, Ca 2+ andCl .Thebranchesarelabelledwithtoxin namesandSwissProtaccessionnumbersexceptOmTx1(Chagotet al .,2005)and BmP09(Yao et al .,2005).TreewasgeneratedusingMEGA3.0.

38 Chapter 3: Data classification

ForK+toxins,afourthsubfamily,δKTxwasintroducedforκtoxinsfollowing thenomenclatureforα,βandγKTxsubfamiliesproposedbyTytgat et al .(1999)

(Table4 ).Thehairpinstructureoftwoshorthelicescrosslinkedwithtwodisulfidesin theδKTxsubfamily(Srinivasan et al .,2002b;Nirthanan et al .,2005)differsfromthe conserved3Dstructuresofallknownscorpiontoxins,comprisingofdoubleortriple strandedβsheetsandanαhelixmaintainedbytwo–fourdisulfidebridges(Possani et al ., 1999; Possani et al ., 2000). Toxins in the δKTx subfamily are non or weak ligandsofvoltagedependentpotassiumchannels(Srinivasan et al .,2002;Nirthanan et al .,2005).

FivenewαKTxsubfamilieswereproposedwhichdidnotfitintotheexpanded

18αKTxsubfamilies(RodriguezdelaVegaandPossani,2004),namelyBmTXKS1

(αKTx21),Tamulustoxins(αKTx22),OmTX(αKTx23),K+channelinhibitingtoxin

(αKTx24) and BmK38 (αKTx25) ( Figure 7 ).TheauthorhadreclassifiedLqh151

(α1.7)andPbTx3(α1.10)inαKTx1subfamilyintoαKTx16andαKTx4subfamilies, respectively.Lqh151sharedhighersequenceidentity(~80%)withtoxinsinαKTx16 thanαKTx1(<50%)andwasclusteredwithinthesamecladeasαKTx16toxinsin thephylogenetictree.PbTx3sharedhighersequenceidentity(average50%)withα

KTx4toxinsthanthoseinαKTx1(average40%).αKT6subfamilywasdividedinto three subfamilies because they were separated into three clades. Tc1 (α13.1) and

OsK2 (α13.2) were split into αKTx13 and αKTx20 because submission of each toxinintoBLASTsearchesatNCBI( http://www.ncbi.nlm.nih.gov/BLAST/ ) did not returntheothertoxinandtheywerepresentondifferentclades.

TheβKTxsubfamilyconsistsofK+longchaintoxinsof60–66residues.The authorclassifiedBmTXKβ(β3)intoβKTx2asithadonly21%identitywithTsTXKβ

(β1),AaTXKβ(β2)andBmTXKβ2(β4)whichformtheβKTx1subfamily(average

39 Chapter 3: Data classification

70%).Anewlongchainpotassiumtoxin,BmP09(Yaoet al .,2005)formstheβKTx3.

Table 4 Classification of 135 K +scorpiontoxinsequences.Thefirstcolumnshows subfamiliesfromthiswork.δKTxsubfamilywasintroducedafterα,βandγKTx subfamilies proposed by Tytgat et al . (1999). Second column lists the number of peptidesineachsubfamily.Thirdcolumnshowsthedistributionofclassifiedtoxinsas comparedtothatofthe18αKTx,4βKTxand5γKTxsubfamiliesbyRodriguezde la Vega and Possani (2004). ‘New toxin’ represents the number of new toxins not previouslyincludedinthesubfamilyofRodriguezdelaVegaandPossani(2004). K+ Peptideno. RodriguezdelaVegaandPossani(2004) Subfamily Sequenceincrement αKTx1 8 αKTx1 0 αKTx2 9 αKTx2 3 αKTx3 9 αKTx3 0 αKTx4 8 αKTx4 4 αKTx5 5 αKTx5 0 αKTx6(a,b,c) 12(4,7,1) αKTx6 2 αKTx7 2 αKTx7 0 αKTx8 4 αKTx8 0 αKTx9 6 αKTx9 2 αKTx10 2 αKTx10 0 αKTx11 3 αKTx11 0 αKTx12 2 αKTx12 1 αKTx13 1 αKTx13 1 αKTx14 4 αKTx14 0 αKTx15 7 αKTx15 0 αKTx16 5 αKTx16 2 αKTx17 1 αKTx17 0 αKTx18 1 αKTx18 0 αKTx19 1 Cai et al .,2004 0 αKTx20 1 αKTx13 0 αKTx21 1 New 1 αKTx22 2 New 2 αKTx23 3 New 3 αKTx24 1 New 1 αKTx25 1 New 1 βKTx1 3 β1,β2,β4 0 βKTx2 1 β3 0 βKTx3 1 New 1 γKTx1 2 γKTx2 1 γKTx2 12 γKTx4 0 γKTx3 14 γKTx1,3,4,5 1 δKTx1 3 κhefutoxins 1

40 Chapter 3: Data classification

Gp Name (SP ID) Sequence

Alpha-KTx 1ChTx(P13487)ZFTNVSCTTSKECWSVCQRLHNTSRGKCMNKKCRCYS 2NTX(P08815)TIINVKCTSPKQCSKPCKELYGSSAGAKCMNGKCKCYNN 3KTX(P24662)GVEINVKCSGSPQCLKPCKDAGMRFGKCMNRKCHCTPK 4TsK4(P46114)VFINAKCRGSPECLPKCKEAIGKAAGKCMNGKCKCYP 5LeTx(P16341)AFCNLRMCQLSCRSLGLLGKCIGDKCECVKH 6Pi1(Q10726)LVKCRGTSDCGRPCQQQTGCPNSKCINRMCKCYGC 7Pi2(P55927)TISCTNPKQCYPHCKKETGYPNAKCMNRKCKCFGR 8P01(P56215)VSCEDCPEHCSTQKAQAKCDNDKCVCEPI 9BmP02(Q9NJP7)VGCEECPMHCKGKNAKPTCDDGVCNCNV 10CoTx1(O46028)AVCVYRTCDKDCKRRGYRSGKCINNACKCYPY 11PBTx1(P60164)DEEPKESCSDEMCVIYCKGEEYSTGVCDGPQKCKCSD 12BuTx(P59936)WCSTCLDLACGASRECYDPCFKAFGRAHGKCMNNKCRCYT 13Tc1(P83243)ACGSCRKKCKGSGKCINGRCKCY 14BmKK1(Q967F9)TPFAIKCATDADCSRKCPGNPSCRNGFCACT 15AaTx1(Q867F4)ZIETNKKCQGGSCASVCRRVIGVAAGKCINGRCVCYP 16Lqh151(P45660)GLIDVRCYDSRQCWIACKKVTGSTQGKCQNKQCRCY 17BmKK4(Q95NJ8)QTQCQSVRDCQQYCLTPDRCSYGTCYCKTT 18Tc32(P60211)TGPQTTCQAAMCEAGCKGLGKSMESCQGDTCKCKA 19BmK37(P83407)AACYSSDCRVKCVAMGFSSGKCINSKCKCYK 20OsK2(P83244)ACGPGCSGSCRQKGDRIKCINGSCHCYP 21BmTXKS1(Q963F8)GIVCKVCKIICGMQGKKVNICKAPIKCKCKK 22Tamulustoxin1(Q9BN12)RCHFVVCTTDCRRNSPGTYGECVKKEKGKECVCKS 23OmTx1DPCYEVCLQQHGNVKECEEACKHPVE 24Tx(O96669)CQNECCGISSLRERNYCANLVCINCFCQGRTYKICRCFFSIHAIR 25BmK38(Q8MUB1)KTATFCTQSICQESCKRQNKNGRCVIEAEGSLIYHLCKCY Beta-KTx 1TsTXKB(P69940)KLVALIPNDQLRSILKAVVHKVAKTQFGCPAYEGYCNDHCNDIERKDGECHGFKCKCAKD 2BmTXKB(Q8WS33)KNIKEKLTEVKDKMKHSWNKLTSMSEYACPVIEKWCEDHCAAKKAIGKCEDTECKCLKLRK 3BmP09DNGYLLNKYTGCKIWCVINNESCNSECKLRRGNYGYCYFWKLACYCEGAPKSELWAYETNKCNGKM Gamma-KTx 1BeKm1(Q9BKB7)RPTDIKCSESYQCFPVCKSRFGKTNGRCVNGFCDCF 2CllErgTX2(Q86QU9)DRDSCVDKSKCSKYGYYGQCDECCKKAGDRAGNCVYFKCKCNP 3CsErgTX2(Q86QU5)DRDSCVDKSRCAKYGYYGQCEVCCKKAGHRGGTCDFFKCKCV Delta-KTx 1HfTx1(P82850)GHACYRNCWREGNDEETCKERC Figure7RepresentativescorpiontoxinsfromfourK+subfamilies,namelyα,β,γandδ.Thegroupnumberisfollowedbythetoxinnameandthe accessionnumberfromSwissProt,exceptOmTx1(Chagot et al .,2005)andBmP09(Yao et al .,2005)whichwereextractedfromtheliterature .

41 Chapter 3: Data classification

The author reclassified toxins targeting etheragogo potassium channel subtypeintothreesubfamilies:γKTx1–3insteadoffivesubfamiliesasproposedby

RodriguezdelaVegaandPossani(2004).Threedistinctpatternswereobservedfrom theirmultiplesequencealignment( Figure8 ).TheγKTx1consistsofBeKm1(γ2.1) andBmKK7(SwissProtid:P59938),γKTx2consistsofalltheγKTx4toxinsandγ

KTx3consistsofγKTx1,3,4and5.

γ4.13γKTx2 DRDSC VDKSKCG KYGYYGQC DE CC KKAGDRAGTC VYY KCKCN P γ4.12 ERDSC VEKSKCG KYGYYGQC DE CC KKAGDRAGTC VYY KCKCN P γ4.09DRDSC VDKSRCG KYGYYGQC DD CC KKAGDRAGTC VYY KCKCN P γ4.10 DRDSC VDKSRCG KYGYYGQC DE CC KKAGDRAGTC VYY KCKCN P γ4.11 DRDSC VDKSQCG KYGYYGQC DE CC KKAGERVGTC VYY KCKCN P γ4.03 DRDSC VDKSKCG KYGYYGQC DE CC KKAGDRAGICEYY KCKCN P γ4.01 DRDSC VDKSKCS KYGYYGQC DE CC KKAGDRAGNC VYFKCKCN P γ4.06 DRDSC VDKSKCS KYGYYGQC DKCC KKAGDRAGNC VYFKCKCNQ γ4.07 DRDSC VDKSKCAKYGYYGQC DE CC KKAGDRAGNC VYLKCKCNQ γ4.08 DRDSC VDKSKCG KYGYYHQC DE CC KKAGDRAGNC VYY KCKCN P γ4.04 DRDSC VDKSKCAKYGYYYQC DE CC KKAGDRAGTC EYFKCKCN P γ4.05 DRDSC VDKSQC AKYGYYYQC DE CC KKAGDRAGTC EYFKCKCN P γ4.02γKTx3 DRDSC VDKSKCG KYGYYQ ECQ DCC KNAGHNGGTC VYY KCKCN P γ1.01 DRDSC VDKSRCAKYGYYQ ECQ DCC KNAGHNGGTC MFF KCKCA γ1.01isoform DRDSC VDKSRCAKYGYYQECQ DCC KNAGHNGGTQ MFF KCKAP γ1.04 DRDSC VDKSRCAKYGYYQ ECQ DCC KKAGHNGGTC MFF KCKCA γ1.05 DRDSC VDKSRCS KYGYYQ ECQ DCC KKAGHNGGTC MFF KCKCA γ1.06 DRDSC VDKSRCAKYGYYQ ECQ DCC KKAGHSGGTC MFF KCKCA γ1.02 DRDSC VDKSRCAKYGYYQ ECT DCC KKYGHNGGTC MFF KCKCA γ1.03 DRDSC VDKSRCAKYGHYQ ECT DCC KKYGHNGGTC MFF KCKCA γ5.01 DRDSC VDKSRCAKYGYYGQC EVCC KKAGHNGGTC MFF KCMCVNS KMN γ3.01 GRDSC VNKSRCAKYGYYSQC EVCC KKAGH KGGTC DFF KCKCKV γ3.03 DRDSC VDKSRCAKYGYYGQC EVCC KKAGH RGGTC DFF KCKCKV γ3.02 DRDSC VDKSRCAKYGYYQQC EICC KKAGH RGGTC EFF KCKCKV γ3.04 DRDSC VDKSRCQ KYGNY AQCT ACC KKAGHN KGTC DFF KCKCT γ5.02 DRDSCVDKSRCQ KYG PYGQCT DCC KKAGHTGGTC IYFKCKCG AESG R γ2.01γKTx1RP TDIKCS ES YQC FPV CKSRFGKTNG RCVNG FCDCF BmKK7RP TDIKCS AS YQC FPV CKSRFGKTNG RCVNG LCDCF * .: * *: * ** :.*. * *. Figure 8 Multiple sequence alignment of γKTx toxins targeting etheragogo K + channelsubtype.ThefirstcolumnrepresentstheclassificationinRodriguezdelaVega and Possani (2004); second column represents the classification in this work; third column represents the toxin primary sequences. Multiple sequence alignment was generatedbyClustalW.

42 Chapter 3: Data classification

Inthiswork,theauthorclassified222Na +scorpiontoxinsinto18groupsbased onsequencesimilarityandionchannelspecificity( Table5 ).Incontrast,Possani et al .

(1999)classified36sequencesinto10groupsbasedonanimalspeciesspecificityand pharmacologicalproperties.Informationontoxicitytoanimalmodelssuchasinsects

(cockroach,locust),mammals(mice,rat)andcrustaceans(crayfish,prawn)islimited intheliterature.ToxinsequencesindifferentgroupsproposedbyPossani et al .(1999) sharedhighsequencesimilarity.Forexample,CssII,CsEv1andCn5inPossani et al .’s groups2,8and9shared60–86%identity( Figure9 ).Na +toxinswerethusclassified basedonprimarystructuresimilarityandbroadionchannelspecificity.Thiswork’s groups1,2,10and11correspondedtoPossani et al .’sgroups10,3,7and4exceptfor thelargernumberoftoxinmembersanalysed.Groups3–7and9containedsequences distributedinPossani et al .’sgroups1,2,5,6,8and9.Finally,newgroups8,12–18 containing scorpion toxin sequences that were dissimilar to other groups were introduced.Representativesequencesofthe18Na +groupsareshownin Figure10 .

Toxin Possani’s Sequence CsEv12 KEGY LV KK SDGC KYDCFWL GKNEHCNT ECKAKNQGGSYGYCY AFA CWCEGLP EST PTY PLPNKSC Cn59 KEGY LV NKSTGC KYGC LLL GKNEGC DKECKAKNQGGSYGYCY AF GC WCEGLP EST PTY PLPNKSCS CssII8 KEGY LV SKSTGC KYECLKLGDNDYC LRECKQQYG KSSGGYCY AFA CWCTH LYEQAVVWPLPNKTCN ******.** **** *: **.* : * *** : .* ******.*** **. : .:*****:*. Figure 9Multiplesequence alignmentofCsEv1,Cn5andCssII in Possani et al .’s groups2,8and9.Inthiswork,thesesequenceshadbeenclassifiedintogroup9. Scorpion toxins targeting Ca 2+ channels were classified into four groups

(Figure 11 ) as they shared less than 28% sequence identity among themselves.

ImperatoxinIfrom Pandinus imperator istheonlyscorpiontoxinisolatedthusfarthat existsasaheterodimer(Zamudio et al .,1997).

43 Chapter 3: Data classification

Table5Classificationof222Na +scorpiontoxinsequences.Thefirstcolumnshows groups and subgroups determined in this work. Second column lists the number of peptidesineachgrouporsubgroup.Thirdcolumnshowsthedistributionofclassified toxinsascomparedtothatofthe10subfamiliesbyPossani et al . (1999).‘Newtoxin’ represents the number of new toxins not previously included in the subfamily of Possaniet al .(1999). Na + Peptideno. Possaniet al .(1999) Group(Subgroup) Subfamily NewToxin 1(a,b,c) 18(4,12,2) 10 15 2(a,b,c,d) 19(3,4,7,5) 3 18 3 38 5,6,8 31 4 5 1 3 5 7 6 5 6 19 1,5 15 7 6 1,8 3 8 2 New 2 9(a,b,c,d) 52(6,20,24,2) 2,8,9 42 10 5 7 4 11 33 4 27 12(a,b) 4(3,1) New 4 13 1 New 1 14 3 New 3 15 5 New 5 16 3 New 3 17 1 New 1 18 1 New 1

44 Chapter 3: Data classification & analyses

Gp Name (SP ID) Sequence 1AahIT1(P01497)KKNGYAVDSSGKAPECLLSNYCNNECTKVHYADKGYCCLLSCYCFGLNDDKKVLEISDTRKSYCDTTIIN 2Tst1(P56612)KEGYLMDHEGCKLSCFIRPSGYCGRECTLKKGSSGYCAWPACYCYGLPNWVKVWDRATNKC 3LqhalphaIT(P17728)VRDAYIAKNYNCVYECFRDAYCNELCTKNGASSGYCQWAGKYGNACWCYALPDNVPIRVPGKCR 4Aah3(P01480)VRDGYIVDSKNCVYHCVPPCDGLCKKNGAKSGSCGFLIPSGLACWCVALPDNVPIKDPSYKCHS 5Lqh3(P56678)VRDGYIAQPENCVYHCFPGSSGCDTLCKEKGGTSGHCGFKVGHGLACWCNALPDNVGIIVEGEKCHS 6Aah2(P01484)VKDGYIVDDVNCTYFCGRNAYCNEECTKLKGESGYCQWASPYGNACYCYKLPDHVRTKGPGRCH 7Ts3(P01496)KKDGYPVEYDNCAYICWNYDNAYCDKLCKDKKADSGYCYWVHILCYCYGLPDSEPTKTNGKCKS 8Aah6(P56743)GRDGYVVKNGTNCKYSCEIGSEYEYCGPLCKRKNAKTGYCYAFACWCIDVPDDVKLYGDDGTYCSS 9Css2(P08900)KEGYLVSKSTGCKYECLKLGDNDYCLRECKQQYGKSSGGYCYAFACWCTHLYEQAVVWPLPNKTCN 10AahIT4(P21150)EHGYLLNKYTGCKVWCVINNEECGYLCNKRRGGYYGYCYFWKLACYCQGARKSELWNYKTNKCDL 11BotIT5(P55904)DGYIRKRDGCKVSCLFGNEGCDKECKAYGGSYGYCWTWGLACWCEGLPDDKTWKSETNTCG 12BeI2(P15221)ADGYVKGKSGCKISCFLDNDLCNADCKYYGGKLNSWCIPDKSGYCWCPNKGWNSIKSETNTC 13BmKBT(Q9NBW2)KKSGYPTDHEGCKNWCVLNHSCGILCEGYGGSGYCYFWKLACWCDDIHNWVPTWSRATNKCRA 14Toxin1(Q9BLM0)VRDGYFVEPDNCVVHCMPSSEMCDRGCKHNGATSGSCKAFSKGGNACWCKGLR 15Birtoxin(P58752)ADVPGNYPLDKDGNTYKCFLLGGNEECLNVCKLHGVQYGYCYASKCWCEYLEDDKDSV 16Phaiodotoxin(Q5MJP5)KFIRHKDESFYECGQLIGYQQYCVDACQAHGSKEKGYCKGMAPFGLPGGCYCPKLPSNRVKMCFGALESKCA 17BmTXPL2(Q95P90)CSMVYGDSLSPWNEGDTYYGCQRQTDEFCNKICKLHLASGGSCQQPAPFVKLCTCQGIDYDNSFFFGALEKQCPKLRE 18Cn12(P63019)RDGYPLASNGCKFGCSGLGENNPTCNHVCEKKAGSDYGYCYAWTCYCEHVAEGTVLWGDSGTGPCRS Figure10 RepresentativescorpiontoxinsfromNa +toxingroups1–18.Thegroupnumberisfollowedbythetoxinnameandtheaccession numberfromSwissProt. Gp Name (SP ID) Sequence 1Mca(P60254)GDCLPHLKLCKENKDCCSKKCKRRGTNIEKRCR 2IpTxI(P59888)TMWGTKWCGSGNEATDISELGYWSNLDSCCRTHDHCDNIPSGQTKYGLTNEGKYTMMNCKCETAFEQCLRNVTGGMEGPAAGFVRKTYFDLYGNGCY 3Kurtoxin(P58910)KIDGYPVDYWNCKRICWYNNKYCNDLCKGLKADSGYCWGWTLSCYCQGLPDNARIKRSGRCRA 4BjTx1VGCEECPAHCKGKNAKPTCDDGVCNCNV 1Mca(P60254) 2IpTxI(P59888)NVQCPSQSEECPDGVATYTGEAGYGAWAINKLNG 3Kurtoxin(P58910) 4BjTx1 Figure11RepresentativescorpiontoxinsfromCa 2+ toxingroups1–4.Thegroupnumberisfollowedbythetoxinnameandtheaccession numberfromSwissProtexceptBjTx1whichwasextractedfromZhu et al .(2004b).

45 Chapter 3: Data classification

3.5 Discussionandconclusions

Theauthorproducedthefirstlargescaleclassificationof393currentlyknown scorpiontoxinswhichincludealltoxinsspecificto Na +, K +, Ca 2+ and Cl channels.

Withalargerepositoryoftoxinsequences,abroadperspectiveofthegeneralpatterns intheirstructureandfunctioncanbeobserved.Thisclassificationeliminatesanumber of errors that resulted from analysing small sample sizes. For example, prior to isolationofK +longchaintoxinsof60–64residues(Legros et al .,1998),K+toxins wereclassifiedasshortchaintoxinsof30–40residues.Withthisnewinformation, the primary sequence structure, but not peptide length, can be used to discriminate betweenlongchainNa +andK +toxins.

This classification was based on broad ion channel specificity and primary structure similarity because of three important reasons. First, the pharmacology of manyofthesetoxinstoionchannelsremainstobedetermined. For example,tothe author’s knowledge, the pharmacological properties of BmK 412 (GenBank ID:

AF327643),AamTx(Chen et al .,2005)andAamH2(Chen et al .,2003)havenotyet beenvalidated.Second,forK +toxins,fewofthepeptidesareselectiveforagivenK + channel subtypes, for example, charybdotoxin (Gao and Garcia, 2003), maurotoxin

(Vissan et al .,2004)andseveralothersblockbothvoltagegatedandCa 2+ activatedK + channels. However, iberiotoxin is a selective blocker of largeconductance Ca 2+ activated K + channel (Galvez et al .,1990)andP05isspecifictosmallconductance

Ca 2+ activated K + channel (Wu et al ., 2002). Third, for Na + toxins, information on toxicitytodifferentanimalmodelsislacking.Forexample,toxicitytoanimalmodels wasnotdetermined for the16peptidesidentifiedfrom the cloning of Centruroides sculpturatus Ewingvenomousgland(Corona et al .,2001).Further,toxinsknownas

46 Chapter 3: Data classification mammalspecific can be toxic to insects and vice versa (Gordon et al ., 1996). The animalgroupspecificityisonlyrelativeanddefinitecrossreactivityexists(Selisko et al .,1996).Informationoncrossreactivityremainstobedetermined.Asaresult,the primary structure similarity and broad ion channel specificity appear to be the best criteria available for classification purposes in scorpion toxins. The phylogenetic analysesandhighlyaccuratepredictionsofscorpiontoxinfunctionalproperties(see

Chapter 5) indicate that current groupings are suitable for scorpion toxin characterisation.

Classificationandnomenclatureofbioactivetoxinsinanimalvenomsarestill being developed – they are important prerequisites for structural and functional studies.Alackofconsistencyinthenomenclaturemayledtoconfusioninscorpion toxin classification with possibility of multiple names ascribed to the same toxins

(Goudet et al .,2002)orthesamenameusedfordifferenttoxins(Becerril et al .,1997).

Toxinshavebeennamedaccordingtotheorderoffractionspurifiedfromthevenom

(e.g.Batista et al .,2004;Murgia et al .,2004).Currently,classificationoftwoscorpion species Centruroides exilicauda Woodand Centruroides sculpturatus Ewingremainto beresolved(ValdezCruz et al .,2004b)wheredefinitionofspecies,genera,families and others is often subjective. A detailed classification based on structurefunction relationshipswillprovideaconvenientsolutiontothenamingofthetoxins.

The author proposed new classification groups for scorpion toxin sequences thathavenotbeenclassified,reflectingthediversityofdifferenttoxinsavailableinthe naturallibraryofscorpionvenoms.Withthegrowth of number of newly identified scorpiontoxins,theviewofthestructurefunctionrelationshipsisconstantlychanging.

Itisimportanttonotethatthispresentlargescaleclassificationof393differenttoxins inthisworkrelatestocurrentknowledgeofthefield.Inthefuture,itisexpectedthat

47 Chapter 3: Data classification revision of classification will be necessary as more information becomes available.

This classification approach, as a generic tool, can be applied to largescale classificationofotherbioactivetoxinsandfamiliesofbioactivepeptides.

Chaptersummary

• Largescale classification of all known scorpion toxins provides a global

overviewontheirstructurefunctionrelationships,aidsfunctionalinferenceof

novelscorpiontoxinsandfacilitatesretrievalofrelevantbiologicalinformation

fromlargenumberoftoxindata.

• Inthiswork,393currentlyknownscorpiontoxinshavebeenclassifiedinto62

groups based on ion channel specificity and primary sequence similarity,

combined with multiple sequence alignments and phylogenetic analyses. 14

newclassificationgroupswereproposedforscorpiontoxinsequencesthathave

notbeenclassified.

• Inthefuture,thepresentclassificationisexpectedtoberevisedtoincorporate

novelscorpiontoxinsequencesidentifiedandnewinformationmadeavailable.

48 Chapter 4: Extraction of functional motifs in scorpion toxins

PartII:Chapter4 Extractionoffunctionalpeptidemotifs inscorpiontoxins ‘Manyoflife'sfailuresarepeoplewhodid notrealisehowclosetheyweretosuccess whentheygaveup.’ ThomasEdison

49 Chapter 4: Extraction of functional motifs in scorpion toxins

Newly identified scorpion toxins are deposited in public databases which provide limited functional annotation. To aid experimental characterisation of these toxins, functionisinferredfromidentificationofsimilarcharacterisedsequencesbypairwise alignment,suchasusingBLAST(Altschul et al .,1997)orFASTA(Pearson,2000) tools.However,thereareknowncaseswheresimilarstructuresdonotnecessaryimply functional similarity. The highly similar sequences may differ in function from the characterised sequences where even variation at a critical functional residue can markedly affectstheactivityofaprotein.Forexample,scorpiontoxinsIkitoxinand

Birtoxindifferbyonlyoneresidueatposition23,whereIkitoxinhasglutamateinstead of glycine, resulting in Ikitoxin having 1000fold reduced activity than Birtoxin

(Inceoglu et al ., 2002). Pairwise alignment does not differentiate critical functional residues (e.g. an active site) from residues with no critical role (Hulo et al ., 2004).

Anotherapproachtotheinferenceoffunctioninvolvessearchingsequencesignature databasessuchasPROSITE(Hulo et al .,2004)orPRINTS(Attwood et al .,2003)for conserved motifs which usually relate to structural or functional meaning. These motifs were derived statistically from multiple sequence alignments of protein sequencesthatformafamily.Forexample,thescorpionshorttoxinssignature,Cx(3)

Cx(6,9)[GAS]KC[IMQT]x(3)CxC,hasbeendepositedinPROSITE(accession

ID:PS01138).However,themotifsthatwerederivedstatisticallyarenotnecessarily biologicallyrelevant.

Mutation studies of scorpion toxins (such as sitedirected mutagenesis and chemical modification) have identified critical residuesimportantforbothstructural andfunctionalproperties(e.g.Cohen et al .,2005;Legroset al .,2005).Themutation datawhichprovidebiologicallyrelevantinformationoncriticalresidueandposition

50 Chapter 4: Extraction of functional motifs in scorpion toxins are available in the literature and normally are not used for extraction of functional motifs in scorpion toxins. Additionally, analyses of 3D structures of toxins can aid identification of the motif where noncontiguous residues in the primary sequence clusterspatially.Herewith,asetofscorpiontoxinmotifsextractedfromanalysesof multiple sequence alignment of classified scorpion native toxin sequences, 3D structuresandinformationfrommutationstudieswasdescribed(Tan et al .,2006a).

ThisisthefirstreportofeightfunctionallyrelevantbindingmotifstoNa +and

K+ channels which can facilitate the determination of specificity of newly identified scorpion toxins to various ion channel subtypes. Also, these motifs can help in detectionofdistantrelationshipsbetweentoxinsequenceswhichmaybeoverlookedin pairwisealignmentanalysis.

4.1 MaterialsandMethods

LiteratureonmutationstudiesofscorpiontoxinsweresearchedinthePubMed database ( http://www.pubmed.gov/ ) using keywords such as ‘scorpion toxin’ and

‘mutation’. Data of scorpion mutant toxins and information of mutation on toxin function were extracted from the literature and deposited into the SCORPION2 database. The positions and residue identities of mutation were noted. Effects of mutationonbindingaffinitywerescaledtoacommonscaleforcomparison.Mutations thataffectedtoxinfunctionbymorethan10foldweredefinedasimportantresidues.

These residues were then compared to multiple sequence alignment of classified groupsofnativetoxins(discussedinSection3.3)toverifypossible conservationof residues.Thespatialorganisationoftheresiduesimportantforstructureandfunction wasdeterminedfrom3Dstructureanalysisandmappedontothenativetoxinprimary sequencesforextractionofmotifs.

51 Chapter 4: Extraction of functional motifs in scorpion toxins

4.1.1 Scalingofbindingaffinitiestoacommonscaleinmutanttoxindata

Bindingaffinitydatafrommutationstudieswereextractedfromtheliterature andscaledtoacommonscaleforcomparison(Equation1).OnlyK +andNa +binding affinitydataweremappedbecausenobindingaffinitydatawasavailableforCl toxins andonly asetof fivebindingaffinitymeasurements was available for Ca 2+ toxins.

Within K + and Na + groups, eight binding categories were defined for each group, namelyof1)Nonbinding2)Verylowbinding3)Low binding 4) Moderately low binding5)Moderatebinding6)Moderatelyhighbinding7)Highbinding,and8)Very highbinding.ThehighestbindingaffinitydataobservedforK +andNa +were0.08pM

(Koschak et al .,1998)and4pM(Hassani et al .,1999),respectively.Thelowestlimit forbothK +andNa +wassetat10,000nMbecauseanyconcentrationhigherthanthis wouldmeanthetoxinisanonbindertotheionchannel.

Equation 1 Formula for scaling binding affinities of independent experiments to a commonscale.log10 representsthecommonbase10logarithm, x isthevalueofthe bindingaffinitytobescaled.Thehighestbinding affinity values were observed for eachionchannelfromindependentbindingexperiments.Thelowestbindingaffinity foreachionchannelwassetat10,000nM. 1 1 log − log 10 x 10 LowestBindingAffinity Scaled X = 1+ *7 1 1 log − log 10 HighestBindingAffinity 10 LowestBindingAffinity

4.1.2 Dataanalysis

Mutations that affected structural folding, as detected by circular dichroism spectroscopy, were annotated as important to structural integrity. Mutations that affectedbindingaffinityandtoxicitybymorethan10and100foldascomparedto native toxins were termed as ‘influential’ and ‘critical’, respectively. Multiple

52 Chapter 4: Extraction of functional motifs in scorpion toxins sequencealignmentsof nativetoxinstargetingthe same ion channel subtype as the mutant toxins were performed using Clustal W (Thompson et al ., 1994). These

‘influential’ and ‘critical’ residues were then compared to the multiple sequence alignment of native toxins to verify possible conservation of residues. The spatial organisationoftheresiduesimportantforstructureandfunctionwasvisualisedusing

Molsoft ( http://www.molsoft.com/ ) and mapped onto the native toxin primary sequences for extraction of motifs. The motifs are represented as used in PROSITE database(Hulo et al .,2004).Totesttherelevanceofthesemotifs,eachwassearched inSwissProt(release48.0)andTrEMBL(release31.0)databasesusingScanProsite

(Gattiker et al .,2002)toscanforproteinsequencescontainingthemotif.

4.2 Resultsanddiscussion

InSCORPION2,atotalof426scorpionmutanttoxinrecordswereextracted from 72 literature sources on mutation studies of scorpion toxins (as of November

2005).Scalingbindingaffinitiesofscorpionmutanttoxinstoacommonlogarithmic scaleisusefulforcomparingeffectsofmutationonfunction.Anexampleisshownin

Figure 12 , where Agitoxin 2 targets Shaker K + channel at moderate range after mappingtothecommonscale(K d=0.741nM)(Ranganathan et al .,1996).Mutation study on Agitoxin 2 suggested that K27 and N30 are critical for binding affinity towardsthischannel(>600foldreducedbindingaffinityascomparedtothatofnative

Agitoxin 2), while S11, R24, F25, M29 and T36 influence the ligandchannel interaction(>10foldbut<100foldreduction),andT9,G10,R31,K32,H34andP37 canbesubstitutedwithoutsignificanteffectonbindingaffinity(<10foldreduction)

(Ranganathan et al .,1996).

53 Chapter 4: Extraction of functional motifs in scorpion toxins

Theauthorextractedeightnewmotifsfromanalysesofnativescorpiontoxin sequences, 3D structures and mutation studies of Na+ (four motifs) and K + (four motifs)toxins( Table6 ).Thesemotifsarehighlyspecifictoscorpiontoxinswhereall thesequencesreturnedbyScanProsite(Gattiker et al .,2002)wereofscorpionsi.e.no false positives were extracted. Further, these motifs can be used to search uncharacterisedscorpiontoxinsequencesforinferenceoftheirspecificitiestodifferent ionchannelsubtypes.

Figure 12 Scaling binding affinities of Agitoxin 2 and its mutant sequences for analysisofresiduesimportantforinteractionwithShakerK+channel.NativeAgitoxin 2bindsmoderatelytoShakerK +channel.Mutationsatpositions9,10,31,32,34and 37 did not affect binding affinity significantly (<7fold). However, mutations at positions11,24,25,29and36resultedinmoderatelylowaffinities.Further,mutations oflysineandasparaginetomethionineandalanineatpositions27and30respectively resultedinlowaffinitiestowardsthechannel.Thissuggeststhatlysineandasparagine atthesepositionsareimportantforbindingtoShakerK +channel.

54 Chapter 4: Analysis of scorpion toxin data

+ + + 2+ Table6MotifsofscorpiontoxinsextractedforNa ,K andCl channels.K V–voltagedependentK channels,BK Ca –largeconductanceCa 2+ + activatedchannel,SK Ca –smallconductanceCa activatedchannel,ERG–EtheragogoK channel.‘Hits’–numberofsequencesthatcontain thesubmittedmotifuponsearchinSwissProtandTrEMBLdatabasesusingScanProsite.‘Expt.’–experimentallydeterminedfunction.‘No info.’–Functionalinformationisunavailable.

Na + Hits Expt . Noinfo . α R-D-x-Y-I-x(4)-N-C-x-Y-x-C-x(5,7)-C-N-x-x-C-T-x-x-G-A-x(3,4)-Y-C-x(6)-G-N-x-C-x-C-x-x-L-P-x(4)-I-x(4,5)-[KR]-C-[HR] 22 4 15 αlike R-D-x-Y-I-A-x(3)-N-C-x(3)-C-x(3,6)-C-x-x-L-C-x(3)-G-x(3)-G-x-C-x(6)-G-x-x-C-W-C-x-x-L-P-x-x-V-x-I-x(3)-G-K-C-H 16 4 9 βexcitatory G-x(3)-D-x-x-G-K-x-x-E-C-x(4,9)-Y-C-x-x-E-C-x-K-V-x-Y-A-x-x-G-Y-C-C-x(3)-C-Y-C-x-G-L-x(16)-C 9 4 5 βmammal K-x-G-Y-x-V-x(4)-G-C-x(3)-C-x-x-L-G-x-N-x-x-C-x-x-E-C-x(9)-G-Y-C-Y-x-F-x-C[WY]-C-x-x-L-x(8)-L-x-x-K-x-C 25 9 14 K+ Kv C-x-x-[SP]-x(1,2)-C-[YWIDLG]-x-x-C-x(8,10) -K-C-[MI]-N-x-x-C-[KRH]-C 27 19 8

BK Ca C-x-x-[SP]-x(1,2)-C-[YWIDLG]-x-x-C-x(8,10) -K-C-[MI]-[NG]-x-x-C-[KRH]-C 28 7 19

SK Ca C-x-x-[RK]-[RM]-C-x(3)-C-[RK]-x(7) -C-x(4)-C-x-C 5 5 ERG C-x(3)-Y-x-C-x(3)-C-K-x-R-F-x-K-x(3)-R-C-x(4)-C-x-C 2 1 1 Cl C-x-P-C-F-T-x(8)-C-x(2)-C-C-x(5,7)-C-x(2,3)-Q-C-[LI]-C 14 14

55 Chapter 4: Analysis of scorpion toxin data

4.2.1 Chloridechannelmotif

InPROSITEdocumentofscorpionshorttoxinssignature (PDOC00875), Cl toxinswasannotatedashavingthepattern,Cx(3)Cx(6,9)[GAS]KC[IMQT]x(3)

CxC.However,themotifthattheauthorextractedbasedonconservationofresidues among 18 Cl toxins was CxPCFTx(8)Cx(2)CCx(5,7)Cx(2,3)QC[LI]C

(Figure13 ).TheconservedcysteinepatterninCl toxinsisdifferentfromthatreported in the PROSITE document. This conserved cysteine pattern is distinct from those observed in Na + and K +toxinsandcanbeusedtodifferentiateCl toxins from the othertoxins.

Cl toxinsadoptthecysteinestabilisedαhelix(CSH)foldcommonlyobserved inscorpiontoxinsandtheconservedresiduesareclusteredspatiallyattheβsheetsand loopsurfaces( Figure14 ).Sincethese18toxinsshared48–100%identityandtheir

3Dfoldingsareconserved,thisclustercouldformtheputativeinteractionsiteinCl toxins.Mutationperformedontheseconservedresidues would clarify their roles in functionandstructure.

56 Chapter 4: Analysis of scorpion toxin data

1102030 |||| D000104 MCMP CFTT DPNMA KK CRDCCGGNG K CFGPQC LCN R35 D000206 MCMP CFTT DHN MA KK CRDCCGGNG K CFGPQC LCN R35 D000207 MCMP CFTT DPNMA NKCRDCCGGG KK CFGPQC LCN R35 D000154CG PCFTT DANMA RK CRECCGG IGK CFGPQC LCN RI35 D000615CG PCFTT DANMA RK CRECCGGNG K CFGPQC LCN RE35 D000106 MCMP CFTT DHQ MA RK CDD CCGG KGRGKCYG PQC LCR36 D000237 MCMP CFTT DHQ MA RK CDD CCGG KGRGKCYG PQC LCRG37 D000205 MCMP CFTT DHQT ARR CRDCCGG RGRKCFGQC LCGY D36 D000110 MCMP CFTT RPDMA QQC RACC KGRGK CFGPQC LCGY D36 D000111CG PCFTT DPYT ESKCATCCGG RGK CVGPQC LCN RI35 D000171CG PCFTKDPETEKK CATCCGG IGR CFGPQC LCN RGY 36 D000238CG PCFTT DHQT EQKCAECCGG IGK CYG PQC LCRG34 D000239CG PCFTT DRQMEQKCAECCGG IGK CYG PQC LCRG34 D000240CG PCFTT DHQT EQKCAECCGG IGK CYG PQC LCN RG35 D000105 RCS PCFTT DQQ MTKK CY DCCGG KGKGKCYG PQC ICAP Y38 D000109 RCKPCFTT DPQMSKK CADCCGG KGKGKCYG PQC LC35 D000140 RCG PCFTT DPQTQ AKCS ECCG RK GGVCKGPQC ICG IQ37 D000629 RCPP CFTTN PNMEADCRK CCGG RGY CASYQC ICPGG 36 *****. * ** *.** :* Figure 13 Conserved residues of 18 Cl specific scorpion toxins are CxPCFT x(8)Cx(2)CCx(57)Cx(23)QC[LI]C.Thefirstcolumnrepresentstherecord accessionnumberfoundintheSCORPION2database;secondcolumnrepresentsthe toxin primary sequences; third column represents the peptide length. Multiple sequencealignmentwasgeneratedbyClustalW.

Figure14 ScorpiontoxinsspecificforCl channeladoptthecysteinestabilisedαhelix foldwheretheβsheetsandαhelixareconnectedbythreedisulfidebonds.Thefourth disulfide bond is formed between the βsheet and the loop. A) Superimposition of chlorotoxin (PDB: 1CHL, grey) and insectotoxin I5A (PDB: 1SIS, red) with root meansquaredistance(rmsd)=2.00Ådemonstratedthat3Dfoldingsareconserved. B) The conserved residues among 18 Cl toxins are P4, F6, T7, Q32 and L34 (numberedasinchlorotoxin)wheretheyareclusteredspatiallyattheβsheetsandthe precedingloop.

57 Chapter 4: Analysis of scorpion toxin data

4.2.2 Sodiumchannels–βexcitatorymotif

Inclusion of mutation studies in the analyses of conserved residues helps determinetheirimportanceforstructureandfunctionoftoxinpeptides.Forexample, in Na + βexcitatory toxins, the conserved residues are:

KK xGxxxDxxGKxxECx(4,9)YCxxxC TKVxYAxxGYCCxxxCYCxGLx DDK x(9) Kx xC D ( Figure 15 ). Mutation studies of BjxtrIT (Cohen et al ., 2004; Karbat et al .,

2004a)demonstratedthattoxicitytoinsectsandbindingaffinitytoinsectNa +channels ascomparedtothatofwildtypeBjxtrITweremildlyaffected(<7fold)forK1,K2,

T32,D54,D55,K56,K66andD70(showninbold).However,mutationsatD8,K12,

Y26,K33andV34stronglyaffectedbindingaffinity,rangingfrom12to>10,000fold reductioninbindingaffinity.Thus,differentconservedresiduesplaydifferentrolesin function and structure of scorpion toxins. Further, when negativecharged E15 was mutated to positivecharged arginine, toxicity to insects was severely affected

(>10,000fold)butnotbindingaffinity(<7fold).Thissuggeststhatnegativecharged

E15isinvolvedintoxicaction.

Another residue important for toxicity and binding affinity is glutamate at position 30 in the multiple sequence alignment of βexcitatory toxins (Figure 15 ).

Residues are not conserved at position 30 because glutamine, isoleucine and asparaginewerealsoobserved.MutationofE30toglutamine, leucine, arginine and even conservative substitution to aspartate caused reduction of more than 8fold in toxicityand43foldinbindingaffinity.Withoutmutantdata,thispositionandresidue identity(negativechargedandhydrophilic),whichisimportantforfunction,wouldbe overlooked.Thus,byincorporationofmutationinformation,thefunctionalmotifofβ excitatorytoxinstoinsectNa +channelscanbesummarisedasGx(3)DxxGKxx

ECx(4,9)YCxxECxKVxYAxxGYCCx(3)CYCxGLx(16)C

58 Chapter 4: Analysis of scorpion toxin data whereEisimportantfortoxicitytoinsects.These functional residues are clustered spatiallyatonesurfaceofBjxtrIT( Figure16 ).

1102030405060 ||||||| D000004 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000137 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000125 KK NGY AV DSSG KAP EC LL SNYCNNQCT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000005 KK NGY AV DSSG KAP EC LL SNYCYNECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000139 KK DGY AV DSSG KAP EC LL SNYCYN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000824 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLSDD KK VL E55 D000825 KK NGY AV DSSG KAP EC LL SNYCNN ECT KVHY ADKGYCC LLSCYC FGLSDD KK VL D55 D000006 KK NGY AV DSSG KAP EC LL SNYCYN ECT KVHY ADKGYCC LL SCYC VGLSDD KK VL E55 D000007 KK NGY AV DSSG KAP EC LL SNYCYN ECT KVHY AEKGYCC LL SCYC VGLSDD KK VL E55 D000233 KK NGY AV DSKGKAP EC FL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000234 KK NGY AV DSKGKAP EC FL SNYCNN ECT KVHY ADKGYCC LL SCYC FGLNDD KK VL E55 D000235 KK NGY AV DSKGKAP EC FF SNYCNN ECT KVHY AEKGYCC LL SCYC VGLNDD KK VM E55 D000236 KK NG FAV DSNG KAP EC FF DHYCNS ECT KVYY AEKGYCCT LSCYC VGLDDD KK VL D55 D000165 KK NGY AV DSSG KVA EC LF NNYCNN ECT KVYY ADKGYCC LL KCYC FGLA DD KPVL D55 D000003 KK NGY AV DSSG KVSEC LL NNYCNN ICT KVYY ATSGYCC LL SCYC FGLDDD KAVL K55 D000136 KK NGY AV DSSG KVSEC LL NNYCNN ICT KVYY ATSGYCC LL SCYC FGLDDD KAVL K55 D000184 KK NGY AV DSSG KVSEC LL NNYCN INCT KVYY ATSGYCC LL SCYC FGLDDD KAVL K55 D000001 KK NGY PL DRNG KTT ECSG VNAIAP HYCNS ECT KVYY AESGYCC WGACYC FGLEDD KPI GP60 D000002 KK NGY PL DRNG KTT ECSG VNAIAP HYCNS ECT KVYY AKSGYCC WGACYC FGLEDD KPI GP60 ** :*:.:*.**..** :: ** **** :** .**** ***.** *** : 70 | D000004 ISDTRK SYC DTT II N70 D000137 ISDTRK SYC DTT II N70 D000125 ISDTRK SYC DTT II N70 D000005 ISDTRK SYC DTPII N70 D000139 ISDTRK SYC DTPII N70 D000824 ISDTRKK YC DYT II N70 D000825 ISDTRKK YC DYT II N70 D000006 ISDARKK YC DFV TIN70 D000007 ISDARKK YC DFV TIN70 D000233 ISGTT KK YC DFTII N70 D000234 ISDTT KK YC DFTII N70 D000235 ISDTRKK ICDTT II N70 D000236 ISDTRKK LCDFTLF N70 D000165 IW DST KNYC DVQII DLS72 D000003 IKDATKSYC DVQIN69 D000136 IKDATKSYC DVQII G70 D000184 IKDATKSYC DVQII N70 D000001 MKDITKK YC DVQIIP S76 D000002 MKDITKK YC DVQIIP S76 : . *. ** Figure 15 Conserved residues of 19 Na + βexcitatory toxins are KKxGxxxDxxGKxxECx(4,9)YCxxxCTKVxYAxxGYCCxxxCYCxGLxDDKx(9)Kxx CDwherexrepresentsanyresidueandnumber(s)inparenthesisrepresentsthenumber ofinterveningresidues.MultiplesequencealignmentwasgeneratedbyClustalW.The first column represents the record accession number found in the SCORPION2 database; second column represents the toxin primary sequences; third column represents the peptide length. Mutation studies were performed on BjxtrIT (D000001).

59 Chapter 4: Analysis of scorpion toxin data

Figure 16 Functional motif of βexcitatory toxins represented by BjxtrIT (PDB: 1BCG). Only residues that affected binding affinity and toxicity determined in mutation studies are displayed: D8, K12, E15, Y26, E30, K33 and V34. These are clusteredspatiallyatonesurfaceofBjxtrIT.

4.2.3 Sodiumchannels–βmammalmotif

In βmammal specific toxins, the conserved residues in this group are

KxGYxVx(4)GC KxxCxxLGxNxxCxxECx(9)GYCYxFxCxCxxLx(7) PLxxKxC

(Figure17 ).MutationstudiesofCss4(Cohen et al .,2005)demonstratedthatmutation ofK13andP61(showninbold)didnotaffectbindingaffinity tomammalsodium channelssignificantly(≤2fold).Y4,W47andK63areinvolvedinstructuralintegrity

–theirmutationstoalaninedisruptedsecondarystructures.SubstitutionofN22,Y40,

Y42andF44toalanineresultedin600foldreductionofbindingaffinitywhileL19to alanineresultedin150foldreductioninbindingaffinitytomammalsodiumchannels.

Thissuggestsdifferentpositionsinthetoxinsequencearemoreimportantinbinding affinitythanothers.Informationsuchasthephysicochemicalpropertiesoffunctional residuescanalsobeobtainedfrommutationstudies.Forexample,astrongerreduction

60 Chapter 4: Analysis of scorpion toxin data onbindingaffinitywasobtaineduponchargeinversionofE28toarginine(900fold) thanchargeneutralisingsubstitutionstoalanine(600fold),glutamine(400fold)and leucine (50fold). The functional motif for βmammal specific toxins can be summarised as KxGYxVx(4)GCx(3)CxxLGxNxxCxxECx(9)G

YCYxFxC[WY]CxxLx(8)LxxKxC. The 3D structure of Css 4 was modeledwithtemplate1CN2wherethesequenceidentitybetweentargetandtemplate is83%.Thefunctionalresiduesareclusteredspatiallyatthesurface( Figure18 ).

1102030405060 ||||||| D000181 KEGY LV NSYTGC KFECFKLGDNDYC LRECRQQYG KGSGGYCY AF GC WCTH LYEQAVVWPL 60 D000182 KEGY LV NSYTGC KFECFKLGDNDYC KR ECKQQYG KSSGGYCY AF GC WCTH LYEQAVVWPL 60 D000176 KEGY LV NHSTGC KYECY KLGDNDYC LRECKQQYG KGAGGYCY AF GC WCTH LYEQAVVWPL 60 D000177 KEGY LV NHSTGC KYECFKLGDNDYC LRECKQQYG KGAGGYCY AF GC WCNH LYEQAVVWPL 60 D000057 KEGY IV NLSTGC KYECY KLGDNDYC LRECKQQYG KGAGGYCY AF GC WCTH LYEQAVVWPL 60 D000056 KEGY LV NHSTGC KYECFKLGDNDYC LRECRQQYG KGAGGYCY AF GC WCTH LYEQAVVWPL 60 D000041 KEGY LV DKNTGC KYECLKLGDNDYC LRECKQQYG KGAGGYCY AFA CWCTH LYEQAIVWPL 60 D000051 KEGY LV SKSTGC KYECLKLGDNDYC LRECKQQYG KSSGGYCY AFA CWCTH LYEQAVVWPL 60 D000054 KEGY LV ELGTGC KYECFKLGDNDYC LRECKARYG KGAGGYCY AF GC WCTQ LYEQAVVWPL 60 D000052 KEGY LV KK SDGC KYGC LKLGENEGC DTECKAKNQGGSYGYCY AFA CWCEGLP EST PTY PL 60 D000058 KEGY LV NKSTGC KYGC LKLGENEGCDKECKAKNQGGSYGYCY AFA CWCEGLP EST PTY PL 60 D000053 KEGY LV NKSTGC KYGC FWL GKNENC DKECKAKNQGGSYGYCYS FA CWCEGLP EST PTY PL 60 D000035 KDGY LV EKTGC KK TCY KLGENDFCN RECKWKHIGGSYGYCYG FGCYC EGLP DSTQT WPL 59 *:** :*. *** * **.* : * ** :: .: ****.*.* :* *: .: .:** D000181 PNKTCN 66 D000182 PNKTCN 66 D000176 PKK TCN 66 D000177 PKK TCN 66 D000057 PKK TCT 66 D000056 PNKTCS 66 D000041 PNKR CS 66 D000051 PNKTCN 66 D000054 KNKTC R66 D000052 PNKSC 65 D000058 PNKSCS 66 D000053 PNKSCS 66 D000035 PNKTC 64 : ** Figure17 Conservedresiduesof13experimentallydeterminedβmammaltoxinsare KxGYxVx(4)GCKxxCxxLGxNxxCxxECx(9)GYCYxFxCxCxxLx(7)PLxxKxC where x represents any residue and number(s) in parenthesis represents the number of interveningresidues.Multiplesequencealignmentwas generated by Clustal W. The first column represents the record accession number found in the SCORPION2 database; second column represents the toxin primary sequences; third column representsthepeptidelength.MutationstudieswereperformedonCss4(D000181).

61 Chapter 4: Analysis of scorpion toxin data

Figure18 ThefunctionalresiduesofCss4,aβmammaltoxin,areclusteredspatially atonesurfaceofthemolecule.Onlyresiduesthathadmutantinformationareshown. Thisisahomologymodelbuiltfromthetemplate,1CN2whichhavesequenceidentity of83%withCss4.

4.2.4 Sodiumchannels–αmotif

Inαtoxins,themotifextractedfrom14experimentallydeterminedαtoxinsis

RDxYIx(4)NCxYxCx(5,7)CNxxCTxxGAx(3,4)YCx(6)GNx

CxCxxLPx(4)Ix(4,5)[KR]C[HR].MutationstudiesofLqhαIT(Zilberberg et al .,1997;Karbat et al .,2004b)showedthatresiduesatpositions15,25,28,56and54

(asinLqhαIT)canbemutatedtoalaninewithoutsignificanteffectonbindingaffinity toinsectNa +channels(<2foldreduction),whichcorrelatedwellwiththeobservation thatthereisnoconservationofresiduesatthesepositions( Figure19 ).ConservedN44 isinvolvedinstructuralandfunctionalintegritybecausemutationtoalaninedisrupted thesecondarystructuresandreducedbindingaffinityby31fold( Figure20 ).Mutation ofI57toalanineandthreoninecausedmorethan92foldreductioninbindingaffinity.

Positivelychargedresiduesatposition62and64wereessentialtobindingaffinityas mutation to the neutral alanine resulted in more than 50fold reduction of peptide binding.

62 Chapter 4: Analysis of scorpion toxin data

1102030405060 ||||||| D000025 VRDAYIA KNYNC VYECFRD AYCN ELCT KNG ASSG YCQ WA GKYGN ACWCY ALP DNVPI 57 D000029 VRDAYIA KNYNC VYECFRD SYCN DLCT KNG ASSG YCQ WA GKYGN ACWCY ALP DNVPI 57 D000030 VRDAYIA QNYNC VYFCMKD DYCN DLCT KNG ASSG YCQ WA GKYGN ACWCY ALP DNVPI 57 D000040 VRDAYIA KPENC VYHC ATN EGCN KLCT DNG AESG YCQ WGG RYGN ACWCIKLP DRVPI 57 D000042 VRDAYIA KPENC VYECATN EYCN KLCT DNG AESG YCQ WV GRYGN ACXCIKLP DRVPI 57 D000037 VRDAYIA KPENC VYECG IT QDCN KLCT ENG AESG YCQ WGG KYGN ACWCIKLP DSVPI 57 D000038 VRDAYIA KPHNC VYECARN EYCN DLCT KNG AKSG YCQ WV GKYGNGC WCIELP DNVPI 57 D000634 VRDGY IALP HNC AYGC LNN EYCNN LCT KDGAKIGYCN IV GKYGN ACWCIQLP DNVPI 57 D000045 ARDAYIA KPHNC VYECYN PKGSYCN DLCT ENG AESG YCQ IL GKYGN ACWCIQLP DNVPI 59 D000028 GRDAYIA QPENC VYECAKN SYCN DLCT KNG AKSG YCQ WL GRWGN ACYC IDLP DKVPI 57 D000068 GRDAYIA QPENC VYECAQN SYCN DLCT KNG ATSG YCQ WL GKYGN ACWCKDLP DNVPI 57 D000031 GRDAYIA DSENCTYTC AL N PYCN DLCT KNG AKSG YCQ WA GRYGN ACWCIDLP DKVPI 57 D000047 GRDAYIA DSENCTY FCGSN PYCN DVCT ENG AKSG YCQ WA GRYGN ACYC IDLPA SERI57 D000027 ERDGY IV QLHNC VYHCG LN PYCNG LCT KNG ATSGSYCQ WM TKWGN ACYCY ALP DKVPI 58 **.**. **.* * ** : **. :** *** ::: **.* * ** * D000025 RVP GKCH RK 66 D000029 RVP GKCH 64 D000030 RIP GKCHS 65 D000040 RVP GKCH R65 D000042 RVW GKCHG 65 D000037 RVP GKCQ R65 D000038 RVP GKCH R65 D000634 RVP GRCH PA 66 D000045 RIP GKCH 66 D000028 RIEGKCH F65 D000068 RIP GKCH F65 D000031 RISGSC R64 D000047 KEPGKCG 64 D000027 KWL DPKCY 66 : * Figure 19 Conserved residues of 14 experimentally determined αtoxins are RDxYIx(4)NCxYxCx(5,7)CNxxCTxxGAxxGYCx(6)GNxCxCxxLPx(4)Ix(5,6)C wherexrepresentsanyresidueandnumber(s)inparenthesisrepresentsthenumberof interveningresidues.Multiplesequencealignmentwas generated by Clustal W. The first column represents the record accession number found in the SCORPION2 database; second column represents the toxin primary sequences; third column represents the peptide length. Mutation studies were performed on Lqh αIT (D000025).

Figure 20 FunctionalandstructuralresiduesofLqhαIT(PDB: 1LQH), an αtoxin. N45isinvolvedinstructuralandfunctionalintegrity.I58,K63andR65areimportant forbindingaffinitytowardsinsectNa +channel.

63 Chapter 4: Analysis of scorpion toxin data

4.2.5 Sodiumchannels–αlikemotif

ThemotifforαliketoxinsisRDxYIAx(3)NCx(3)Cx(3,6)CxxLC x(3)Gx(3)GxCx(6)GxxCWCxxLPxxVxIx(3)GKCH. The site directedmutagenesisofBmKM1(Sun et al .,2003;Wang et al .,2003)highlightedthe importance of four residues important for function and structure in αlike toxins

(Figure21 ).MutationsatY5andN11disruptedthesecondarystructuresasmeasured by circular dichroism spectroscopy, suggesting their role in maintaining structural integrity.ThisiscorroboratedbythecrystalstructureofBmKM1whereN11forms hydrogen bonds with residues 58 and 59 (He et al ., 1999). Positive charged at positions 62 and 64 is important for binding affinity to insect Na + channel where negativecharged residues (D and E) resulted in more than 167fold reduction.

MutationofconservedP9didnotsignificantlyaffectfunction(twofoldreductionin toxicityandbindingaffinity)( Figure22 ).

Figure 21 Functional and structural residues of BmK M1 (PDB: 1DJT), an αlike toxin as determined by mutation studies. Y5 and N11 are involved in structural integrity where N11 forms hydrogen bonds with R58 and V59. K62 and H64 are importantforbindingaffinitytowardsinsectNa +channel.

64 Chapter 4: Analysis of scorpion toxin data

1102030405060 ||||||| D000621 VRDGY IA QPENC VYHC IP DCDTLCKDNGGTGGHCG FKLGHG IA CWCN ALP DNVGIIV 57 D000622 VRDGY IA KPENC AHHC FP GSSGC DTLCKENGGTGGHCG FKVGHGT ACWCN ALP DKVGIIV 60 D000073 VRDGY IA QPENC VYHC FP GSSGC DTLCKEKGGTSGHCG FKVGHG LA CWCN ALP DNVGIIV 60 D000077 GRDGY IA QPENC VYHC FP GSSGC DTLCKEKGATSGHCG FLP GSG VA CWCDNLP NKVPIVV 60 D000038 VRDAYIA KPHNC VYECARNEYCN DLCT KNG AKSGYCQ WV GKYGNGC WCIELP DNVPI RV59 D000143 GRDAYIA QPENC VYECAKNSYCN DLCT KNG AKSGYCQ WL GKYGN ACWCED LP DNVPI RI59 D000634 VRDGY IALP HNC AYGC LNNEYCNN LCT KDGAKIGYCN IV GKYGN ACWCIQLP DNVPI RV59 D000126 VRDAYIA KPENC VYHC AGNEGCN KLCT DNG AESGYCQ WGG RYGN ACWCIKLP DD VPI RV59 **.*** *.**. : * *: **...*. *:* *.*** ** :.* *: D000621 DGVKCH K64 D000622 DGVKCH 66 D000073 EGEKCHS 67 D000077 GG EKCH 66 D000038 PGKCH R65 D000143 PGKCH F65 D000634 PGRCH PA 66 D000126 PGKCH R65 *: ** Figure 22 Conserved residues of eight experimentally determined αlike toxins are RDxYIAxPxNCxxxCx(3,5,6)CxxLCxxxGxxxGxCx(6)GxxCWCxxLPxxVxIx(3)GKC H,wherexrepresentsanyresidueandthenumberinparenthesisrepresentsthenumber ofinterveningresidues.MultiplesequencealignmentwasgeneratedbyClustalW.The first column represents the record accession number found in the SCORPION2 database; second column represents the toxin primary sequences; third column represents the peptide length. Mutation studies were performed on BmK M1 (D000038).

From the analysis of 222 aligned Na + native toxin sequences, a conserved tyrosine was observed at the Nterminal (data not shown) where mutation studies suggestitsinvolvementinmaintainingthestructuralfoldsofthetoxins(Wang et al .,

2003; Cohen et al .,2005).Also, atyrosineisconserved atoneposition before the fourth cysteine in excitatory toxins and the fifth cysteine in the rest of Na + toxins whichisinvolvedinstructureandfunctionofNa +toxins(Sun et al .,2003).Inbinding studies, the α and αlike toxins compete to bind at receptor site 3 of insect Na + channels(Gilles et al .,1999),whichcouldexplainthesimilarityofpositiveresidues observedinthesetoxinsattheCterminalthatareinvolvedinbindingaffinity.Inβ toxins, the presence of glutamate before the third cysteine in excitatory toxins and fourthcysteineintherestisconservedwhichisimportantforbindingaffinity(Cohen et al .,2005).Thedifferenceobservedbetween bothα(include αlike toxins) andβ

65 Chapter 4: Analysis of scorpion toxin data toxins is that α toxins have an asparagine before the first cysteine but glycine is observed at this position instead in β toxins. Mutation in BmK M1 suggested asparagine was involved in folding (Wang et al ., 2003). There is a need for more mutation studies performed on other Na + toxins and other positions in the toxin sequencestodeterminetheirstructurefunctionrelationships.

4.2.6 Potassiumchannelsubtype–EtheragogorelatedK +channelmotif

ScorpiontoxinstargetvariousK +channelsubtypeswithdifferentspecificities.

InetheragogorelatedgeneK+(ERG)channelsubtype, residuesthatareimportant for binding of BeKm1 to the ERG channel had been identified by mutagenesis

(Korolkova et al .,2002).Y11,K18,R20andK23werecriticaltobindingwhileF21 andR27wereinvolvedinstructuralfolding.Thefunctionalresidueswerelocatedon theαhelixandthefollowingloop( Figure23 ).TherecognitionmotifforERGsubtype canbeexpressedasCx(3)YxCx(3)CKxRFxKx(3)RCx(4)CxC.

1102030 |||| RPTDIK CSES YQCFPV CK SRF GKTNG RC VNGF CDCF Figure23MutagenesisstudyofBeKm1identifiedY11,K18,R20andK23(inbold) wereimportantforbindingtohumanERGchannel.F21andR27wereinvolvedin structuralfolding.

66 Chapter 4: Analysis of scorpion toxin data

4.2.7 Potassiumchannelsubtype–SmallconductanceCa 2+ activatedK +channel

motif

TheERGfunctionalsurfaceissimilarlysharedbyscorpiontoxinsthattarget

2+ + smallconductanceCa activatedK channels(SK Ca ).Mutagenesisofleiurotoxinand

P05identifiedtwopositions,6and7,asimportantforbindingtothissubtype(Sabatier et al ., 1993; Sabatier et al ., 1994;Shakkottai et al ., 2001;Wu et al ., 2002) (Figure

24A ).However,theTsκwhichalsotargetsthissubtype has two known functional residues, R6 and R9, located outside the αhelix structure (Lecomte et al ., 1999)

(Figure 24B ). Though the positions of the functional residues are not spatially conserved,theresidueidentityrequiredtobindtoSK Ca waspositivechargedarginine wheremutationtohydrophobicleucine and evenconservative substitution to lysine affectedbindingaffinity.

A) B)

Figure 24 Functionalresiduesofscorpiontoxinstargetingsmall conductance Ca 2+ activatedK +channels. A) Superimpositionofleiurotoxin(PDB:1SCY,red)andP05 (PDB:1PNH,grey)withrmsd=1.57Å.Thetwofunctionalresidues,Rand(R/M) extendfromtheαhelix. B) Thetwofunctionalresidues,R6andR9inTsκ(PDB: 1TSK)werelocatedoutsidetheαhelicalstructure.

67 Chapter 4: Analysis of scorpion toxin data

4.2.8 Potassium channel subtypes – Large conductance Ca 2+ activated K +

channelandvoltagedependentK +channelmotifs

+ ForscorpiontoxinsthatbindtovoltagedependentK channels(KV),Goldstein andMiller(1993)haddemonstratedthatlysineatposition27incharybdotoxinfrom

Leiurus quinquestriatus hebraeus physically occlude the pore of this subtype, thus preventingtheflowofK +ions.Thisfunctionalresidue,lysineandanaromaticresidue

(tyrosineorphenylalanine)separatedby6.6 ±1.0Åformedthefunctionaldyadthat targetKVchannels(Dauplais et al .,1997).Inaddition,mutagenesisofcharybdotoxin

2+ on KV channels and large conductance Ca activated channels (BK Ca ) highlighted severalpositionswhichareimportantforbinding(ParkandMiller,1992;Goldstein andMiller,1993;Stampe et al .,1994).Theseincludepositions10,14,25,29and34 wheremutationsledtodrasticreductioninbindingaffinity.Thefunctionalresiduesfor

KV andBKCa channelsarelocatedontheβsheetsincontrasttothatforERGandSKCa channelswhicharelocatedattheαhelix.Theseresiduesalsofallwithinthefunctional dyad radius of 6.6 ± 1.0 Å, which may explain their involvement in toxinchannel interaction due to their close proximity to the functional dyad ( Figure 25 ). Only a singleresiduedifferenceinthefunctionalsurfacedistinguishesKVandBKCa channels where asparagine is specific to KV channels. This was determined from analyses of multiplesequencealignmentofscorpiontoxinsthattargetdifferentchannelsubtypes and mutant data ( Figure 26 ). Iberiotoxin that is highly specific for BKCa channel

(Galvez et al .,1990)hasglycineinsteadofasparagineatposition30.P05targetsSK Ca channel only and mutation at positions 2224 from IGD to MNG resulted in recognitionofKVchannels(Wu et al .,2002).ThisiscorroboratedbySchroeder et al .

(2002)wheretheymutatedglycinetoasparagineatposition30iniberiotoxin,which causedthemutanttoxintotargetbothK +channelsubtypes.

68 Chapter 4: Analysis of scorpion toxin data

Figure 25 Functional residues of charybdotoxin (PDB: 2CRD) determined by mutagenesis studies, which are important for binding to voltagedependent K + and largeconductanceCa 2+ activatedK +channels.Mostofthefunctionalresidues(R25, K27,M29,R34andY36),exceptS10andW14arelocatedontheflatsurfaceoftheβ sheets.S10andW14resideattheαhelix.SpatialdistancesbetweenCαatomsofthe functionalresidueswiththatofcriticalK27demonstratedthattheyarewithin6.6 ±1.0 Å.Thismayexplaintheirinvolvementininteractionduetotheircloseproximitytothe functionaldyad.

Toxin Sequence Channel Ref.

CharybdotoxinQFTN VSCTTS KECWSVCQ RLHNT SRGKCMNKK CRCYS BK Ca,K V1,2 IberiotoxinQFTDVDCS VSKECWSVCKDLF GVDRGKCMGKK CRCYQ BK Ca3 Pi4 IEAI RCGGS RDCY RPCQ KR TGC PNAKCINKTC KCYGCS Kv4 Pi2 TISCTN PKQCY PHC KK ETGY PNAKCMNRK CKCFGRKv5 PbTx3 EVDMRCKSS KECLV KCKQATG RPNG KCMNRK CKCY PRKv6 HgTx1TVI DVKCTS PKQC LPP CKAQFGIRAGAKCMNG KCKCY PHKv7 AgTx1 GVPI NVKCTGS PQC LKPCKDAGMR FGKCING KCHCT PKKv8 P05 TVCN LRR CQ LSC RSLGL LGKCIGVKCECVKHSK Ca9 * * *: .** :..*.* Figure26Multiplesequencealignmentofrepresentativescorpiontoxinswhichtarget voltagedependent K + (Kv), large and small conductance Ca 2+ activated K + channels (BK Ca, SK Ca). The box highlights the position and residue identity involved in differentiatingK VandBK Ca.References:1(Vazquez et al .,1989),2(Deutsch et al ., 1991),3(Galvez et al .,1990),4(OlamendiPortugal et al .,1998),5(Rogowski et al ., 1996),6(HuysandTytgat,2003),7(Koschak et al .,1998),8(Garcia et al .,1994),9 (Wu et al .,2002).

69 Chapter 4: Analysis of scorpion toxin data

Scorpion toxins use different residues for targeting various K + channel subtypes. The functional dyad of lysine and an aromatic residue residing at the β sheets interact with voltagedependent K + channels and large conductance Ca 2+ activated channels. Toxins which target small conductance Ca 2+ activated channels andERGchannelshavefunctionalresidueslocatedattheαhelix.

4.3 Conclusion

Giventhelargediversityofionchannels(ZuoandJi,2004;KornandTrapani,

2005), a number of different binding motifs in scorpion toxins can be defined.

Dauplais et al .(1997)reportedaconservationofafunctionaldyadmotifinK+toxins withunrelatedstructureswhilenonehasbeenreportedforNa +,Cl andCa 2+ toxins.

Thisisthefirstreportofbindingmotifsforfour K +ionchannelsubtypes(voltage dependent K + channels, large and smallconductance Ca 2+activated channels, and etheragogo channel), four binding site motifs for Na + channels and a conserved motifforCl channels.

The motifs reported here included information from mutation studies of scorpiontoxinsand3DstructureanalysesexceptforCl channelforwhichmutation studyisnotavailable.Themotifsreportedinthisworkhavebiologicalandfunctional relevance which complement motifs obtained statistically. Mutation studies help determine if conserved residues within groups of scorpion toxins are important for integrityofmolecularstructureandforfunction.Themutantdataprovideinformation onresiduesandpositionswhichareimportantforstructureandfunction,andthosethat arenot.Further,criticalinformationonphysicochemicalpropertiesofkeyresiduescan be obtained from mutant data where there may lack conservationofresiduesatkey positions in multiple sequence alignments. This is important because even semi

70 Chapter 4: Analysis of scorpion toxin data conservative substitution of residues can affect activity in scorpion toxins where a substitutionfromargininetolysineresultedina80folddecreasedactivityinLqhαIT

(Karbat et al .,2004b).Scalingtheeffectsofmutationonbindingaffinitytoacommon scale provides a semiquantitative comparison to differentiate critical residues from residuesthatplaynoroleinbindingaffinity.

One major goal of analysing 3D protein structure is to understand the relationship between the primary and tertiary structures. If this relationship were known,thenthestructureandfunctionofaproteincouldbereliablypredictedfromits amino acid sequence. In fact, Mouhat et al . (2004a) proposed that the spatial distributionofkeyfunctionalresiduesismoreimportantthanitsfoldwhenconsidering the interaction of a toxin with its ion channel target. Spatial proximity to the key functional residues is likely to influence interaction, especially in toxinchannel complexeswhichinvolvesacombinationofelectrostatic,hydrophobicandhydrogen bondinginteractions(Xu et al .,2003;Mouhat et al .,2004b).3Dstructureanalysesaid identificationofmotifswherenoncontiguousresiduesareclusteredspatiallyandhelp extraction of functional motifs where their spatial positions are mapped onto the primary sequence. These linearised motifs can then be used to compare novel sequencesforpredictionoffunction.Theinformationofspatialproximityonresidues neartocriticalfunctionalresiduefacilitatesthedesignofmutationstudies.

Thissystematicapproachofincludingmutantdataofscorpiontoxinsand3D structure analyses for extraction of motifs can serve as a model for other proteins wheremutationstudiesand3Dstructuresareavailable.Thisapproach complements motifs that are derived statistically by including biological information for a more accurateinferenceoffunctionfornewlyidentifiedscorpiontoxinsequences.

71 Chapter 4: Analysis of scorpion toxin data

Chaptersummary

• Twomainapproachesareavailableforfunctionalinferenceofnewlyidentified

scorpiontoxins.Firstistheidentificationofsimilarcharacterisedsequencesby

pairwise alignment tools such as BLAST or FASTA. However, pairwise

alignment is unable to differentiate functional residues from those with no

critical role. Second, function can be inferred from matches to patterns in

signature databases such as PROSITE. The patterns are usually derived

statisticallybutarenotnecessarilybiologicallyrelevant.

• However,biologicallyrelevantinformationoncriticalresidueandpositionis

availableinmutationstudiesbutisusuallynotusedforextractionoffunctional

motifsinscorpiontoxins.

• ThisisthefirstreportoneightfunctionallyrelevantbindingmotifstoNa +and

K+ channels which were extracted from the approach of analysing multiple

sequencealignmentsofnativescorpiontoxins,3Dstructuresandinformation

frommutationstudies.Themotifsreportedinthis work have biological and

functional relevance. Multiple sequence alignment of native toxins targeting

thesameionchannelsubtypeallowsconservedresiduestobedetermined.3D

structureanalysesaididentificationofmotifswhere noncontiguous residues

are clustered spatially. Mutation studies provide information on residues,

positions and the physicochemical properties of key residues important for

structureandfunction.

• This systematic approach can serve as a model for other proteins where

mutationstudiesand3Dstructuresareavailable.

72 Chapter 5: Functional prediction of scorpion toxin data

PartII:Chapter5 Functional prediction of bioactive toxinsinscorpionvenom ‘Onceanewtechnologyrollsoveryou,if youarenotpartofthesteamroller,you're partoftheroad.’ StewartBrand

73 Chapter 5: Functional prediction of scorpion toxin data

Scorpion toxins are important experimental tools for studies of biochemical and pharmacological properties of ion channels which have significance in the development of novel therapeutics for ion channel diseases (Mouhat et al ., 2005;

Devauxet al .,2004;Fuller et al .,2004).Identificationofnewtoxinsequencesfrom scorpionvenomsisthuscriticalforscientificresearchandmedicalapplications.The number of functionally characterised scorpion toxins is steadily growing, but the numberofnewlyidentifiedtoxinsequencesisincreasingatmuchfasterpace.These newlyidentifiedsequencesusuallyhaveprimarystructureinformationwithlimitedor nofunctionalfeaturecharacterised.Withanestimated100,000differentvariants,there is a pressing need to accurately predict functions of these sequences where bioinformatic analysis of scorpion toxins is becoming a necessary tool for their systematicfunctionalanalysis.

Here, the author reports a bioinformaticsdriven approach involving scorpion toxin structural classification, functional annotation, sequence comparison, nearest neighbouranalysisanddecisionrules,whichproduceshighlyaccuratepredictionsof scorpion toxin functional properties. The methodology reported is of importance becauseitprovidesabioinformaticsbasisforsystematicfunctionalanalysisoflarge sets of toxin sequences. The prediction tool, termed Annotate Scorpion , is the first noveltoolinthefieldofscorpiontoxinresearchwhichpredictsfunctionalproperties ofuncharacterisedscorpiontoxins.Itfacilitatesselectionofcriticalexperimentswhere comprehensive characterisation of even a single toxin involves valuable time and resources.

74 Chapter 5: Functional prediction of scorpion toxin data

5.1 Prediction of functional properties of novel scorpion toxins by nearest

neighbouranalysis,sequencecomparisonanddecisionrules

Theclassifiedgroupsandsubgroupsofscorpiontoxins,proposedinChapter3, were used as a basis for the development of a bioinformaticsdriven approach for predictionoffunctionalpropertiesofscorpiontoxinsincludingionchannelspecificity

(Na +,K +,Ca 2+ orCl ),toxinsubfamily(α,αlike,β,K +,Ca 2+ orCl ),toxinpotency

(neurotoxicornontoxic),andtargetcellspecificity(insect,crustacean,ormammal specific). This new hierarchical classification scheme, underpinned by sequence, structuralandfunctionalsimilarityofsequenceshelpsunderstandingoftheirstructural and functional properties. This method termed “ Annotate Scorpion ” combines sequence comparison, nearest neighbour analysis anddecisionrules.Thismoduleis datadrivenandthusstatisticalinitsnature.Testingofthe Annotate Scorpion showed itcouldpredictfunctionalpropertiesofscorpiontoxinswithhighaccuracy.Herethe author describes the method and the algorithm, and presents the results of the assessmentofpredictionaccuracy.

The scorpion toxin functional prediction module, Annotate Scorpion , automaticallygeneratesputativefunctionalannotationforthequerytoxinandplaces querysequenceintoanappropriatestructuralgroup.Itcombinessequencecomparison, nearest neighbour analysis and decision rules to assign a putative membership of a query sequence to a functional or structural group. This module provides multiple sequence alignmentofthetestsequencealongwiththenearestneighboursequences availableinthedatabase.Highaccuracypredictionsoftargetreceptor(91.5%),toxin action (83.3%) and toxin type (68.9%) were achieved on a test set of 52 newly characterised scorpion toxins (Tan et al ., 2005). Because it uses nearest neighbour analysis,the Annotate Scorpion toolhashighspecificity.Similarbioinformaticbased

75 Chapter 5: Functional prediction of scorpion toxin data approachofpredictingstructureandspecificfunctioncanbeappliedtoothertoxinsas demonstrated by the Analysis module in the MOLLUSK database

(http://research.i2r.astar.edu.sg/MOLLUSK/ ).

5.2 MaterialsandMethods

Twoseparatetestingswereperformedontheaccuracyofthemodulewherethe twotestsetsconsistedof52and127toxinsequencescollectedfrompublicdatabases and literature. The two separate test sets, containing newly characterised scorpion toxins, were used for evaluating the prediction accuracy of Annotate Scorpion. The primarytoxinsetwasanalysedandclassifiedintogroupsbasedonprimarysequence similarityandionchannelspecificity(asdiscussedinChapter3).Thesegroupswere used as comparison targets by the “ Annotate Scorpion ” method for prediction of anonymousproteinsequences.

5.2.1 Scorpiontoxindata

The initial primary data set of 220 sequences was extracted from the

SCORPIONdatabase.Ofthese,196completesequences representing mature toxins were compared and used for defining toxin groups (see Section 3.3.1), while 24 partiallysequencedtoxinswerenotanalysed,butwerelateraddedtothegroups.The firsttestsetof52sequenceswasobtainedfromthesecondarycollectionofscorpion toxinsequenceswhere18werefunctionallycharacterisedfortheirreceptorandtarget cell specificities, toxin action and toxin type by experimentation, whereas 34 sequenceswerepartiallycharacterised.

76 Chapter 5: Functional prediction of scorpion toxin data

Upon prediction of their functional properties, these 52 sequences were subsequently added to the initial primary data set of 220 sequences, totaling 272 sequences.Thesecondprimarydatasetconsistedoftheenlargedprimarydatasetof

272sequencesand426scorpionmutanttoxindataforpredictionoffunctionalfeatures in the second test set. The second test set of 127 sequences was obtained from a separate collection where 18 were functionally characterised for their receptor and targetcellspecificities,andtoxinactionandtoxintypebyexperimentation.

5.2.2Algorithm–nearestneighbourandrulebased

The aim of Annotate Scorpion prediction module is to generate accurate functionalannotationsforaqueryscorpiontoxin.Thepredictedfunctionalproperties are toxintype,toxinaction,ionchannelspecificity,andcellulartargetspecificity.This modulecombinessequencecomparison,nearestneighbouranalysisanddecisionrules for assigning a putative classification of a query sequence to a structurefunction group.Thelogicinvolvedperformingasimilaritysearchofaquerysequenceagainst theprimarydatasetintheSCORPION2databasebyBLASTprogram.Alargechange in score values separates sequences dissimilar to the query from similar sequences.

Thispermitsgroupingofthequerytosimilarsequences whereby nearest neighbour analysis will assign the functional and structural properties of the sequences in the grouptothequery.Thenearestneighbourrulestatesthatatestinstanceisclassified based on the classification of nearest training instances (Cover and Hart, 1967;

Dasarathy,1991).Thealgorithmuses10rulesforgroupingandpredictionofspecific functionofscorpiontoxins.

1) Thelengthofquerysequenceismaximum200aminoacids.

2) Ifthebitscoreofthenearestneighbouris<20,thenNOTSCORPIONTOXIN.

77 Chapter 5: Functional prediction of scorpion toxin data

3) Ifthequerysequenceis100%identicaltoanexistingsequenceintheprimary

dataset,thanIDENTICALtoknowntoxin.

4) Ifthequerysequenceis100%identicaltoaportionofanexistingsequencein

theprimarydataset,thenPARTIALSEQUENCE.

5) Iftheidentitytothenearestneighbouris<50%,thenNEWGROUP.

6) Ifthedifferenceofbitscoresbetweentwoconsecutiveneighboursis≥30,this

servesasacutoffpoint.Thenumberofnearestneighbours,N=5.

7) IftheNneighboursbelongtosamesubgroup,thenEXISTINGSUBGROUP.

8) IftheNneighboursequencesbelongtothesamegroupbutarefromdifferent

subgroups,thenNEWSUBGROUP.

9) IftheNneighboursbelongtodifferentgroups,thenNEWGROUP.

10)IfthegroupwhichthenearestneighbourbelongstoconsistsofMsequencesand

M<N,thenonlythetopMneighboursareconsideredinrules7,8and9.

Thethresholdof<20bitscorewasselectedfordifferentiatingsequencesthat arenotscorpiontoxinsbecausetherangeofsequenceidentitiesbetween393native scorpiontoxinswas30–100%.Anewgroupisproposed for a query if no nearest neighbourisfound(<50%identity)orthenearestneighbours(>50%identity)belong todifferentgroups.Anewsubgroupisproposedifthenearestneighboursareinthe same group but belong to different subgroups. For example, submission of TbIT1

(Pimenta et al .,2001)into Annotate Scorpion returnedthetopfivenearestneighbours of >50% identity where Tst1, Tb1 and Ts1 (SCORPION2 ID: D000008, D000009,

D000012)belongtoNa +subgroup2awhileTsNTXPandTs4(D000162,D000010) belongtoNa +subgroup2c.AnewsubgroupwasproposedforTbIT1.

78 Chapter 5: Functional prediction of scorpion toxin data

5.3 Results–Accuratepredictionoffunctionalproperties of novel scorpion

toxins

Thefunctionalpropertiespredictedbythe Annotate Scorpion modulearethe toxinsubfamily,toxinaction,ionchannelspecificityandtargetcelltype.Theaccuracy of Annotate Scorpion module upon submission of the two test sets has been summarised in Figure 27 where this work’s predictions were compared with the featuresoffunctionally characterisedscorpiontoxinsequences.Forthefirsttestset,

91%,69%,83% and44%werecorrectlypredictedforionchannelspecificity,toxin subfamily,toxinactionandtargetcelltype,respectively.Inthesecondtestset,99%,

83%,50%and40%correctpredictionswereobtainedforionchannelspecificity,toxin subfamily,toxinactionandtargetcelltype,respectively.

Inthefirsttestset,onlytwosequencescouldnotbeannotated( Table7 ).No similarsequenceswerefoundfor κhefutoxins1and2,thus Annotate Scorpion could assign neither the group, nor functional features to them. These two sequences constitute a new scorpion toxin fold as determined by NMR which consists of two parallelheliceslinkedbytwodisulfidebridgeswithoutanyβsheets(Srinivasan et al .,

2002b).12sequenceswerepredictedasmembersofnewgroups,ofwhichnine(75%) were correctly predicted as members of new groups, and three (25%) belong to existinggroups( Be Km1,BKTx,and Bm Tx3belongtoK +group1,Na +group7,and

K+group4).Newsubgroupswerepredictedfor sevensequences.Theremaining31 toxinsequenceswereclassifiedintothewelldefinedgroups.

Of52sequences,47hadknownionchannelspecificity,18forcellulartarget specificityandtoxinaction,and45fortoxintype.Ionchannelspecificitywascorrectly predictedfor43(91.5%)toxins.Tamulustoxin1and2 werewronglyclassifiedasNa + toxinsinsteadofK +toxins(4.25%misclassification).

79 Chapter 5: Functional prediction of scorpion toxin data

Accuracy of functional prediction for the first test set of 52 A) sequences

100% 91% 90% 83% Correct 80% 69% 70% Partial 60% 50% 50% 44% 40% Wrong

30% 22% 17% 20% Not 10% 4%4% 4% 4% 6% 0% 0% 0% 0% predicted 0% Ion channel Tx-type Activity Species target

Accuracy of functional prediction for the second test set of 127 B) sequences 99% 100% Correct 90% 83% 80% 70% Partial 60% 50% 50% 40% 40% 40% Wrong 30% 23% 17% 20% 14% 13% 10% Not 7% 10% 3% 0% 1% 0% 0% predicted 0% Ion channel Tx-type Activity Species target

Figure27 Accuracyoffunctionalpredictionof Annotate Scorpion moduledetermined usingtwoseparatetestsets.Thefunctionalpropertiespredictedincludeionchannel specificity,toxintype,toxinactionandcellularspeciestarget.Thiswork’spredictions ofthesepropertieswerecomparedwithfeaturesoffunctionallycharacterisedscorpion toxin sequences. ‘Correct’ represents the prediction agrees with experimentally characterisedfeature.‘Partial’representsthepredictionsuggestsatleasttwoproperties ofwhichoneagreeswithexperimentallycharacterisedfeature.‘Wrong’representsthe prediction disagrees with characterised feature and ‘Not predicted’ represents no features predicted. A) Accuracy of prediction for first test set of 52 sequences. B) Accuracyofpredictionforsecondtestsetof127sequences.

80 Chapter 5: Functional prediction of scorpion toxin data

Table7Functionalpropertiespredictedforthefirsttestsetof52newtoxinsequences. Abbreviationscorrespondtothescorpionspecies: Aah , Androctonus australis Hector; Be , Buthus eupeus ; Bm , Buthus martensii ; Bom , Buthus occitanus mardochei ; Bs , Buthus sindicus ; Bt , Buthus tamulus ; Cg , Centruroides gracilis ; Cll , Centruroides limpidus limpidus ; Hf , Heterometrus fulvipes ; Lqh , Leiurus quinquestriatus hebreaus ; Tb , Tityus bahiensis ; Tst , Tityus stigmurus ; Os , Orthochirus scrobiculosus . The Annotate Scorpion moduleassignedeachquerysequencetoagroup,andpredictedits ionchanneltype,toxintype,toxinactionandspeciesspecificity.TheGroupindicates grouporsubgroupthatcontainstoxinsequencessimilartothequery.Anewsubgroup isdenotedbythegroupnumberandanasterisk.Ifnonearestneighbourisfound,the queryisassignedasa‘New’group.Toxintypedescribesthesubfamilyofthequery sequence,whereNa +toxinsareclassifiedintoalpha( α),alphalike( α')andbeta( β) subfamilies and K +, Ca 2+ and Cl toxinshavebeenclassifiedintosinglesubfamilies each. Toxin action describes the nature of the toxin: T, neurotoxic; N, nontoxic. Abbreviationscorrespondtospeciesspecificity:M,mammal;I,insect;C,crustacean; Pu: putative annotation. Ex: experimentally determined function. Ref.: functional annotationinSwissProt.Dashes()representnoexperimentalcharacterisationorno functionalannotationinreferences.SwissProt( SP )accessionnumbersrefertodirect submission of toxin sequences. Question marks (?) represent sequences not functionallyannotatedby Annotate Scorpion .

81 Chapter 5: Functional prediction of scorpion toxin data

Toxinname Group Ion Toxin Toxin Species Ref. Channel Type Action Specificity Pu Ex/Ref. Pu Ex/Ref. Pu Ex Pu Ex Aah (Toxin1) New Na + α, α' T I,M 1 Aah (Toxin2) New Na + α, α' T I,M 1 Aah (Toxin3) New Na + α, α' T I,M 1 Aah (Toxin4) 3 Na + α, α' T I,M,C 1 Aah (Toxin5) 6 Na + α T I,M 1 Be Km1 New K+ K+ K K T T M M 2 Bm (ANEPII) 11 Na + Na + β β T I,M SP Q9BKJ1 Bm (BKTx) New Na + Na + α α T T I,M M 3 Bm 32VI 1a Na + Na + β β T T I I 4 Bm 33I 1a Na + Na + β β T T I I 4 Bm KdITAP3 11 Na + Na + β β T T,N I,M I,M 5 Bm KK1 New K+ K+ K K T M 6 Bm P01 8 K+ K+ K K T,N M SP Q9U522 Bm KK3 New K+ K+ K K T M 6 Bm Tx3 New K+ K+ K K T T M M 7 Bm TXKS1 New K+ K+ K K T,N ? 8 Bom α6a 3 Na + Na + α, α' α T I,M,C 9 Bom α6b 3 Na + Na + α, α' α T I,M,C 9 Bom α6c 3 Na + Na + α, α' α T I,M,C 9 Bom α6d 3 Na + Na + α, α' α T I,M,C 9 Bom α6e 3 Na + Na + α, α' α T I,M,C 9 Bs IT1 11 Na + Na + β β T T I,M I 10 Bs IT2 11 Na + Na + β β T T I,M I 10 Bs IT3 11 Na + Na + β β T T I,M I 10 Bs IT4 11 Na + Na + β β T T I,M I 10 Bt (Tamapin) 5 K+ K+ K K T T M M 11 Bt (Tamapin2) 5 K+ K+ K K T T M M 11 Bt (Tamulustoxin1) New Na + K+ β K T T I,M M 12 Bt (Tamulustoxin2) New Na + K+ β K T T I,M M 12 Cg 2 9c Na + Na + β T I,C 13 Cll 9 9* Na + Na + β T I,C 13 Hf Tx1 ? ? K+ ? K T ? 14 Hf Tx2 ? ? K+ ? K T ? 14 Lqh α6a 3 Na + Na + α, α' α' T I,M,C 9 Lqh α6b 3 Na + Na + α, α' α' T I,M,C 9 Lqh α6c 3 Na + Na + α, α' α' T I,M,C 9 Lqh ChTxb 1 K+ K+ K K T M 9 Lqh ChTxc 1 K+ K+ K K T M 9 Lqh ClTxa 1* Cl Cl Cl Cl T I 9 Lqh ClTxb 1a Cl Cl Cl Cl T I,M 9 Lqh ClTxc 1a Cl Cl Cl Cl T I,M 9 Lqh ClTxd 1a Cl Cl Cl Cl T I,M 9 Lqh IT1a 1* Na + Na + β β T I 9 Lqh IT1b 1* Na + Na + β β T I 9 Lqh IT1c 1* Na + Na + β β T I 9 Lqh IT1d 1* Na + Na + β β T I 9 Lqh IT213 11 Na + Na + β β T I,M 9 Lqh IT253 11 Na + Na + β β T I,M 9 Os K2 New K+ K+ K K T T M I 15 Tb2II 2b Na + Na + α, β β T,N T M I,M 16 Tb ITI 2* Na + Na + α, β β T,N T I,M I 16 Tst 2 2b Na + Na + β β T T M M 17 References: 1 (Ceard et al ., 2001), 2 (Korolkova et al ., 2001), 3 (Srinivasan et al ., 2001),4(Escoubas et al .,2000),5(Guan et al .,2001),6(Zeng et al .,2001),7(Vacher et al ., 2001), 8 (Zhu et al ., 2001), 9 (Froy et al ., 1999), 10 (Ali et al ., 2001), 11 (Pedarzani et al ., 2002), 12 (Strong et al ., 2001), 13 (Possani et al ., 2000), 14 (Srinivasan et al ., 2002b), 15 (Dudina et al ., 2001), 16 (Pimenta et al ., 2001), 17 (Becerril et al .,1996), SP directsubmissiontoSwissProt.

82 Chapter 5: Functional prediction of scorpion toxin data

Na +toxinsbelongto α, αlike and β subfamilies whereas K +, Cl and Ca 2+ specific toxins belong to respective single families. Crossreferencing with original literatureorannotationsinthedatabasesshowedthatof45testsequences,31(68.9%) hadcorrect,10(22.2%)partiallycorrect,and4(8.9%)wrongpredictionsoftoxintype.

Partially correct predictions were those where Annotate Scorpion predicted two possible types, of which one was correct. For Na +toxins,15of17toxinsequences were correctly predicted as β subfamily while two were annotated as belonging to either αor βsubfamily.Ofsix αtoxins,BKTxwascorrectlypredictedandanother fivetoxinsaseither αor αliketoxins.Three αliketoxins,Lqh α6a,6band6cwere annotatedaseither αor αliketoxins.

Toxinactiondescribesthepotencyofthetoxinwhere'neurotoxic'elicitstoxic effectuponinjectionintotargetorganismswhile'nontoxic'doesnotcausetheeffect.

Comparisontoexperimentalresultsshowedthatof18testsequences,15(83.3%)had correctpredictions,andthree(16.7%)hadpartiallycorrectpredictionsoftoxicity.Tb2

IIandTbITIwereexperimentallydeterminedtobeneurotoxicbutwerepredictedas beingeitherneurotoxicornontoxic.BmKdITAP3wasobservedtobeweaklytoxicin insectandnontoxicinmammalbutwaspredictedtobetoxic.

Of 18 experimentally characterised sequences, eight (44.4%) predictions of targetcellspecificity werecorrect.Anothereightpeptideswerepredictedtointeract withbothinsectandmammaliancells,butexperimentaltoxicitywasreportedonlyfor mammalian cells (three peptides) or insect cells (five peptides). One peptide was predicted to interact with mammalian cells, while experimental results showed specificityforbothmammalianandinsectcells.Onlyasinglepredictionwasincorrect

(Os K2). The species specificity of Bm TXKS1 was not assigned as there were no nearestneighboursand Annotate Scorpion predictedthatitbelongstoanewgroup.

83 Chapter 5: Functional prediction of scorpion toxin data

In the second test set of 127 sequences, only one sequence could not be annotated ( Table 8 ). No similar sequences were found for Toxin 6 from Buthus martensii Karsch(SwissProt:Q95P85),thus Annotate Scorpion couldassignneither thegroup,norfunctionalfeaturestoit.Thissequenceconstitutesanewscorpiontoxin group.11sequences,inclusiveofToxin6,hadprimarysequenceinformationonlyand thus predictions were not compared against them. 20 sequences were predicted as membersofnewgroups,ofwhich14(70%)werecorrectlypredictedasmembersof new groups, and six (30%) belong to existing groups (KTX1, AamH1, AamH2,

AamH3, Lqh4 and Lqhβ1 belong to K + group 1, Na + groups 4, 6, 5, 6 and 10, respectively). New subgroups were predicted for five sequences. The remaining 91 toxinsequenceswereclassifiedintothewelldefinedgroupsandsubgroups.

Of127sequences,111hadknownionchannelspecificity,30forcellulartarget specificityandtoxinaction,and71fortoxintype.Ionchannelspecificitywascorrectly predictedfor110(99.1%)toxins.OmTx3waswronglyclassifiedasNa +toxinsinstead ofK +toxins(0.9%misclassification).

Crossreferencing with original publications or annotations in the databases showedthatof71testsequences,59(83.1%)hadcorrect,10(14.1%)partiallycorrect, and 2 (2.8%) wrong predictions of toxin type. For Na + toxins, five of six toxin sequences were correctly predicted as β subfamily while one was annotated as belonging to either α or β subfamily. Of 12 α toxins, AamH1 and AmmVIII were correctly predicted, another seven toxins as either α or αlike toxins, another two toxinsaseither αor βsubfamilyandTc48bwaswronglyannotatedas βsubfamily.

BmKM7, an αlike toxin, was annotated as either α or αlike toxins. OmTx3 belongingtoK +subfamilywaswronglyannotatedas αor βsubfamily.

84 Chapter 5: Functional prediction of scorpion toxin data

Comparisonoftoxinactionwithexperimentalresultsshowedthatof30test sequences, 15 (50.0%) had correct predictions, three (10.0%) had partially correct predictionsoftoxicity,sevenwrongpredictions(23.3%)andfive(16.7%)couldnotbe predicted.KTX3wasexperimentallydeterminedtobeneurotoxicbutwaspredictedas being either neurotoxic or nontoxic. Tspep1 and Kbot1 were nontoxic but were predictedtobeneurotoxicornontoxic.Tf4,Tspep2,Tspep3,Cn12,BjαIT,Bestoxin and Altitoxin were nontoxic but were wrongly predicted as neurotoxic. The toxin actionforCn11,BotIT6,BmKIM2,BmKITa1andLqhβ1couldnotbepredicted.

Of30experimentally characterisedsequences,12(40.0%)predictionsoftarget cellspecificitywerecorrect.Anothersixpeptideswerepredictedtointeractwithboth insect and mammalian cells, but experimental toxicity was reported only for mammaliancells(threepeptides),insectcells(twopeptides)orcrustaceancells(one peptide).Onepeptidewaspredictedtotargetbothinsectandcrustaceancells,butwas reportedforinsectcellsonly.Threepeptidestargetbothinsectsandmammaliancells, butwerepredictedtointeractwithmammaliancells(onepeptide)orinsectscells(two peptides).Twopeptideshadacellulartargetpredicted correctly where experimental toxicitywasreportedfortwocellulartargets.Twopredictionswereincorrect(IsTxand

BmK37).ThecellulartargetsofIkitoxin,Dortoxin,BestoxinandAltitoxinwerenot assigned.

Theauthoralsotested Annotate Scorpion withsequencesotherthanscorpion toxinandinallcasestheresultswere‘nosimilarrecordfound’.Thishasshownthat

Annotate Scorpion is robust and highly accurate in assigning putative annotation to previouslyunseenscorpiontoxins.Thetoxinsequencesfromthetestsetshavebeen incorporatedintothepredictionmoduleastemplatesforfunctionalannotationofnew toxinsequences.

85 Chapter 5: Functional prediction of scorpion toxin data

Table 8 Functional properties predicted for the second test set of 127 new toxin sequences. Abbreviations correspond to the scorpion species: Ap, Anuroctonus phaiodactylus ; Aah, Androctonus australis Hector; Aam, Androctonus amoreuxi ; Amm, Androctonus mauretanicus mauretanicus ; Bt, Buthus tamulus ; BmK, Buthus martensii Karsch; Bot, Buthus occitanus tunetanus ; Ce, Centruroides elegans ; Cg, Centruroides gracilis ; Cn, Centruroides noxius ; Cex, Centruroides exilicauda ; Cll, Centruroides limpidus limpidus ; CsE, Centruroides sculpturatus Ewing; Hs, Heterometrus spinifer ;Iv, Isometrus vittatus ;Lqh, Leiurus quinquestriatus hebreaus ; Lqq, Leiurus quinquestriatus quinquestriatus ; Oc, Opistophthalmus carinatus ; Om, Opisthacanthus madagascariensis ; Pg, Parabuthus granulatus ; Pt, Parabuthus transvaalicus ;Tc, Tityus cambridgei ;Td, Tityus discrepans ;Tf, Tityus fasciolatus ;Ts, Tityus serrulatus ; Tt, Tityus trivittatus ; Tz, Tityus zulianus . The Annotate Scorpion moduleassignedeachquerysequencetoagroup,andpredicteditsionchanneltype, toxintype,toxinactionandspeciesspecificity.TheGroupindicatesgrouporsubgroup thatcontainstoxinsequencessimilartothequery.Anewsubgroupisdenotedbythe groupnumberandanasterisk.Ifnonearestneighbourisfound,thequeryisassigned asa‘New’ group.Toxintypedescribesthesubfamilyofthequerysequence,where Na +toxinsareclassifiedintoalpha( α),alphalike( α')andbeta( β)subfamiliesandK +, Ca 2+ and Cl toxins have been classified into single subfamilies each. Toxin action describesthenatureofthetoxin:T,neurotoxic;N,nontoxic.Abbreviationscorrespond tospeciesspecificity:M,mammal;I,insect;C,crustacean; Pu: putative annotation. Ex: experimentally determined function. Ref.: functional annotation in SwissProt. Dashes () represent no experimental characterisation or no functional annotation in references. SwissProt ( SP ) accession numbers refer to direct submission of toxin sequences. Question marks (?) represent sequences not functionally annotated by Annotate Scorpion .

86 Chapter 5: Functional prediction of scorpion toxin data

Toxinname Group Ion Toxin Toxin Species Ref. Channel Type Action Specificity Ex/ Ex/ Pu Pu Pu Ex Pu Ex Ref. Ref. AaHTX2 4 K K K K T M 1 AamH1 New Na Na α α T M 2 AamH2 New Na Na α,α’ α T M 2 AamH3 New Na Na α,α’ α T M 2 AamTx 4 K K K K T M 3 AmmTX3 4 K K K K T T M M 4 AmmVIII 6 Na Na α α T T IM M 5 Ap() 6 K K K K T M 6 Ap(Phaiodotoxin) New Na Na α,α’ N N M M 7 Ap(Phaiodotoxin2) New Na α,α’ N M 7 Ap(Phaiodotoxin3) New Na α,α’ NT M 7 BjαIT 3 Na Na α,α’ α T N M M 8 BmK(ANEPIII) 11 Na β ? I SP Q9BKJ0 SP BmK(KTX1) New K K K K T IM Q8MQL0 BmK(Toxin6) ? ? ? ? ? SP Q95P85 BmK(X29S) New K,Ca K,Ca T M SP Q7Z0F1 BmK12b 1a Cl Cl T I SP Q9BJW4 BmK37 10 K K K K T T M I 9 SP BmK38 New K K K ? M Q8MUB1 BmKAEP2 11 Na β ? IM SP Q86M31 BmKIM2 11 Na Na β β ? T I IM 10 BmKITa 11 Na β ? IM SP Q9XY87 BmKITa1 11 Na Na β ? T IM I 11 BmKK4 4 K K T IM 12 BmKK7 1 K K K K T M SP P59938 BmKM7 3 Na Na α,α’ α’ T M 13 BmKSKTx1 19 K K K K T ? 14 BmKTXPL2 New Na α ? I 15 BmKX New K,Ca,Na K,Ca,α T M 16 SP BmKα3 6 Na Na α T M Q9GUA7 Bot(Kbot1) 9 K K K K NT N M M 17 Bot(KTX3) 3 K K K K NT T M M 18 BotIT6 11 Na Na β β ? T I I 19 Bt(Neurotoxin) 3 Na α,α’ T M SP P60277 BtITx3 1a Cl Cl Cl Cl T T IC I 20 BtK2 9 K K K K NT M 21 Buthussp(Toxin) New K K ? M SP P83108 CeErg1 16b K K K K T M 22 CeErg2 16b K K K K T M 22 CeErg3 16b K K K K T M 22 CexErg1 16b K K K K T M 22 CexErg2 16b K K K K T M 22 CexErg3 16b K K K K T M 22 CexErg4 16b K K K K T M 22 CgErg1 16b K K K K T M 22 CgErg2 16b K K K K T M 22 CgErg3 16b K K K K T M 22 Cll2b 9b Na Na β T M SP P59899 Cll3 9b Na Na β T M SP Q7Z1K9 Cll4 9b Na Na β T M SP Q7Z1K8 Cll5b 9c Na Na β T IC SP Q7Z1K7 Cll5c 9c Na Na β T IC SP Q7YT61 Cll5c* 9c Na Na β T IC SP Q7Z1K6 Cll6 9c Na Na β T IC SP Q7Z1K5 Cll7 9* Na Na β T IMC SP P59865 Cll8 9c Na Na β T IC SP Q7Z1K4 CllErg1 16b K K K K T M 22 CllErg2 16b K K K K T M 22 CllErg3 16b K K K K T M 22 CllErg4 16b K K K K T M 22 Cn11 11 Na Na β β ? T I IC 23 Cn12 New Na Na β T N IMC IMC 24 CnErg2 16b K K K K T M 22 CnErg3 16b K K K K T M 22 CnErg4 16b K K K K T M 22 CnErg5 16b K K K K T M 22 CsE1x 9a Na Na β T MC 25 CsE3 9b Na Na β T M 25 CsE8 9c Na Na β T IC 25 CsE9b 9d Na Na ? ? ? 25

87 Chapter 5: Functional prediction of scorpion toxin data

CsEErg1 16b K K K K T M 22 CsEErg3 16b K K K K T M 22 CsEErg5 16b K K K K T M 22 CsEIa 9a Na Na β T MC 25 CsEKerg1 16b K K K K T M 26 CsErg2 16b K K K K T M 22 CsErg4 16b K K K K T M 22 CsEv1b 9c Na Na β T IC 25 CsEv1c 9c Na Na β T IC 25 CsEv1d 9c Na Na β T IC 25 CsEv1e 9c Na Na β T IC 25 CsEv2a* 9c Na Na β T I 25 CsEv2b 9c Na Na β T IC 25 CsEv2c 9c Na Na β T IC 25 CsEv2d 9c Na Na β T IC 25 CsEv3b 9c Na Na β T I 25 CsEv3b* 9c Na Na β T IC 25 CsEv4 9c Na Na β T IC 27 CsEv5 2* Na Na α,β α T T M IM 27 Hs(κKTX1.3) 18 K K K K ? ? 28 Iv(IsomTx1) 1b Na Na β ? I 29 Iv(IsomTx2) 1b Na Na β ? I 29 Lqh4 New Na Na α,α’ α T T M IM 30 Lqh6 5 Na Na α,α’ α T T IM IM 31 Lqh7 5 Na Na α,α’ α T T IM IM 31 Lqhβ1 New Na Na α,β β ? T IM I 32 Lqq(Insect2clone6) 11 Na Na β ? I 33 Lqq(Insect2clone8) 11 Na Na β ? I 33 Oc(Opicalcine1) 1 Ca Ca Ca Ca T M 34 Oc(Opicalcine2) 1 Ca Ca Ca Ca T M 34 OcKTx1 6 K K K K T M 35 OcKTx2 6 K K K K T M 35 OcKTx3 6 K K K K T M 35 OcKTx4 6 K K K K T M 35 OcKTx5 6 K K K K T M 35 Om(IsTx) New K K K K T T M C 36 OmTx1 New K K K K T M 37 OmTx2 New K K K K T M 37 OmTx3 New Na K α,β K ? IM 37 Pg(PBTx10) 11 K K K K ? ? 38 Pt(Altitoxin) 15 Na Na β T N ? M 39 Pt(Bestoxin) 15 Na Na β T N ? M 39 Pt(Dortoxin) 15 Na Na β T T ? M 39 Pt(Ikitoxin) 15 Na Na β β T T ? M 40 Tc30 4 K K K K T IM 41 Tc32 New K K K K T M 41 Tc48a 2* Na Na β T IM 42 Tc48b 2a Na Na β α T IM 43 Tc49b 2a Na Na β T T IM M 44 Td(Ardiscretin) 2* Na Na β T T IM IC 45 Td(Discrepin) New K K K K T M 46 Tf4 2* Na Na α,β α T N M MC 47 TsPep1 New K K NT N M M 48 TsPep2 New K K T N M M 48 TsPep3 New K K T N M M 48 TtButtoxin 12 K K K K T M 49 Tz1 2a Na Na β β T T IM M 50 References :1(Legros et al. ,2003),2(Chen et al. ,2003),3(Chen et al. ,2005),4(Vacher et al. ,2002), 5(Alami et al. ,2003),6(Bagdany et al. ,2005),7(ValdezCruz et al. ,2004a),8(Arnon et al. ,2005),9 (Xu et al. ,2004a),10(Peng et al. ,2002),11(Liu et al. ,2003),12(Zhang et al. ,2004),13(Guan et al. , 2004),14(Xu et al. ,2004b),15(ZhuandLi,2002),16(Wang et al. ,2005),17(MahjoubiBoubaker et al. ,2004),18(Meki et al. ,2000),19(Mejri et al. ,2003),20(Dhawan et al. ,2002),21(Dhawan et al. , 2003),22(Corona et al. ,2002),23(RamirezDominguez et al. ,2002),24(delRioPortilla et al. ,2004), 25(Corona et al. ,2001),26(Nastainczyk et al. ,2002),27(David et al. ,1991),28(Nirthanan et al. , 2005),29(Coronas et al. ,2003b),30(Corzo et al. ,2001),31(Hamon et al. ,2002),32(Gordon et al. , 2003),33(ZakiandMaruniak,2003),34(Zhu et al. ,2003),35(Zhu et al. ,2004c),36(Yamaji et al. , 2004),37(Chagot et al. ,2005),38(Huys et al. ,2004),39(Inceoglu et al. ,2005),40(Inceoglu et al. , 2002),41(Batista et al. ,2002a),42(Batista et al. ,2004),43(Murgia et al. ,2004),44(Batista et al. , 2002b),45(D'Suze et al. ,2004b),46(D'Suze et al. ,2004a),47(Wagner et al. ,2003),48(Pimenta et al. ,2003),49(Coronas et al. ,2003a),50(Borges et al. ,2004), SP directsubmissiontoSwissProt.

88 Chapter 5: Functional prediction of scorpion toxin data

5.4 Discussionandconclusions

Thereisaneedforanaccuratefunctionalpredictiontoolforscorpiontoxinsas morethanhalfofthedatainthetwotestsetscollectedfrompublicdatabasesandthe literature are molecular clones (52% and 69%, respectively), having only sequence information or partial functional information. The author has developed the first genericbioinformaticfunctionalpredictiontoolforaccuratefunctionalannotationof scorpion toxins that can help reduce the number of experiments conducted for characterisingnovelscorpiontoxinsequences.Thebioinformaticsbasedapproachof collecting, cleaning, annotating and classifying scorpion sequences into groups and subgroupsallowedpredictionofthefunctionsofqueryscorpionsequenceswithhigh accuracy. The initial process of cleaning the data is critical for preventing the propagationoferrors.Forexample,atthetimeofthisstudy,SwissProtdatabasehad annotatedexcitatoryanddepressanttoxinsasbelongingto αtoxinsubfamily(P01497,

P15147, P19856, P55904, P80962, P19855, P24336, P81240, P15228, P55903,

O61668 and Q9U7E5). These two groups of toxins, however, belong to βtoxin subfamily (Gordon et al ., 1998; Oren et al ., 1998). If these annotations were not corrected,thepredictionswouldbelessaccurate.

TheauthorhadclassifiedscorpiontoxinsusingBLASTandClustalWresults, which are in agreement with the phylogenetic analysis (discussed in Chapter 3).

Detailedclassificationofscorpiontoxinsequencesintowellorganisedgroupsallows better correlation of structurefunction relationships and thereafter, classification of new sequences by the prediction tool. Sequences that are similar (in primary, secondaryandtertiarystructures)oftenperformsimilarfunction.Mostscorpiontoxins share the CSH fold (Bontems et al .,1991).Byclusteringaquerysequencewithits nearest neighbours in the welldefined groups, functional properties of the nearest

89 Chapter 5: Functional prediction of scorpion toxin data neighbourscouldbeascribedtothequerysequence.Thealgorithmisrobustevenifthe query sequence could not be classified into the defined groups as it ascribed the functionalpropertiesofthefivenearestneighbourstothequery.Novelscorpiontoxin sequencesthatshowrelativelylowprimarysequencesimilarityareassignedintonew groups. If nearest neighbours do not exist, Annotate Scorpion will not make predictions.Thisrestrictionwillresultinscorpiontoxinsthathavenewfold(e.g. κ

Hefutoxins)maynotbeannotated.However,oncearepresentativetoxinisenteredin the database, it becomes a template for further predictions. The accuracy of the predictionmoduleislimitedbytheavailabilityofpresentdata.Nevertheless,asmore newvenomsequencesarecharacterisedandincludedintothedataset,theaccuracyof thestructurefunctionpredictiontoolisexpectedtoimprove.

Among the four functional properties predicted, higher accuracies were obtainedforionchannelspecificityandtoxintypeascomparedtotoxinactionand cellular target specificity. The difference in accuracies could be that the biological aspectoftoxicactionandcellularspeciestargetisfunctionallymorecomplex.Theion channelspecificityandtoxinsubtypecouldbepredictedbasedonsequencesimilarity whereastoxinactionandcellularspecificityaredependentonvariousfactorsbesides sequence similarity. The route of toxin administration affects level of toxicity in animalmodels.Forexample,noapparenteffectwasobservedinmicewhenscorpion toxin Cll9 was administered intraperitoneally but induced sleep upon intracerebroventricularroute(Corona et al .,2003).Also,allαtoxinsaretoxictomice to a similar extent when injected subcutaneously, but they differ prominently upon intracerebroventricularinjection(GordonandGurevitz,2003).Finally,theage,genetic background and gender of animal models determine toxicity of scorpion toxins

(Padilla et al .,2003).Forpredictionofcellularspecificity,scorpiontoxinshavebeen

90 Chapter 5: Functional prediction of scorpion toxin data tested on one or combination of animal modelssuch as insects (cockroach, locust), mammals (mice, rat) and crustaceans (crayfish, prawn). The specificity to other organismshasnotbeenfullydetermined.Forexample,bindingstudiescorrelatedwith toxicity towards mammals and insects have revealed that even toxins known as mammalspecific can be toxic to insects and vice versa (Gordon et al ., 1996) The animalgroupspecificityisonlyrelativeanddefinitecrossreactivityexists(Selisko et al ., 1996). The lack of cellular specificity data prevents accurate prediction of this functional property. Many scorpion toxins are generally characterised on a limited number of pharmacological targets and animal models where many other true physiologicaltargetsremaintobediscovered.

Thedetailedstructuralandfunctionalclassificationofscorpiontoxinsequences into groups and subgroups can be used for accurate prediction of ion channel specificityandtoxinsubtype,andtoalesserextentoncellulartargetandtoxinaction.

Thepredictionscaneffectivelyreducethenumberofcriticalexperimentsperformed.

Thisgenericbioinformaticdrivenapproachservesasamodelforfunctionalprediction of novel toxins from other venomous animals as demonstrated in the MOLLUSK

(http://research.i2r.astar.edu.sg/MOLLUSK/ )andsnakevenomneurotoxinsdatabases

(Siew et al .,2004).

Chaptersummary

• Experimentalcharacterisationofthelargenumberofnewlyidentifiedscorpion

toxinsisprohibitivelyexpensiveandtimeconsuming.Thus,thereisaneedto

accurately predict functions of these sequences to expedite their

characterisationbyfacilitatingselectionofcriticalexperiments.

91 Chapter 5: Functional prediction of scorpion toxin data

• Thisworkreportsonthefirstnovelpredictiontool,Annotate Scorpion ,which

predictsfunctionalpropertiesofuncharacterisedscorpiontoxins.Thealgorithm

combinessequencecomparison,nearestneighbouranalysisanddecisionrules

topredictionchannelspecificity,toxinsubfamily,toxinpotencyandcellular

specificity.

• High accuracy of predictions, particularly ion channel specificity and toxin

subfamily, was obtained by validation with two sets of experimentally

characterised scorpion toxins. Lower accuracies were obtained for toxin

potency and cellular specificity because of different biological and

experimentalfactorsinvolvedinthefunctionalpropertiesotherthansequence

similarity.

• The methodology reported here demonstrates that bioinformatics can be

applied to systematic functional analysis of large sets of scorpion toxin

sequences. This generic approach serves as a model for prediction of novel

toxinsfromothervenomousanimals.

92 Chapter 6: Data warehouse of scorpion toxins

PartIII:Chapter6 Implementation of scorpion toxin data warehouse ‘[An] argument against the Human GenomeProjectwasthatitwastrivial,it wasn’treallyscience.Itwasreferredtoas a fishing expedition, or a mindless collecting of facts. What they did not realiseishowthesedatabasesweregoing totransformhowwethinkaboutbiology andmedicine.’ LeroyHood

93 Chapter 6: Data warehouse of scorpion toxins

Largescaleanalysisofscorpiontoxindataprovidesaholisticviewofallthecurrently available data which is important for better understanding of their structurefunction relationships.However,scorpiontoxindataarereportedaslonglistsofsequencesin literaturereviews(e.g.Possani et al .,1999;Tytgat et al .,1999;Goudet et al .,2002)or are scattered across multiple public databases with very limited structural and functionalannotation.Discrepanciesbetweenrecordsrepresentingthesametoxinin various databases are also common. Mutation studies (such as sitedirected mutagenesis)ofscorpiontoxinsprovidealargeamountofdataontheirfunctionaland structuralfeaturesareavailableintheliteraturebutareusuallynotusedinlargescale analyses. Structural analyses of scorpion toxins are impeded by availability of 3D structuresinPDB(Deshpande et al .,2005)(only20%of393reportedscorpiontoxin sequencesasofNovember2005).Theaccesstoandanalysisofthegrowingnumberof scorpion toxin data scattered across multiple data sources is becoming increasingly difficult. Thus, there is a need to create a centralised repository for improved data managementtofacilitateanalysesinthefieldofscorpiontoxins.

TheauthorwasinvolvedinbuildingtheSCORPIONdatabase (Srinivasan et al .,2002a)whichcontained277nativescorpiontoxins.InSCORPION,mutanttoxin data available in the literature had not been tapped in the bioinformatics study of scorpion toxins. Values of lethal assays in animal models and binding affinities towardsvariousionchannelswerenotextractedfromtheliterature.

SCORPION2databaseisthefirstdatawarehouseofnativescorpiontoxinsand artificialmutanttoxinswithintegratedbioinformaticstoolsforsystematicanalysisof their structurefunction relationships. It is publicly available at andsupersedestheearlierSCORPIONdatabase.SCORPION2

94 Chapter 6: Data warehouse of scorpion toxins serves as a onestop repository of currently available scorpion toxin data where

SCORPION2recordsarecleanedoferrorsandhighlyenrichedwithstructurefunction information from literature. Query, extraction, prediction and structure visualisation toolshavebeenintegratedtofacilitatemanipulationoflargenumberofscorpiontoxin dataforanalysis.Inthischapter,thegeneralstepsinvolvedintheimplementationof data warehouse of scorpion toxins, the SCORPION2 and advantages of data warehouseinthefieldoftoxinresearchovergeneraldatabasesarediscussed.

6.1 Datawarehouseforinformationusageandknowledgediscovery

Thecompletionandongoingsequencingofvariousgenomeprojectsincluding threevenomousanimals(honeybee,seaurchinandduckbilledplatypus)markedthe beginning of biological research into ‘largescale science’ of venomous animals

(Collins et al ., 2003). The large volume of biological data has posed an exciting challengeforresearchersonhowtoextractinformationfromrawdataandtransform the information into knowledge. These voluminous data are scattered in diverse biological databases with limited annotation. In the period of one year, 171 new publiclyaccessibledatabaseswerecreated, totaling719biologicaldatabasesin2005

(Galperin,2005).Thesedatabasesplaykeyrolesinbiologicalresearch.

Easyaccesstodiversebiologicaldatabasesandefficientanalysesofthesedata areimportantforinterpretationofexperimentalresults,discoveryofnewknowledge andplanningresearch(Schonbach et al .,2000).However,theheterogeneousformats ofthegeographicallydiversebiologicaldatabasesimpedeaccesstoandextractionof usefulinformation.Thesedatabaseshavedifferentdataformats,databasemanagement systems and data manipulation languages, among others. Different databases also encodedataatvariouslevelsofcomplexityrangingfromlowleveldatatohighlevel

95 Chapter 6: Data warehouse of scorpion toxins information.Examplesoflowleveldataarethosethatreportexperimentalvaluesas compared to highlevel information (e.g. interpretation of data by domain experts), whicharederivedfromthelowleveldata.Dependingonthedatathattheycontain, thesedatabasesservedifferentfunctions.

Generalpurposesequencedatabases,suchasGenBank(Benson et al .,2005), aimtoexpandanddisseminatenucleotidesequenceinformation.Anotherexampleis

SwissProt (Bairoch et al ., 2004), a general database of protein sequences which focusesonbuildingahighleveloffunctionalannotationofproteins,integratingwith other databases and maintaining lowredundancy. In contrast to a general purpose database(e.g.SwissProt),whichcontainsallknownproteinsequences, amolecular biology data warehouse contains subjectorientated, integrated, nonvolatile, expert interpreted collection of biological data (e.g. a family of proteins that have similar structures and functions) (Schonbach et al ., 2000). The data warehouse involves integrating information on a specific subject from different databases, systems, and locationsintoacentraldatabaseformoreaccurateandindepthanalysisofbiological data,improvedresearchplanningandknowledgediscovery.Nonvolatiledatameans thatdataisneverremovedeventhoughmoredataareaddedi.e.dataisstableinadata warehouse 1.Expertcurationandannotationisessentialtoascertaintherelevanceand accuracy of the entries in a biological data warehouse since the types of errors and inconsistencies can be domainspecific (Schonbach et al ., 2000). The advantages of data warehouses include providing an integrated environment for consistency in naming conventions, measurement of variables and encoding data attributes

(Schonbach et al .,2000)andeliminatepotentialtechnicalandsemanticheterogeneity

(Karasavvas et al .,2004).

1http://www.sdgcomputing.com/glossary.htm

96 Chapter 6: Data warehouse of scorpion toxins

Theseadvantagesmotivatedtheauthortobuildadatawarehouseofscorpion toxinswiththeSCORPION2databaseasanexampleofabioinformaticplatformfor improvedanalysisoftheircomplexstructurefunctionrelationships.

6.2 Implementationofthedatawarehouseofscorpiontoxins,SCORPION2

HeretheauthordescribesthecreationofSCORPION2databasewhichinvolve accesstotoxindatascatteredacrossmultipledatabases,inspectionforerrors,analysis andclassificationoftoxinsequencesandtheirstructures,andthedesignanduseof predictive models for simulation of laboratory experiments (Tan et al ., 2003).

SCORPION2servesasavaluableprimaryresourcewithintegratedbioinformatictools foranalysisandpredictionofstructurefunctionrelationshipsofscorpiontoxinsthat maytypicallyinvolvequeryingmultipleresources.

6.3 Materialsandmethods

Public databases used for creation of the SCORPION2 database were:

GenBank and SwissProt for consolidation of scorpion toxin sequences, PDB for extraction of scorpion toxin PDB structures and PubMed (http://www.pubmed.gov/ ) forextractionofpublishedscorpiontoxinsequencesnotdepositedinanydatabases, mutantscorpiontoxindata,structurefunctioninformationandliterature.Thegeneral stepsofcreatingabiologicaldatawarehouseofscorpion toxins were implemented: data collection, data cleaning and data annotation. Homology models were also generatedforscorpiontoxinsthatlack3Dstructures.

97 Chapter 6: Data warehouse of scorpion toxins

6.3.1 Datacollectionofnativeandmutantscorpiontoxinsequencesandtheir3D

structures

For data collection of scorpion native toxin sequences, two strategies were employed: keyword and sequence similarity searches. A keyword search identifies sequencesbycomparingwordsorstringsofcharactersthroughthewrittendescriptions orannotatedsectionofadatabaserecordwhileasequencesimilaritysearchcompares thesequencesthemselves. Keywordsearch“scorpiontoxin”wasperformedinSwiss

Prot, GenBank and PDB databases. The search results were screened for records containing scorpion toxins only which formed an initial dataset while irrelevant records (nonscorpion toxins such as snake, bacterial and plant toxins etc.) were removed. Each sequence in the initial dataset was subsequently submitted into the proteinBLASTtool(blastp)withdefaultparameters against the nonredundant (nr) database at NCBI ( http://www.ncbi.nlm.nih.gov/BLAST/ ). The nr database removes redundant identical sequences to yield a collection of unique sequences which facilitatesdatacleaning.TheaimofsubmittingscorpiontoxinsequencesintoBLAST wastocollectscorpiontoxinsequencesthatweremissedoutbykeywordsearching becausethelatterapproachisnotexhaustive.Forexample,Opicalcines1and2from

Opistophthalmus carinatus (SwissProtID:P60252andP60253)werenotpickedup bythekeywordsearch‘scorpiontoxin’butbyBLASTsearchusingaquerysequence ofImperatoxinfrom Pandinus imperator .Themajorityofbiologicalrecordsingeneral purpose databases have limited or no annotation, in particular the nucleotide data derived from sequencing of genomes where high throughput is the priority. Thus, sequencesimilaritysearchisneededtogathertheserecords.

Sequences in the BLAST results were compared against the initial dataset whererelevantomittedsequenceswereaddedtotheinitialdataset.Theprocesswas

98 Chapter 6: Data warehouse of scorpion toxins repeated until the initial dataset was exhausted. Published novel sequences not deposited in any databases were obtained using a literature search in PubMed and subsequent extraction of sequences from publications. The purpose of performing keyword and sequence similarity searches, and extracting sequences from literature wastogenerateacomprehensivedataset,whichservedasabetterrepresentationofthe completesequenceinformation.

Data of mutant scorpion toxins were extracted solely from literature by exhaustive keyword searches (e.g. ‘scorpion toxin’, ‘chemical modification’,

‘mutation’, ‘mutant’, ‘alanine scanning’, ‘mutagenesis’, ‘analog’ and ‘chimera’) in

PubMed.Inaliterature,eachmutanttoxinsequencewasrepresentedasanindividual record. For example, if there were 15 unique mutant sequences in a literature, 15 recordsweregenerated.

6.3.2 Generationofhomologymodelsofscorpiontoxins

After the collection of scorpion toxin sequences and 3D structures, scorpion toxins that currently lack experimentally determined structures (e.g. xray crystallography or nuclear magnetic resonance) were inspected if homology models couldbegeneratedbasedonavailabilityoftemplatestructuresandpairwisealignment.

The approach began with identification of the template sequence, which has an experimentallydetermined3DstructureinPDBandahighpairwisesequenceidentity tothequerysequencealsoknownasthetargetsequence.Thecutoffsequenceidentity valuebetweenthetemplateandtargetsequenceswassetat≥30%whereRost(1999) demonstrated that 90% of the templatetarget pairs had similar structures. The alignment of the query sequence to the template structure was optimised and the homologymodelwasgeneratedusingtheautomatedhomologymodellingfeaturein

99 Chapter 6: Data warehouse of scorpion toxins the SDPMOD server (Kong et al .,2004).Thequalityofeachhomologymodelwas checkedmanually,facilitatedbyassessmentbythePROCHECKprogram(Laskowski et al .,1993)anddepositedintoSCORPION2database.

6.3.3 Datacleaning

Dataerrorsweredetectedmanuallybycomparisonofrecordsbetweenpublic databasesandreferencingtooriginalliterature.Recordswereinspectedprimarilyfor errorsintheirprimarysequences,disulfidelinkages,functionalannotationandtoxin names.Inconsistenciesinthereportedprimarysequencesbetweenoriginalreferences wereannotatedinthetoxinrecordfieldtermed‘Conflict’.Discrepanciesinstructural orfunctionalinformationwereannotatedinafieldcalled‘Comment’. Duplicatesor identical sequences belonging to the same scorpion species reported in different databases weredetected bypairwisesequencecomparisonandcombinedasasingle record withhyperlinkstoeachunderlyingdatabase.Differentnamesreferringtothe sametoxinsequenceswereconsolidatedassynonyms.

6.3.4 Dataannotation

Functionalandstructuralinformationfromoriginalliteraturewasannotatedto eachnativetoxinrecord.Examplesoffunctionalinformationincludedbindingaffinity towardsspecificionchannelsubtypes,toxicsymptomsobserveduponadministeringto animal models or suggestion of possible interactive site. Structural information annotated included lengths of signal peptides and matured toxins, disulfide connectivity and posttranslational modification (e.g. amidation). For records of mutant scorpion toxins, functional information extracted from literature included

100 Chapter 6: Data warehouse of scorpion toxins changesinbindingaffinitiestowardsspecificion channels and differences in lethal dosages in animal models compared with that of native toxin peptides. Structural information, such as disulfide connectivity and monitoring of changes in secondary structuresofmutantsequencesbycirculardichroismspectroscopy,wasalsoannotated inthemutantrecord.Theannotatedfieldsincludecitationsforthearticlesinwhichthe experimentaldatawerereported.

Aftercollection,cleaningandannotationofthedataset,databasecreationwas published on the internet using Templar ( http://research.i2r.astar.edu.sg/Templar/ ).

TemplarisasubsystemofBioWarewhichconsistsoffourdatawarehousingmodules for the retrieval, annotation, publishing and data updating of specialist molecular databases (Koh et al ., 2004a). The organisation of the compiled data was designed using data warehousing principles (Schonbach et al ., 2000) where SCORPION2 recordsweremaintainedasflatfileformattofacilitateeasydataanalysis,extractionof valueaddedinterpretationofthedataanddatabasemaintenance.

6.4 Results

ThedifferencesbetweenSCORPION2andSCORPIONhavebeensummarised in Figure 28 . SCORPION2 contains more than 800 records of native and mutant scorpiontoxins.Approximately60%ofrecordswereextractedfromliterature.Of624

3D structures available in SCORPION2, 82 were extracted from the PDB database

(Table9 )and542homologymodelsofnativeandmutanttoxinsweregeneratedusing thecomparativemodelingtool,SDPMOD(Kong et al .,2004).Tenscorpionspecies previously not present in SCORPION are: Androctonus amoreuxi, Anuroctonus phaiodactylus, Babycurus centrurimorphus, Isometrus vittatus, Opisthacanthis madagascariensis, Opistophthalmus carinatus, Parabuthus granulatus, Tityus

101 Chapter 6: Data warehouse of scorpion toxins fasciolatus, Tityus trivittatus and Tityus zulianus. Some 50 scorpion species are represented in SCORPION2. More than 400 newly added records of mutant toxin records were extracted exclusively from the literature. The number of literature references has increased to ~500. The newly added functional information includes bindingaffinity(>500records)andtoxicity(~200records).

TheSCORPION2interfaceprovideshyperlinkstothecorrespondingentriesof therelevantexternaldatabases.Errorswereminimisedtoensurehighdataqualityby checking with original literature and crossreferencing across databases. Analysis of scorpiontoxindatafrompublicdatabasesrevealedthepresenceofnumerouserrorsin thesequences,incompletedata,poorannotationanddiscrepanciesofinformationfor thesameentryfromdifferentdatabasesources.Examplesoferrorsthatwerefoundin the scorpion toxin entries in major sequence databases are: wrong links between databases, different names for the same sequence, different sequences for the same toxin, missing links between databases, toxin names from journals not used in the database,andSwissProtlinkstoPDBstructuresofpoorhomology(Figure29 ).

102 Chapter 6: Data warehouse of scorpion toxins

Statistics of SCORPION2 database

900 819 800 700 600 542 498 500 426 Ver 1 393 400 Ver 2 295 277 277 300 No. of entries of No. 200 82 103 100 50 0 0 Literature PDB Homology Mutant Native Total no. of references structures models toxins toxins toxin entries Figure28StatisticsofSCORPION2databaseonthenumberofrecords,3Dstructures andliteratureavailableasofNovember2005.

Types of errors detected during data cleaning

200 182 180 160 140 120 100 80 60 35 Number of Number of records 40 23 11 15 20 1 0 Wrong links Discrepancies in Discrepancies in No link betw een Toxin names in Value-added betw een names for same sequences databases journal not found information from databases sequences betw een in databases literature databases Figure29Numberofrecordshavingerrorsordiscrepanciessummedbycategoryof errorduringtheinitialcollectionofscorpiontoxinrecords.Thegraphalsoshowsthe numberofrecordsthatrequiredadditionalannotations.

103 Chapter 6: Data warehouse of scorpion toxins

Table 9 Asummaryof82scorpiontoxinPDBstructuresavailableinSCORPION2. The structures are listed in alphabetical order according to scorpion species and divided into four broad groups of ion channel specificity, namely K+ (grey), Na + (yellow), Ca 2+ (green) and Cl (cyan) ion channels as of November 2005. Multiple structuresofthesametoxinareavailableforsometoxins.

Scorpionspecies Toxinname PDBidentity Androctonus mauretanicus mauretanicus Kaliotoxin1 2KTX,1KTX P01 1ACW P05 1PNH Buthus eupeus BeKm1 1J5J,1LGL Buthus martensi Karsch BmKTX 1BKT BmP01 1WM7 BmP02 1DU9 BmP03 1WM8 BmTx1 1BIG BmBKTx1 1Q2K,1R1G BmKK2 1PVZ BmKK4 1S8K BmKX 1RJI BmTx2 2BMT BmTx3 1M2S Centruroides limbatus Hongotoxin1 1HLY Centruroides margaritatus 1MTX Centruroides noxius Cobatoxin1 1PJV Noxiustoxin 1SXM Ergtoxin 1NE5,1PX9 Heterometrus fulvipes Hefutoxin1 1HP9 Heterometrus spinnifer HsTx1 1QUZ Leiurus quinquestriatus hebraeus Agitoxin2 1AGT Charybdotoxin 2CRD LeiurotoxinI 1SCY LQH182 1LIR Opisthacanthus madagascariensis IsTx 1WMT OmTx1 1WQC OmTx2 1WQD OmTx3 1WQE Orthochirus scrobiculosus Osk1 1SCO Pandinus imperator Pi2 2PTA

104 Chapter 6: Data warehouse of scorpion toxins

Pandinus imperator Pi3 1C49 Pi4 1N8M Pi7 1QKY Scorpio maurus palmatus Maurotoxin 1TXM Tityus cambridgei Tc1 1JLZ Tityus serrulatus Butantoxin 1C55,1C56 Tsκ 1TSK TsTxKα 1HP2 Androctonus australis Hector AaH2 1AHO,1PTX Buthus martensii Karsch BmαTX12 1OMY BmKITAP 1T0Z BmKM1 1DJT,1SN1 BmKM2 1CHZ BmKM4 1SN4 BmKM7 1KV0 BmKM8 1SNB Centruroides exilicauda CsEV1 1VNA,1VNB CsEI 1B3C,2B3C Centruroides noxius Cn2 1CN2 Cn12 1PE4 Centruroides sculpturatus CsEV 1NRA,1NRB CsEV2 1JZA,1JZB CsEV3 2SN3 CsEV5 1I6F,1I6G,1NH5 Hottentotta judaica BjxtrIT 1BCG Leiurus quinquestriatus hebraeus LqhαIT 1LQH,1LQI Lqh3 1BMR,1FH3 Leiurus quinquestriatus quinquestriatus Lqq3 1LQQ Mesobuthus tamulus Aneurotoxin 1DQ7 Tityus serrulatus TsTxVII 1B7D,1NPI Scorpio maurus palmatus Maurocalcine 1C6W Pandinus imperator ImperatoxinA 1IE6 Leiurus quinquestriatus hebraeus Chlorotoxin 1CHL Mesobuthus eupeus InsectotoxinI5A 1SIS

105 Chapter 6: Data warehouse of scorpion toxins

6.4.1 Databasedescription

Five bioinformatics features as available in the earlier SCORPION database

(Srinivasan et al ., 2002a) are: Search Scorpion, BLAST Scorpion, Download

FASTA, Scorpion Structure and Annotate Scorpion .Anewquerytool, Activity

Scorpion wasdevelopedinSCORPION2database(Figure30 ).

Download sequences

Download Ion channel target, target cellular specificity, toxin action, classification Annotation Prediction PDB Structures Structure Query Homology 3D Structure sequence models

Keyword Search Sequence Search

Search Activity BLAST

Keyword, Species, Most similar Species, Animal model, sequences Channel, Adminstration route Figure 30 Site map of SCORPION2 database. The six oval shapes represent bioinformatictoolsavailableinthedatabase. Userscansearchforscorpiontoxinentriesusingthe SearchScorpion feature bykeywords(e.g.betatoxin,potassium,nontoxicetc.),thespecificionchannel(e.g. sodium,calciumetc.)orscorpionspecies.ThenewActivityScorpion featureallows query of scorpion toxin activity through scorpion species, routes of toxin administration or animal models using logical operators (AND, OR). For example, comparison of LD 50 values by route of toxin administration can be performed by

106 Chapter 6: Data warehouse of scorpion toxins checking ‘intracerebroventricular’, ‘subcutaneous’ and ‘AND’ options. The LD 50 values listed corroborates that toxicity in mammalian models are more toxic via intracerebroventricular than subcutaneous route (Gordon and Gurevitz, 2003). This search tool enables identification of the most active toxins in different scorpion venoms where envenomation treatment can be developed (Theakston et al ., 2003;

Gazarian et al .,2005).Knowledgeofscorpiontoxinlethalityisimportantasscorpion stingsremainsapublichealthissue(Theakston et al .,2003;Gazarian et al .,2005).The resultsof SearchScorpion and ActivityScorpion featuresaredisplayedaslistsina tabularformcontainingbriefdescriptionsoftherecordswithhyperlinkstoindividual fulldatarecords( Figure31 ).

Figure31 ThewebinterfaceoftheSCORPION2database.Theleftframeprovidesfor selectionofthevarioussearch,extractionandpredictivetools.Thelargerrightframe isanexampleoftheoutputlist(partial)ofentries that match route keyword search ‘intracerebroventricular’and‘subcutaneous’.Theaccessionnumberfieldsinthetable containhyperlinkstothefullentriesintheSCORPION2database.

107 Chapter 6: Data warehouse of scorpion toxins

The BLAST Scorpion feature enables the user to perform sequence comparison using the BLAST (Altschul et al ., 1997) algorithm. A query sequence, eitheraminoacidornucleotide,canbecomparedagainstallscorpiontoxinsequences availableintheSCORPION2database.Theuserscangettheresultsineitherstandard

BLASTsearchoutputorcolourcodedmultiplesequencealignmentgeneratedbythe

Mview program (Brown et al ., 1998). The colourcoded alignments indicate the positions of conserved and homologous amino acids in the multiple sequence alignmentgeneratedusingBLASTsearches( Figure32 ).

Figure 32 BLAST result upon submission of maurotoxin from Scorpio maurus palmatus .SequencesintheSCORPION2databasethataremostsimilartomaurotoxin arerankedindescendingorder.Positionsofaminoacidconservationarecolouredin theMviewformat.

The Download FASTA featureenablesuserstodownloadFASTAformatted files(Pearson,1990)ofaminoacidandnucleotidesequencesfromtheSCORPION2 databaseforsequenceanalysesoftheirtoxinsofinterest.TheFASTAformattedfiles areavailableaszipcompressedfiles.

108 Chapter 6: Data warehouse of scorpion toxins

The ScorpionStructure featureallowsuserstoviewandstudyavailable3D structures through either Jmol ( http://www.jmol.org/ ) or Chime

(http://www.mdli.com/downloads/ ) viewer. In addition, users can download the 3D structures in PDB format. These structures include 82 structures extracted from the

PDBdatabaseand542homologymodelsgeneratedfornativeandmutanttoxinsasof

November 2005. Information on structural template and pairwise sequence identity between template and target (ranging from 32.8 to 100%) are provided in the corresponding SCORPION2 records for each homology model. The quality of each homologymodelwasevaluatedbythePROCHECK(Laskowski et al .,1993)program and has a Ramachandran plot which can be accessed through a hyperlink. Various displayschemessuchasspacefill,wireframeandbackbone,areavailable( Figure33 ).

Figure33Visualisationofscorpiontoxin3DstructuresusingJmol.Variousschemes ofdisplayssuchaswireframe,spacefillandsticksareavailable.

109 Chapter 6: Data warehouse of scorpion toxins

The prediction tool, Annotate Scorpion , automatically generates putative functionalannotationforthescorpiontoxinbeing queried (discussed in Chapter 5).

The annotation result contains the following predicted properties of the query sequence:subfamilyofthetoxin,ionchannelspecificity,mechanismofaction,nearest neighbour analysis by sequence similarity, and the target cell type. This feature providesmultiplesequencealignment,usingClustalW(Thompson et al .,1994),ofthe testsequence alongwiththenearestneighbour sequences available in the database.

Theinbuiltintelligenceofthe AnnotateScorpion featurehelpstheannotationofthe fragments of the toxin sequences and identifies novel subfamilies of the query peptides.

6.4.2 DescriptionoftheSCORPION2records

SCORPION2 entries contain fields described in the earlier version of the database (Srinivasan et al ., 2002a): Date, Accession, References, Species, Name,

Mechanism of Action, Toxin_action, Activity, Interaction_site, Target_cell, Toxin

Type,Structure,Disulfide,Alpha_helix,Beta_strand,C_end,Sequence,Designation,

Translation, AA Alignment, Conflict, and Comment. New fields that contain functional,structural,orotherrelevantinformationhavebeenaddedtoSCORPION2.

A summary of fields in a SCORPION2 full data recordisgivenin Table 10 . The description of each field will be divided into database, functional and structural categories.

110 Chapter 6: Data warehouse of scorpion toxins

Table 10 Description of fields in a SCORPION2 record. The field values may be textual,numericoremptyandcanbegenerallydividedintodatabase,functionaland structuralcategories.NewfieldnamesintroducedinSCORPION2arein boldface . Fieldname Description Databasefieldnames DBACC Uniqueaccessionnumber“D”followsbysixdigitnumber Date Entrydateofrecord Last_updated Lastdateofupdatingwithrelevantliteratureorinformation CrossreferencestocorrespondingrecordsinGenBank,EMBL,DDBJ, Accession SwissProtandPDBdatabasesifavailable PublicationreferencesandhyperlinkstothePubMeddatabase,containing Reference theauthornames,titleandjournalreference Functionalfieldnames Species Biologicalandcommonnameofthescorpion Name Listofnamesusedintheliteratureoraspecifictoxinname Mechanismofaction Descriptionofthebiologicalmechanismofthetoxinactivity Toxinaction Natureoftoxin Activity Potencyoftoxin Targetcell Speciesspecificityoftoxin Toxintype Subfamilyoftoxin Toxineffect Symptomsobservedinanimalmodelsupontoxinadministration Bindingaffinity Bindingaffinityoftoxintowardsatargetionchannel Targetionchannel.Sourceofionchannelisenclosedinparenthesisif Channel available Interactionsite Suggestpossibleresiduesinvolvedinfunctionoftoxin Comment Anyrelevantfunctionalorstructuralobservations Lethaldosage,effectivedosageorparalyticunitat50%uponadministering LD ,ED orPU 50 50 50 toxinintoanimalmodels,respectively Injectionroute Routeofadministeringthetoxinintotheanimalmodels Source Sourceofthetoxin Origin Eithernativeormutanttoxin Rel_activity Relativetoxicityofmutanttoxin Rel_binding_affinity Relativebindingaffinityofmutanttoxin Highlightsthepositionandmutationofthesequencesuchasaminoacid Mutation substitution,insertionanddeletion Structuralfieldnames AccesstoPDBtoxinstructures,homologymodelsandotherrelated Structure structuralinformation Ramachandran AccesstoaRamachandranplotofeachhomologymodelgenerated CD_spectrum Annotatecirculardichroismspectrumofmutanttoxinsequence Cys_arrangement Cysteinemotifwithnumberofinterveningresidues(X)inparenthesis Disulfide Positionsoftheresiduesinvolvedindisulfidebridges Oneoftheeighttypesofdisulfideconnectivitypatternscurrentlyobserved Disulfide_arrangement in393nativescorpiontoxinentries NatureofCterminuswhetherfreeorposttranslationallymodifiede.g. C_end amidated,sulfoxide Sequencedesignation Canbeaprecursor,matureorapartialsequencetoxin Completeaminoacidsequenceoftoxinwiththedisulfidepattern,together Translation withthesignalpeptideifpresent AAAlignment Informationonthegroupingsofscorpiontoxins Nucleicacidsequenceoftoxinthatdescribesboththeintronsandexonsif Sequence available Observationsregardingaminoaciddiscrepanciesbetweenrecords Conflict representingthesamesequenceindifferentdatabases

111 Chapter 6: Data warehouse of scorpion toxins

Thefieldsforthedatabasecategoryareasfollow.Auniqueaccessionnumber

‘DBACC ’hasbeenassignedtoeachoftherecordsintheSCORPION2database.The formatbeginswith‘D’followsbyasixdigitnumber.Thisfieldisfollowedbythe

‘Date ’and‘ Last_updated’whichidentifiestheentrydateanddateofupdatingwith relevant information, 3D structures or literature. The field ‘ Accession ’ contains hyperlinkstocorrespondingaccessionnumbersinpublicdatabasesGenBank,EMBL,

DDBJ,SwissProt,andPDB.Thefield‘ References ’providespublicationreferences and hyperlinks to the PubMed database. It contains the author names, title and the journalreference.The ‘Comment ’fieldhasbeenreservedforanyrelevantcomments orobservations.

Functionalinformationisprovidedinthefollowingfields.Thefield‘ Species ’ givesthebiologicalandtrivialnameofthescorpion.The‘ Name ’fieldcontainsalist ofnamesusedintheliteratureoraspecificname, for example “Neurotoxin KTX2

(BmSKTx2)(BmKK3)”and“makatoxin”.Someentrieshaveonlyonename,while someareknownbymultiplenames.Abriefdescriptionofthebiologicalmechanismof

+ + thetoxinactivity(suchaswhetherthetoxininteractswithNa orK channel)canbe foundinthefield‘ MechanismofAction ’.The‘ Toxin_action ’and‘ Activity ’fields describethenatureofthetoxinanditspotency.Thesefields ‘LD 50 ’,‘ED 50 ’or ‘PU 50 ’ describe the lethal dosage, effective dosage and paralytic unit at 50% upon administeringtoxinintoanimalmodels,respectively.Theanimalmodelsareenclosed within parenthesis. ‘ Injection_route ’ is the route of administering toxins into the animal models. Information on binding affinity of a toxin towards a particular ion channel is provided in ‘ Binding_affinity ’ and ‘ Channel ’ fields, respectively. The source of the experimented ion channel, if available, is found in parenthesis in the

‘Channel ’ field. The possible amino acid residues critical for toxinchannel

112 Chapter 6: Data warehouse of scorpion toxins interactionsareprovidedinthefield‘ Interaction_site ’.Thespeciesspecificityofthe toxins is provided in the field ‘ Target_cell ’. The ‘ Toxin Type ’ field specifies the subfamily of the given toxin. Source ( venom, recombinantly expressed, chemically modified and chemically synthesized ) of the toxin entries are found in ‘ Source ’.

Mutantandwildtypetoxinentriescanbedifferentiatedbyeither Mutant or Native in

‘Origin’ .Therelativeeffectofmutationontheactivityandbindingaffinitycanbe obtainedfromthefields‘ Rel_activity ’ and‘ Rel_binding_affinity ’.The‘ Mutation ’ field highlights the position and mutation of the sequence such as amino acid substitution,insertionanddeletion.Regionsofmutationarehighlightedintheprimary sequence.

The ‘ Structure’ field gives direct access to the 82 3D structures and other relatedstructuralinformationfoundinthePDBdatabase. In addition, SCORPION2 contains homology models for 542 toxins that currently lack experimentally determined3Dstructuralinformation.Thephi(CαNbond)versuspsi(CαCbond) torsion angles for all residues in the homology model can be assessed from the

‘Ramachandran ’ field. Mutation studies which included circular dichroism spectroscopytodetermineifmutationaffectedstructuralfoldingareannotatedinthe

‘CD_spectrum ’field.Thepositionsoftheresiduesinvolvedindisulfidebridges,α helicesorβstrandformationsaredescribedinthe fields ‘ Disulfide ’, ‘ Alpha_helix ’ and ‘ Beta_strand ’, respectively. The phrase “BY SIMILARITY” in the ‘ Disulfide ’ fieldstandsforputativedisulfidebridgesdeterminedbysimilaritytoknownstructures.

‘Disulfide_arrangement ’ describes one of the eight types of disulfide connectivity patternscurrentlyobservedin393nativescorpiontoxinrecords.Informationregarding the nature of the Cterminal, whether free or posttranslationally modified can be obtainedfromthefield,‘ C_end ’.The‘ SequenceDesignation ’fielddescribeswhether

113 Chapter 6: Data warehouse of scorpion toxins a particular entry is a precursor sequence, a mature toxin or a partial sequence. The completeaminoacidsequencesofthetoxinwiththedisulfidepattern,togetherwith thesignalpeptideifpresent,aredisplayedinthe field ‘Translation’ . The disulfide bridges,thesignalpeptideandthematuretoxinare distinctly colour coded for easy identification and clear understanding. The field ‘AA Alignment ’ provides the information on the classified groups of scorpion toxins. This feature of the

SCORPION2databaseisbasedonmultiplesequencealignment of sequences and a

+ unifiedgroupingoftoxins,createdusingClustalW(discussedinChapter3).TheNa channel toxins are classified into 18 groups based on sequence similarity and the

+ position of other conserved residues. Similarly, the K channel toxins have been

classifiedinto32groups.TheCl channeltoxins,scorpiondefensinsandothershort chain neurotoxins are grouped separately. All the information on the groupings of scorpion toxins can be obtained from the field ‘ AA Alignment’ . The nucleic acid sequences of the toxins that describe both the introns and exons are shown, where available,inthe‘ Sequence ’field.Thefield ‘Conflict’ containsobservationsregarding aminoaciddiscrepanciesbetweenrecordsrepresentingthesamesequenceindifferent databases.

6.5 Discussionandconclusion

SCORPION2isanimprovedresourceofnativescorpiontoxinsandartificial mutant toxins with integrated bioinformatics tools for systematic analysis of their structurefunction relationships. Inclusion of mutant scorpion toxins provides informationonaminoacididentitiesandpositionsthatmayaffecttoxinfunctionand fold.Designedusingdatawarehousingprinciples,SCORPION2issubjectorientated which provides a knowledge base to focus on and analyse scorpion toxin data.

114 Chapter 6: Data warehouse of scorpion toxins

SCORPION2 is a platform where bioinformatic analyses (Chapters 3 and 4), developmentofextractionandpredictiontoolsforstructurefunctionrelationshipsof scorpiontoxins(Chapter5)areintegrated.Itisalsoanexampleofhowtobridgethe gap between the fastpaced accumulation of scattered data and their proper managementforbetterutilisationofthecurrentknowledge.Itisalsointendedtoallow userstomakeuseofthebioinformatictoolsandlinkthemtoexperimentaldesignin thestudyofscorpiontoxins.

The SCORPION2 is an integrated resource of scorpion toxin data (3D structures,nativeandmutanttoxinsequences)largelycompiledfrompublishedreports and public databases by keyword and sequence similarity searches. Errors were eliminated by crossreferencing of entries with original papers and other databases.

Theenlargedandcleaneddatasetservestobetterrepresentthecurrentknowledgeof their structurefunction relationships. The steps of data cleaning, consistency check anddataannotationarecrucialfortheattainmentofhighqualitydatawhichfurther facilitatedetailedanalyses.Thesestepsarethemostdifficultandmosttimeconsuming in the creation of the database. Full automation of annotation helps speed up the processbutmayintroducefalsepositivesandfalsenegatives.Manualannotationby domainexpertismoreaccurateandcompletethanautomatedannotationbutintroduces adegreeofrandomerrorsnotfoundwithautomatedannotation (Ding et al ., 2004).

Therefore, combining fast automated process with accurate expert inspection could enhance the annotation process. This ideal semiautomated environment provides a human annotator to confirm, edit or reject automatically generated annotations of records.

The unique and novel aspects of the SCORPION2 includes i) 548 newly created homology models which are not available elsewhere, and ii) the functional

115 Chapter 6: Data warehouse of scorpion toxins predictionmodulewhichenableuserstoinfernewknowledgefromexistingscorpion toxin data. Important structural and functional information can be extracted from mutant toxin records. This information coupled with the available 3D structures providesuserswithaneasyaccesstoinformationrelatedtopotentialstructurefunction relationships for uncharacterised toxins. The prediction tool helps annotation and determination of functional properties of novel peptides. The architecture of the

SCORPION2databaseallowsforeasyintegrationofbioinformatictoolsforadditional analysesofscorpiontoxindata.

SCORPION2databaseisamodelformanagementofmoleculardataoftoxins of various venomous animal species. By creating data warehouses of toxins, researchers have the access to more complete data sets for further analysis and interpretationthanisavailableingeneralpurposedatabases.Thedatawarehousesof toxins integrated with analysis tools will enable classification of the data and applicationofdataminingmethodsfordiscoveryandextractionofnewknowledge.

The largescale analysis of structurefunction relationships of various toxins could draw a more accurate inference and aid understanding of the correlation among differenttoxins.

Chaptersummary

• ThehighlightofthischapteristheimplementationofSCORPION2database.It

isthefirstdatawarehouseofnativeandmutantscorpiontoxinswithintegrated

bioinformaticstoolsforuserstoanalysetheirstructurefunction relationships

andaidresearchplanningandknowledgediscovery.Itisanimprovedresource

ofscorpiontoxindata(nucleotideandprimary sequences, 3D structures and

relevantreferences)whereusersneednotquerymultipleresources.

116 Chapter 6: Data warehouse of scorpion toxins

• ThecreationofSCORPION2involvesmultiplesteps:1)accesstotoxindata

acrossmultipledatabases,2)inspectionandsubsequentcleaningoferrors,3)

annotation of structurefunction information from literature, 4) analysis and

classificationoftoxinsequencesandtheirstructures,and5)thedesignanduse

ofpredictivemodelsforsimulationoflaboratoryexperiments.

• The success of SCORPION2 serves as an example for the management of

moleculartoxindatafromvariousvenomousanimalswherethe gapbetween

the fastpaced accumulation of scattered toxin data and their proper

managementcanbebridged.

117 Chapter 7: Limits of bioinformatics

PartIII:Chapter7 Exploringbioinformaticapproachesfor functional prediction of bioactive scorpiontoxins ‘Thebestcomputerisaman,andit’sthe only one that can be massproduced by unskilledlabor.’ WernherMagnusMaximilianvonBraun

118 Chapter 7: Limits of bioinformatics

Scorpion toxins are important pharmacological tools for probing the physiological roles of ion channels which have significant therapeutic potential for an array of medicaldisorders(RodriguezdelaVegaet al .,2003;Blank et al .,2004;Bagdanyet al ., 2005). The discovery of new scorpion toxins with different specificities and affinitiesisneededtofurthercharacterisethephysiologyofionchannels.However,to serveaseffectiveprobes,thenewtoxinsidentifiedmustnotonlybespecificbutalso bind with high affinity to targeted ion channel. Binding affinity data of native and artificialmutantscorpiontoxinstovariousionchannelsubtypesareavailableinthe literature.Thesedatacanbeusedtodevelopatool for predicting toxin strength of bindingaffinitytospecificionchannel.Accuratepredictionoftoxinbindingstrength facilitatesidentificationofhighaffinitytoxinbinderswhichimprovesefficiencyand theeconomyofexperimentation.

This is the first report of a prediction tool based on a bioinformaticdriven approachinvolvingsequencecomparison,nearestneighbouranalysis,decisionrules, scaledbindingaffinitydata,andfunctionalmotifs(discussedinChapter4)topredict strengthofbindingaffinitytoionchannelsinthefieldofvenomresearch.

7.1 MaterialsandMethods

Native and artificial scorpion toxin data were collected and enriched with bindingaffinityinformationfromliterature(discussedinSection6.3).Atestsetof26 newlyidentifiedscorpiontoxinswithexperimentallydeterminedbindingaffinitydata wasusedforevaluatingthepredictionaccuracy.Bindingaffinitydatawerescaledtoa commonscale(discussedinSection4.1.1)andclassifiedintofourcategories,namely

‘nonbinding’,‘low’,‘medium’and‘high’binding(Table11 ).Thesefourcategories

119 Chapter 7: Limits of bioinformatics wereusedforsequencecomparisonoftoxinsequences.Thefunctionalmotifsspecific toNa +andK+channels(discussedinSection4.2)wereincorporatedintoanalgorithm forpredictingionchannelspecificityandstrengthofbindingaffinity.Aflowchartof theapproachisavailableinFigure34 .

Table 11 Four categories of strength of binding affinity were introduced, namely ‘Nonbinding’,‘Low’,‘Medium’and‘High’fromthemappingofbindingaffinitytoa commonscalediscussedinSection4.1.

Class 1 2 3 4 5 6 7 8 Mapped Non Very Low Moderately Moderate Moderately High Very affinity binding low low high high Non Prediction Low Medium High binding

Collectbindingaffinitydataofnativeandmutantscorpiontoxins

Mapsodiumand Incorporatefunctional potassiumbindingaffinity motifsofsodiumand datatofourcategories potassiumtoxins

Predictionalgorithmbysequencecomparison, Submitquery nearestneighbouranalysisandrulebased

Predictionchannelspecificityand strengthofbindingaffinity

Figure 34 Flowchart of predicting ion channel specificity and strength of binding affinity.

120 Chapter 7: Limits of bioinformatics

7.1.1 Algorithmforpredictingstrengthofbindingaffinityofscorpiontoxins

Prediction of strength of binding affinity to K + and Na + ion channels was performed by modifying the algorithm described in Section 5.2.2. This modified modulealsocombinessequencecomparison,nearestneighbouranalysisanddecision rulesforassigningaputativeclassificationofaquerysequencetoastructurefunction group. The grouping of the query to similar sequences permits nearest neighbour analysistoassignthescaledbindingaffinitydataintheneighboursequencestothe query.Forexample,ifthequerymatchedneighboursequenceswith‘High’and‘Very high’mappedaffinities,itwaspredictedas‘High’affinityandsoforthwithprediction of ‘Medium’, ‘Low’ and ‘Nonbinding’ categories. The query sequence was also compared against scorpion mutant toxin data by BLAST program (Altschul et al .,

1997) where residue(s) in the query sequence that matched with mutations had the relativebindingaffinityinformationextracted.Ifresiduesinthequerydidnotmatch anymutation,itwascomparedagainstthefunctionalmotifs(discussedinSection4.2) tohighlightthepositionsandidentitiesofaminoacidsinthequerysequence,which mayaffectstrengthofbindingaffinity.

7.2 Results

Theaccuracyofpredictionswasvalidatedbysubmitting26newlyidentified scorpiontoxinswithexperimentaldata.Allpredictionsofionchannelspecificitywere correct (100%) ( Table 12 ). For strength of binding affinity, two agreed with experimentaldata(7.7%)(Meki et al .,2000;Alami et al .,2003)( Figure35,Figure

36), three were wrong (11.5%) and 21 were not predicted (80.8%). The tool also highlightedpositionsandresiduesinthequerysequencewhichmayaffectthestrength ofbindingaffinity.

121 Chapter 7: Limits of bioinformatics

Validation with experimental data has shown that this approach is not predictiveinthestrengthofbindingaffinityofscorpiontoxinstoionchannel.Thus, thereisalimittotheuseofbioinformaticsinpredictionoftoxinbindingstrengthbut notspecificitytoionchannels.

Table 12 Predicted ion channel specificity and strength of binding affinity for 26 newly identified scorpion toxins with experimental data. Toxins are arranged alphabetically.Purepresentsprediction,Exrepresentsexperimentaldata,*represents notpredicted,NB,L,MandHrepresentnonbinding,low,mediumandhighbinding, respectively.KandNarepresentpotassiumandsodiumionchannels,respectively. Toxinname Ionchannel Strength Reference Pu Ex Pu Ex AmmTX3 K K L M Vacher et al. ,2002 AmmVIII Na Na M M Alami et al. ,2003 Anuroctoxin K K * M Bagdany et al. ,2005 BmK37 K K * M Xu et al.,2004a BmKIM2 Na Na * L Peng et al. ,2002 BmKSKTx1 K K * L Xu et al. ,2004b BotIT6 Na Na * M Mejri et al. ,2003 BtK2 K K * L Dhawan et al., 2003 Cn12 Na Na * L delRioPortilla et al. ,2004 CsEKerg1 K K * M Nastainczyk et al. ,2002 Discrepin K K * M D'Suze et al. ,2004a IsTx K K * L Yamaji et al. ,2004 Kbot1 K K * M MahjoubiBoubaker et al. ,2004 KTX3 K K M M Meki et al. ,2000 Lqh6 Na Na NB M Hamon et al. ,2002 Lqh7 Na Na NB M Hamon et al. ,2002 Lqhβ1 Na Na * H Gordon et al. ,2003 Tc30 K K * L Batista et al. ,2002a Tc32 K K * M Batista et al. ,2002a Tc48a Na Na * M Batista et al. ,2004 TsPep1 K K * NB Pimenta et al. ,2003 TsPep2 K K * NB Pimenta et al. ,2003 TsPep3 K K * NB Pimenta et al. ,2003 TtButtoxin K K * L Coronas et al. ,2003a Tz1 Na Na * L Borges et al. ,2004 κKTX1.3 K K * NB Nirthanan et al. ,2005

122 Chapter 7: Limits of bioinformatics

Figure 35 Prediction of the binding affinity of KTX3 from Buthus occitanus tunetanus . It is predicted to target K + channel with medium binding affinity which agreeswithexperimentalresults(IC 50 =0.05nM)(Meki et al .,2000).Thepositions and residue identity in the query sequence that do not coincide with the functional motifsarehighlighted.

123 Chapter 7: Limits of bioinformatics

Figure 36 Prediction of the binding affinity of AmmVIII from Androctonus mauretinicus mauretinicus .ItispredictedtotargetNa +channelswithmediumbinding affinitywhichagreeswithexperimentalresults(EC50 =29nMand416nM)(Alami et al ., 2003). The positions and residue identity in the query sequence that do not coincidewiththefunctionalmotifsarehighlighted.

7.3 Discussionandconclusion

Theapplicationofbioinformaticapproachtofunctionalpredictionofbinding affinity of scorpion toxins to ion channels was explored in this chapter. There are several possible reasons for the poor performance in predicting strength of binding affinitywheremajoritywerenotpredicted.First,thenumberofbindingaffinitydata availableisinsufficient.Thepharmacologicaldataformanyscorpiontoxinsremainto

124 Chapter 7: Limits of bioinformatics bedetermined.Withnobindingaffinityinformation,nearestneighbouranalysiscannot ascribe an unknown function in the nearest neighbour to the query. Second, the sequencesimilarityofthequeryistoodiversewherenonearestneighbourisfoundas observed in the new groups proposed for the newly identified scorpion toxins

(discussed in Chapter 3). Third, toxinchannel binding processes are extremely complicated, determined by many factors such as shape and size complementary, hydrophobicityandpolarity.Also,toxinsandchannelproteinsareflexiblemolecules, andtheenergyinventorybetweentheboundandunboundstatesmustbeconsideredin aqueoussolution(GohlkeandKlebe,2002).Thisinformationcannotbecapturedby primary sequence similarity. Lastly, the algorithm may be inadequate to solve the predictionoftoxinbindingstrengthtoionchannels.

Inconclusion,thepredictionofstrengthofbindingaffinityofscorpiontoxins toionchannelscannotbeachievedwiththenearestneighbourmethod.Thelimitsof the current approach have been explored in this work and as new data become available,thesituationmightchange.

Chaptersummary

• There is a need to discover new scorpion toxins with high specificities and

affinitiesforfurtherphysiologicalcharacterisationoftheionchannels.

• Accuratepredictionoftoxinbindingstrengthaidsidentificationofhighbinders

whereweakornonbindersneednotbetested,thusimprovingtheefficacyof

experimentation.

• The limits of the current approach to predict toxin binding strength was

exploredasvalidatedbyexperimentaldata.

125 Chapter 8: General Discussion

PartIV:Chapter8 Generaldiscussion ‘Life is a continuous exercise in creative problemsolving.’ MichaelJ.Gelb

126 Chapter 8: General Discussion

Theintroductionofbioinformaticsasadisciplineinthelastquarterofthe20 th century, it has revolutionised the approach in which biological research is conducted.

Researchersareincreasinglyconductingsearchesinpublicdatabasesforcharacterised sequencesthatmatchtheirsequencesbeforeconductingexperimentstodeterminetheir function. Bioinformatics narrows down the number of essential experiments needed andthusexpeditesthe discoveryprocess. Inthiswork,theauthoremployed atop downapproachwhichservesasanemergingalternativestrategyforpredictionofthe structuralandfunctionalpropertiesofthelargepoolofuncharacterisedscorpiontoxins incontrasttotraditionalexperimentalbottomupapproach.Couplingbothapproaches havetheadvantageofreducinglaboriousandtimeconsumingbenchwork.

In this thesis, application of bioinformatics to venom research was accomplishedbydatacollection,cleaningandenrichment,classificationandanalyses, developmentofpredictiontool,andimplementationofspecialiseddatawarehouseof bioactive toxins in scorpion venom as an example. During the process of data collection and cleaning the public database records, several errors in the data were identifiedandcorrected.Examplesincludehighredundancyduetomaintenanceofthe samesequenceindifferentpublicdatabases,discrepanciesinprimarysequencesand conflicting annotation. More biological data artifacts, which affect accuracy of data mininghasbeenstudiedbyKoh et al .(2004b).Datacheckingandcorrectionarethus criticalfortheimprovementofdataquality.Interpretationofuncleandataisnormally inaccurateanderrorswillbepropagatedinsubsequentanalysiswherehighdataquality is important for accurate predictions. Further, the basic information in the records prevented application of data mining. Thus, annotation of functional and structural information from literature to records was manually performed to improve the data

127 Chapter 8: General Discussion quality.However,manualannotationdoesnotscaleuptocapacitiesneededforlarge scaleanalysisofbioactivetoxins.

Properdataclassificationisalsoneededfortheaccuracyofpredictions.There are two principal approaches taken for data classification, manual and automatic.

Manual classifications are based on human expertise, facilitated by bioinformatic analyses,toclusterdataintoparticulargroupsthatsharecommonpropertiesdefinedby domain experts. Examples include the manually curated SwissProt and PROSITE databases.Theotherapproachisautomaticclassificationswhichdependonalgorithms or models to generate matrices for similarity or distance that are then processed to determinethesegroups.Examplesofautomaticclassificationalgorithmsincludeself organisedmaps,artificialneuralnetwork,andsupportvectormachineswhichbelong tothefieldsofartificialintelligenceandmachinelearning(Kapetanovic et al .,2004).

ProDomandDOMO(Gracy andArgos,1998) amongothers, address classification more systematically with automated processes that classify entire protein sequence databases.Theadvantageofmanualclassificationisthehighqualityofclusteringbut the final classification result may be irreproducible because of differences in the experts’knowledge.Incontrast,automationisfullyreproduciblebecauseoffixedrules writtenincomputerprogramsandscalabletolargedataset.

A combination of different bioinformatic approaches was performed for the largescale analyses of toxin data. Multiple alignments of protein sequences are an effective way of identifying conserved amino acids that provide clues to functional relationships among proteins. The patterns of amino acid variability in multiple sequence alignments reveal evolutionary pressure, mutation, recombination and genetic drift that spans millions of years (Valdar,2002).Conservedresiduesamong related toxin families are usually involved in functional properties of the peptides

128 Chapter 8: General Discussion

(Gilquinet al .,2002).Closelyinterlinkedwithfunctionalpropertiesofproteinsisthe correct structural folding of proteins. A linear protein sequence must be folded correctly in spatial organisation for maintaining the optimal positions of functional residuesontheproteinmoleculeforinteraction.Thus,3Danalysishelpsunderstand spatial properties and molecular structure of toxins, which provide clues for identification of interaction sites and key structurefunction features. The key structurefunction information from mutation studiesoftoxinsisextractedfromthe literatureforanalysesandextractionoffunctionalmotifs.Therefore,acombinatorial approach to analyse toxins, as demonstrated in this work, can greatly enhanced understandingoftheirmolecularfunction.

Predictionofaproteinfunctionfromprimarysequenceandtertiarystructureis a challenging issue, because homologous proteins often have different function.

Thoughdetailedstudiesondeterminationofafunctioninisolatedbioactivetoxinare performed, researchers cannot be confident that the toxin’s full repertoires of biological activities are known. Many methods of function prediction rely on identifyingsimilarityinsequenceandstructurebetweenaproteinofunknownfunction and one or more characterised proteins (Whisstock and Lesk, 2003). Prediction of function from protein sequence based on detecting sequence similarity or matching motifs have two main weaknesses. First, a similar protein must be available for comparativeanalysiselsepredictionisnotmade.Forinstance, Annotate scorpion tool cannot predict functional properties of κhefutoxins from Heterometrus fulvipes becausenosimilarsequencewasfoundtoadoptahairpinstructureoftwoshorthelices crosslinked with two disulfide bridges (Srinivasan et al ., 2002b; Nirthanan et al .,

2005)whichisdifferentfromtheCSHfoldinmajorityofscorpiontoxins.Second,at least one of the similar proteins must be characterised in order for inference of

129 Chapter 8: General Discussion function. When this fail, new methods which are not reliant on alignments are appearing such as gene neighbour, inductive logical programming and text analysis

(Dobson et al .,2004).

The implementation of specialised data warehouse helped to systematise the approachforindepthanalysisanddiscoveryofnewknowledgeofbioactivetoxinsin scorpionvenomwhereitcapturesdomainexpertiseintheanalysisandinterpretation oftheclassifiedtoxinsequences.Theauthorenvisages this database as a model for management of molecular data of toxins of various venomous animal species. A collection of different specialist toxin databases provides researchers access to the most complete toxin data sets for further analysis, interpretation and formulating hypotheses.Theintegratedanalysistoolsinthespecialisttoxindatabaseswillenable researcherstoclassifythetoxindataandapplydataminingmethodsfordiscoveryand extraction of new knowledge. The largescale analysis of structurefunction relationshipsofvarioustoxins,supportedbybioinformatics,couldaidunderstanding ofthecorrelationamongdifferenttoxins.

Chaptersummary

• Bioinformatics, when applied appropriately by researchers, helps in

determining the key experiments to be undertaken, thus improving the

efficiencyofbiologyresearch.

• In this work, bioinformatics was successfully applied to scorpion venom

research by data collection, cleaning and enrichment, classification and

analyses, development of prediction tool, and implementation of specialised

datawarehouseofbioactivescorpiontoxins.Datacleaningandenrichmentis

essentialformaintaininghighdataqualityneededforaccurateprediction.

130 Chapter 8: General Discussion

• A combination of different bioinformatic approaches such as analyses of

information from mutation studies, multiple sequence alignments and 3D

structureswasperformedforthelargescaleanalysesofscorpiontoxindata.

• Creation of the scorpion toxin data warehouse aids systematic analysis and

discoveryofnewknowledgeofbioactivetoxinsinscorpionvenom.

131 Chapter 9: Conclusion

PartIV:Chapter9 Conclusion ‘Concern for man himself and his fate mustalwaysformthechiefinterestofall technicalendeavors,concernforthegreat unsolvedproblemsoftheorganisationof labor and the distribution of goods – in orderthatthecreationsofourmindshall beablessingandnotacursetomankind. Never forget this in the midst of your diagramsandequations.’ AlbertEinstein

132 Chapter 9: Conclusion

9. Conclusion

Thisworkstartedwithaninvestigationoftheuseofbioinformaticbasedapproachto the largescale study of structurefunction relationships of scorpion toxins. By the systematic application of bioinformatics to scorpion venom research, this work has definedthenovelfieldofvenominformatics.Becausethisthesisistohisknowledge the first research project that deals systematically with bioinformatics in venom research,theauthorhadtotakeanarrowfocus,coveringonlybioactivetoxinsfrom scorpionvenom.Theauthorfocusedontheclassification, analyses, development of predictivetoolandcreationofscorpiontoxindatawarehouseforlargescalestudyof scorpiontoxins.Bycompletingthisproject,theauthorhasadvancedtheknowledge about the bioinformaticbased approach in venom research. The author now summariseshisconclusions.

9.1 Largescaleclassification

Largescale classification of all known toxins according to their function is necessaryforclarifyingthecurrentknowledge,includinganoverviewofthefunctional repertoireofthetoxins.Suchknowledgewillfacilitatefunctionalassignmentofnewly identified bioactive toxins. Largescale classification is an important step towards effective information management where relevant biological information can be retrieved from vast amounts of toxin data. Classification of related biological sequences can be used to predict the function of an unknown sequence based on inference of homology between the unknown and the characterised sequences in a class.

133 Chapter 9: Conclusion

Here, the author has described a systematic largescale classification of 393 currently known scorpion toxin sequences into 62 groups based on ion channel specificity and primary sequence similarity, combined with multiple sequence alignmentsandphylogeneticanalyses.Withalarge repositoryoftoxinsequences,a broad perspective of the general patterns in their structure and function can be observed. This present largescale classification of different toxins relates to current knowledgeofthefieldandasmoreinformationbecomesavailableinthefuture,itis expectedthatrevisionofclassificationwillbenecessary.Thisclassificationapproach, asagenerictool,canbeappliedtolargescaleclassificationofotherbioactivetoxins andfamiliesofbioactivepeptides.

9.2 Largescaleanalysis

With the accumulation of toxin data and increasing information of their complexstructurefunctionproperties,thereisaneedtodevelopnewcomputational strategiesforextractionofinformationfromlowlevelrawdatatosupportlargescale venomresearch.Thelargescaleanalysisofstructurefunctionrelationshipsofvarious animaltoxinscoulddrawmoreaccurateinferencesandaidourunderstandingofthe correlationamongdifferenttoxins.

Theapproachtoincludestructurefunctioninformationfrommutationstudies of scorpion toxins combined with multiple sequence alignment and 3D structure analyses provide biological significance to the extraction of functionally relevant motifs,complementingmotifsobtainedstatistically.Thisworkreportsthefirstbinding motifs for four K + ion channel subtypes (voltagedependent K + channels, large and smallconductance Ca 2+activated K + channels, and etheragogo K + channel), four binding site motifs for Na + channels and a conserved motif for Cl channels. This

134 Chapter 9: Conclusion systematic approach of including mutant data of scorpion toxins and 3D structure analysesforextractionofmotifscanserveasamodelforotherbioactivetoxinswhere mutationstudiesand3Dstructuresareavailable.

9.3 Developmentoffunctionalpredictiontool

Laboratory experiments involving large number of sequences are extremely costlyandtimeconsuming.Incontrast,theadvantagesofapplyingbioinformaticsin biological research are lower cost, speed and efficiency. Identification and optimisation of key experiments can be achieved by combining conventional experimentalmethodsforstructurefunctionanalysis,suchassitedirectedmutagenesis and chemical modifications, with bioinformaticdriven structurefunction study of toxins for identification of key experiments and help optimisation of experimental design. Bioinformaticsbridgesthe gapbetween the fast data growth and the slower paceofexperimentalvalidationstudiesbyfacilitatingdataanalysesandinterpretation, andunderstandingbiologicalprocesses.Thecomputationalalgorithmsdevelopedfrom data analyses play an increasingly important role in the formation and testing of hypothesestodeterminefunctionofproteinsmorerapidly.

Database mining for discovery and extraction of new knowledge is new in toxinology, with excellent prospects to facilitate future advances of this field. In addition,thedetailedstructuralandfunctionalgroupingofproteinsequencescanbe usedforaccuratepredictionoffunctionalpropertiesofothertoxinsandotherfamilies of bioactive peptides. In this era of largescale screening using genomics and proteomics, venominformatics will become increasingly important for management and analysis of toxin data and provides a framework for efficient analysis and maximisationofknowledgeextractionfromlargeamountsofthesepharmacologically

135 Chapter 9: Conclusion importanttoxindata.

A systematic approach to the prediction of functions of bioactive toxins in scorpion venom through bioinformatics was accomplished in this work. Detailed classification of scorpion toxin sequences into wellorganised groups allows better correlation of structurefunction relationships and thereafter, classification of new sequencesbythepredictiontool.Thisworkdemonstrated high accuracy (>90%) of predicting ion channel specificity and toxin family based on sequence comparison, nearest neighbor analysis and decision rules. This generic bioinformaticdriven approach serves as a model for functional prediction of novel toxins from other venomous animals. Functional predictions of bioactive toxins help to minimise the number of experiments performed by narrowing the validation processes with the predictedfunction. Itcomplementswetlabexperimentsasdemonstratedinthewide application of bioinformatics in various fields including toxicology (Fielden et al .,

2002),drugdiscovery(Gagna et al .,2004),genetics(FishelsonandGeiger,2004)and pharmacology (Ross et al ., 2004), creating a drywet cycle. The application of bioinformaticsinthefunctionalpredictionoftoxinbindingstrengthtomolecularion channeltargetsiscurrentlylimitedasvalidatedbyexperimentaldata.

9.4 Datawarehouseofscorpiontoxins

Biologicaldatabasesnowplayacentralroleindirectingmedicalandbiological research with the huge amount of data generated by genomics and proteomics, and highthroughput technologies. The number of newly identified toxin sequences is growing fast, while their functional characterisation is lagging. These toxin data are depositedacrossdifferentpublicdatabases,usuallywithprimarysequenceinformation only.Discrepanciesbetweendatabaserecordsarecommon.Structuralandfunctional

136 Chapter 9: Conclusion information,inparticularfrommutationstudies,isavailablemainlyintheliterature.

Analyses of toxin data are growing increasingly difficult and there is a need for cleanedandwellannotateddatabasesoftoxins.

Datawarehousesoftoxinsserveasvaluableresourcesforexplorationoftoxins, allowing users to query complex biological questions that may usually involve searchingmultiplesources.Researchershavetheaccesstothemostcompletedatasets for analysis and interpretation. This thesis has demonstrated such applications with relativelysmalldatasetsofscorpiontoxindata(upto1000entries),whileworkwith larger data sets will require additional data management and data analysis tools, supported by bioinformatics (Koh et al ., 2004a). This thesis had systematically collectedandcombinedtoxinsequenceand3Dstructuredata,andhypothesisdriven resultsfromliteratureintohighlyusefulresourcesforexperimentalbiologists.More than500homologymodelsofscorpiontoxinshavebeengeneratedanddepositedin the SCORPION2 database. The 3D models of these toxins coupled with multiple sequence alignment of each group in the toxin classification, can be employed to investigatethefunctionalresiduesofeachpharmacologicalactivity.Theutilisationof knowledge, particularly of functional information, has been made more efficient by enricheddescriptionsoftoxinentriesandintegrationofbioinformatictoolstofacilitate comprehensiveanalysisofthedataandfordatamining.

9.5 Evaluationofapplicationofbioinformaticsinvenomresearch

Thefieldofvenominformatics,amarriagebetweenbioinformaticsandvenom biology,isinitsinfancy.Butalreadyithasthepotentialtorevolutionisethewaythat researchers manage venom related data and information. Venominformatics is a systematicapproachthatfacilitatesdiscoveryofnewknowledgeeitherthroughdirect

137 Chapter 9: Conclusion discovery, or through support for efficient experimentation by preselection of the most interesting toxin candidates. It bears the potential to expedite drug discovery process and accurate prediction of functional properties of novel toxins based on existingdata.Becausetoxinsarefunctionallydiverse,butbelongtoalimitednumber of structural families, they are ideal for application of data mining techniques for discovery of previously unknown relationships among data. The venominformatics lessonswillbeusefulforstudyofstructurefunctionrelationshipsofdiversetypesof bioactivetoxins.

Conclusionsummary

Theworkdescribedinthisthesisisthefirstandpioneeringworkinthefieldof venom research to the author’s knowledge. In summary, the following general conclusionscanbedrawn:

• Bioinformaticsisagrowingfieldinthepostgenomeeraandislikelytoshape

futureresearchinboththeoreticalandappliedvenomresearch.Theauthorhas

demonstrated the application of bioinformatics in the largescale study of

scorpiontoxins.

• Thelargescaleclassificationof393currentlyknownscorpiontoxinsprovides

aglobalviewoftheirstructurefunctionrelationships(describedinChapter3).

• Inclusionofinformationfrommutationstudiesofscorpion toxins, combined

with multiple sequence alignment and 3D structure analyses supports

extraction of functionally relevant binding motifs for scorpion toxins

(describedinChapter4).

• The author has demonstrated that combining bioinformatics and venom

research can increase the efficiency of the discovery process where the

138 Chapter 9: Conclusion

functional prediction tool can be used for support and planning of the

experiments (e.g. validation of the predicted function of newly identified

scorpion toxins, described in Chapter 5). This combination is a viable

methodology for acquiring new knowledge in venom research where the

growthofnewtoxinsequencesidentifiedisincreasing.

• In this work, the author has created the first data warehouse of native and

mutant scorpion toxin data that have been cleaned, enriched, classified and

analysed, and integrated with bioinformatic tools for efficient and effective

discoveryofnewknowledge.

9.6 Futureworks

Therearetwoprincipaldirectionsforthedevelopmentofbioinformaticsinthis venom research, namely the development of data warehouses and development of functional prediction methods. The development of data warehouse includes data updateontopofdatacollection,datacleaningandintegrationofbioinformatictools.

Data update focuses on adding new toxin sequences identified, new 3D structures solvedandnewinformationpublishedinliteratureintothedatawarehouse.Thenew data help to verify hypotheses made during analyses of initial dataset while new information can provide insights for further analysis. Important discoveries of sequence relationships have been missed because old or incomplete databases were used(Altschul et al .,1994).Forinstance,originaldatasubmittedbysequencersinto themajornucleotidesequencedatabaseGenBank/EMBL/DDBJ(Benson et al .,2005) arenotupdatedunlessarevisionissubmittedbythesame group(Wu et al .,2003).

Thus data remains uninformative though more recent knowledge is available. The authorproposestodevelopasystemthatcanperiodicallyperformdataupdateanda

139 Chapter 9: Conclusion platform for domain expert to screen and discard irrelevant data so as to increase efficiency.Completeanduptodatedatabasesofbiologicalknowledgearecriticalfor informationdependentbiologicalresearch(Apweiler et al .,2004).

Thefurtherdevelopmentoffunctionalpredictionmethodsinvenomresearch dependslargelyontheidentificationofbiological problems that can be transformed into computational questions. The biological problem of determining the interaction surface of toxinchannel complex can be formulated into computational algorithm wheredatafromthermodynamicmutantcyclesoftoxinchannelcomplexescouldbe analysed. Thermodynamic mutant cycles provide valuable information for studying energetic coupling between amino acids on the interacting surface (Hidalgo and

MacKinnon,1995)andthustheinteractingaminoacidscanbedeterminednotonlyon thetoxins,butalsoionchannelswherepotentialtherapeuticscanbedevelopedfrom thisinformation.Recently,thecrystalstructureofvoltagedependentK +channelwas solved(Jiang et al .,2003)whichprovidesanopportunitytoallowhomologymodeling ofotherionchannelsandperformcomputersimulationoftoxindockingontotheion channelsfordeterminingtoxinchannelinteractingresidues.

The current sequencing of three venomous animal genomes provides an opportunityforcomparativegenomicanalysisoftoxins.Computationalmethodshave beenusedtocomparebetweengenomesequencestopredictproteinproteininteraction basedontheassumptionthatthegenesoffunctionallyinteractingproteinstendtobe associated with each other on genomes (Huynen et al .,2003).Promisingresultsfor deductionofspecificfunctionsfornumerousproteinsareobtainedfromcomparative analysisofcompletegenomesbyanalysingphylogeneticprofilesofproteinfamilies, domain fusions, gene adjacency in genomes and expression methods (Galperin and

Koonin,2000;Marcotte,2000).Thefunctionalpredictionmethodusedinthisthesisis

140 Chapter 9: Conclusion suitableforfunctionalpredictionoftoxinsthatsharesequencesimilarity.Incorporation of comparative genomes can be used to predict function of toxins that lack characterisedhomologues.

Theauthoranticipatesthatbioinformaticswillincreaseinimportanceinvenom researchfield.Giventheimportanceoftoxinstoscientific,medicaland commercial applications,thenumberofapplicationsstemmingfromvenomresearchwillbehuge.

141 References

References

Aiyar, J., Withka, J.M., Rizzi, J.P., Singleton, D.H., Andrews, G.C., Lin, W., Boyd, J.,

Hanson,D.C.,Simon,M.,Dethlefs,B. et al .,1995.Topologyoftheporeregionofa

K+channelrevealedbytheNMRderivedstructuresof scorpion toxins. Neuron 15,

11691181.

Alami,M.,Vacher,H.,Bosmans,F.,Devaux,C.,Rosso,J.P.,Bougis,P.E.,Tytgat,J.,Darbon,

H. and MartinEauclaire, M.F., 2003. Characterisation of Amm VIII from

Androctonus mauretanicus mauretanicus : a new scorpion toxin that discriminates

betweenneuronalandskeletalsodiumchannels.BiochemJ375,551560.

Ali, S.A., Stoeva, S., Grossmann, J.G., Abbasi, A. and Voelter, W., 2001. Purification,

characterisation,andprimarystructureoffourdepressantinsectselectiveneurotoxin

analogs from scorpion ( Buthus sindicus ) venom. Arch Biochem Biophys 391, 197

206.

Alonso, D., Khalil, Z., Satkunanthan, N. and Livett, B.G., 2003. Drugs from the sea:

conotoxinsasdrugleadsforneuropathicpainandotherneurologicalconditions.Mini

RevMedChem3,785787.

Altschul, S.F., Boguski, M.S., Gish, W. and Wootton, J.C., 1994. Issues in searching

molecularsequencedatabases.NatGenet6,119129.

Altschul,S.F.,Madden,T.L.,Schaffer,A.A.,Zhang,J.,Zhang,Z.,Miller,W.andLipman,

D.J., 1997. Gapped BLAST and PSIBLAST: a new generation of protein database

searchprograms.NucleicAcidsRes25,33893402.

Apweiler,R.,Bairoch,A.andWu,C.H.,2004.Proteinsequencedatabases.CurrOpinChem

Biol8,7680.

Arnon,T.,Potikha,T.,Sher,D.,Elazar,M.,Mao,W.,Tal,T.,Bosmans,F.,Tytgat,J.,Ben

Arie,N.andZlotkin,E.,2005.BjalphaIT:anovelscorpionalphatoxinselectivefor

insects–uniquepharmacologicaltool.InsectBiochemMolBiol35,187195.

Attwood,T.K.,Bradley,P.,Flower,D.R.,Gaulton,A.,Maudling,N.,Mitchell,A.L.,Moulton,

142 References

G.,Nordle,A.,Paine,K.,Taylor,P.,Uddin,A.andZygouri,C.,2003.PRINTSandits

automaticsupplement,prePRINTS.NucleicAcidsRes31,400402.

Attwood,T.K.,2000.Thequesttodeduceproteinfunctionfromsequence:theroleofpattern

databases.IntJBiochemCellBiol32,139155.

Bagdany, M., Batista, C.V., ValdezCruz, N.A., Somodi, S., Rodriguez de la Vega, R.C.,

Licea,A.F.,Varga,Z.,Gaspar,R.,Possani,L.D.andPanyi,G.,2005.Anuroctoxin,a

newscorpiontoxinoftheαKTx6subfamilyishighlyselectiveforK V1.3overIK Ca 1

ionchannelsofhumanTlymphocytes.MolPharmacol67,10341044.

Bairoch,A.,Boeckmann,B.,Ferro,S.andGasteiger,E.,2004.SwissProt:jugglingbetween

evolutionandstability.BriefBioinform5,3955.

Bateman,A.,Coin,L.,Durbin,R.,Finn,R.D.,Hollich,V.,GriffithsJones,S.,Khanna,A.,

Marshall,M.,Moxon,S.,Sonnhammer,E.L.,Studholme,D.J.,Yeats,C.andEddy,

S.R.,2004.ThePfamproteinfamiliesdatabase.NucleicAcidsRes32,D138D141.

Bateman,A.,Birney,E.,Durbin,R.,Eddy,S.R.,Howe,K.L.andSonnhammer,E.L.,2000.

ThePfamproteinfamiliesdatabase.NucleicAcidsRes28,263266.

Batista,C.V.,delPozo,L.,Zamudio,F.Z.,Contreras,S.,Becerril,B.,Wanke,E.andPossani,

L.D.,2004.ProteomicsofthevenomfromtheAmazonianscorpion Tityus cambridgei

and the role of prolines on mass spectrometry analysis of toxins. J Chromatogr B

AnalytTechnolBiomedLifeSci803,5566.

Batista,C.V.,GomezLagunas,F.,RodriguezdelaVega,R.C.,Hajdu,P.,Panyi,G.,Gaspar,

R.andPossani,L.D.,2002a.TwonoveltoxinsfromtheAmazonianscorpion Tityus

+ cambridgei that block K V1.3 and Shaker B K channels with distinctly different

affinities.BiochimBiophysActa1601,123131.

Batista,C.V.,Zamudio,F.Z.,Lucas,S.,Fox,J.W.,Frau,A.,Prestipino,G.andPossani,L.D.,

2002b.Scorpiontoxinsfrom Tityus cambridgei thataffectNa +channels.Toxicon40,

557562.

Becerril,B.,Corona,M.,Coronas,F.I.,Zamudio,F.,CalderonAranda,E.S.,Fletcher,P.L.,Jr.

Martin, B.M. and Possani, L.D., 1996. Toxic peptides and genes encoding toxin

143 References

gammaoftheBrazilianscorpions Tityus bahiensis and Tityus stigmurus .BiochemJ

313,753760.

Becerril,B.,Marangoni,S.andPossani,L.D.,1997.Toxinsandgenesisolatedfromscorpions

ofthegenus Tityus .Toxicon35,821835.

Benkhadir,K.,Kharrat,R.,Cestele,S.,Mosbah,A.,Rochat,H.,ElAyeb,M.andKaroui,H.,

2004.MolecularcloningandfunctionalexpressionofthealphascorpiontoxinBotIII:

pivotalroleoftheCterminalregionforitsinteractionwithvoltagedependentsodium

channels.Peptides25,151161.

Benson, D.A., KarschMizrachi, I., Lipman, D.J., Ostell, J. and Wheeler, D.L., 2005.

GenBank.NucleicAcidsRes33,D34D38.

Blank,T.,Nijholt,I.,Kye,M.J.andSpiess,J.,2004. Small conductance Ca 2+ activated K +

channelsastargetsofCNSdrugdevelopment.CurrDrugTargetsCNSNeurolDisord

3,161167.

Bontems,F.,Roumestand,C.,Gilquin,B.,Menez,A.andToma,F.,1991.Refinedstructureof

charybdotoxin:commonmotifsinscorpiontoxinsandinsectdefensins.Science254,

15211523.

Borges, A., Alfonzo, M.J., Garcia, C.C., Winand, N.J., Leipold, E. and Heinemann, S.H.,

2004. Isolation, molecular cloning and functional characterisation of a novel beta

toxinfromtheVenezuelanscorpion, Tityus zulianus .Toxicon43,671684.

Bradbury,J.,2003.Death,whereisthysting?DrugDiscovToday8,1099.

Briggs, J.C., 1987. Antitropical distribution and evolution in the IndoWest Pacific Ocean.

SystZool36,237247.

Brown,N.P.,Leroy,C.andSander,C.,1998.MView:awebcompatibledatabasesearchor

multiplealignmentviewer.Bioinformatics14,380381.

Brusic,V.,Zeleznikow,J.andPetrovsky,N.,2000.Molecularimmunologydatabasesanddata

repositories.JImmunolMethods238,1728.

Cai,Z.,Xu,C.,Xu,Y.,Lu,W.,Chi,C.W.,Shi,Y.andWu,J.,2004.Solutionstructureof

BmBKTx1,anewBK Ca 1channelblockerfromtheChinesescorpion Buthus martensi

144 References

Karsch.Biochem43,37643771.

Catterall,W.A.,2002.Molecularmechanismsofgatinganddrugblockofsodiumchannels.

NovartisFoundSymp241,206218;discussion218232.

Ceard, B., MartinEauclaire, M. and Bougis, P.E., 2001. Evidence for a positionspecific

deletionasanevolutionarylinkbetweenlongandshortchainscorpiontoxins.FEBS

Lett494,246248.

Chagot,B.,Pimentel,C.,Dai,L.,Pil,J.,Tytgat,J.,Nakajima,T.,Corzo,G.,Darbon,H.and

Ferrat,G.,2005.Anunusualfoldforpotassiumchannelblockers:NMRstructureof

three toxins from the scorpion Opisthacanthus madagascariensis . Biochem J 388,

263271.

Chang,L.S.,Chu,Y.P.,Cheng,Y.C.,Liou,J.C.andYang,C.C.,2005.Lys64oftheAchain

is involved in the enzymatic activity and neurotoxic effect of beta.

Toxicon45,179185.

Chang, L.S., Wu, P.F., Liou, J.C., ChiangLin, W.H. and Yang, C.C., 2004. Chemical

modificationofarginineresiduesof Notechis scutatus scutatus notexin.Toxicon44,

491497.

Chen, S.W. and Pellequer, J.L., 2004. Identification of functionally important residues in

proteinsusingcomparativemodels.CurrMedChem11,595605.

Chen,T.,Folan,R.,Kwok,H.,O'Kane,E.J.,Bjourson,A.J.andShaw,C.,2003.Isolationof

scorpion ( Androctonus amoreuxi ) putative alpha neurotoxins and parallel cloning of

theirrespectivecDNAsfromasinglesampleofvenom.RegulPept115,115121.

Chen,T.,Walker,B.,Zhou,M.andShaw,C.,2005.Molecularcloning of a novel putative

potassiumchannelblockingneurotoxinfromthevenomoftheNorthAfricanscorpion,

Androctonus amoreuxi .Peptides26,731736.

Chioato,L.andWard,R.J.,2003.Mappingstructuraldeterminantsofbiologicalactivitiesin

snakevenomphospholipasesA2bysequenceanalysisandsitedirectedmutagenesis.

Toxicon42,869883.

Chuang,R.S.,Jaffe,H.,Cribbs,L.,PerezReyes,E.andSwartz,K.J.,1998.InhibitionofT

145 References

typevoltagegatedcalciumchannelsbyanewscorpiontoxin.NatNeurosci1,668

674.

Church,J.E.andHodgson,W.C.,2002.Thepharmacologicalactivityoffishvenoms.Toxicon

40,10831093.

Cline,M.,Hughey,R.andKarplus,K.,2002.Predictingreliableregionsinproteinsequence

alignments.Bioinformatics18,306314.

Cochrane,G.,Aldebert,P.,Althorpe,N.,Andersson,M.,Baker,W.,Baldwin,A.,Bates,K.,

Bhattacharyya,S.,Browne,P.,vandenBroek,A.,2006.EMBLNucleotideSequence

Database:developmentsin2005.NucleicAcidsRes34,D10D15.

Cohen,L.,Karbat,I.,Gilles,N.,Froy,O.,Corzo,G.,Angelovici,R.,Gordon,D.andGurevitz,

M., 2004. Dissection of the functional surface of an antiinsect excitatory toxin

illuminates a putative "hot spot" common to all scorpion betatoxins affecting Na +

channels.JBiolChem279,82068211.

Cohen,L.,Karbat,I.,Gilles,N.,Ilan,N.,Benveniste,M.,Gordon,D.andGurevitz,M.,2005.

Common features in the functional surface of scorpion βtoxins and elements that

confer specificity for insect and mammalian voltagegated sodium channels. J Biol

Chem280,50455053.

Collins,F.S.,Morgan,M.andPatrinos,A.,2003.TheHumanGenomeProject:lessonsfrom

largescalebiology.Science300,286290.

Conticello, S.G., Gilad, Y., Avidan, N., BenAsher,E., Levy, Z. and Fainzilber, M., 2001.

Mechanismsforevolvinghypervariability:thecaseof conopeptides. Mol Biol Evol

18,120131.

Corona, M., Coronas, F.V., Merino, E., Becerril, B., Gutierrez, R., RebolledoAntunez, S.,

Garcia, D.E. and Possani, L.D., 2003. A novel class of peptide found in scorpion

venomwithneurodepressanteffectsinperipheralandcentralnervoussystemofthe

rat.BiochimBiophysActa1649,5867.

Corona, M., Gurrola, G.B., Merino, E., Cassulini, R.R., ValdezCruz, N.A., Garcia, B.,

RamirezDominguez, M.E., Coronas, F.I., Zamudio, F.Z., Wanke, E. and Possani,

146 References

L.D., 2002. A large number of novel Ergtoxinlike genes and ERG K +channels

blockingpeptidesfromscorpionsofthegenus Centruroides .FEBSLett532,121126.

Corona,M.,ValdezCruz,N.A.,Merino,E.,Zurita,M.andPossani,L.D.,2001.Genesand

peptides from the scorpion Centruroides sculpturatus Ewing, that recognize Na +

channels.Toxicon39,18931898.

Coronas,F.V.,deRoodt,A.R.,Portugal,T.O.,Zamudio,F.Z.,Batista,C.V.,GomezLagunas,

F.andPossani,L.D.,2003a.DisulfidebridgesandblockageofShakerBK +channels

by another butantoxin peptide purified from the Argentinean scorpion Tityus

trivittatus .Toxicon41,173179.

Coronas,F.V.,Stankiewicz,M.,Batista,C.V.,Giraud,S.,Alam,J.M.,Possani,L.D.,Mebs,D.

andPelhate,M.,2003b.Primarystructureandelectrophysiologicalcharacterisationof

two almost identical isoforms of toxin from Isometrus vittatus (family: Buthidae)

scorpionvenom.Toxicon41,989997.

Corzo,G.,Villegas,E.andNakajima,T.,2001.Isolationandstructuralcharacterisationofa

peptide from the venom of scorpion with toxicity towards invertebrates and

.ProteinPeptLett8,385393.

Couraud,F.,Jover,E.,Dubois,J.M.andRochat,H.,1982.Twotypesofscorpionreceptor

sites,onerelatedtotheactivation,theothertotheinactivationoftheactionpotential

sodiumchannel.Toxicon20,916.

Cover,T.andHart,P.,1967.Nearestneighbourpatternclassification.IEEETransactionson

InformationTheory13,2127.

Curran,M.E.,1998.Potassiumionchannelsandhumandisease:phenotypestodrugtargets?

CurrOpinBiotechnol9,565572.

Dalton, S., Gerzanich, V., Chen, M., Dong, Y., Shuba, Y. and Simard, J.M., 2003.

Chlorotoxinsensitive Ca 2+ activated Cl channelintypeR2reactiveastrocytesfrom

adultratbrain.Glia42,325339.

Dasarathy, B.V., Ed., 1991. Nearest neighbour (NN) Norms: NN pattern classification

techniques.IEEEComputerSocietyPress.LosAlamitos,Calif.

147 References

Dauplais,M.,Lecoq,A.,Song,J.,Cotton,J.,Jamin,N.,Gilquin,B.,Roumestand,C.,Vita,C.,

deMedeiros,C.L.,Rowanet al .,1997.Ontheconvergentevolutionofanimaltoxins.

Conservation of a diad of functionalresidues inpotassium channelblocking toxins

withunrelatedstructures.JBiolChem272,43024309.

David,R.M.,Krishna,N.R.andWatt,D.D.,1991.Characterisationofcationicbindingsitesof

neurotoxins from venom of the scorpion ( Centruroides sculpturatus Ewing) using

lanthanidesasbindingprobes.Toxicon29,645662.

Davies,N.W.,Wiese,M.D.andBrown,S.G.,2004.Characterisationofmajorpeptidesin'jack

jumper'antvenombymassspectrometry.Toxicon43,173183. del RioPortilla, F., HernandezMarin, E., Pimienta, G., Coronas, F.V., Zamudio, F.Z.,

Rodriguez de la Vega, R.C., Wanke, E. and Possani, L.D., 2004. NMR solution

structure of Cn12,a novel peptide from the Mexican scorpion Centruroides noxius

withatypicalβtoxinsequencebutwithαlikephysiologicalactivity.EurJBiochem

271,25042516.

Deshpande,N.,Addess,K.J.,Bluhm,W.F.,MerinoOtt,J.C.,TownsendMerino,W.,Zhang,

Q.,Knezevich,C.,Xie,L.,Chen,L.,Feng,Z.et al .,2005.TheRCSBProteinData

Bank:aredesignedquerysystemandrelationaldatabasebasedonthemmCIFschema.

NucleicAcidsRes33,D233D237.

Deutsch,C.,Price,M.,Lee,S.,King,V.F.andGarcia,M.L.,1991.Characterisationofhigh

affinity binding sites for charybdotoxin in human T lymphocytes. Evidence for

associationwiththevoltagegatedK +channel.JBiolChem266,36683674.

Devaux,J.,Beeton,C.,Beraud,E.andCrest,M.2004.Ionchannelsanddemyelination:basis

of a treatment of experimental autoimmune encephalomyelitis (EAE) by potassium

channelblockers.RevNeurol(Paris)160,S16S27.

Dhawan,R.,Varshney,A.,Mathew,M.K.andLala,A.K.,2003.BTK2,anewinhibitorofthe

KV1.1potassiumchannelpurifiedfrom Indianscorpion Buthus tamulus . FEBS Lett

539,713.

Dhawan,R.D.,Joseph,S.,Sethi,A.andLala,A.K.,2002.Purificationandcharacterisationof

148 References

ashortinsecttoxinfromthevenomofthescorpion Buthus tamulus .FEBSLett528,

261266.

Ding,L.,Sabo,A.,Berkowicz,N.,Meyer,R.R.,Shotland,Y.,Johnson,M.R.,Pepin,K.H.,

Wilson, R.K. and Spieth, J., 2004. EAnnot: a genome annotation tool using

experimentalevidence.GenomeRes14,25032509.

Dobson,P.D.,Cai,Y.D.,Stapley,B.J.andDoig,A.J.,2004.Predictionofproteinfunctionin

theabsenceofsignificantsequencesimilarity.CurrMedChem11,21352142.

D'Suze,G.,Batista,C.V.,Frau,A.,Murgia,A.R.,Zamudio,F.Z.,Sevcik,C.,Possani,L.D.and

Prestipino,G.,2004a.Discrepin,anewpeptideofthesubfamilyαKTX15,isolated

fromthescorpion Tityus discrepans irreversiblyblocksK +channels(IAcurrents)of

cerebellumgranularcells.ArchBiochemBiophys430,256263.

D'Suze,G.,Sevcik,C.,Corona,M.,Zamudio,F.Z.,Batista,C.V.,Coronas,F.I.andPossani,

L.D., 2004b. Ardiscretin a novel arthropodselective toxin from Tityus discrepans

scorpionvenom.Toxicon43,263272.

Dudina, E.E., Korolkova, Y.V., Bocharova, N.E., Koshelev, S.G., Egorov, T.A., Huys, I.,

Tytgat,J.andGrishin,E.V.,2001.OsK2,anewselectiveinhibitorofK V1.2potassium

channelspurifiedfromthevenomofthescorpion Orthochirus scrobiculosus .Biochem

BiophysResCommun286,841847.

Dutertre,S.,Nicke,A.,Tyndall,J.D.andLewis,R.J.,2004.Determinationofalphaconotoxin

bindingmodesonneuronalnicotinicacetylcholinereceptors.JMolRecognit17,339

347.

Escoubas, P., Stankiewicz, M., Takaoka, T., Pelhate, M., RomiLebrun, R., Wu, F.Q. and

Nakajima,T.,2000.Sequenceandelectrophysiologicalcharacterisationoftwoinsect

selectiveexcitatorytoxinsfromthevenomoftheChinesescorpion Buthus martensi .

FEBSLett483,175180.

Everhart, D., Cartier, G.E., Malhotra, A., Gomes, A.V., McIntosh, J.M. and Luetje, C.W.,

2004. Determinants of potency on alphaconotoxin MII, a peptide antagonist of

neuronalnicotinicreceptors.Biochem43,27322737.

149 References

Fajloun, Z., Kharrat, R., Chen, L., Lecomte, C., Di Luccio, E., Bichet, D., El Ayeb, M.,

Rochat, H., Allen, P.D., Pessah, I.N. et al., 2000. Chemical synthesis and

characterisation of maurocalcine, a scorpion toxin that activates Ca 2+ release

channel/ryanodinereceptors.FEBSLett469,179185.

Fielden,M.R.,Matthews,J.B.,Fertuck,K.C.,Halgren,R.G.andZacharewski,T.R.,2002. In

silico approaches to mechanistic and predictive toxicology: an introduction to

bioinformaticsfortoxicologists.CritRevToxicol32,67112.

Fishelson,M.andGeiger,D.,2004.Optimizingexactgeneticlinkagecomputations.JComput

Biol11,263275.

Frenal,K.,Xu,C.Q.,Wolff,N.,Wecker,K.,Gurrola,G.B.,Zhu,S.Y.,Chi,C.W.,Possani,

L.D., Tytgat, J. and Delepierre, M., 2004. Exploring structural features of the

interaction between the scorpion toxin CnErg1 and ERG K + channels. Proteins 56,

367375.

Froy,O.,Sagiv,T.,Poreh,M.,Urbach,D.,Zilberberg,N.andGurevitz,M.,1999.Dynamic

diversificationfromaputativecommonancestorofscorpiontoxinsaffectingsodium,

potassium,andchloridechannels.JMolEvol48,187196.

Fry, B.G., 2005. From genome to "venome": molecular origin and evolution of the snake

venomproteomeinferredfromphylogeneticanalysisoftoxinsequencesandrelated

bodyproteins.GenomeRes15,403420.

Fry,B.G.andWuster,W.,2004.Assemblinganarsenal:origin and evolutionofthe snake

venom proteome inferred from phylogenetic analysis of toxin sequences. Mol Biol

Evol21,870883.

Fry,B.G.,Wuster,W.,Kini,R.M.,Brusic,V.,Khan,A.,Venkataraman,D.andRooney,A.P.,

2003.Molecularevolutionandphylogenyofelapidsnakevenomthreefingertoxins.J

MolEvol57,110129.

Fuller,M.D.,Zhang,Z.R.,Cui,G.,Kubanek,J.andMcCarty,N.A.,2004.InhibitionofCFTR

channels by a peptide toxin of scorpion venom. Am J Physiol Cell Physiol 287,

C13281341.

150 References

Gagna,C.E.,Winokur,D.andClarkLambert,W.,2004.Cellbiology,chemogenomicsand

chemoproteomics.CellBiolInt28,755764.

Galperin, M.Y., 2005. The Molecular Biology Database Collection: 2005 update. Nucleic

AcidsRes33,D5D24.

Galperin, M.Y. and Koonin, E.V., 2000. Who's your neighbour? New computational

approachesforfunctionalgenomics.NatBiotechnol18,609613.

Galvez, A., GimenezGallego, G., Reuben, J.P., RoyContancin, L., Feigenbaum, P.,

Kaczorowski, G.J. and Garcia, M.L., 1990. Purification and characterisation of a

unique,potent,peptidylprobeforthehighconductancecalciumactivatedpotassium

channelfromvenomofthescorpion Buthus tamulus .JBiolChem265,1108311090.

Gao, Y.D. and Garcia, M.L., 2003 Interaction of agitoxin2, charybdotoxin, and iberiotoxin

with potassium channels: selectivity between voltagegated and MaxiK channels.

Proteins52,146154.

Gattiker,A.,Gasteiger,E.andBairoch,A.,2002.ScanProsite:areferenceimplementationofa

PROSITEscanningtool.ApplBioinformatics1,107108.

Garcia,M.L.,GarciaCalvo,M.,Hidalgo,P.,Lee,A.andMacKinnon,R.,1994.Purification

andcharacterisationofthreeinhibitorsofvoltagedependentK +channelsfrom Leiurus

quinquestriatus var. hebraeus venom.Biochem33,68346839.

Gazarian,K.G.,Gazarian,T.,Hernandez,R.andPossani,L.D.2005.Immunologyofscorpion

toxins and perspectives for generation of antivenom vaccines. Vaccine 23, 3357

3368.

Giangiacomo, K.M., Ceralde, Y. and Mullmann, T.J., 2004. Molecular basis of alphaKTx

specificity.Toxicon43,877886.

Gilles, N., Blanchet, C., Shichor, I., Zaninetti, M., Lotan, I., Bertrand, D. and Gordon, D.,

1999.Ascorpionalphaliketoxinthatisactiveoninsectsandmammalsrevealsan

unexpected specificity and distribution of sodium channel subtypes in rat brain

neurons.JNeurosci19,87308739.

Gilles, N., Gurevitz, M. and Gordon, D., 2003. Allosteric interactions among ,

151 References

brevetoxin,andscorpiontoxinreceptorsoninsectsodiumchannelsraiseanalternative

approachforinsectcontrol.FEBSLett540,8185.

Gilquin, B., Racape, J., Wrisch, A., Visan, V., Lecoq, A., Grissmer, S., Menez, A. and

Gasparini,S.,2002.StructureoftheBgKKV1.1complexbasedondistancerestraints

identifiedbydoublemutantcycles.MolecularbasisforconvergentevolutionofK V1

channelblockers.JBiolChem277,3740637413.

Gohlke,H.andKlebe,G.,2002.Approachestothedescriptionandpredictionofthebinding

affinityofsmallmoleculeligandstomacromolecularreceptors.AngewChemIntEd

Engl41,26442676.

Goldstein,S.A.andMiller,C.,1993.MechanismofcharybdotoxinblockofavoltagegatedK +

channel.BiophysJ65,16131619.

Gordon, D. and Gurevitz, M., 2003. The selectivity of scorpion alphatoxins for sodium

channelsubtypesisdeterminedbysubtlevariationsattheinteractingsurface.Toxicon

41,125128.

Gordon,D.,Ilan,N.,Zilberberg,N.,Gilles,N.,Urbach,D.,Cohen,L.,Karbat,I.,Froy,O.,

Gaathon, A., Kallen, R.G. et al ., 2003. An 'Old World' scorpion betatoxin that

recognizesbothinsectandmammaliansodiumchannels.EurJBiochem270,2663

2670.

Gordon, D., MartinEauclaire, M.F., Cestele, S., Kopeyan, C., Carlier, E., Khalifa, R.B.,

Pelhate, M. and Rochat, H., 1996. Scorpion toxins affecting sodium current

inactivationbindtodistincthomologousreceptorsitesonratbrainandinsectsodium

channels.JBiolChem271,80348045.

Gordon, D., Savarin, P., Gurevitz, M. and ZinnJustin, S., 1998. Functional anatomy of

scorpiontoxinsaffectingsodiumchannels.JToxicolToxinRev17,131159.

Gottlieb, P.A., Suchyna, T.M., Ostrow, L.W. and Sachs, F., 2004. Mechanosensitive ion

channelsasdrugtargets.CurrDrugTargetsCNSNeurolDisord3,287295.

Goudet,C.,Chi,C.W.andTytgat,J.,2002.Anoverviewoftoxinsandgenesfromthevenom

oftheAsianscorpion Buthus martensi Karsch.Toxicon40,12391258.

152 References

Gracy,J.andArgos,P.,1998.DOMO:anewdatabase of aligned protein domains. Trends

BiochemSci23,495497.

Grant, M.A., Morelli, X.J. and Rigby, A.C., 2004. Conotoxins and structural biology: a

prospectiveparadigmfordrugdiscovery.CurrProteinPeptSci5,235248.

Guan,R.,Wang,C.G.,Wang,M.andWang,D.C.,2001. A depressant insect toxin with a

novelanalgesiceffectfromscorpion Buthus martensii Karsch.BiochimBiophysActa

1549,918.

Guan,R.J.,Xiang,Y.,He,X.L.,Wang,C.G.,Wang,M.,Zhang,Y.,Sundberg,E.J.andWang,

D.C., 2004. Structural mechanism governing cis and trans isomeric states and an

intramolecular switch for cis /trans isomerization of a nonproline peptide bond

observedincrystalstructuresofscorpiontoxins.JMolBiol341,11891204.

Hains,P.G.,Ramsland,P.A.andBroady,K.W.,1999.ModelingofacanthoxinA1,aPLA2

enzyme from the venom of the common death adder ( Acanthophis antarcticus ).

Proteins35,8088.

Hamon, A., Gilles, N., Sautiere, P., Martinage, A., Kopeyan, C., Ulens, C., Tytgat, J.,

Lancelin, J.M. and Gordon, D., 2002. Characterisation of scorpion alphalike toxin

groupusingtwonewtoxinsfromthescorpion Leiurus quinquestriatus hebraeus .EurJ

Biochem269,39203933.

Harrison,R.A.,2004.DevelopmentofvenomtoxinspecificbyDNAimmunisation:

rationale and strategies to improve therapy of viper envenoming.

Vaccine22,16481655.

Hassani,O.,Mansuelle,P.,Cestele,S.,Bourdeaux,M.,Rochat,H.andSampieri,F.,1999.

Role of lysine and tryptophan residues in the biological activity of toxin VII (Ts

gamma)fromthescorpion Tityus serrulatus .EurJBiochem260,7686.

He,X.L.,Li,H.M.,Zeng,Z.H.,Liu,X.Q.,Wang,M.andWang,D.C.,1999.Crystalstructures

oftwoalphalikescorpiontoxins:nonproline cis peptidebondsandimplicationsfor

newbindingsiteselectivityonthesodiumchannel.JMolBiol292,125135.

He,Y.Y.,Lee,W.H.andZhang,Y.,2004.Cloningandpurificationofalphaneurotoxinsfrom

153 References

kingcobra( Ophiophagus hannah ).Toxicon44,295303.

Hidalgo,P.andMacKinnon,R.,1995.RevealingthearchitectureofaK +channelporethrough

mutantcycleswithapeptideinhibitor.Science268,307310.

Hulo,N.,Sigrist,C.J.,LeSaux,V.,LangendijkGenevaux,P.S.,Bordoli,L.,Gattiker,A.,De

Castro,E.,Bucher,P.andBairoch,A.,2004.RecentimprovementstothePROSITE

database.NucleicAcidsRes32,D134D137.

Huynen,M.A.,Snel,B.,vonMering,C.andBork,P.,2003.Functionpredictionandprotein

networks.CurrOpinCellBiol15,191198.

Huys, I., OlamendiPortugal, T., GarciaGomez, B.I., Vandenberghe, I., Van Beeumen, J.,

Dyason, K., Clynen, E., Zhu, S., van der Walt, J., Possani, L.D. et al ., 2004. A

subfamilyofacidicalphaK+toxins.JBiolChem279,27812789.

Huys, I. and Tytgat,J., 2003. Evidence for afunctionspecific mutation in the neurotoxin,

parabutoxin3.EurJNeurosci17,17861792.

Inceoglu,A.B.,Hayashida,Y.,Lango,J.,Ishida,A.T.andHammock,B.D.,2002.Asingle

chargedsurfaceresiduemodifiestheactivityofikitoxin,abetatypeNa +channeltoxin

from Parabuthus transvaalicus .EurJBiochem269,53695376.

Inceoglu, B., Lango,J., Pessah, I.N. and Hammock, B.D., 2005. Three structurally related,

highlypotent,peptidesfromthevenomof Parabuthus transvaalicus possessdivergent

biologicalactivity.Toxicon45,727733.

Ivanovski,G.,Petan,T.,Krizaj,I.,Gelb,M.H.,Gubensek,F.andPungercar,J.,2004.Basic

amino acid residues in the betastructure region contribute, but not critically, to

presynapticofammodytoxinA.BiochimBiophysActa1702,217225.

Jiang,Y.,Lee,A.,Chen,J.,Ruta,V.,Cadene,M.,Chait,B.T.andMacKinnon,R.,2003.X

raystructureofavoltagedependentK +channel.Nature423,3341.

Jiqun, S., Xiuling, X., Zhijian, C., Wanhong, L., Yingliang, W., Shunyi, Z., Xianchun, Z.,

Dahe,J.,Xin,M.,Hui,L.et al .,2004.Molecularcloning,genomicorganizationand

functionalcharacterisationofanewshortchainpotassiumchanneltoxinlikepeptide

BmTxKS4 from Buthus martensii Karsch(BmK).JBiochemMolToxicol18,187

154 References

195.

Jouirou,B.,Mosbah,A.,Visan,V.,Grissmer,S.,M'Barek,S.,Fajloun,Z.,VanRietschoten,J.,

Devaux, C., Rochat, H., Lippens, G. et al ., 2004. Cobatoxin 1 from Centruroides

noxius scorpion venom: chemical synthesis, threedimensional structure in solution,

pharmacologyanddockingonK +channels.BiochemJ377,3749.

Jover,E.,Bablito,J.andCouraud,F.,1984.Bindingofbetascorpiontoxin:aphysicochemical

study.Biochem23,11471152.

Kapetanovic, I.M., Rosenfeld, S. and Izmirlian, G., 2004. Overview of commonly used

bioinformaticsmethodsandtheirapplications.AnnNYAcadSci1020,1021.

Karasavvas, K.A., Baldock, R. and Burger, A., 2004. Bioinformatics integration and agent

technology.JBiomedInform37,205219.

Karbat, I., Cohen, L., Gilles, N., Gordon, D. and Gurevitz, M., 2004a. Conversion of a

scorpion toxin agonist into an antagonist highlights an acidic residue involved in

voltagesensortrappingduringactivationofneuronalNa +channels.FasebJ18,683

689.

Karbat,I.,Frolow,F.,Froy,O.,Gilles,N.,Cohen,L.,Turkov,M.,Gordon,D.andGurevitz,

M.,2004b.Molecularbasisofthehighinsecticidalpotencyofscorpionalphatoxins.J

BiolChem279,3167931686.

Karp,P.D.,Paley,S.andZhu,J.,2001.Databaseverification studies of SWISSPROT and

GenBank.Bioinformatics17,526532.

Kini,R.M.,2002.Molecularmouldswithmultiplemissions: functional sites in threefinger

toxins.ClinExpPharmacolPhysiol29,815822.

Kobayashi,Y.,Takashima,H.,Tamaoki,H.,Kyogoku,Y.,Lambert,P.,Kuroda,H.,Chino,

N., Watanabe, T.X., Kimura, T., Sakakibara, S. et al. , 1991. The cystinestabilized

alphahelix:acommonstructuralmotifofionchannelblockingneurotoxicpeptides.

Biopolymers31,12131220.

Koh,J.L.,Krishnan,S.P.T.,Hong,S.H.,Tan,P.T.,Khan,A.M.,Lee,M.L.andBrusic,V.,

2004a. Bioware: A framework for bioinformatics data retrieval, annotation and

155 References

publishing. ACM SIGIR Workshop on Search and Discovery in Bioinformatics,

Sheffield,UK.

Koh,J.L.Y., Lee, M.L. and Brusic,V.,2004b. Aclassification of biological data artifacts.

ICDTWorkshoponDatabaseIssuesinBiologicalDatabases(DBiBD),89thJanuary

2005,Edinburgh,Scotland,UK,5357.

Kohling,R.,2002.Voltagegatedsodiumchannelsinepilepsy.Epilepsia43,12781295.

Kong, L., Lee, B.T., Tong, J.C., Tan, T.W. and Ranganathan, S., 2004. SDPMOD: an

automatedcomparativemodelingserverforsmalldisulfidebondedproteins.Nucleic

AcidsRes32,W356W359.

Kordis,D.andGubensek,F.,2000.Adaptiveevolution of animal toxin multigenefamilies.

Gene261,4352.

Korn,S.J.andTrapani,J.G.,2005.Potassiumchannels.IEEETransNanobioscience4,2133.

Korolkova, Y.V., Bocharov, E.V., Angelo, K., Maslennikov, I.V., Grinenko, O.V., Lipkin,

A.V., Nosyreva, E.D., Pluzhnikov, K.A., Olesen, S.P., Arseniev, A.S. et al ., 2002.

NewbindingsiteoncommonmolecularscaffoldprovidesHERGchannelspecificity

ofscorpiontoxinBeKm1.JBiolChem277,4310443109.

Korolkova,Y.V.,Kozlov,S.A.,Lipkin,A.V.,Pluzhnikov,K.A.,Hadley,J.K.,Filippov,A.K.,

Brown,D.A.,Angelo,K.,Strobaek,D.,Jespersen,T.et al.,2001.AnERGchannel

inhibitorfromthescorpion Buthus eupeus .JBiolChem276,98689876.

Korolkova,Y.V.,Tseng,G.N.andGrishin,E.V.,2004.Uniqueinteractionofscorpiontoxins

withthehERGchannel.JMolRecognit17,209217.

Koschak,A.,Bugianesi,R.M.,Mitterdorfer,J.,Kaczorowski,G.J.,Garcia,M.L.andKnaus,

H.G., 1998. Subunit composition of brain voltagegated potassium channels

determined by hongotoxin1, a novel peptide derived from Centruroides limbatus

venom.JBiolChem273,26392644.

Kumar, S., Tamura, K. and Nei, M., 2004. MEGA3: Integrated software for Molecular

EvolutionaryGeneticsAnalysisandsequencealignment.BriefBioinform5,150163.

Kyte,J.andDoolittle,R.F.,1982.Asimplemethodfordisplayingthehydropathiccharacterof

156 References

aprotein.JMolBiol157,105132.

Lacinova, L., 2004. Pharmacology of recombinant lowvoltage activated calcium channels.

CurrDrugTargetsCNSNeurolDisord3,105111.

Laskowski,R.A.,MacArthur,M.W.,Moss,D.S.andThornton,J.M., 1993. PROCHECK: a

programtocheckthestereochemicalqualityofproteinstructures.JApplCrystallogr

26,283291.

Lassmann, T. and Sonnhammer, E.L., 2002. Quality assessment of multiple alignment

programs.FEBSLett529,126130.

Lecomte,C.,Ferrat,G.,Fajloun,Z.,VanRietschoten,J.,Rochat,H.,MartinEauclaire,M.F.,

Darbon, H. and Sabatier, J.M., 1999. Chemical synthesis and structureactivity

relationships of Ts kappa, a novel scorpion toxin acting on apaminsensitive SK

channel.JPeptRes54,369376.

Legros,C.,Ceard,B.,Vacher,H.,Marchot,P.,Bougis,P.E.andMartinEauclaire,M.F.,2005.

ExpressionofthestandardscorpionalphatoxinAaHIIandAaHIImutantsleadingto

theidentificationofsomekeybioactiveelements.BiochimBiophysActa1723,9199.

Legros, C., Bougis, P.E. and MartinEauclaire, M.F., 2003. Characterisation of the genes

encoding Aa1 isoforms from the scorpion Androctonus australis .Toxicon 41, 115

119.

Legros, C., Ceard, B., Bougis, P.E. and MartinEauclaire, M.F., 1998. Evidence for a new

classofscorpiontoxinsactiveagainstK +channels.FEBSLett431,375380.

Lewis,R.J.,2004.Conotoxinsasselectiveinhibitorsofneuronalionchannels,receptorsand

transporters.IUBMBLife56,8993.

Lewis,R.J.andGarcia,M.L.,2003.Therapeuticpotentialofvenompeptides.NatRevDrug

Discov2,790802.

Li,R.A.andTomaselli,G.F.,2004.Usingthedeadly muconotoxinsasprobesofvoltage

gatedsodiumchannels.Toxicon44,117122.

Liu,H.L.andLin,J.C.,2004.MoleculardockingofthescorpiontoxinTc1tothestructural

model of the voltagegated potassium channel K V1.1 from human Homo sapiens . J

157 References

BiomolStructDyn21,639650.

Liu,L.H.,Bosmans,F.,Maertens,C.,Zhu,R.H.,Wang,D.C.andTytgat,J.,2005.Molecular

basisofthemammalianpotencyofthescorpionalphaliketoxin,BmKM1.FasebJ

19,594596.

Liu,Z.,Yang,G.,Li,B.,Chi,C.andWu,X.,2003.Cloning,coexpressionwithanamidating

enzyme, and activity of the scorpion toxin BmK ITa1 cDNA in insect cells. Mol

Biotechnol24,2126.

LopezGonzalez, I., OlamendiPortugal, T., De la VegaBeltran, J.L., Van der Walt, J.,

Dyason,K.,Possani,L.D.,Felix,R.andDarszon,A.,2003.Scorpiontoxinsthatblock

Ttype Ca 2+ channels in spermatogenic cells inhibit the sperm acrosome reaction.

BiochemBiophysResCommun300,408414.

Loret,E.P.,Mansuelle,P.,Rochat,H.andGranier,C.,1990.Neurotoxinsactiveoninsects:

aminoacidsequences,chemicalmodifications,andsecondarystructureestimationby

circulardichroismoftoxinsfromthescorpion Androctonus australis Hector.Biochem

29,14921501.

Lourenco, W.R., 1994. Diversity and endemism in tropical versus temperate scorpion

communities.Biogiographica70,155160.

Lu,Z.andMacKinnon,R.,1997.Purification,characterisation,andsynthesisofaninward

rectifierK +channelinhibitorfromscorpionvenom.Biochem36,69366940.

Maertens,C.,Wei,L.,Tytgat,J.,Droogmans,G.andNilius,B.,2000.Chlorotoxindoesnot

inhibit volumeregulated, calciumactivated and cyclic AMPactivated chloride

channels.BrJPharmacol129,791801.

MahjoubiBoubaker,B.,Crest,M.,Khalifa,R.B.,Ayeb,M.E.andKharrat,R.,2004.Kbot1,a

threedisulfidebridgestoxinfrom Buthus occitanus tunetanus venomhighlyactiveon

bothSKandK Vchannels.Peptides25,637645.

Marcotte, E.M., 2000. Computational genetics: finding protein function by nonhomology

methods.CurrOpinStructBiol10,359365.

MartinEauclaire,M.F.andCouraud,F.,1995.Scorpionneurotoxins:effectsandmechanisms.

158 References

In: Chang, L.W., Dyer, R.S. (Eds.), Handbook of Neurotoxicology, Marcell and

Dekker,NewYork,pp.683–716.

Maslennikov,I.V.,Shenkarev,Z.O.,Zhmak,M.N.,Ivanov,V.T.,Methfessel,C.,Tsetlin,V.I.

and Arseniev, A.S., 1999. NMR spatial structure of alphaconotoxin ImI reveals a

common scaffold in snail and snake toxins recognizing neuronal nicotinic

acetylcholinereceptors.FEBSLett444,275280.

McPherson,A.,1999.Crystallizationofbiologicalmacromolecules.NewYork,ColdSpring

HarborLaboratoryPress.

Mejri,T.,Borchani,L.,SrairiAbid,N.,Benkhalifa,R.,Cestele,S.,Regaya,I.,Karoui,H.,

Pelhate, M., Rochat, H. and El Ayeb, M., 2003. BotIT6: a potent depressant insect

toxinfrom Buthus occitanus tunetanus venom.Toxicon41,163171.

Meki,A.,Mansuelle,P.,LarabaDjebari,F.,Oughideni,R.,Rochat,H.andMartinEauclaire,

M.F.,2000.KTX3,thekaliotoxinfrom Buthus occitanus tunetanus scorpionvenom:

one ofan extensive family of peptidyl ligandsof potassium channels. Toxicon 38,

105111.

Miljanich, G.P., 2004. Ziconotide: neuronal calcium channel blocker for treating severe

chronicpain.CurrMedChem11,30293040.

MorenoMurciano,M.P.,Monleon,D.,Calvete,J.J.,Celda,B.andMarcinkiewicz,C.,2003.

Amino acid sequence and homology modeling of obtustatin, a novel nonRGD

containingshortdisintegrinisolatedfromthevenomof Vipera lebetina obtusa .Protein

Sci12,366371.

Mouhat,S.,Jouirou,B.,Mosbah,A.,DeWaard,M.andSabatier,J.M.,2004a.Diversityof

foldsinanimaltoxinsactingonionchannels.BiochemJ378,717726.

Mouhat,S.,Mosbah,A.,Visan,V.,Wulff,H.,Delepierre,M.,Darbon,H.,Grissmer,S.,De

Waard,M.andSabatier,J.M.,2004b.The'functional'dyadofscorpiontoxinPi1isnot

itselfaprerequisitefortoxinbindingtothevoltagegatedK V1.2potassiumchannels.

BiochemJ377,2536.

Mouhat,S.,Visan,V.,Ananthakrishnan,S.,Wulff,H.,Andreotti,N.,Grissmer,S.,Darbon,

159 References

H., De Waard,M. and Sabatier,J.M., 2005.K + channel types targeted by synthetic

OSK1,atoxinfrom Orthochirus scrobiculosus scorpionvenom.BiochemJ385,95

104.

Mourier,G.,Dutertre,S.,FruchartGaillard,C.,Menez,A.andServent,D.,2003.Chemical

synthesis of MT1 and MT7 muscarinic toxins: critical role of Arg34 in their

interactionwithM1muscarinicreceptor.MolPharmacol63,2635.

Murgia,A.R.,Batista,C.V.,Prestipino,G.andPossani,L.D.,2004.Aminoacidsequenceand

function of a new alphatoxin from the Amazonian scorpion Tityus cambridgei .

Toxicon43,737740.

Nastainczyk,W.,Meves,H.andWatt,D.D.,2002.Ashortchainpeptidetoxinisolatedfrom

Centruroides sculpturatus scorpion venom inhibits etheragogorelated gene K +

channels.Toxicon40,10531058.

Nirthanan,S.,Pil,J.,AbdelMottaleb,Y.,Sugahara,Y.,Gopalakrishnakone,P.,Joseph,J.S.,

Sato, K. and Tytgat, J., 2005. Assignment of voltagegated potassium channel

blockingactivitytokappaKTx1.3,anontoxichomologueofkappahefutoxin1,from

Heterometrus spinifer venom.BiochemPharmacol69,669678.

OlamendiPortugal, T., GomezLagunas, F., Gurrola, G.B. and Possani, L.D., 1998. Two

similar peptides from the venom of the scorpion Pandinus imperator , one highly

effectiveblockerandtheotherinactiveonK +channels.Toxicon36,759770.

OlamendiPortugal,T.,Garcia,B.I.,LopezGonzalez,I.,VanDerWalt,J.,Dyason,K.,Ulens,

C., Tytgat, J., Felix, R., Darszon, A., and Possani, L.D., 2002. Two new scorpion

toxins that target voltagegated Ca 2+ and Na + channels. Biochem Biophys Res

Commun299,562568.

Oren,D.A.,Froy,O.,Amit,E.,KleinbergerDoron,N.,Gurevitz,M.andShaanan,B.,1998.

Anexcitatoryscorpiontoxinwithadistinctivefeature:anadditionalalphahelixatthe

Cterminusanditsimplicationsforinteractionwithinsectsodiumchannels.Structure

6,10951103.

Padilla,A.,Govezensky,T.,Possani,L.D.andLarralde,C.,2003.Experimentalenvenoming

160 References

ofmicewithvenomfromthescorpion Centruroides limpidus limpidus :differencesin

mortalityandsymptomswithandwithouttherapyrelatingtodifferencesin

age,sexandstrainofmouse.Toxicon41,959965.

Park,C.S.andMiller,C.,1992.Mappingfunctiontostructureinachannelblockingpeptide:

electrostaticmutantsofcharybdotoxin.Biochem31,77497755.

Pearson, W.R., 1990. Rapid and sensitive sequence comparison with FASTP and FASTA.

MethodsEnzymol183,6398.

Pearson, W.R., 2000. Flexible sequence similarity searching with the FASTA3 program

package.MethodsMolBiol132,185219.

Pedarzani,P.,D'Hoedt,D.,Doorty,K.B.,Wadsworth,J.D.,Joseph,J.S.,Jeyaseelan,K.,Kini,

R.M., Gadre, S.V., Sapatnekar, S.M., Stocker, M. et al ., 2002. Tamapin, a venom

peptide from the Indian red scorpion ( Mesobuthus tamulus ) that targets small

conductanceCa 2+ activatedK +channelsandafterhyperpolarizationcurrentsincentral

neurons.JBiolChem277,4610146109.

Peng,F., Zeng,X.C.,He,X.H.,Pu,J.,Li,W.X.,Zhu, Z.H. and Liu, H., 2002. Molecular

cloning and functional expression of a gene encoding an antiarrhythmia peptide

derivedfromthescorpiontoxin.EurJBiochem269,44684475.

Peng,L.S.,Zhong,X.F.,Huang,Y.S.,Zhang,Y.,Zheng,S.L.,Wei,J.W.,Wu,W.Y.andXu,

A.L., 2003. Molecular cloning, expression and characterisation of three short chain

alphaneurotoxins from the venom of sea snake – Hydrophiinae Hydrophis

cyanocinctus Daudin.Toxicon42,753761.

Pimenta,A.M.,Legros,C.,AlmeidaFde,M.,Mansuelle,P.,DeLima,M.E.,Bougis,P.E.and

MartinEauclaire,M.F.,2003.Novelstructuralclassoffourdisulfidebridgedpeptides

from Tityus serrulatus venom.BiochemBiophysResCommun301,10861092.

Pimenta,A.M.,MartinEauclaire,M.,Rochat,H.,Figueiredo,S.G.,Kalapothakis,E.,Afonso,

L.C. and De Lima, M.E., 2001. Purification, aminoacid sequence and partial

characterisationoftwotoxinswithantiinsectactivityfromthevenomoftheSouth

Americanscorpion Tityus bahiensis (Buthidae).Toxicon39,10091019.

161 References

Poirot,O.,O'Toole,E.andNotredame,C.,2003.Tcoffee@igs:Awebserverforcomputing,

evaluatingandcombiningmultiplesequencealignments.NucleicAcidsRes31,3503

3506.

Possani,L.D.,Becerril,B.,Delepierre,M.andTytgat,J.,1999.Scorpiontoxinsspecificfor

Na +channels.EurJBiochem264,287300.

Possani,L.D.,Merino,E.,Corona,M.,Bolivar,F.andBecerril,B.,2000.Peptidesandgenes

codingforscorpiontoxinsthataffectionchannels.Biochimie82,861868.

Pragl, B., Koschak, A., Trieb, M., Obermair, G., Kaufmann, W.A., Gerster, U., Blanc, E.,

Hahn, C., Prinz, H., Schutz, G. et al ., 2002. Synthesis, characterisation, and

application of cydye and alexadyelabeled hongotoxin 1 analogues. The first high

affinityfluorescenceprobesforvoltagegatedK +channels.BioconjugChem13,416

425.

Rajendra,W.,Armugam, A.andJeyaseelan,K.,2004.Toxinsinantinociceptionandanti

inflammation.Toxicon44,117.

RamirezDominguez,M.E.,OlamendiPortugal,T.,Garcia,U.,Garcia,C.,Arechiga,H.and

Possani,L.D.,2002.Cn11,thefirstexampleofascorpiontoxinthatisatrueblocker

ofNa +currentsincrayfishneurons.JExpBiol205,869876.

Ramos,O.H.andSelistredeAraujo,H.S.,2004.Comparativeanalysisofthecatalyticdomain

of hemorrhagic and nonhemorrhagic snake venom metallopeptidases using

bioinformatictools.Toxicon44,529538.

Ranganathan,R.,Lewis,J.H.andMacKinnon,R.,1996.SpatiallocalizationoftheK +channel

selectivityfilterbymutantcyclebasedstructureanalysis.Neuron16,131139.

Rodriguez de la Vega, R.C., Merino, E., Becerril, B. and Possani, L.D., 2003. Novel

interactionsbetweenK +channelsandscorpiontoxins.TrendsPharmacolSci24,222

227.

RodriguezdelaVega,R.C.andPossani,L.D.,2005.Overviewofscorpiontoxinsspecificfor

Na + channels and related peptides: biodiversity, structurefunction relationships and

evolution.Toxicon46,831844.

162 References

RodriguezdelaVega,R.C.andPossani,L.D.,2004.Currentviewsonscorpiontoxinsspecific

forK +channels.Toxicon43,865875.

Rogowski, R.S., Collins, J.H., O'Neill, T.J., Gustafson, T.A., Werkman, T.R., Rogawski,

M.A.,Tenenholz,T.C.,Weber,D.J.andBlaustein,M.P.,1996.Threenewtoxinsfrom

thescorpion Pandinus imperator selectivelyblockcertainvoltagegatedK +channels.

MolPharmacol50,11671177.

Ross,J.S.,Schenkein,D.P.,Kashala,O.,Linette,G.P.,Stec,J.,Symmans,W.F.,Pusztai,L.

andHortobagyi,G.N.,2004.Pharmacogenomics.AdvAnatPathol11,211220.

Rost,B.,1999.Twilightzoneofproteinsequencealignments.ProteinEng12,8594.

Sabatier,J.M.,Fremont,V.,Mabrouk,K.,Crest,M.,Darbon,H.,Rochat,H.,VanRietschoten,

J.andMartinEauclaire,M.F.,1994.LeiurotoxinI,ascorpiontoxinspecificforCa 2+

activatedK +channels.Structureactivityanalysisusingsyntheticanalogs.IntJPept

ProteinRes43,486495.

Sabatier,J.M.,Zerrouk,H.,Darbon,H.,Mabrouk,K.,Benslimane,A.,Rochat,H.,Martin

Eauclaire,M.F.andVanRietschoten,J.,1993.P05,anewleiurotoxinIlikescorpion

toxin: synthesis and structureactivity relationships of the alphaamidated analog, a

ligandofCa 2+ activatedK +channelswithincreasedaffinity.Biochem32,27632770.

Saitou,N.andNei,M.,1987.Theneighbourjoiningmethod:anewmethodforreconstructing

phylogenetictrees.MolBiolEvol4,406425.

Santos, A.D., McIntosh, J.M., Hillyard, D.R., Cruz, L.J. and Olivera, B.M., 2004. The A

superfamily of conotoxins: structural and functional divergence. J Biol Chem 279,

1759617606.

Schonbach,C.,KowalskiSaunders,P.andBrusic,V.,2000.Datawarehousinginmolecular

biology.BriefBioinform1,190198.

Schroeder, N., Mullmann, T.J., Schmalhofer, W.A., Gao, Y.D., Garcia, M.L. and

Giangiacomo, K.M., 2002. Glycine 30in iberiotoxin is a critical determinant of its

specificityformaxiKversusK Vchannels.FEBSLett527,298302.

Schwede,T.,Kopp,J.,Guex,N.,andPeitsch,M.C., 2003. SWISSMODEL: an automated

163 References

proteinhomologymodelingserver.NucleicAcidsRes31,33813385.

Selisko,B.,Garcia,C.,Becerril,B.,Delepierre,M.andPossani,L.D.,1996.Aninsectspecific

toxin from Centruroides noxius Hoffmann. cDNA, primary structure, three

dimensionalmodelandelectrostaticsurfacepotentialsincomparisonwithothertoxin

variants.EurJBiochem242,235242.

Servant, F., Bru, C., Carrere, S., Courcelle, E., Gouzy, J., Peyruc, D. andKahn, D., 2002.

ProDom:automatedclusteringofhomologousdomains.BriefBioinform3,246251.

Shakkottai,V.G.,Regaya,I.,Wulff,H.,Fajloun,Z.,Tomita,H.,Fathallah,M.,Cahalan,M.D.,

Gargus,J.J.,Sabatier,J.M.andChandy,K.G.,2001.Designandcharacterisationofa

highly selective peptide inhibitor of the small conductance calciumactivated K +

channel,SKCa 2.JBiolChem276,4314543151.

Siew,J.P.,Khan,A.M.,Tan,P.T.,Koh,J.L.,Seah,S.H.,Koo,C.Y.,Chai,S.C.,Armugam,A.,

Brusic, V. and Jeyaseelan, K., 2004. Systematic analysis of snake neurotoxins'

functionalclassificationusingadatawarehousingapproach.Bioinformatics20,3466

3480.

Smith,C.G.andVane,J.R.,2003.Thediscoveryofcaptopril.FasebJ17,788789.

Srinivasan,K.N.,Nirthanan,S.,Sasaki,T.,Sato,K.,Cheng,B.,Gwee,M.C.,Kini,R.M.and

Gopalakrishnakone, P., 2001. Functional site of bukatoxin, an alphatype sodium

channel neurotoxin from the Chinese scorpion ( Buthus martensi Karsch) venom:

probableroleofthe(52)PDKVP(56)loop.FEBSLett494,145149.

Srinivasan,K.N.,Gopalakrishnakone,P.,Tan,P.T.,Chew,K.C.,Cheng,B.,Kini,R.M.,Koh,

J.L.,Seah,S.H.andBrusic,V.,2002a.SCORPION,amoleculardatabaseofscorpion

toxins.Toxicon40,2331.

Srinivasan,K.N.,Sivaraja,V.,Huys,I.,Sasaki,T.,Cheng,B.,Kumar,T.K.,Sato,K.,Tytgat,

J.,Yu,C.,San,B.C.et al .,2002b.kappaHefutoxin1,anoveltoxinfromthescorpion

Heterometrus fulvipes withuniquestructureandfunction.Importanceofthefunctional

diadinpotassiumchannelselectivity.JBiolChem277,3004030047.

Stampe,P.,KolmakovaPartensky,L.andMiller,C.,1994.IntimationsofK +channelstructure

164 References

fromacompletefunctionalmapofthemolecularsurfaceofcharybdotoxin.Biochem

33,443450.

Strong,P.N.,Clark,G.S.,Armugam,A.,DeAllie,F.A.,Joseph,J.S.,Yemul,V.,Deshpande,

J.M., Kamat, R., Gadre, S.V., Gopalakrishnakone, P. et al ., 2001. Tamulustoxin: a

novel potassium channel blocker from the venom of the Indian red scorpion

Mesobuthus tamulus .ArchBiochemBiophys385,138144.

Sun,X.,Chen,X.,Zhang,Z.,Wang,H.,Bianchi,F.J.,Peng,H.,Vlak,J.M.andHu,Z.,2002.

Bollworm responses to release of genetically modified Helicoverpa armigera

nucleopolyhedrovirusesincotton.JInvertebrPathol81,6369.

Sun, Y.M., Bosmans, F.,Zhu, R.H., Goudet, C., Xiong, Y.M., Tytgat, J. and Wang, D.C.,

2003.Importanceoftheconservedaromaticresiduesinthescorpionalphaliketoxin

BmKM1:thehydrophobicsurfaceregionrevisited.JBiolChem278,2412524131.

Szolajska,E.,Poznanski,J.,Ferber,M.L.,Michalik,J.,Gout,E.,Fender,P.,Bailly,I.,Dublet,

B.andChroboczek,J.,2004.Poneratoxin,aneurotoxinfromantvenom.Structureand

expressionininsectcellsandconstructionofabioinsecticide. EurJ Biochem 271,

21272136.

Tan,P.T.,Ranganathan,S.andBrusic,V.,2006a.Extractionoffunctionalpeptidemotifsin

scorpiontoxins.JPeptSci12,420427.

Tan, P.T., Veeramani, A., Srinivasan, K.N., Ranganathan, S. and Brusic, V., 2006b.

SCORPION2: a databasefor structurefunction analysisofscorpiontoxins.Toxicon

47,356363.

Tan,P.T.,Srinivasan,K.N.,Seah,S.H.,Koh,J.L.,Tan,T.W.,Ranganathan,S.andBrusic,V.,

2005. Accurate prediction of scorpion toxin functional properties from primary

structures.JMolGraphModel24,1724.

Tan,P.T.,Khan,A.M.andBrusic,V.,2003.Bioinformaticsforvenomandtoxinsciences.

BriefBioinform4,5362.

Theakston,R.D.,Warrell,D.A.andGriffiths,E.,2003. Report of aWHO workshop onthe

standardizationandcontrolofantivenoms.Toxicon41,541557.

165 References

Thompson, J.D., Higgins, D.G. and Gibson, T.J., 1994. CLUSTAL W: improving the

sensitivity of progressive multiple sequence alignment through sequence weighting,

positionspecificgappenaltiesandweightmatrixchoice.NucleicAcidsRes22,4673

4680.

Tsetlin,V.I. and Hucho, F., 2004. Snake andsnail toxins acting on nicotinic acetylcholine

receptors:fundamentalaspectsandmedicalapplications.FEBSLett557,913.

Tytgat,J.,Chandy,K.G.,Garcia,M.L.,Gutman,G.A.,MartinEauclaire,M.F.,vanderWalt,

J.J.andPossani,L.D.,1999.Aunifiednomenclatureforshortchainpeptidesisolated

fromscorpionvenoms:alphaKTxmolecularsubfamilies.TrendsPharmacolSci20,

444447.

Vacher, H., Alami, M., Crest, M., Possani, L.D., Bougis, P.E. and MartinEauclaire, M.F.,

2002. Expanding the scorpion toxin alphaKTX 15 family with AmmTX3 from

Androctonus mauretanicus .EurJBiochem269,60376041.

Vacher,H.,RomiLebrun,R.,Mourre,C.,Lebrun,B.,Kourrich,S.,Masmejean,F.,Nakajima,

T., Legros, C., Crest, M., Bougis, P.E. et al ., 2001. A new class of scorpion toxin

binding sitesrelatedto an Atype K + channel:pharmacological characterisation and

localizationinratbrain.FEBSLett501,3136.

Valdar,W.S.,2002.Scoringresidueconservation.Proteins48,227241.

ValdezCruz,N.A.,Batista,C.V.,Zamudio,F.Z.,Bosmans,F.,Tytgat,J.andPossani,L.D.,

2004a.Phaiodotoxin,anovelstructuralclassofinsecttoxinisolatedfromthevenom

oftheMexicanscorpion Anuroctonus phaiodactylus .EurJBiochem271,47534761.

ValdezCruz,N.A.,Davila,S.,Licea,A.F.,Corona,M.,Zamudio,F.,GarciaValdes,J.,Boyer,

L.andPossani,L.D.,2004b.Biochemical,geneticandphysiologicalcharacterisation

ofvenomcomponentsfromtwospeciesofscorpions:Centruroides exilicauda Wood

and Centruroides sculpturatus Ewing.Biochimie86,387396.

Valentin,E.andLambeau,G.,2000.Whatcanvenom phospholipases A 2 tell us about the

functional diversity of mammalian secreted phospholipases A 2?Biochimie82,815

831.

166 References

Vazquez,J.,Feigenbaum,P.,Katz,G.,King,V.F.,Reuben,J.P.,RoyContancin,L.,Slaughter,

R.S., Kaczorowski, G.J. and Garcia, M.L., 1989. Characterisation of high affinity

bindingsitesforcharybdotoxininsarcolemmalmembranesfrombovineaorticsmooth

muscle.Evidenceforadirectassociationwiththehighconductancecalciumactivated

potassiumchannel.JBiolChem264,2090220909.

Visan,V.,Fajloun,Z.,Sabatier,J.M.andGrissmer,S.,2004.Mappingofmaurotoxinbinding

sitesonhK V1.2,hKV1.3,andhIK Ca 1channels.MolPharmacol66,11031112. vonSegesser,L.K.,Mueller,X.,Marty,B.,Horisberger,J.andCorno,A.,2001.Alternatives

tounfractionatedheparinforanticoagulationincardiopulmonarybypass.Perfusion16,

411416.

Wagner,S.,Castro,M.S.,Barbosa,J.A.,Fontes,W.,Schwartz,E.N.,Sebben,A.,Rodrigues

Pires, O., Jr., Sousa, M.V. and Schwartz, C.A., 2003. Purification and primary

structuredeterminationofTf4,thefirstbioactivepeptideisolatedfromthevenomof

theBrazilianscorpion Tityus fasciolatus .Toxicon41,737745.

Wang,C.G.,Cai,Z.,Lu,W.,Wu,J.,Xu,Y.,Shi,Y.andChi,C.W.,2005.Anovelshortchain

peptideBmKXfromtheChinesescorpion Buthus martensi Karsch,sequencing,gene

cloningandstructuredetermination.Toxicon45,309319.

Wang,C.G.,Gilles,N.,Hamon,A.,LeGall,F.,Stankiewicz,M.,Pelhate,M.,Xiong,Y.M.,

Wang, D.C. and Chi, C.W., 2003. Exploration of the functional site of a scorpion

alphaliketoxinbysitedirectedmutagenesis.Biochem42,46994708.

Whisstock,J.C.andLesk,A.M.,2003.Predictionofproteinfunctionfromproteinsequence

andstructure.QRevBiophys36,307340.

White,J.,2000.Bitesandstingsfromvenomousanimals:aglobaloverview.TherDrugMonit

22,6568.

Wickenden, A.D., 2002a.K +channelsastherapeuticdrugtargets.PharmacolTher94,157

182.

Wickenden, A.D., 2002b. Potassium channels as antiepileptic drug targets.

Neuropharmacology43,10551060.

167 References

Wu,J.J.,He,L.L.,Zhou,Z.andChi,C.W.,2002.Geneexpression,mutation,andstructure

functionrelationshipofscorpiontoxinBmP05activeonSK Ca channels.Biochem41,

28442849.

Wu, C.H., Huang, H., Yeh, L.S. and Barker, W.C., 2003. Protein family classification and

functionalannotation.ComputBiolChem27(1),3747.

Wu, Y., Cao, Z., Yi, H.,Jiang, D., Mao,X., Liu,H. and Li, W., 2004. Simulation of the

interaction between ScyTx and small conductance calciumactivated potassium

channelbydockingandMMPBSA.BiophysJ87,105112.

Wulff,H.,Beeton,C.andChandy,K.G.,2003.Potassiumchannelsastherapeutictargetsfor

autoimmunedisorders.CurrOpinDrugDiscovDevel6,640647.

Xu,C.Q.,Brone,B.,Wicher,D.,Bozkurt,O.,Lu,W.Y.,Huys,I.,Han,Y.H.,Tytgat,J.,Van

Kerkhove,E.andChi,C.W.,2004a.BmBKTx1,anovel Ca 2+ activated K + channel

blockerpurifiedfromtheAsianscorpion Buthus martensi Karsch.JBiolChem279,

3456234569.

Xu,C.Q.,He,L.L.,Brone,B.,MartinEauclaire,M.F.,VanKerkhove,E.,Zhou,Z.andChi,

C.W.,2004b.Anovelscorpiontoxinblockingsmallconductance Ca 2+ activated K+

channel.Toxicon43,961971.

Xu,C.Q.,Zhu,S.Y.,Chi,C.W.andTytgat,J.,2003.TurretandporeblockofK +channels:

whatisthedifference?TrendsPharmacolSci24,446448;authorreply448449.

Yamaji,N.,Dai,L.,Sugase,K.,Andriantsiferana,M.,Nakajima,T.andIwashita,T.,2004.

Solution structure of IsTX. A male scorpion toxin from Opisthacanthus

madagascariensis (Ischnuridae).EurJBiochem271,38553864.

Yao,J.,Chen,X.,Li,H.,Zhou,Y.,Yao,L.,Wu,G.,Zhang,N.,Zhou,Z.,Xu,T.,Wu,H. et

al ., 2005. BmP09, a "long chain" scorpion peptide blocker of BK channels. J Biol

Chem280,1481914828.

Yu, K., Fu, W., Liu, H., Luo, X., Chen, K.X., Ding, J., Shen, J. and Jiang, H., 2004.

Computational simulations of interactions of scorpion toxins with the voltagegated

potassiumionchannel.BiophysJ86,35423555.

168 References

Zaki, T.I. and Maruniak,J.E., 2003.Three polymorphic genes encoding a depressant toxin

fromtheEgyptianscorpion Leiurus quinquestriatus quinquestriatus .Toxicon41,109

113.

Zamudio,F.Z.,Conde,R.,Arevalo,C.,Becerril,B.,Martin,B.M.,Valdivia,H.H.andPossani,

L.D., 1997. The mechanism of inhibition of ryanodine receptor channels by

imperatoxinI,aheterodimericproteinfromthescorpion Pandinus imperator .JBiol

Chem272,1188611894.

Zeng,X.C.,Peng,F.,Luo,F.,Zhu,S.Y.,Liu,H.andLi,W.X.,2001.Molecularcloningand

characterisation of four scorpion K +toxinlike peptides:a newsubfamily of venom

peptides(alphaKTx14)andgenomicanalysisofamember.Biochimie83,883889.

Zhang, N., Wu, G., Wu, H., Chalmers, M.J. and Gaskell, S.J., 2004. Purification,

characterisationandsequencedeterminationofBmKK4, a novelpotassium channel

blockerfromChinesescorpion Buthus martensi Karsch.Peptides25,951957.

Zhu,S.,Bosmans,F.andTytgat,J.,2004a.Adaptiveevolutionofscorpionsodiumchannel

toxins.JMolEvol58,145153.

Zhu,X.,Zamudio,F.Z.,Olbinski,B.A.,Possani,L.D.andValdivia,H.H.,2004b.Activation

ofskeletalryanodinereceptorsbytwonovelscorpiontoxinsfrom Buthotus judaicus .J

BiolChem279,2658826596.

Zhu,S.,Huys,I.,Dyason,K.,Verdonck,F.andTytgat,J.,2004c.Evolutionarytraceanalysis

ofscorpiontoxinsspecificforKchannels.Proteins54,361370.

Zhu,S.,Darbon,H.,Dyason,K.,Verdonck,F.andTytgat,J.,2003.Evolutionaryoriginof

inhibitorcystineknotpeptides.FasebJ17,17651767.

Zhu,S.andLi,W.,2002.Precursorsofthreeuniquecysteinerichpeptidesfromthescorpion

Buthus martensii Karsch.CompBiochemPhysiolBBiochemMolBiol131,749756.

Zhu, S.Y., Li, W.X. and Zeng, X.C., 2001. Precursor nucleotide sequence and genomic

organizationofBmTXKS1,anewscorpiontoxinlikepeptidefrom Buthus martensii

Karsch.Toxicon39,12911296.

Zilberberg,N.,Froy,O.,Loret,E.,Cestele,S.,Arad,D.,Gordon,D.andGurevitz,M.,1997.

169 References

Identification of structural elements of a scorpion alphaneurotoxin important for

receptorsiterecognition.JBiolChem272,1481014816.

Zlotkin, E., Kadouri, D., Gordon, D., Pelhate, M., Martin, M.F. and Rochat, H., 1985. An

excitatory and a depressant insect toxin from scorpion venom both affect sodium

conductanceandpossessacommonbindingsite.ArchBiochemBiophys240,877

887.

Zuo,X.P.andJi,Y.H.,2004.Molecularmechanismofscorpionneurotoxinsactingonsodium

channels:insightintotheirdiverseselectivity.MolNeurobiol30,265278.

170 Author’s publications

Author’sPublications Tan, P.T ., Ranganathan, S. and Brusic. V., 2006. Extraction of functional peptide motifsinscorpiontoxins.JPeptideSci12,420427. Tan, P.T .,Veeramani,A.,Srinivasan,K.N.,Ranganathan,S.andBrusic.V.,2006. SCORPION2: A database for structurefunction analysis of scorpion toxins. Toxicon47,356363. Tan,P.T. ,Srinivasan,K.N.,Seah,S.H.,Koh,J.L.Y.,Tan,T.W.,Ranganathan,S.and Brusic, V.,2005.Accuratepredictionofscorpiontoxin functional properties fromprimarystructures . JMolGraphModel24,1724. Tan, P.T. , Khan, A.M. and Brusic, V., 2003. Bioinformatics for venom and toxin sciences.BriefBioinform4,5362 Siew, J.P., Khan, A.M., Tan, P.T., Koh, J.L., Seah, S.H., Koo, C.Y., Chai, S.C., Armugam, A., Brusic, V. and Jeyaseelan, K., 2004. Systematic analysis of snakeneurotoxinsfunctionalclassificationusingadatawarehousingapproach. Bioinformatics20,34663480. Lenffer,J.,Lai,P.,ElMejaber,W.,Khan,A.M.,Koh,J.L., Tan,P.T. ,Seah,S.H.and Brusic, V., 2004. CysView: protein classification based on cysteine pairing patterns.NucleicAcidsRes.32,W350W355. Koh, J.L.Y., Krishnan, S.P.T., Seah, S.H., Tan, P.T., Khan, A.M., Lee, M.L. and Brusic, V., 2004. BioWare: A framework for bioinformatics data retrieval, annotationandpublishing.ACMSIGIRWorkshoponSearch&Discoveryin Bioinformatics,Sheffield,UK,July2004. Srinivasan, K.N., Gopalakrishnakone, P., Tan, P.T. , Chew, K.C., Cheng, B., Kini, R.M.,Koh,J.L.Y.,Seah,S.H.andBrusic,V.,2002.SCORPION,amolecular databaseofscorpiontoxins.Toxicon40,2331.

171 Apendix 1

Appendix1

The62groupsof393scorpionnativetoxinsequenceswereclassifiedbasedon ion channel specificity and primary sequence similarity using BLAST, multiple sequence alignment and phylogenetic analysis. The scorpion toxin sequences were broadlyclassifiedintofourbroadgroups,namelyNa +,K+,Ca 2+ andCl .Eachbroad groupwasthenclassifiedintogroupsandsubgroupsbyBLASTandmultiplesequence alignment,andverifiedbyphylogeneticanalysis.135K+toxinswereclassifiedinto32 groups, 222 Na + toxins were classified into 18 groups, while 8 Ca 2+ toxins were classifiedintofourgroups.K +group6wasfurtherclassifiedintothreesubgroupseach.

Na +groups2and9wereclassifiedintofoursubgroupseach,Na +group1intothree subgroupsandNa +group12intotwosubgroups.19Cl toxinswereclassifiedintotwo subgroups. Scorpine and nine scorpion toxin sequence with no annotated molecular targetwereassignedtothe‘defensin’and‘orphan’groups,respectively.

Legend : First column shows the accession numbers in SCORPION2. Second column shows aligned scorpiontoxinsequences.Dashesareintroducedforaligningsequences.Maturedtoxinisrepresentedby green single amnio acid notations, signal peptide by brown and fragment region by red. Conserved residues within groups/subgroups are highlighted in grey. Pairs of disulfide bridges are shown by differenthighlights.

Sodiumchanneltoxins Group01 Subgroup01a

DBACC

D000003 KKNGYAVDSSGKVSECLLNNYCNNICTKVYYATSGYCCLLSCYCFGLDDDKAVLKIKDATKSYCDVQIN

D000184 KKNGYAVDSSGKVSECLLNNYCNINCTKVYYATSGYCCLLSCYCFGLDDDKAVLKIKDATKSYCDVQIIN

D000136 MKFFLIFLVIFPIMGVLG KKNGYAVDSSGKVSECLLNNYCNNICTKVYYATSGYCCLLSCYCFGLDDDKAVLKIKDATKSYCDVQIIG

D000165 MKFFLIFLVIFPIMGVLG KKNGYAVDSSGKVAECLFNNYCNNECTKVYYADKGYCCLLKCYCFGLADDKPVLDIWDSTKNYCDVQIIDLS

Subgroup01b

DBACC

D000233 MKFFLLFLVVLPIMGVLGKKNGYAVDSKGKAPECFLSNYCNNECTKVHYADKGYCCLLS CYCFGLNDDKKVLEISGTTKKYCDFTIIN

172 Apendix 1

D000234 MKFFLLFLVVLPIMGVLGKKNGYAVDSKGKAPECFLSNYCNNECTKVHYADKGYCCLLS CYCFGLNDDKKVLEISDTTKKYCDFTIIN

D000006 KKNGYAVDSSGKAPECLLSNYCYNECTKVHYADKGYCCLLS CYCVGLSDDKKVLEISDARKKYCDFVTIN

D000007 KKNGYAVDSSGKAPECLLSNYCYNECTKVHYAEKGYCCLLS CYCVGLSDDKKVLEISDARKKYCDFVTIN

D000824 KKNGYAVDSSGKAPECLLSNYCNNECTKVHYADKGYCCLLS CYCFGLSDDKKVLEISDTRKKYCDYTIIN

D000825 KKNGYAVDSSGKAPECLLSNYCNNECTKVHYADKGYCCLLS CYCFGLSDDKKVLDISDTRKKYCDYTIIN

D000005 MKFLLLFLVVLPIMGVLGKKNGYAVDSSGKAPECLLSNYCYNECTKVHYADKGYCCLLS CYCFGLNDDKKVLEISDTRKSYCDTPIIN

D000139 KKDGYAVDSSGKAPECLLSNYCYNECTKVHYADKGYCCLLS CYCFGLNDDKKVLEISDTRKSYCDTPIIN

D000004 MKFLLLFLVVLPIMGVFGKKNGYAVDSSGKAPECLLSNYCNNECTKVHYADKGYCCLLS CYCFGLNDDKKVLEISDTRKSYCDTTIIN

D000125 KKNGYAVDSSGKAPECLLSNYCNNQCTKVHYADKGYCCLLS CYCFGLNDDKKVLEISDTRKSYCDTTIIN

D000137 MKFLLLFLVVLPIMGVLGKKNGYAVDSSGKAPECLLSNYCNNECTKVHYADKGYCCLLS CYCFGLNDDKKVLEISDTRKSYCDTTIIN

D000235 MKFFLLFLVVLPIMGVLGKKNGYAVDSKGKAPECFFSNYCNNECTKVHYAEKGYCCLLS CYCVGLNDDKKVMEISDTRKKICDTTIIN

D000236 MKFFLLFLVVLPIMGVLGKKNGFAVDSNGKAPECFFDHYCNSECTKVYYAEKGYCCTLS CYCVGLDDDKKVLDISDTRKKLCDFTLFN

Subgroup01c

DBACC

D000001 MKFFLMCLIIFPIMGVLG KKNGYPLDRNGKTTECSGVNAIAPHYCNSECTKVYYAESGYCCWGA CYCFGLEDDKPIGPMKDITKKYCDVQIIPS

D000002 MKFFLMCLIIFPIMGVLG KKNGYPLDRNGKTTECSGVNAIAPHYCNSECTKVYYAKSGYCCWGA CYCFGLEDDKPIGPMKDITKKYCDVQIIPS

Group02 Subgroup02a

DBACC

D000009 MKGMILFISCLLLIGIVVEC KEGYLMDHEGCKLSCFIRPSGYCGSECKIKKGSSGYCAWPA CYCYGLPNWVKVWDRATNKCGKK

D000012 MKGMILFISCLLLIGIVVEC KEGYLMDHEGCKLSCFIRPSGYCGRECGIKKGSSGYCAWPA CYCYGLPNWVKVWDRATNKCGKK

D000008 MKGMILFISCLLLIDIVVGGKEGYLMDHEGCKLSCFIRPSGYCGRECTLKKGSSGYCAWPA CYCYGLPNWVKVWDRATNKCGKK

Subgroup02b

DBACC

D000021 KEGYAMDHEGCKFSCFPRPAGFCDGYCKTHLKASSGYCAWPA CYCYGVPSNIKVWDYATNKC

D000246 KEGYAMDHEGCKFSCFIRPSGFCDGYCKTHLKASSGYCAWPA CYCYGVPSNIKVWDYATNKC

D000017 KEGYAMDHEGCKFSCFIRPAGFCDGYCKTHLKASSGYCAWPA CYCYGVPDHIKVWDYATNKC

D000222 KEGYAMDHEGCKFSCFIRPAGFCDGYCKTHLKASSGYCAWPA CYCYGVPDHIKVWDYATNKC

173 Apendix 1

Subgroup02c

DBACC

D000010 GREGYPADSKGCKITCFLTAAGYCNTECTLKKGSSGYCAWPA CYCYGLPESVKIWTSETNKC

D000162 MKRMILFISCLLLIDIVVG GREGYPADSKGCKITCFLTAAGYCNTECTLKKGSSGYCAWPA CYCYGLPDSVKIWTSETNKCGKK

D000172 GKEGYPTDKRGCKLTCFFT

D000245 GKEGYPVDSRGCKVTCFFTGAGYCDKECKLKKASSGYCAWPA CYCYGLPDSVPVYDNASNKCB

D000631 GKEGYPADSKGCKVTCFFTGVGYCDTECKLKKASSGYCAWPA CYCYGLPDSASVWDSATNKC

D000632 KDGYPVDSKGCKLSCVANNYCDNQCKMKKASGGHCYAMS CYCEGLPENAKVSDSATNIC

D000626 LKDGYPTNSKGCKISGCLPGENKFCLNECQ

Subgroup02d

DBACC

D000694 MTRFVLFICCFFLIGMVVECKDGYLVGNDGCKYSCFTRPGTYCANECSRVKGKDGYCYAWMA CYCYSMPNWVKTWDRATNRCGR GK

D000696 KDGYLVGNDGCKYNCLTRPGHYCANECSRVKGKDGYCYAWMA CYCYSMPDWVKTWSRSTNRCGR

D000695 KKEGYLVGNDGCKYGCITRPHQYCVHECELKKGTDGYCAYWLA CYCYNMPDWVKTWSSATNKCK

D000822 NKDGYLMEGDGCKMGCLTRKKASYCVDQCKEVGGKDGYCYAWLS CYCYNMPDSVEIWDSKNNKCGK

D000697 MKGMIMLISCLMLIDVVVESKNGYIIEPKGCKYSCFWGSSTWCNRECKFKKGSSGYCAWPA CWCYGLPDNVKIFDYYNNKCGK

Group03

DBACC

D000031 GRDAYIADSENCTYTCALNPYCNDLCTKNGAKSGYCQWAGRYGNACWCIDLPDKVPIRISGSCR

D000209 MNYLIVISFALLLMTGVES GRDAYIAKKENCTYFCALNPYCNDLCTKNGAKSGYCQWAGRYGNACWCIDLPDKVPIRIPGPCIGR

D000047 GRDAYIADSENCTYFCGSNPYCNDVCTENGAKSGYCQWAGRYGNACYCIDLPASERIKEPGKCG

D000210 MNYLIVISFALLLMTSVES GRDAYIADSENCTYFCGSNPYCNDLCTENGAKSGYCQWAGRYGNACWCIDLPDKVPIRIPGPCRGR

D000027 ERDGYIVQLHNCVYHCGLNPYCNGLCTKNGATSGSYCQWMTKWGNACYCYALPDKVPIKWLDPKCY

D000076 AEIKVRDGYIVYPNNCVYHCGLDPYCNDLCTGA

D000025 MNHLVMISLALLLLLGVES VRDAYIAKNYNCVYECFRDAYCNELCTKNGASSGYCQWAGKYGNACWCYALPDNVPIRVPGKCHRK

D000029 VRDAYIAKNYNCVYECFRDSYCNDLCTKNGASSGYCQWAGKYGNACWCYALPDNVPIRVPGKCH

D000030 VRDAYIAQNYNCVYFCMKDDYCNDLCTKNGASSGYCQWAGKYGNACWCYALPDNVPIRIPGKCHS

D000228 VRDAYIAQNYNCVYTCFKDAHCNDLCTKNGASSGYCQWAGKYGNACWCYALPDNVPIRIPGKCHRK

D000164 LLMTGVES GRDAYIAKNYNCVYHCFRDDYCNGLCTENGADSGYCYLAGKYGNACWCINLPDDKPIRIPGKCHRR

D000223 MNHLVMISLALLLMT GVES GRDAYIAQNYNCVYHCALNPYCNDLCTKNGAKSGYCQWFGSSGNACWCIDLPDNVPIKVPGKCHRK

174 Apendix 1

D000224 MNHLVMISLALLLMT GVES GRDAYIAQNYNCVYHCALNPYCNDLCTKNGAKSGYCQWFGSNGNACWCIDLPDSVPIKVPGKCHRK

D000225 MNHLVMISLALLLMT GVES GRDAYIAQNYNCVYHCFVNPYCNDLCTKNGAESGYCQWFTSSGNACWCINLPDSVPIKIPGKCHRK

D000075 VRDAYIAQNYNCVYTCFKNDYCNDICTKNGAXXGYC

D000226 VRDAYIAQNYNCVYDCARDAYCNDLCTKNGAKSGYCEWFGPHGDACWCIDLPNNVPIKVEGKCHRK

D000227 VRDAYIAQNYNCVYDCARDAYCNELCTKNGAKSGHCEWFGPHGDACWCIDLPNNVPIKVEGKCHRK

D000229 VRDAYIAQNYNCVYHCGRDAYCNELCSKNGAKSRTRGGYCHWFGPHGDACWCIDLPNNVPIKVEGKCHRK

D000230 VRDAYIAQNYNCVYACARDAYCNDLCTKNGARSGLFATFGPHGDACWCIALPNNVPLKVQGKCHRK

D000211 MNYLVMISFAFLLMTGVES VRDAYIAQNYNCVYHCARDAYCNELCTKNGAKSGSCPYLGEHKFACYCKDLPDNVPIRVPGKCHRR

D000267 MNYLVMISLALLIAGVDS ARDAYIAKNDNCVYECFQDSYCNDLCTKNGAKSGTCDWIGTYGDACLCYALPDNVPIKLSGECHR

D000028 GRDAYIAQPENCVYECAKNSYCNDLCTKNGAKSGYCQWLGRWGNACYCIDLPDKVPIRIEGKCHF

D000144 GRDAYIAQPENCVYECAKSSYCNDLCTKNGAKSGYCQWLGRWGNACYCIDLPDKVPIRIEGKCHFA

D000068 GRDAYIAQPENCVYECAQNSYCNDLCTKNGATSGYCQWLGKYGNACWCKDLPDNVPIRIPGKCHF

D000143 GRDAYIAQPENCVYECAKNSYCNDLCTKNGAKSGYCQWLGKYGNACWCEDLPDNVPIRIPGKCHF

D000045 ARDAYIAKPHNCVYECYNPKGSYCNDLCTENGAESGYCQILGKYGNACWCIQLPDNVPIRIPGKCH

D000040 MNYLVMISFALLLMTGVES VRDAYIAKPENCVYHCATNEGCNKLCTDNGAESGYCQWGGRYGNACWCIKLPDRVPIRVPGKCHR

D000156 MNYLVMISFALLLMTGVES VRDAYIAKPENCVYHCATNEGCNKLCTDNGAESGYCQWGGKYGNACWCIKLPDDVPIRVPGKCHR

D000126 MNYLVMISFALLLMKGVES VRDAYIAKPENCVYHCAGNEGCNKLCTDNGAESGYCQWGGRYGNACWCIKLPDDVPIRVPGKCHR

D000042 VRDAYIAKPENCVYECATNEYCNKLCTDNGAESGYCQWVGRYGNACXCIKLPDRVPIRVWGKCHG

D000037 MNYLVMISFALLLMKGVES VRDAYIAKPENCVYECGITQDCNKLCTENGAESGYCQWGGKYGNACWCIKLPDSVPIRVPGKCQR

D000038 MNYLVMISFALLLMTGVES VRDAYIAKPHNCVYECARNEYCNDLCTKNGAKSGYCQWVGKYGNGCWCIELPDNVPIRVPGKCHR

D000213 MNYLVMISFALLLMTGVES VRDAYIAKPHNCVYECARNEYCNDLCTKNGAKSGYCQWVGKYGNGCWCKELPDNVPIRVPGKCHR

D000142 VRDAYIAKPHNCVYECARNEYCNNLCTKNGAKSGYCQWSGKYGNGCWCIELPDNVPIRVPGKCH

D000141 VRDAYIAKPHNCVYECARNEYCNDLCTKDGAKSGYCQWVGKYGNGCWCIELPDNVPIRIPGNCH

D000155 MNYLVMVSFALLLMTGVES VRDGYIALPHNCAYGCLLNEFCNDLCTKNGAKIGYCNIQGKYGNACWCIELPDNVPIRVPGRCHPS

D000634 VRDGYIALPHNCAYGCLNNEYCNNLCTKDGAKIGYCNIVGKYGNACWCIQLPDNVPIRVPGRCHPA

D000611 GEDGYIADGDNCTYICTFNNYCHALCTDKKGDSGACDWWVPYGVVCWCEDLPTPVPIRGSGKCR

Group04

DBACC

D000069 MNYLVMISLALLLMTGVES VRDGYIVDSKNCVYHCVPPCDGLCKKNGAKSGSCGFLIPSGLACWCVALPDNVPIKDPSYKCHS R

D000070 MNYLIMFSLALLLVIGVES GRDGYIVDSKNCVYHCYPPCDGLCKKNGAKSGSCGFLVPSGLACWCNDLPENVPIKDPSDDCHK R

D000071 MNYLVMISLALLLMIGVES KRDGYIVYPNNCVYHCVPPCDGLCKKNGGSSGSCSFLVPSGLACWCKDLPDNVPIKDTSRKCTR

175 Apendix 1

D000072 MNYLVMISLALLLMIGVES KRDGYIVYPNNCVYHCIPPCDGLCKKNGGSSGSCSFLVPSGLACWCKDLPDNVPIKDTSRKCTR

D000625 MNYLVMISLALLLMIGVES VRDGYIVYPHNCVYHCIPSCDGLCKENGATSGSCGYIIKVGIACWCKDLPENVPIYDRSYKCYR

Group05

DBACC

D000074 MSSLMISTAMKGKAPYRQ VRDGYIAQPHNCAYHCLKISSGCDTLCKENGATSGHCGHKSGHGSACWCKDLPDKVGIIVHGEKCHR

D000078 GVRDGYIAQPHNCVYHCFPGSGGCDTLCKENGATQGSSCFILGRGTACWCKDLPDRVGVIVDGEKCH

D000621 VRDGYIAQPENCVYHCIPDCDTLCKDNGGTGGHCGFKLGHGIACWCNALPDNVGIIVDGVKCHK

D000622 VRDGYIAKPENCAHHCFPGSSGCDTLCKENGGTGGHCGFKVGHGTACWCNALPDKVGIIVDGVKCH

D000073 VRDGYIAQPENCVYHCFPGSSGCDTLCKEKGGTSGHCGFKVGHGLACWCNALPDNVGIIVEGEKCHS

D000077 GRDGYIAQPENCVYHCFPGSSGCDTLCKEKGATSGHCGFLPGSGVACWCDNLPNKVPIVVGGEKCH

D000628 MNYLVMISLALLFMIGVES ARDGYIAQPNNCVYHCIPLSPGCDKLCRENGATSGKCSFLAGSGLACWCVALPDNVPIKIIGQKCTR

Group06

DBACC

D000023 GVRDAYIADDKNCVYTCGSNSYCNTECTKNGAESGYCQWLGKYGNACWCIKLPDKVPIRIPGKCR

D000665 GVRDAYIADDKNCVYTCGANSYCNTECTKNGAESGYCQWFGKYGNACWCIKLPDKVPIRIPGKCR

D000817 MNYLVVICFALLLMTVVES GRDAYIADNLNCAYTCGSNSYCNTECTKNGAVSGYCQWLGKYGNACWCINLPDKVPIRIPGACRGR

D000627 MNYLITISLALLLMTGVASG VRDGYIADAGNCGYTCVANDYCNTECTKNGAESGYCQWFGRYGNACWCIKLPDKVPIKVPGKCNGR

D000014 MNYLVMISLALLFVTGVES VKDGYIVDDVNCTYFCGRNAYCNEECTKLKGESGYCQWASPYGNACYCYKLPDHVRTKGPGRCHGR

D000179 IKDGYIVDDVNCTYFCGRNAYCNEECTKLKGESGYCQWASPYGNACYCYKLPDHVRTKGPGRCR

D000019 VKDGYIVDDRNCTYFCGRNAYCNEECTKLKGESGYCQWASPYGNACYCYKVPDHVRTKGPGRCN

D000688 MNYLVMISLALLFMTGVES LKDGYIVNDINCTYFCGRNAYCNELCIKLKGESGYCQWASPYGNSCYCYKLPDHVRTKGPGRCND R

D000022 LKDGYIVDDRNCTYFCGTNAYCNEECVKLKGESGYCQWVGRYGNACWCYKLPDHVRTVQAGRCRS

D000026 LKDGYIVDDKNCTFFCGRNAYCNDECKKKGGESGYCQWASPYGNACWCYKLPDRVSIKEKGRCN

D000032 LKDGYIIDDLNCTFFCGRNAYCDDECKKKGGESGYCQWASPYGNACWCYKLPDRVSIKEKGRCN

D000268 MNYLVMISLALLFMTGVES KKDGYIVDDKNCTFFCGRNAYCNDECKKKGAESGYCQWASPYGNACYCYKLPDRVSTKKKGGCNGR

D000151 MNYMVIISLALLVMTGVES VKDGYIADDRNCPYFCGRNAYCDGECKKNRAESGYCQWASKYGNACWCYKLPDDARIMKPGRCNGG

D000163 MNYLVFFSLALLLMTGVES VKDGYIADDRNCPYFCGRNAYCDGECKKNRAESGYCQWASKYGNACWCYKLPDDARIMKPGRCNGG

D000203 MNYLVFFSLALLLMTGVES VRDGYIADDKNCAYFCGRNAYCDDECKKNGAESGYCQWAGVYGNACWCYKLPDKVPIRVPGKCNGG

D000614 MNYLVFFSLALLLMTGVES VRDGYIADDKNCAYFCGRNAYCDDECKKKGAESGYCQWAGVYGNACWCYKLPDKVPIRVPGKCNGG

D000212 MNYLVFFSLALLVMTGVES VRDGYIADDKNCAYFCGRNAYCDDECKKNGAESGYCQWAGVYGNACWCYKLPDKVPIRVPGKCNGG

176 Apendix 1

D000150 MNYLVFFSLALLLMTGVGS VRDGYIADDKNCPYFCGRNAYCDDECKKNGAESGYCQWAGVYGNACWCYKLPDKVPIRVPGKCNGG

D000063 VRDGYIADDKDCAYFCGRNAYCDEECKKGAESGKCWYAGQYGNACWCYKLPDWVPIKQKVSGKCN

D000244 VRDGYIADDKNCAYFCGRNAYCDEECIINGAESGYCQQAGVYGNACWCYKLPDKVPIRVSGECQQ

D000039 ARDAYIADDRNCVYTCALNPYCDSECKKNGADGSYCQWLGRFGNACWCKNLPDDVPIRKIPGEECR

Group07

DBACC

D000015 MVVVCLLTAGTEG KKDGYPVEYDNCAYICWNYDNAYCDKLCKDKKADSGYCYWVHIL CYCYGLPDSEPTKTNGKCKS GKK

D000016 LVVVCLLTAGTEG KKDGYPVEYDNCAYICWNYDNAYCDKLCKDKKADSGYCYWVHIL CYCYGLPDSEPTKTNGKCKS GKK

D000020 KKDGYPVEADNCAFVCFGYDNAYCDKLCGDKKADSGYCYWVHIL CYCYGLPDNEPTKTNGKC

D000024 KKDGYPVEGDNCAFACFGYDNAYCDKLCKDKKADDGYCVWSPD CYCYGLPEHILKEPTKTSGRC

D000011 KKDGYPVDSGNCKYECLKDDYCNDLCLERKADKGYCYWGKVS CYCYGLPDNSPTKTSGKCNPA

D000180 KIDGYPVDYWNCKRICWYNNKYCNDLCKGLKADSGYCWGWTLS CYCQGLPDNARIKRSGRCRA

Group08

DBACC

D000050 GRDGYVVKNGTNCKYSCEIGSEYEYCGPLCKRKNAKTGYCYAFA CWCIDVPDDVKLYGDDGTYCSS

D000055 ARDGYIVHDGTNCKYSCEFGSEYKYCGPLCEKKKAKTGYCYLFA CWCIEVPDEVRVWGEDGFMCWS

Group09 Subgroup09a

DBACC

D000013 MNSLLMITACFVLIGTVWA KDGYLVDAKGCKKNCYKLGKNDYCNRECRMKHRGGSYGYCYGFG CYCEGLSDSTPTWPLPNKTCSGK

D000121 KDGYLVDAKGCKKNCYKLGKNDYCNRECRMKHRGGSYGYCYGFG CYCEGLSDSTPTWPLTNKTC

D000018 MNSLLIITACLVLIGTVWA KDGYLVDVKGCKKNCYKLGENDYCNRECKMKHRGGSYGYCYGFG CYCEGLSDSTPTWPLPNKRCGGK

D000035 KDGYLVEKTGCKKTCYKLGENDFCNRECKWKHIGGSYGYCYGFG CYCEGLPDSTQTWPLPNKTC

D000603 MNSLLMITACLVLIGTVWA KDGYLVEKTGCKKTCYKLGENDFCNRECKWKHIGGSYGYFYGFG CYCEGLPDSTQTWPLPNKTCGKK

D000638 MNSLLMITACLVVIGTVWA KEGYLVDVKGCKKNCWKLGDNDYCNRECKWKHIGGSYGYCYGFG CYCEGLPDSTQTWPLPNKTCGKK

Subgroup09b

DBACC

D000059 KEGYIVNYHTGCKYTCAKLGDNDYCLRECK

D000616 MNSLLIIAACLALIGTVWAKEGYIVNYHTGCKYECFKLGDNDYCLRECKLRHGKGSGGYCYAFGCWCTHLYEQAVVWPLPKKKCNGK

177 Apendix 1

D000134 KEGYIVNYHDGCKYECYKLGDNDYCLRECKLRVGKGAGGYCYAFACWCTHLYEQA

D000673 MNSLLMITACLALIGTVWAKEGYIVNYHDGCKYECYKLGDNDYCLRECKLRYGKGAGGYCYAFGCWCTHLYEQAVVWPLPKKRCNGK

D000133 KEGYIVNYYTGCKFACAKLGDNDYCLRECKARYGKGAGGYCYAFGCWCTHLYEQAVVWPLPK

D000674 MNSLLMITACLAVIGTVWAKEGYIVNYYDGCKYACLKLGENDYCLRECKARYYKSAGGYCYAFACWCTHLYEQAVVWPLPNKTCYGK

D000181 KEGYLVNSYTGCKFECFKLGDNDYCLRECRQQYGKGSGGYCYAFGCWCTHLYEQAVVWPLPNKTCN

D000182 KEGYLVNSYTGCKFECFKLGDNDYCKRECKQQYGKSSGGYCYAFGCWCTHLYEQAVVWPLPNKTCN

D000044 KEGYLVNSYTGCKYECLKLGDNDYCLRECRQQYGKSGGYCYAFACWCTHLYEQAVVWPLPNKTCN

D000051 KEGYLVSKSTGCKYECLKLGDNDYCLRECKQQYGKSSGGYCYAFACWCTHLYEQAVVWPLPNKTCN

D000041 LLIITACLALIGTVWAKEGYLVDKNTGCKYECLKLGDNDYCLRECKQQYGKGAGGYCYAFACWCTHLYEQAIVWPLPNKRCSGK

D000123 KEGYLVNHSTGCKYECYKLGDNDYCLRECK

D000176 KEGYLVNHSTGCKYECYKLGDNDYCLRECKQQYGKGAGGYCYAFGCWCTHLYEQAVVWPLPKKTCN

D000612 MNSLLMITACLALVGTVWAKEGYLVNHSTGCKYECYKLGDNDYCLRECKQQYGKGAGGYCYAFGCWCTHLYEQAVVWPLPKKTCNGK

D000177 KEGYLVNHSTGCKYECFKLGDNDYCLRECKQQYGKGAGGYCYAFGCWCNHLYEQAVVWPLPKKTCN

D000057 KEGYIVNLSTGCKYECYKLGDNDYCLRECKQQYGKGAGGYCYAFGCWCTHLYEQAVVWPLPKKTCT

D000056 KEGYLVNHSTGCKYECFKLGDNDYCLRECRQQYGKGAGGYCYAFGCWCTHLYEQAVVWPLPNKTCS

D000043 MNSLLMITACLALVGTVWAKEGYLVNSYTGCKYECFKLGDNDYCLRECKQQYGKGAGGYCYAFGCWCTHLYEQAVVWPLKNKTCNGK

D000054 KEGYLVELGTGCKYECFKLGDNDYCLRECKARYGKGAGGYCYAFGCWCTQLYEQAVVWPLKNKTCR

D000122 KKDGYLVNKYTGCKVNCYKLGENKFCNRE

D000633 MNSLLMITACLVLFGTVWAKEGYLVNTYTGCKYICWKLGENKYCIDECKEIGAGYGYCYGFGCYCEGFPENKPTWPLPNKTCGRK

Subgroup09c

DBACC

D000053 KEGYLVNKSTGCKYGCFWLGKNENCDKECKAKNQGGSYGYCYSFA CWCEGLPESTPTYPLPNKSCS

D000672 MNSLLMITACLAEIGTVWA KEGYLVNKSTGCKYGCFWLGKNENCDKECKAKNQGGSYGYCYSFA CWCEGLPDSTPTYPLPNKSCSKK

D000666 MNSLLMITACLVLFGTVWA KEGYLVNKSTGCKYGCFWLGKNENCDMECKAKNQGGSYGYCYSFA CWCEGLPDSTPTYPLPNKSCSKK

D000671 MNSLLIITACLVLFVWA KEGYLVNKSTGCKYGCFWLGKNENCDMECKAKNQGGSYGYCYSFA CWCEGLPDSTPTYPLPNKSCSKK

D000046 MNSLLMITACLVLFGTVWA KEGYLVNKSTGCKYGCFWLGKNEGCDKECKAKNQGGSYGYCYAFG CWCEGLPESTPTYPLPNKTCSKK

D000048 KEGYLVKKSDGCKYDCFWLGKNEHCNTECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSC

D000608 MNSLLIITACFALVGTVWA KEGYLVKKSDGCKYDCFWLGKNEHCDTECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCGKK

D000600 MNSLLMITACFALVGTVWA KEGYLVKKSDGCKYDCFWLGKNEHCDLECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCGKK

D000609 MNSLLMITACFALVGTVWA KEGYLVKKSDGCKYDCFWLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCGKK

D000669 TVSA KEGYLVKKSNGCKYECFKLGENEHCDTECKAPNQGGSYGYCDTFE CWCEGLPESTPTWPLPNKSCGKK

178 Apendix 1

D000599 MNSLLIITVCLFLIGTVWA KEGYLVNKSTGCKYDCFWLGKNEHCDLECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCGKK

D000058 MNSLLIITACLFLIGTVWA KEGYLVNKSTGCKYGCLKLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCSRK

D000604 MNSLLIITACLFLIGTVWA KEGYLVNKSTGCKYGCLKLGENEGCDKECKAENQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCSRK

D000605 MNSLLIITACFALVGTVWA KEGYLVNKSTGCKYGCLKLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCSRK

D000606 MNSLLMITACLFLIGTVWA REGYLVNKSTGCKYGCLKLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCSRK

D000607 MNSLLMITACLFLIGTVWA KEGYLVNKSTACKYGCLKLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCSRK

D000052 KEGYLVKKSDGCKYGCLKLGENEGCDTECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSC

D000602 MNSLLITACLFLIGTVWAK EGYLVNKSTGCKYGYLKLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCGKK

D000601 MNSLLMITACLFLIGTVWA KEGYLVNKSTGCKYGCLKLGENEGCDKECKAKNQGGSYGYCYAFA CWCEGLPESTPTYPLPNKSCGKK

D000049 MNSLLMITACLFLIGTVWA KEGYLVNKSTGCKYGCLLLGKNEGCDKECKAKNQGGSYGYCYAFG CWCEGLPESTPTYPLPNKSCSKK

D000138 ITACLVLIGTVCA KEGYLVNKSTGCKYNCLILGENKNCDMECKAKNQGGSYGYCYKLA CWCEGLPESTPTYPIPGKTCRTK

D000630 KEGYMVNKSTGCSYSCPKTGESVYCDKECKAKNQGGSYGFCQYSN CWCEGLPESTPTWPLDDKPCD

D000635 MNSLLMITACLVLFGTVWS EKGYLVHEDTGCRYKCTFSGENSYCDKECKSQGGDSGICQSKA CYCQGLPEDTKTWPLIGKLCGRK

D000670 MNSLLMIIGCLVLIGTVWT KEGYLVNMKTGCKYGCYELGDNGYCDRKCKAESGNYGYCYTVG CWCEGLPNSKPTWPLPGKSCSGK

D000252 KDGYLVNKSTGCKYSCIENINDSHCNEECISSIRKGSYGYCYKFY CYCIGMPDSTQVYPIPGKTCSTE

Subgroup09d

DBACC

D000253 MNSLLMITACLILIGTVWAEDGYLFDKRKRCTLACIDKTGDKNCDRNCKKEGGSFGHCSYSACWCKGLPGSTPISRTPGKTCKK

D000636 MNSLLMITTCLILIGTVLAEDGYLFDKRKRCTLECIDKTGDKNCDRNCKNEGGSFGKCSYFACWCKGLPGITPISRTPGKTCKI

Group10

DBACC

D000159 MKIIIFLIVCSFVLIGVKADNGYLLNKYTGCKIWCVINNESCNSECKLRRGNYGYCYFWKLA CYCEGAPKSELWAYETNKCNGKM

D000685 MKIIIFLIVSSLMLIGVKTDNGYLLNKATGCKVWCVINNASCNSECKLRRGNYGYCYFWKLA CYCEGAPKSELWAYATNKCNGKL

D000160 MKTVIFLIVSSLLLIGVKTDNGYLLDKYTGCKVWCVINNESCNSECKIRGGYYGYCYFWKLA CFCQGARKSELWNYNTNKCNGKL

D000062 EHGYLLNKYTGCKVWCVINNEECGYLCNKRRGGYYGYCYFWKLA CYCQGARKSELWNYKTNKCDL

D000090 EDGYLLNRDTGCKVSCGTCRYCND

Group11

DBACC

D000036 MKLLLLLIVSASMLIESLVNA DGYIRKRDGCKLSCLFGNEGCNKECKSYGGSYGYCWTWGLA CWCEGLPDEKTWKSETNTCG

179 Apendix 1

D000598 MKLLLLLIVSASMLIESLVNA DGYIRKRDGCKVSCLFGNEGCDKECKSYGGSYGYCWTWGLA CWCEGLPDEKTWKSETNTCG

D000597 MKLLLLLIVSASMLIESLVNA DGYIRKRDGCKVSCLFGNEGCDKECKAYGGSYGYCWTWGLA CWCEGLPDDKTWKSETNTCG

D000033 DGYIRKRDGCKVSCLFGNEGCDKECKAYGGSYGYCWTWGLA CWCEGLPDDKTWKSETNTCG

D000034 DGYIRRRDGCKVSCLFGNEGCDKECKAYGGSYGYCWTWGLA CWCEGLPDDKTWKSETNTCG

D000081 DGYIRRRDGCKVSCLFGNEGCDKECKAYGGSYGYCWTWGLA CWCEGLPDDKTWKSETNTCG

D000124 MKLLLLLIVSASMLIESLVNA DGYIKRRDGCKVACLIGNEGCDKECKAYGGSYGYCWTWGLA CWCEGLPDDKTWKSETNTCGGKK

D000231 MKLLLLLIVSASMLIESLVNA DGYIKRRDGCKVACLVGNEGCDKECKAYGGSYGYCWTWGLA CWCEGLPDDKTWKSETNTCGGKK

D000060 MKLLLLLVISASMLLECLVNA DGYIRKKDGCKVSCIIGNEGCRKECVAHGGSFGYCWTWGLA CWCENLPDAVTWKSSTNTCGRKK

D000146 MDGYIRGSNGCKVSCLWGNEGCNKECRAYGASYGYCWTWGLA CWCEGLPDDKTWKSESNTCG

D000272 DGYIRGSNGCKVSCLWGNEGCNKECRAYGASYGYCWTWGLA CWCEGLPDDKTWKSESNTCG

D000683 MKLFLLLLISASMLIDGLVNA DGYIRGSNGCKVSCLWGNEGCNKECRAYGASYGYCWTWGLA CWCQGLPDDKTWKSESNTCGGKK

D000148 MKLFLLLVISASMLIDGLVNA DGYIRGSNGCKVSCLWGNEGCNKECKAFGAYYGYCWTWGLA CWCQGLPDDKTWKSESNTCGGKK

D000675 MKLFLLLVISASMLIDGLVNA DGYIRGSNGCKVSCLWGNEGCNKECKAFGAYYGYCWTWGLA CWCEGLPDDKTWKSESNTCGGKK

D000149 MDGYIRGSNGCKISCLWGNEGCNKECKGFGAYYGYCWTWGLA CWCEGLPDDKTWKSESNTCG

D000676 MKLFLLLVFFASMLIDGLVNA DGYIRGSNGCKISCLWGNEGCNKECKGFGAYYGYCWTWGLA CWCEGLPDDKTWKSESNTCGGKK

D000623 MKLSLLLVISASMLIDGLVNA DGYIRGSNGCKISCLWGNEGCNKECKGFGAYYGYCWTWGLA CWCEGLPDDKTWKSESNTCGGKK

D000678 MKLFLLLVISASMLIDGLVNA DGYIRGSNGCKVSCLWGNEGCNKECGAYGASYGYCWTWGLA CWCEGLPDDKTWKSESNTCGGKK

D000271 MKLSLLLVISASMLIDGLVNA DGYIRGSNGCKVSCLWGNDGCNKECRAYGASYGYCWTWGLA CWCEGLPDDKTWKSESNTCGGKK

D000080 DGYIRGSDNCKVSCLLGNEGCNKECRAYGASYGYCWTVKLAQDCEGLPDT

D000158 MKLFLLLVISASMLIDGLVNA DGYIRGSNGCKVSCLLGNEGCNKECRAYGASYGYCWTWKLA CWCEGLPDDKTWKSESNTCGGKK

D000204 DGYIKGKSGCRVACLIGNQGCLKDCRAYGASYGYCWTWGLA CWCEGLPDNKTWKSESNTCG

D000145 MKLFLLLVIFASMLNDGLVNA DGYIRGSDGCKVSCLWGNDFCDKVCKKSGGSYGYCWTWGLA CWCEGLPDNEKWKYESNTCGSKK

D000256 DGYILMRNGCKIPCLFGNDGCNKECKAYGGSYGYCWTYGLA CACEGQPEDKKHLNYHKKTC

D000061 DGYIRGGDGCKVSCVIDHVFCDNECKAAGGSYGYCWGWGLA CWCEGLPADREWKYETNTCG

D000220 DGYIKRHDGCKVTCLINDNYCDTECKREGGSYGYCYSVGFA CWCEGLPDDKAWKSETNTCD

D000232 MKLLLLLIITASMLIEGLVNA DVYIRRHDGCKISCTVNDKYCDNECKSEGGSYGYCYAFG CWCEGLPNDKAWKSETNTCGGKK

D000221 DGYIKGYKGCKITCVINDDYCDTECKAEGGTYGYCWKWGLA CWCEDLPEDKRWKPETNTC

D000157 DGYPKQKDGCKYSCTINHKFCNSVCKSNGGDYGYCWFWGLA CWCEGLPDNKMWKYETNTCG

D000663 DGYPKQKNGCKYDCIINNKWCNGICKMHGGYYGYCWGWGLA CWCEGLPEDKKWWYETNKCGR

D000637 ARDGYPVDEKGCKLSCLINDKWCNSACHSRGGKYGYCYTGGLA CYCEAVPDNVKVWTYETNTC

D000257 DGYIKKSKGCKVSCVINNVYCNSMCKSLGGSYGYCWTYGLA CWCEGLPNAKRWKYETKTCK

D000258 DGYILNSKGCKVSCVVSIVYCNSMCKSSGGSYGYCWTWGLA CWCEGLPNSKRWTSSKNKCN

180 Apendix 1

D000259 DGYIKGNKGCKVSCVINNVFCNSMCKSSGGSYGYCWSWGLA CWCEGLPAAKKWLYAATNTCG

Group12 Subgroup12a

DBACC

D000130 KDGYLMEPNGCKLGCLTRPAKYCWXEE

D000131 KDGYLVGTDGCKYGCFTRPGHFCANEECL

D000132 KDGYLMGADGCKLCVLTAPYDY CAC E

Subgroup12b

DBACC

D000064 ADGYVKGKSGCKISCFLDNDLCNADCKYYGGKLNSWCIPDKSGY CWCPNKGWNSIKSETNTC

Group13

DBACC

D000153 MKAALLLVIFSLMLIGVLT KKSGYPTDHEGCKNWCVLNHSCGILCEGYGGSGYCYFWKLA CWCDDIHNWVPTWSRATNKCRA K

Group14

DBACC

D000264 MNYLVMISFALLLVIGVES VRDGYFVEPDNCVVHCMPSSEMCDRGCKHNGATSGSCKAFSKGGNACWC KGLR

D000266 MNYLVMISFALLLVIGVES VRDGYFVEPDNCVIYCMPSSEVCDRGCKHNGATSGTCKEFSKGGNVCWC KGLR

D000265 MNYLVMISFALLLVIGVES VRDGYFVEPDNCLVYCMPSPEICDRGCKRYGATSGFCKEFSKGENFCWC KGLR

Group15

DBACC

D000833 ADVPGNYPLDKDGNTYTCLELGENKDCQKVCKLHGVQYGYCYAFS CWC KEYLDDKDSV

D000834 ADVPGNYPLDKDGNTYTCLELGENKDCQKVCKLHGVQYGYCYAFF CWC KELDDKDVSV

D000701 ADVPGNYPLDKDGNTYKCLKLGENKDCQKVCKLHGVQYGYCYAFE CWC KEYLDDKDSV

D000277 ADVPGNYPLDKDGNTYKCFLLGGNEECLNVCKLHGVQYGYCYASK CWC EYLEDDKDSV

D000684 ADVPGNYPLDKDGNTYKCFLLGENEECLNVCKLHGVQYGYCYASK CWC EYLEDDKDSV

181 Apendix 1

Group16

DBACC

D000819 MKTIPLLFLLFIYFECDG KFIRHKDESFYECGQLIGYQQYCVDACQAHGSKEKGYCKGMAPFGLPGG CYCPKLPSNRVKMCFGALESKCA

D000821 KFIRHKDESFYECGQLIGYQQYCVNACQAHGSKEKGYCKGMAPFGLPGG CYCPKLPSNRVKMCFGALESKCA

D000820 KFIRHKDESFYECGQSIGYQQYCVDACQAHGSKEKGYCKAMAPFGLPGG CYCPKLPSNRVKMCFGALESKCA

Group17

DBACC

D000681 MVKMQVIFIAFIAVIA CSMVYGDSLSPWNEGDTYYGCQRQTDEFCNKICKLHLASGGSCQQPAPFVKL CTC QGIDYDNSFFFGALEKQCPKLRE

Group18

DBACC

D000702 RDGYPLASNGCKFGCSGLGENNPTCNHVCEKKAGSDYGYCYAWT CYCEHVAEGTVLWGDSGTGPCRS

Potassiumchanneltoxins:Alpha,Beta,GammaandDeltaKtx AlphaKTx01

DBACC

D000100 MKILSVLLLALIICSIVGWSEA QFTNVSCTTSKECWSVCQRLHNTSRGKCMNKKCRCYS

D000241 MKILSVLLLALIICSIVGWSEA QFTDVSCTTSKECWSVCQRLHNTSIGKCMNKKCRCYS

D000242 QFTNVSCTTSKECWSVCEKLYNTSRGKCMNKKCRCYS

D000102 QFTQESCTASNQCWSICKRLHNTNRGKCMNKKCRCYS

D000117 MKISFLLLLAIVICSIGWTEA QFTNVSCSASSQCWPVCKKLFGTYRGKCMNSKCRCYS

D000097 QFTDVDCSVSKECWSVCKDLFGVDRGKCMGKKCRCYQ

D000118 MKISFLLLALVICSIGWSEA QFTDVKCTGSKQCWPVCKQMFGKPNGKCMNGKCRCYS

D000195 KFIDVKCTTSKECWPPCKAATGKAAGKCMNKKCKCQ

AlphaKTx02

DBACC

D000197 KVIDVKCTSPKQCLPPCKAQFGD

D000198 TVIDVKCTSPKQCLPPCAKQ

182 Apendix 1

D000194 TVIDVKCTSPKQCLPPCKAQFGIRAGAKCMNGK CKCYPH

D000067 TVIDVKCTSPKQCLPPCKEIYGRHAGAKCMNGK CKC

D000066 TIINVKCTSPKQCLPPCKAQFGQSAGAKCMNGK CKCYPH

D000196 TFINVKCTSPKQCLPACKEKFGXAAGKCMNGKCK

D000065 ITINVKCTSPQQCLRPCKDRFGQHAGGKCINGK CKCYP

D000089 TIINVKCTSPKQCSKPCKELYGSSAGAKCMNGK CKCYNN

D000175 TIINEKCFATSQCWTPCKKAIGSLQSKCMNGK CKCYNG

AlphaKTx03

DBACC

D000083 GVPINVSCTGSPQCIKPCKDAGMRFGKCMNRKCHCTPK

D000084 GVPINVPCTGSPQCIKPCKDAGMRFGKCMNRKCHCTPK

D000200 GVPINVKCRGSPQCIQPCRDAGMRFGKCMNGKCHCTPQ

D000082 GVPINVKCTGSPQCLKPCKDAGMRFGKCINGKCHCTPK

D000085 GVEINVKCSGSPQCLKPCKDAGMRFGKCMNRKCHCTPK

D000086 GVIINVKCKISRQCLEPCKKAGMRFGKCMNGKCHCTPK

D000087 MKVFSAVLIILFVCSMIIGINA VRIPVSCKHSGQCLKPCKDAGMRFGKCMNGKCDCTPK

D000617 VGIPVSCKHSGQCIKPCKDAGMRFGKCMNRKCDCTPK

D000119 MKVFFAVLITLFICSMIIGIHG VGINVKCKHSGQCLKPCKDAGMRFGKCINGKCDCTPKG

D000173 MKVFFAVLITLFISSMIIGIHG VGINVKCKHSGQCLKPCKDAGMRFGKCINGKCDCTPKG

AlphaKTx04

DBACC

D000091 VFINVKCRGSPECLPKCKEAIGKAAGKCVN

D000092 VFINVKCRGSPECLPKCKEAFGKAAGKCVN

D000093 VFINVKCRGSPECLPKCKEAIGKAAGKCMN

D000088 VFINAKCRGSPECLPKCKEAIGKAAGKCMNGK CKCYP

D000147 VFINVKCTGSKQCLPACKAAVGKAAGKCMNGK CKCYT

D000699 VFINVKCRGSKECLPACKAAVGKAAGKCMNGK CKCYP

D000216 EVDMRCKSSKECLVKCKQATGRPNGKCMNRK CKCYPR

D000095 MKVLYGILIIFILCS MFYLSQE VVIGQRCYRSPDCYSACKKLVGKATGKCTNGR CDC

183 Apendix 1

AlphaKTx05

DBACC

D000107 TVCNLRRCQLSCRSLGLLGKCIGVK CECVKH

D000168 MHNYYKIVLIMVAFFAVIITFSNIQVEG AVCNLKRCQLSCRSLGLLGKCIGDK CECVKH GK

D000103 AFCNLRMCQLSCRSLGLLGKCIGDK CECVKH

D000248 AFCNLRRCELSCRSLGLLGKCIGEE CKCVPY

D000249 AFCNLRRCELSCRSLGLLGKCIGEE CKCVPH

AlphaKTx06a

DBACC

D000190 ASCRTPKDCADPCRKETGCPYGKCMNRK CKCNRC

D000818 QKECTGPQHCTNFCRKNKCTHGKCMNRK CKCFNCK

D000094 VSCTGSKDCYAPCRKQTGCPNAKCINKS CKCYGC

D000096 LVKCRGTSDCGRPCQQQTGCPNSKCINRM CKCYGC

AlphaKTx06b

DBACC

D000191 IEAIRCGGSRDCYRPCQKRTGCPNAKCINKT CKCYGCS

D000192 DEAIRCTGTKDCYIPCRYITGCFNSRCINKS CKCYGCT

D000592 MNAKFILLLVLTTMMLLPDTKGAEVIRCSGSKQCYGPC KQQTGCTNSKCMNKV CKCYGCG

D000593 MNAKFILLLLVVTTTTLLPDAKGAEIIRCSGTRECYAPCQKLTGCLNAKCMNKA CKCYGCV

D000595 MNAKFILLLLVVTTTMLLPDTQGAEVIKCRTPKDCADPCRKQTGCPHGKCMNRT CRCNRCG

D000596 MNAKFILLLLVVATTMLLPDTQGAEVIKCRTPKDCAGPCRKQTGCPHGKCMNRT CRCNRCG

D000594 MNAKFILLLLVVTTTILLPDTQGAEVIKCRTPKDCADPCRKQTGCPHAKCMNKT CRCHRCG

AlphaKTx06c

DBACC

D000829 MKVAYLLVLFTIMMLANDASL VHTNIPCRGTSDCYEPCEKKYNCARAKCMNRH CNCYNNCPW R

AlphaKTx07

DBACC

D000098 TISCTNEKQCYPHCKKETGYPNAKCMNRK CKCFGR

184 Apendix 1

D000099 RGSVDYKDDDDK TISCTNPKQCYPHCKKETGYPNAKCMNRK CKCFGR

AlphaKTx08

DBACC

D000113 VSCEDCPDHCSTQKARAKCDNDK CVCEPI

D000114 VSCEDCPDHCSTQKARAKCDNDK CVCEPK

D000112 VSCEDCPEHCSTQKAQAKCDNDK CVCEPI

D000169 MSRLYAIILIALVFNVVMTITPDMKVEA ATCEDCPEHCATQNARAKCDNDK CVCEPK

D000263 MSRLYAIILIALVFNVIMTIIPDMKVEA ATCEDCPEHCATQNARAKCDNDK CVCEPK

AlphaKTx09

DBACC

D000174 MIVLFTLVLIVLAMNVTMAIISDPVVEA VGCEECPMHCKGKNANPTCDDGVCNCNV

D000693 VGCEECPMHCKGKHAVPTCDDGVCNCNV

D000167 MSRLFTLVLIVLAMNVMMAIISDPVVEA VGCEECPMHCKGKNANPTCDDGVCNCNV

D000170 MSRLFTLVLIVLAMNVMMAIISDPVVEA VGCEECPMHCKGKNAKPTCDDGVCNCNV

D000115 VGCEECPMHCKGKNAKPTCDNGVCNCNV

D000116 VGCEEDPMHCKGKQAKPTCCNGVCNCNV

D000687 VGCAECPMHCKGKMAKPTCENEVCKCNIGKKD

AlphaKTx10

DBACC

D000129 MEGIAKITLILLFLFVTMHTFA NWNTEA AVCVYRTCDKDCKRRGYRSGKCINNA CKCYPY GK

D000208 VACVYRTCDKDCTSRKYRSGKCINNA CKCYPY

AlphaKTx11

DBACC

D000214 DEEPKESCSDEMCVIYCKGEEYSTGVCDGPQK CKCSD

D000215 DEEPKETCSDEMCVIYCKGEEYSTGVCDGPQK CKCSD

D000686 DEEPKETCSDDMCVIYCKGEEFSTGACDGPQK CKCS

185 Apendix 1

AlphaKTx12

DBACC

D000193 WCSTCLDLACGASRECYDPCFKAFGRAHGKCMNNK CRCYT

D000700 WCSTCLDLACGASRECYDPCFKAFGRAHGKCMNNK CRCYT

AlphaKTx13

DBACC

D000202 ACGSCRKKCKGSGKCINGR CKCY

AlphaKTx14

DBACC

D000243 MKIFFAILLILAVCSMAIWTVNG TPFAIKCATDADCSRKCPGNPSCRNGF CACT

D000682 MKIFFAILLILAVCSMAIWTVNG TPFAIKCATNADCSRKCPGNPPCRNGF CACT

D000274 MKIFFAILLILAVCSMAIWTVNG TPFAIKCATDADCSRKCPGNPPCRNGF CACT

D000262 MKIFFAILLILAVCSMAIWTVNG TPFEVRCATDADCSRKCPGNPPCRNGF CACT

AlphaKTx15

DBACC

D000183 MKFSSIILLTLLICSMSIFGNCQIETNKKCQGGSCASVCRRVIGVAAGKCINGRCVCYP

D000689 QIETNKKCQGGSCASVCRKVIGVAAGKCINGRCVCYP

D000624 MKFSSIILLTLLICSMSIFGNCQVETNKKCQGGSCASVCRRVIGVAAGKCINGRCVCYP

D000618 MKFSSIILLTLLICSMSIFGNCQVQTNVKCQGGSCASVCRREIGVAAGKCINGKCVCYRN

D000835 MKFSSIILLTLLICSMSIFGNGQVQTNKKCKGGSCASVCAKEIGVAAGKCINGRCVCYP

D000269 MKFSSIILLTLLICSMSKFGNCQVETNVKCQGGSCASVCRKAIGVAAGKCINGRCVCYP

D000826 QIDTNVKCSGSSKCVKICIDRYYNTRGAKCINGRCTCYP

AlphaKTx16

DBACC

D000152 MKIFSILLVALIICSISICTEA FGLIDVKCFASSECWTACKKVTGSGQGKCQNNQCRCY

D000620 MKIFSILLVALIICSISICTEA FGLIDVKCFASSECWIACKKVTGSVQGKCQNNQCRCY

D000201 DLIDVKCISSQECWIACKKVTGRFEGKCQNRQCRCY

D000101 GLIDVRCYDSRQCWIACKKVTGSTQGKCQNKQCRCY

186 Apendix 1

D000108 GLIDVRCYDSSQCE

AlphaKTx17

DBACC

D000276 MKFIIVLILISVLIATIVPVNEA QTQCQSVRDCQQYCLTPDRCSYGT CYCKTT GK

AlphaKTx18

DBACC

D000698 TGPQTTCQAAMCEAGCKGLGKSMESCQGDT CKCKA

AlphaKTx19

DBACC

D000613 AACYSSDCRVKCVAMGFSSGKCINSK CKCYK

AlphaKTx20

DBACC

D000273 ACGPGCSGSCRQKGDRIKCINGS CHC YP

AlphaKTx21

DBACC

D000270 MNRLTTIILMLIVINVIMDDISESKVAA GIVCKVCKIICGMQGKKVNICKAPIK CKCKK G

AlphaKTx22

DBACC

D000250 RCHFVVCTTDCRRNSPGTYGECVKKEKGKE CVCKS

D000251 RCHFVICTTDCRRNSPGTYGECVKKEKGKE CVCKS

AlphaKTx23

DBACC

D000830 DPCYEVCLQQHGNVKECEEACKHPVE

D000831 DPCYEVCLQQHGNVKECEEACKHPVEY

187 Apendix 1

D000832 NDPCEEVCIQHTGDVKACEEACQ

AlphaKTx24

DBACC

D000166 CQNECCGISSLRERNYCANLVCIN CFC QGRTYKI CRC FFSIHAIR

AlphaKTx25

DBACC

D000677 MQKLFIVFVLFCILRLDAEVDG KTATFCTQSICQESCKRQNKNGRCVIEAEGSLIYHL CKC Y

BetaKTx

BetaKTx01

DBACC

D000187 MQRNLVVLLFLGMVALSSC GLREKHFQKLVKYAVPEGTLRTIIQTAVHKLGKTQFGCPAYQGYCDDHCQDIKKEEGFCHGFK CKC GIPMGF

D000189 MQRNLVVLLFLGMVALSSC GLREKHVQKLVKYAVPVGTLRTILQTVVHKVGKTQFGCPAYQGYCDDHCQDIKKEEGFCHGFK CKC GIPMGF

D000188 RKLALLLILGMVTLASC GLREKHVQKLVALIPNDQLRSILKAVVHKVAKTQFGCPAYEGYCNDHCNDIERKDGECHGFK CKC AKD

BetaKTx02

DBACC

D000186 MMKQQFFLFLAVIVMISSVIEA GRGKEIM KNIKEKLTEVKDKMKHSWNKLTSMSEYACPVIEKWCEDHCAAKKAIGKCEDTE CKC LKLRK

BetaKTx03

DBACC

D000823 DNGYLLNKYTGCKIWCVINNESCNSECKLRRGNYGYCYFWKLA CYCEGAPKSELWAYETNKCNGKM

GammaKTx

GammaKTx01

DBACC

D000247 MKISFVLLLTLFICSIGWSEA RPTDIKCSESYQCFPVCKSRFGKTNGRCVNGFCDCF

D000619 MKISFVLLLTLFICSIGWSEA RPTDIKCSASYQCFPVCKSRFGKTNGRCVNGLCDCF

188 Apendix 1

GammaKTx02

DBACC

D000647 DRDSCVDKSKCSKYGYYGQCDECCKKAGDRAGNCVYFKCKCNP

D000653 DRDSCVDKSKCSKYGYYGQCDKCCKKAGDRAGNCVYFKCKCNQ

D000657 DRDSCVDKSKCAKYGYYGQCDECCKKAGDRAGNCVYLKCKCNQ

D000655 DRDSCVDKSRCGKYGYYGQCDDCCKKAGDRAGTCVYYKCKCNP

D000659 DRDSCVDKSRCGKYGYYGQCDECCKKAGDRAGTCVYYKCKCNP

D000658 DRDSCVDKSQCGKYGYYGQCDECCKKAGERVGTCVYYKCKCNP

D000654 DRDSCVDKSKCGKYGYYGQCDECCKKAGDRAGTCVYYKCKCNP

D000662 ERDSCVEKSKCGKYGYYGQCDECCKKAGDRAGTCVYYKCKCNP

D000650 DRDSCVDKSKCGKYGYYHQCDECCKKAGDRAGNCVYYKCKCNP

D000645 DRDSCVDKSKCGKYGYYGQCDECCKKAGDRAGICEYYKCKCNP

D000651 DRDSCVDKSKCAKYGYYYQCDECCKKAGDRAGTCEYFKCKCNP

D000656 DRDSCVDKSQCAKYGYYYQCDECCKKAGDRAGTCEYFKCKCNP

GammaKTx03

DBACC

D000648 GRDSCVNKSRCAKYGYYSQCEVCCKKAGHKGGTCDFFKCKCKV

D000649 DRDSCVDKSRCAKYGYYGQCEVCCKKAGHRGGTCDFFKCKCKV

D000644 DRDSCVDKSRCAKYGYYQQCEICCKKAGHRGGTCEFFKCKCKV

D000661 DRDSCVDKSRCAKYGYYGQCEVCCKKAGHNGGTCMFFKCMCVNSKMN

D000639 DRDSCVDKSRCAKYGYYQECTDCCKKYGHNGGTCMFFKCKCA

D000641 DRDSCVDKSRCAKYGHYQECTDCCKKYGHNGGTCMFFKCKCA

D000218 MKVLILIMIIASLMIMGVEM DRDSCVDKSRCAKYGYYQECQDCCKNAGHNGGTCMFFKCKCA

D000219 DRDSCVDKSRCAKYGYYQECQDCCKNAGHNGGTQMFFKCKAP

D000660 DRDSCVDKSKCGKYGYYQECQDCCKNAGHNGGTCVYYKCKCNP

D000642 DRDSCVDKSRCSKYGYYQECQDCCKKAGHNGGTCMFFKCKCA

D000640 DRDSCVDKSRCAKYGYYQECQDCCKKAGHSGGTCMFFKCKCA

D000643 DRDSCVDKSRCAKYGYYQECQDCCKKAGHNGGTCMFFKCKCA

D000646 DRDSCVDKSRCQKYGNYAQCTACCKKAGHNKGTCDFFKCKCT

D000652 DRDSCVDKSRCQKYGPYGQCTDCCKKAGHTGGTCIYFKCKCGAESGR

189 Apendix 1

DeltaKTx

DeltaKTx01

DBACC

D000260 GHACYRNCWREGNDEETCKERC

D000261 GHACYRNCWREGNDEETCKERCG

D000816 GFGCYRSCWKAGHDEETCKRECS

Calciumchanneltoxins Group01

DBACC

D000610 MKPSLIIVTFIVVFMAISCVAA DDEQETWIEKR GDCLPHLKRCKENNDCCSKKCKRRGTNPEKRCR

D000703 MKPSLIIVTFIVVFMTISCVAA DDEQETWIEKR GDCLPHLKRCKENNDCCSKKCKRRGANPEKRCR

D000178 GDCLPHLKLCKENKDCCSKKCKRRGTNIEKRCR

D000185 GDCLPHLKRCKADNDCCGKKCKRRGTNAEKRCR

Group02

DBACC

D000135 MHTPKHAIQRISKEEMEFFEGRCERMGEADE TMWGTKWCGSGNEATDISELGYWSNLDSCCRTHDHCDNIPSGQTKYGLTNEGK

YTMMN CKC ETAFEQCLRNVTGGMEGPAAGFVRKTYFDLYGNGCYNVQCPSQRRLARSEECPDGVATYTGEAGYGAWAINKLNG

Group03

DBACC

D000180 KIDGYPVDYWNCKRICWYNNKYCNDLCKGLKADSGYCWGWTLS CYCQGLPDNARIKRSGRCRA

Group04

DBACC

D000827 VGCEECPAHCKGKNAKPTCDDGV CNCNV

D000828 VGCEECPAHCKGKNAIPTCDDGV CNCNV

190 Apendix 1

Chloridechanneltoxins Group01 Subgroup01a

DBACC

D000111 CGPCFTTDPYTESKCATCCGGRGKCVGPQ CLCNRI

D000171 CGPCFTKDPETEKKCATCCGGIGRCFGPQ CLCNRGY

D000238 MKFLYGIAFIAVFLTVMIVTDIEA CGPCFTTDHQTEQKCAECCGGIGKCYGPQ CLCRG

D000240 MKFLYGIAFIAVFLTVMIVTDIEA CGPCFTTDHQTEQKCAECCGGIGKCYGPQ CLCNR G

D000239 MKFLYGIAFIAVFLTVMIATHIEA CGPCFTTDRQMEQKCAECCGGIGKCYGPQ CLCRG

D000104 MCMPCFTTDPNMAKKCRDCCGGNGKCFGPQ CLCNR

D000206 MCMPCFTTDHNMAKKCRDCCGGNGKCFGPQ CLCNR

D000207 MCMPCFTTDPNMANKCRDCCGGGKKCFGPQ CLCNR

D000110 MCMPCFTTRPDMAQQCRACCKGRGKCFGPQ CLCGYD

D000154 MKFLYGIVFIALFLTVMFATQTDG CGPCFTTDANMARKCRECCGGIGKCFGPQ CLCNRI

D000615 MKFLYGIVFIALFLTVMFATQTDG CGPCFTTDANMARKCRECCGGNGKCFGPQ CLCNRE

D000199 RCKPCFTTDPQMSKKCADXCGGXKX

Subgroup01b

DBACC GenBank

D000105 RCSPCFTTDQQMTKKCYDCCGGKGKGKCYGPQCICAPY

D000109 S06667 RCKPCFTTDPQMSKKCADCCGGKGKGKCYGPQCLC

D000106 A48850 MCMPCFTTDHQMARKCDDCCGGKGRGKCYGPQCLCR

D000237 MKFLYGIVFIALFLTVMIATHTEA MCMPCFTTDHQMARKCDDCCGGKGRGKCYGPQCLCRG

D000205 P60268 MCMPCFTTDHQTARRCRDCCGGRGRKCFGQCLCGYD

D000140 RCGPCFTTDPQTQAKCSECCGRKGGVCKGPQCICGIQ

D000629 AF481880 MKFLYGTILIAFFLTVMIATHSEA RCPPCFTTNPNMEADCRKCCGGRGYCASYQCICPG G

Defensinscorpiontoxins Group01

DBACC GenBank

D000217 AJ292361 MNSKLTALIFLGLIAIAYC GWINEEKIQKKIDERMGNTVLGGMAKAIVHKMAKNEFQCMANMDMLGNCEK

HCQTSGEKGYCHGTK CKC GTPLSY

191 Apendix 1

ShortChainNeurotoxins Group01

DBACC GenBank

D000128 VTMGYIKDGDGKKIAKKKNKNGRKHVEIDLNKVG

Group02

DBACC GenBank

D000664 VSIGIKCDPSIDLCEGQCRIRYFTGYCSGDT CHC S

Group03

DBACC GenBank

D000667 AY147405 MKIFFAVLVILVLFSMLIWTAYG TPYPVNCKTDRDCVMCGLGISCKNGYCQSCTR

D000668 AY125328 MKIFFAVLVILVLFSMLIWTAYG TPYPVNCKTDRDCVMCGLGISCKNGYCQGCTR

Group04

DBACC GenBank

D000679 AF159979 MEIKYLLTVFLVLLIVSDHCQAFLFSLIPSAISGLISAFKGRRKRDLNG

Group05

DBACC GenBank

D000680 AF159977 ENLGEDCENLCKQQKATDGFCRQPH CFC TDMPDNYATRPDTVDPIM

Group06

DBACC

D000691 TVKCGGCNRKCCAGGCRSGKCINGKCQCY

D000692 TVKCGGCNRKCCPGGCRSGKCINGKCQCY

D000690 KPKCGLCRYRCCSGGCSSGKCVNGACDCS

192 Appendix 2

Appendix2

The20naturalaminoacidshavedifferentphysicochemical properties which includebothphysical(e.g.lengthofRgroupsidechain,branchorsinglechainetc.) andchemical(e.g.hydrophobicity,chargedoruncharged groups etc.) characteristics

(Figure 37 ). A single amino acid residue can be described by various overlapping physical ( Table 13 ) and chemical properties. Studying their physicochemical properties can help in analysis of structurefunction relationships, particularly in mutation studies. For example, neutralisation of K27 in Kaliotoxin resulted in low binding affinity towards K + channels which demonstrated that positive charged is importantfortoxinchannelinteraction(Aiyar et al. ,1995).

193 Appendix 2

Small P G A C S N V L T Q I D E W K M H Polar F Y R

Hydrophobic

Legend: Charged Aromatic Positive Tiny Negative Aliphatic

Figure37 Venndiagramdisplayingtheinterrelationshipsofthe20naturallyoccurring amino acids based on their physicochemical properties. Glycine whose sidechain groupisahydrogenatomcanfitintoeitherhydrophobicorhydrophilicenvironment. Prolineistheonlycyclicaminoacidwhichusuallycausekinksinpolypeptidechains. AdaptedfromTaylor,1986.

194 Appendix 2

Table13Physicalpropertiesofthe20Lαaminoacids.*alsocalledaspartaticacid,# alsoknownasglutamicacid,‡molecularweightsgivenarethoseoftheneutral,free aminoacids;residueweightsareobtainablebysubtractionofoneequivalentofwater molecule(18g/mol).§measurestherelativehydrophobicityamongaminoacidswhere positivevalueindicateshydrophobicitywhilenegativevaluerepresentshydrophilicity basedon(KyteandDoolittle,1982). Nameofα Symbol Mol. pI Hydropathy aminoacid 3Letter 1Letter Wt.‡ value Index§ Alanine Ala A 89.09 6.00 1.8 Arginine Arg R 174.20 11.15 4.5 Asparagine Asn N 132.12 5.41 3.5 Aspartate* Asp D 133.10 2.77 3.5 Cysteine Cys C 121.15 5.02 2.5 Glutamine Gln Q 146.15 5.65 3.5 Glutamate # Glu E 147.13 3.22 3.5 Glycine Gly G 75.07 5.97 0.4 Histidine His H 155.16 7.47 3.2 Isoleucine Ile I 131.18 5.94 4.5 Leucine Leu L 131.18 5.98 3.8 Lysine Lys K 146.19 9.59 3.9 Methionine Met M 149.21 5.74 1.9 Phenylalanine Phe F 165.19 5.48 2.8 Proline Pro P 115.13 6.30 1.6 Serine Ser S 105.09 5.68 0.8 Threonine Thr T 119.12 5.64 0.7 Tryptophan Trp W 204.23 5.89 0.9 Tyrosine Tyr Y 181.19 5.66 1.3 Valine Val V 117.15 5.96 4.2

195