Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 218

Cyclotides evolve

Studies on their natural distribution, structural diversity, and activity

SUNGKYU PARK

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6192 ISBN 978-91-554-9604-3 UPPSALA urn:nbn:se:uu:diva-292668 2016 Dissertation presented at Uppsala University to be publicly examined in B/C4:301, BMC, Husargatan 3, Uppsala, Friday, 10 June 2016 at 09:00 for the degree of Doctor of Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty examiner: Professor Mohamed Marahiel (Philipps-Universität Marburg).

Abstract Park, S. 2016. Cyclotides evolve. Studies on their natural distribution, structural diversity, and activity. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 218. 71 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9604-3.

The cyclotides are a family of naturally occurring peptides characterized by cyclic cystine knot (CCK) structural motif, which comprises a cyclic head-to-tail backbone featuring six conserved cysteine residues that form three disulfide bonds. This unique structural motif makes cyclotides exceptionally resistant to chemical, thermal and enzymatic degradation. They also exhibit a wide range of biological activities including insecticidal, cytotoxic, anti-HIV and antimicrobial effects. The cyclotides found in exhibit considerable sequence and structural diversity, which can be linked to their evolutionary history and that of their host plants. To clarify the evolutionary link between sequence diversity and the distribution of individual cyclotides across the genus , selected known cyclotides were classified using signature sequences within their precursor proteins. By mapping the classified sequences onto the phylogenetic system of Viola, we traced the flow of cyclotide genes over evolutionary history and were able to estimate the prevalence of cyclotides in this genus. In addition, the structural diversity of the cyclotides was related to specific features of the sequences of their precursor proteins, their evolutionary selection and expression levels. A number of studies have suggested that the biological activities of the cyclotides are due to their ability to interact with and disrupt biological membranes. To better explain this behavior, quantitative structure-activity relationship (QSAR) models were developed to link the cyclotides’ biological activities to the membrane-interactive physicochemical properties of their molecular surfaces. Both scalar quantities (such as molecular surface areas) and moments (such as the distributions of specific properties over the molecular surface) were systematically taken into account in the development of these models. This approach allows the physicochemical properties of cyclotides to be geometrically interpreted, facilitating the development of guidelines for drug design using cyclotide scaffolds. Finally, an optimized microwave-assisted Fmoc-SPSS procedure for the total synthesis of cyclotides was developed. Microwave irradiation is used to accelerate and improve all the key steps in cyclotide synthesis, including the assembly of the peptide backbone by Fmoc-SPPS, the cleavage of the protected peptide, and the introduction of a thioester at the C-terminal carboxylic acid to obtain the head-to-tail cyclized cyclotide backbone by native chemical ligation.

Keywords: cyclotide, cyclic cystine knot, evolution, peptide synthesis, chemical ligation, QSAR, Viola, , phylogeny

Sungkyu Park, Department of Medicinal Chemistry, Division of Pharmacognosy, Box 574, Uppsala University, SE-75123 Uppsala, Sweden.

© Sungkyu Park 2016

ISSN 1651-6192 ISBN 978-91-554-9604-3 urn:nbn:se:uu:diva-292668 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-292668)

Creativity is Just Connecting Things

Steve Jobs (1955- 2011)

Dedicated to numerous occasions in our history!

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Sungkyu Park, Ki-Oug Yoo, Erik Jacobsson, Pravech Aja- watanawong, Anders Backlund, Johan Rosengren, Inseok Doo, Ulf Göransson. (2016) Insight into cyclotide evolution from tran- scriptomic analyses and studies on their distribution in violets. Manuscript II Sungkyu Park, Adam Strömstedt, Ulf Göransson. (2014) Cyclo- tide structure-activity relationships: Qualitative and quantitative approaches linking cytotoxicity and anthelmintic activity to the clustering of physicochemical forces. PloS One. 9(3):e91430 III Adam Strömstedt, Sungkyu Park, Robert Burman, Ulf Görans- son. (2016) Exposing the true bacterial potency of cyclotides: ex- plained by lipid selectivity, structural characteristics and corre- lating antimicrobial activities. Manuscript IV Sungkyu Park, Gunasekera Sunithi, Teshome Aboye, Ulf Göransson. (2010) An efficient approach for the total synthesis of cyclotides by microwave assisted Fmoc-SPPS. International Journal of Peptide Research and Therapeutics, 16(3):167-176

Reprints were made with permission from the respective publishers.

Contents

1. Introduction ...... 11 1.1. The cyclic cystine knot (CCK): a unique structural motif ...... 11 1.2. All living organisms produce proteins, but what about proteins with CCK? ...... 11 1.3. Cyclotide precursors in flowering plants are very diverse ...... 13 1.4. Cyclotides are classified on the basis of their cyclotide domain sequences ...... 14 1.5. Cyclotides have diverse biological activities ...... 16 1.6. Pharmacognostic approach in cyclotide research ...... 16 2. Research aims ...... 18 3. Evolution and distribution of cyclotides in Violets (Paper I) ...... 19 3.1. Phylogenetic studies on the family Violaceae ...... 19 3.2. Previous studies of cyclotides on family Violaceae ...... 21 3.3. Morphological diversity of Violets ...... 23 3.4. Discovery of new cyclotide precursor sequences ...... 27 3.5. Nomenclature of cyclotides and precursors ...... 27 3.6. Classification of cyclotide precursors using their sequence signatures ...... 28 3.7. Distribution of cyclotides in Violets ...... 33 4. Structure-activity relationships of cyclotides (Paper II and III) ...... 35 4.1 Physicochemical properties and molecular descriptors ...... 35 4.2. Physicochemical properties as scalars and moments ...... 36 4.3. Lipophilic index values can be computed for unnatural amino acids...... 37 4.4. The exposed surface ratio measures the extent to which a residue’s physicochemical properties affect those of the peptide as a whole...... 38 4.5. Procedure for computing the electrostatic moment...... 41 4.6. Use of QSAR to analyze membrane activity and selectivity...... 44 5. Microwave-assisted total synthesis of cyclotides (Paper IV) ...... 47 6. Concluding remarks and future perspectives ...... 50 7. Popular scientific summary ...... 54

8. Acknowledgements ...... 57 9. References ...... 62 Appendix I ...... 70

Abbreviations

CCK cyclic cystine knot CD cyclotide domain cyO2 cycloviolacin O2 CTR C-terminal repeat DCM dichloromethane DMF N,N-dimethylformamide TFA trifluoroacetic acid ER endoplasmic reticulum Fmoc 9-fluorenylmethoxycarbonyl kB1 kalata B1 MS mass spectrometry NMR nuclear magnetic resonance NTPD N-terminal prodomain NTPP N-terminal propeptide NTR N-terminal repeat PDB protein data bank PE phosphatidylethanolamine PC phosphatidylchloline QSAR quantitative structure-activity relation- ship SASA solvent-accessible surface area SPPS solid phase peptide synthesis EM electrostatic moment/ HBD amphi- pathic moment ES positively charge surface area LM lipophilic moment LS lipophilic surface area + LS the exclusive lipophilicity

Side chains of amino acids and their physicochemical properties

The 20 naturally occurring amino acids are classified into five categories according to the physicochemical properties of their side chains: hydrophilic (~), hydrophobic ($), flexible (G), rigid (P), and disulfide-forming (C). The hydrophilic residues are further divided into positively charged (+), negatively charged (=) and uncharged (*) groups. Similarly, the hydrophobic residues ($) are divided into aromatic (#) and al- kyl (<) groups.

1. Introduction

1.1. The cyclic cystine knot (CCK): a unique structural motif The cyclotides are proteins of around 30 amino acids characterized by their common cyclic cystine knot (CCK) structural motif (Craik, Daly et al. 1999), which comprises a cyclic head-to-tail backbone featuring six conserved cys- teine residues that form three disulfide bonds (Figure 1).

Figure 1. Schematic representation of the three-dimensional structure of a cyclotide, showing the CCK framework. The cysteine residues are labelled with Roman numer- als. The cyclic cystine knot (CCK) motif is defined by the head-to-tail cyclic backbone and three disulfide bonds. The closure of the cyclic backbone is indicated by the dashed line linking the peptide’s C- and N-terminal residues (CE and NE) in loop 6. The cystine knot consists of the three disulfide bonds, which are shown in yellow. Two of these disulfide bridges (CI-IV and CII-V) and their connecting backbone seg- ments (loops 1 and 4) define an embedded ring within the peptide’s structure; the third disulfide bridge (CIII-VI) extends through this embedded ring.

1.2. All living organisms produce proteins, but what about proteins with CCK? All living organisms found in all kingdoms of life produce proteins (Figure 2). Cyclotides (i.e. proteins with the CCK motif) have only been identified in plants, to date. However, proteins having only one of the two defining ele- ments of the CCK motif – a cyclic backbone but no cystine knot (Trabi and Craik 2002) or a linear backbone with a cystine knot (Zhu, Darbon et al.

11 2003)—are found in a wide range of organisms distributed over all the king- doms of life.

Figure 2. Schematic phylogenetic tree showing the five kingdoms of life: bacteria (Monera), fungi, animals, plants and protists. The LUCA, or last universal common ancestor of life, is an unknown ancestral organism that had the biosynthetic pathways required to produce proteins using information encoded in genomic DNA.

The angiosperms, or flowering plants, are divided into three classes: the dicots, the monocots, and the (see e.g. Bremer B 2009). Several true cyclotides, i.e. proteins with head-to-tail cyclic backbones and a CCK motif, have been found in eudicots but not in monocots, and no genes encoding true cyclotides have been identified in any of the 11 whole genome sequences available for monocot (Figure 3) (Mulvenna, Mylne et al. 2006, Zhang, Hua et al. 2015). However, so-called linear cyclotides, which have acyclic backbones but do contain a cystine knot motif and extensive sequence homology with true cyclotides, are found in both eudicots and monocots.

Because of this distribution of linear and cyclic cyclotides among the angi- osperms, it has been suggested that the modern cyclotides (both linear and cyclic) evolved from linear ancestors (Mulvenna, Mylne et al. 2006, Gruber, Elliott et al. 2008). However, the evolutionary origin of the linear cyclotides remains unknown.

To date, cyclotides have been discovered in eudicot families including the Rubiaceae (coffee) (Gran 1973), Violaceae (violets) (Hashempour, Koehbach et al.), Fabaceae (legume) (Poth, Colgrave et al. 2011), Solanaceae (potato) (Poth, Mylne et al. 2012) and Cucurbitaceae (cucurbit) (Craik, Daly et al. 2004), and in the monocot family of the Poaceae (grass) (Nguyen, Lian et al. 2013). However, these proteins have not yet been detected in any basal angi- osperms.

12 Figure 3. Evolution and distribution of cyclotides in flowering plants.

1.3. Cyclotide precursors in flowering plants are very diverse Cyclotides are synthesized from precursor proteins that are expressed riboso- mally and then undergo post-translational processing involving enzymatic cleavage and cyclization of the cyclotide domain, which is partially mediated by asparaginyl endopeptidase (AEP) (Jennings, West et al. 2001, Mulvenna, Foley et al. 2005, Mylne, Colgrave et al. 2011, Harris, Durek et al. 2015). The precursor sequences are multi-domain proteins whose domain architecture varies between families and cyclotide types. However, from the N- to the C-terminus, they typically feature an endoplasmic reticulum (ER) target- ing signal, an N-terminal propeptide (NTPP), an N-terminal repeat (NTR), a cyclotide domain (CD), and a C-terminal repeat (CTR) (Figure 4). The ER and cyclotide domains generally exhibit substantial sequence homology be- tween plant families from taxonomically different orders. The sequence be- tween the ER and cyclotide domains is known as the N-terminal prodomain (NTPD), and is subdivided into N-terminal propeptide (NTPP) and N-terminal repeat (NTR) regions.

13

Figure 4. Typical architecture of a cyclotide precursor. A. The architecture of cyclo- tide precursors with (bottom) and without (top) multiple cyclotide domains. B. Se- quence alignment of selected full cyclotide precursor proteins from violets. Vok1 is precursor of kalata S (the first and third cyclotide domains) and kalata B1 (the sec- ond cyclotide domain) from Viola odorata. The VbCP 25 and VbCP 26 are precur- sor sequences of Viba 31 and Viba 32, respectively, from Viola baoshanensis.

1.4. Cyclotides are classified on the basis of their cyclotide domain sequences The cyclotides were initially classified into two main subfamilies on the basis of a single structural trait: the presence or absence of a conceptual 180° twist in the cyclic backbone caused by a conserved cis-Pro residue in loop 5 (Figure 5) (Craik, Daly et al. 1999). Cyclotides that contain this twist are referred to as Möbius cyclotides; those without it are referred to as bracelet cyclotides. Some loops (loops 1 and 4) have high sequence similarity between the sub- families while others (loops 2 and 3) are conserved only within individual subfamilies.

After the introduction of this structural classification system, many new cyclotides were discovered (e.g. kalata B8 as structural hybrid, Violacin A as linear, and MCoTI-I as trypsin inhibitor). Collectively these peptides exhibit substantial sequence diversity and some of them could be satisfactorily clas-

14 sified using the structural criterion outlined above. These unclassifiable pep- tides, which exhibit sequence characteristics of both the Möbius and bracelet subfamilies, were termed hybrid cyclotides (Daly, Clark et al. 2006). A third minor subfamily, known as the trypsin inhibitors (TI), has been discovered in gourd plants (Hernandez, Gagnon et al. 2000). These peptides contain the CCK motif but do not otherwise exhibit any sequence homology with the other subfamilies (Craik, Daly et al. 2004). In addition, linear cyclotide derivatives that exhibit substantial sequence homology with typical cyclotides but lack cyclic backbones have been reported (Ireland, Colgrave et al. 2006, Nguyen, Lian et al. 2013).

Figure 5. Classification of cyclotides on the basis of their structures and sequences. A. Multiple sequence alignment of diverse cyclotide sequences. B. Schematic repre- sentation of the bracelet and Möbius topologies. C. Schematic representation of the trans- and cis-Proline conformations in a tripeptide (G-P-A). In the peptide backbone, the ω angle may only take values of 180º (trans) or 0º (cis).

15 1.5. Cyclotides have diverse biological activities In plants, cyclotides appear to function as defense agents. Accordingly, they have been shown to exhibit insecticidal (Jennings, West et al. 2001), anthel- mintic (Colgrave, Kotze et al. 2008, Huang, Colgrave et al. 2010), antifouling (Göransson, Sjögren et al. 2004) and molluscicidal (Plan, Saska et al. 2008) activities. In addition, cyclotides with uterotonic (Gran, Sandberg et al. 2000), antineurotensin (Witherup, Bogusky et al. 1994), antibacterial (Tam, Lu et al. 1999, Ovesen, Brandt et al. 2011), anti-HIV (Gustafson, Sowder et al. 1994), anticancer (Lindholm, Göransson et al. 2002) and immunosuppressive (Grundemann, Koehbach et al. 2012) activities have been discovered. Multi- ple studies have suggested that these diverse activities have a common mech- anistic origin, relating to the cyclotides’ ability to interact with and disrupt biological membranes (Henriques, Huang et al. , Simonsen, Sando et al. 2008, Huang, Colgrave et al. 2010, Burman, Herrmann et al. 2011).

1.6. Pharmacognostic approach in cyclotide research Pharmacognosy focuses on the identification and development of medicines from natural sources (Larsson, Backlund et al. 2008). The word ‘pharmacog- nosy’ literally means the knowledge of drugs (two Greek words; pharmaknon for drug and gnosis for knowledge). The more modernized definition is ‘mo- lecular science that explores naturally occurring structure-activity relation- ships with drug potential’. Since then, the meaning of pharmacognosy has evolved into an interdisciplinary field of science that spans a wide range of subjects. A pharmacognostic research model was recently proposed as Figure 6.

Cyclotide was discovered by the traditional pharmacognostic strategies. In 1965, Finn Sandberg, professor in pharmacognosy at Uppsala University, firstly reported the traditional use of Oldenlandia affinis in the Central African Republic (Sandberg 1965). The local women ingested that plant as herbal in- fusion to facilitate childbirth. Some years after, one of the cyclotides, kalata B1, was first discovered as a main active component by Dr. Loren Gran (Gran 1973, Sletten and Gran 1973). He brought that plant from Congo to his home country, Norway, and discovered the cyclotide protein sequence. The name ‘kalata B1’ was after the native name of the plant ‘Kalata-Kalata’.

16

Figure 6. Model illustrating the interdisciplinary nature of pharmacognostic re- search. The work presented in this thesis was conducted between the organism, data, chemical structure and biological activity cornerstones. Figure reprinted with per- mission from the author (Larsson, Backlund et al. 2008).

Afterwards, the cyclotide research has been extended by other research fields. The structure of kalata B1 was firstly found by nuclear magnetic reso- nance (NMR) analysis (Saether, Craik et al. 1995). Until then, nothing had been known about the presence of cyclic backbone in cyclotides. Around this time, three different groups independently reported the occurrence of cyclo- tides and from different plant origins and their different biological activities, i.e. hemolytic violapeptide I from Viola arvensis (Schöpke, Hasan et al. 1993), the HIV-inhibitory circulins from Chassalia parvifolia (Gustafson, Sowder et al. 1994) and neurotensin binding inhibitor circulins from Psychotria longipes (Witherup, Bogusky et al. 1994). The number of proteins with CCK structural motif grew over following years. Consequently, the name ‘cyclotides’ was suggested after cyclo-peptides as a collective name for the protein with CCK structural motif (Craik, Daly et al. 1999).

17 2. Research aims

The work presented in this thesis was part of a research project concerning plant proteins focusing on the disulfide-rich small proteins at the Division of Pharmacognosy, Department of Medicinal Chemistry, Uppsala University. The ultimate long-term aims of the project are to find the overall distribution of disulfide-rich small proteins in nature, and understand their mechanism of action in biological activity, and explore them for drug development.

The objectives of this thesis were:

1. To explain the evolution of the cyclotides by considering their natural distribution together with their sequence and structural variation (Paper I). To this end, the cyclotides were classified on the basis of the full sequences of their precursor proteins and their domain architecture, and the classified groups were analyzed in relation to their structural traits. To explain the selec- tive forces whose effects have produced the cyclotides we see today, the struc- tural properties of the cyclotides were analyzed in relation to the sizes of the classified groups and their expression levels. Finally, to estimate the distribu- tion of cyclotides in the genus Viola, we traced the flow of cyclotide genes through its phylogenetic tree.

2. To develop a qualitative structure-activity relationship (QSAR) model for the cyclotides in order to facilitate future drug development (Paper II), and to use molecular descriptors to explain the general membrane-binding activity and selectivity of the cyclotides (Paper III). To this end, the activity of cyclo- tides was explained using geometrically interpretable molecular descriptors that could be used to draw up guidelines for drug design, for example by high- lighting specific residues and physicochemical properties that could be modi- fied to enhance activity. In addition, the application of these simple molecular descriptors to non-natural amino acids was examined.

3. To develop an optimized protocol for the total synthesis of cyclotides using solid phase peptide chemistry (SPPS) (Paper IV). This involved opti- mizing microwave-assisted reactions for Fmoc-SPPS of the peptide backbone, the cleavage of the protected peptide, and the introduction of a thioester at the C-terminal carboxylic acid to obtain the head-to-tail cyclized cyclotide back- bone by native chemical ligation.

18 3. Evolution and distribution of cyclotides in Violets

Paper I explains the evolution of the cyclotides on the basis of their natural distribution across the Violaceae and their sequence and structural diversity. In this study, the family Violaceae and genus Viola were chosen as ideal model to explain cyclotide evolution with the following reasons: among the flower- ing plants, a large number of cyclotide precursor sequences have been de- scribed from Viola, and the phylogenetic system is currently well described. To find the evolutionary link between sequence diversity and their distribution across Violaceae, cyclotides were firstly classified according to the sequences of their precursor proteins. And then, the classified sequences were mapped onto the phylogenetic tree of the Violaceae in order to trace the flow of cyclo- tide genes and estimate their distribution within this family.

3.1. Phylogenetic studies on the family Violaceae The Violaceae are a medium-sized family including about 22 genera and 1,000– 1,100 species worldwide (Tokuoka 2008, Wahlert, Marcussen et al. 2014). Three of these 22 genera–Viola, Rinorea, and Hybanthus–account for 98% of the species in the family. Viola is the largest genus in the family, hav- ing 580– 620 species in total (Ballard, Sytsma et al. 1998, Yockteng 2003, Marcussen, Jakobsen et al. 2012). Representatives of this genus are distributed worldwide in the temperate regions and montane habitats in the tropics.

There are well-established phylogenetic systems for the Violaceae family, and especially for the genus Viola. In particular, there is an infrageneric sys- tem describing the phylogenetic relationships within the genus and dividing it into sections, series, and species. The phylogenetic study of the Violaceae be- gan in the 19th century, and was initially based on the analysis of morpholog- ical characteristics (Gingins 1823, Becker 1925, Clausen 1927, Clausen 1931, Clausen 1964). Subsequently, its phylogeny was refined using molecular phy- logenetic techniques (Ballard, Sytsma et al. 1998, Marcussen, Heier et al. 2015).

19 Gingins originally divided Viola into five sections based on the morphol- ogy of the stigma (Gingins 1823). Becker later divided Viola into 14 sections, 28 subsections, and 7 series (Becker 1925). Becker’s classification system was extensively revised by Clausen to account for color, pistil, shape, chro- mosome number, and geographical distribution (Clausen 1927, Clausen 1931, Clausen 1964). Afterwards, Ballard proposed eight sections based on mor- phology, natural hybridization, chromosome numbers, and the disposition of previously recognized infrageneric groups and unassigned species (Ballard 1996). Recent advances suggest at least 17 phylogenetic lineages referable to sections (Fan, Chen et al. 2015, Marcussen, Heier et al. 2015).

A molecular phylogenetic study was performed using internal transcribed spacer (ITS) DNA sequences for 44 taxa representing many infrageneric groups (Ballard, Sytsma et al. 1998). Recently, a species-level phylogeny for Viola was reconstructed, inferring at least 16-21 events of allopolyploidisation in the history of the genus (Marcussen, Heier et al. 2015) (Figure 7). Hence the species-level phylogeny of Viola is really a network and not a tree as indi- cated by ITS alone. As will be shown late, this is reflected in the distribution of cyclotides. All in all, phylogeny conforms well with the delimitation of sections, many of which were based solely on morphology and chromosome numbers. However, morphology is too variable to be used to infer relation- ships above section level.

Figure 7. Most parsimonious HOLM network for the 16 provisional sections of Viola. Around 600 Viola species are currently known to exist; they are classified into 16 sections. All of these species originated from a single ancestral species via processes of allopolyploid speciation that began around 31 Mya. Reprinted with the journal’s permission (Marcussen, Heier et al. 2015).

20 3.2. Previous studies of cyclotides on family Violaceae Typically, cyclotide sequences from plants have been identified by protein se- quencing (Schöpke, Hasan et al. 1993, Claeson, Göransson et al. 1998, Göransson, Luijendijk et al. 1999) and by analysis of cDNA (Simonsen, Sando et al. 2005, Burman, Gruber et al. 2010). However, despite many attempts, it has not been possible to isolate proteins corresponding to the intact precursor sequences. Currently, with development of next generation sequencing (NGS) technology, the first transcriptome–all mRNA sequences from a specific or- ganism–have been sequenced. The presence and sequence of the precursor proteins is thus determined by examining the expression of their mRNA se- quences in planta. Sequences are considered to be confirmed at the protein level and the mRNA level if they are based on protein sequencing and tran- scriptome analysis, respectively.

Protein-level analysis commonly involves using LC-MS to screen the dis- tribution of cyclotides in plants. Cyclotides’ LC-MS elution times and molec- ular masses are often indicative of their structure: Mӧbius and bracelet cyclo- tides elute late from hydrophobic reversed-phase C18 columns because their molecular surfaces have many exposed hydrophobic residues. Conversely, hy- brid cyclotides elute early because they lack such hydrophobic residues. In addition, the molecular mass provides information on the peptide’s approxi- mate length.

Recently, Burman et al performed the protein-level cyclotide screening from majority genera belong to family Violaceae, representing 1/6 of all spe- cies (Figure 8) (Burman, Yeshak et al. 2015). In that study, the occurrences of cyclotides were mapped with their distribution in Violaceae on a large scale. The prevalence of cyclotides in all species belong to Violaceae was concluded with large structural diversity via chemical modifications (e.g. glycosylation).

Cyclotides have been detected in every Viola species analyzed to date. Pro- tein-level analyses typically reveal the presence of 20-40 different cyclotides in individual Viola species (Figure 9), while transcriptome-level analyses gen- erally reveal 10-100 cyclotides; this large variation could be due to experi- mental factors such as DNA degradation.

21

Figure 8. Cyclotide screening in Violaceae. A. Classification of Violaceae in accord- ance with the phylogenetic system. The genera containing isolated cyclotides are in- dicated by bullet points. B. Geographical origins of herbarium specimens sampled. Redrawn from (Burman, Yeshak et al. 2015).

22

Figure 9. LC-MS chromatogram and fingerprint of Viola mandshurica and Viola ori- entalis. The labels ‘H’, ‘M’, and ‘L’ indicate high, moderate, and low abundance, respectively. Compounds from plant extracts elute from the hydrophobic column at different times and are detected by MS. LC-MS fingerprints are then generated from the eluents’ retention times and signal intensities. To remove non-cyclotides, the plant extracts were reduced and alkylated; as such peptide containing three disulfide bonds can be identified. The fingerprints compare LC-MS results for untreated plant extracts to those obtained after alkylation. In total, 41 and 24 cyclotides were found in V. mandshurica and V. orientalis, respectively.

3.3. Morphological diversity of Violets ‘Violets’ is the common name for all species belonging to the large genus Viola, whose members exhibit substantial morphological diversity. In general, Viola species belonging to the same infrageneric section have several mor- phological characters in common, notably in the differentiation of the growth axis, style shape, flower color and base chromosome number (x). For instance, the Viola species belonging to the Chamaemelanium section are mostly rhi- zomatous with lateral aerial stems, yellow-flowered with a capitate and papil- lose style, and with x=6 chromosomes. Those belonging to the Plagiostigma section are mostly rhizomatous with or without lateral stolons (“stemless”), white- or blue-flowered with a glabrous and laterally-distally edged style, and with x=12. Those from the Melanium section are rhizomatous and apically stemmed with large stipules, multicolored flowers with a papillose style and a wide range of chromosome numbers from x=2 to 64. Species of section Viola are mostly rhizomatous with or without lateral aerial stems or stolons, blue- flowered with a non-edged or papillose style, with x=10. Furthermore, most

23 Viola species are rhizomatous with herbaceous stems but some sections have woody stems (Ballard and Sytsma 2000).

The Viola species examined in this work are listed along with their infra- generic sections in Table 1. Viola is the largest genus within the Violaceae family, and its substantial genetic diversity is reflected in both morphological diversity and the diversity of its cyclotide sequences. In general, speciation is a consequence of genomic changes. As part of the gene pool of the Violaceae, the cyclotide-encoding genes will have evolved along with the speciation of their host plants.

Paper I presents a transcriptomic analysis of five Viola species: V. acu- minata (sect. Viola), V. verecunda, V. albida var. takahashii and V. mands- hurica (sect. Plagiostigma), and V. orientalis (sect. Chamaemelanium). The species names of these violets are derived from either their morphology or the location in which they were first discovered. Thus, the name ‘acuminata’ re- flects this species’ sharply angled leaves, while V. albida was named for its white . Becker discovered V. mandshurica in Manchuria (满洲) in China, and V. orientalis is prevalent in the Eastern world.

The morphological features of the leaves of V. albida var. takahashii have been analyzed rigorously to determine its place in the phylogeny of the violets. It was suggested to be a natural hybrid between V. chaerophylloide and V. albida because its leaves exhibit characteristic features of both species (Kim 1986). However, this proposed hybridization was revised because their leaf shapes, independent endemic and high fertility (Jang 2012). Those three spe- cies have large variations in their leaf shapes, and form complex (Figure 10).

Some of the Viola look similar, but they actually are different species. For example, two, V. mandshurica and V. yedoenesis, are clearly distinguishable species (Yoo and Jang 2013). The identification with the winged petiole alone often could be misleading in their identification. It is because they both have winged petioles, and the petioles have the morphological variation during the cleistogamic/chasmogamic periods in Violaceae family. However, there are two distinctive differences based on the presence of hairs on 1) the lateral se- pal and 2) leaves and petiole (Figure 10). In contrast to V. yedoensis, V. mands- hurica does have hair on the lateral sepal (flowers), and which is most clearly detectable. Also, whereas V. mandshurica does not display hairs on the leaves and petiole, V. yedoensis contains hairs regularly distributed on the counter- parts.

Interestingly, those two Viola species have been used as medicinal plants in China and . The use of those Viola is described in Dongui Bogam (東 醫寶鑑) (Heo 1610) and Bencao Gangmu (本草綱目) (Li 1593). Jahwajijung (紫

24 花地丁), the name of Viola species, is stemmed from their sepal color (purple; 紫花), and the shape of the root and its strongly anchoring character on soil in analogy of nail (地丁). In the oriental medicine, its taste and the pharmacolog- ical characters (性味) have been studied with bitter, pungent and cold (苦辛 and 寒·無毒). As medicinal plants, it has been used for alleviation of fever (淸 熱利濕), detoxifying and resolvent for obstinate swelling (解毒消腫). Also, it has been used to treat jaundice (黃疸), skin rash (疔瘡), pink eye (目赤), shigel- losis (痢疾), mycobacterial cervical lymphadenitis; scrofula (瘰癧), vomiting/ diarrhea (復瀉), venomous snake bites (毒蛇咬傷) and pressure ulcer/sore (背 發無名諸腫).

Viola orientalis belongs to the diploid (x=6) section Chamaemelanium whose ancestor, by repeated hybridization with the now-extinct MELVIO lin- eage and following polyploidisation, gave rise to the remaining north-temper- ate lineages of the genus Viola. The South American sections Andinium and Leptidium are successive sister lineages to the rest of the genus Viola. Sect. Andinium are mainly Andine rosette plants with leaves in many rows, multi- colored flowers, a style with lateral and apical protrusions (“crest”), and x=7. Leptidium are subshrubs, many with nectar-less pollen-flowers (Freitas and Sazima 2003), variously colored, and a simple tapering style, and probably x=14 (Marcussen, Heier et al. 2015).

25

Figure 10. Morphological diversity in Viola. (Upper) a. Viola orientalis, b. Viola accuminata, c. Viola mandshurica, d. V. albida var. takahashii and e. Viola vere- cunda. f. Viola coronifera. (Middle) a-f. Morphological variations of leaf in Viola albida var. takahashii. (Lower) Morphological differences between Viola yedoensis (a-d) Viola mandshurica (e-h). a. and e. whole plants, b. and f. flowers, c. and g. leaves, d. and h. peduncles. The figure was reprinted with the authors’ permissions: the picture f. (upper) was from Watson (Watson and Flores 2011) and the rest pictures were from Yoo (Yoo and Jang 2013).

26 3.4. Discovery of new cyclotide precursor sequences Paper I reported the discovery of 158 new cyclotide precursor sequences (Ta- ble 1). Of these, 138 were discovered during a transcriptomic analysis of the five Viola species described above (V. albida var. takahashii, V. mandshurica, V. orientalis, V. verecunda and V. acuminata), while the remaining 20 were obtained by combining the transcriptomic data generated during this study with transcriptome data for V. canadensis from the 1kp-project (www.on- ekp.com).

3.5. Nomenclature of cyclotides and precursors Each of the discovered cyclotides was assigned a three-component name (Ta- ble 1). The first component is derived from the binomial Latin name of the host plant species in which the corresponding precursor sequence was discov- ered, and can be valta (Viola albida var. takahashii), viman (Viola mands- hurica), vorie (Viola orientalis), viver (Viola verecunda), or vacum (Viola acuminata). The third component of the name specifies the molecular species of the cyclotide in question, and the second is a number specifying which of the cyclotides having that molecular species and host species is referred to. Thus, the name vacum2-HS4 refers to the second cyclotide from V. acuminata whose molecular species is HS4. The molecular species are named after their NTR signature sequences, i.e. the residues present at positions -9 and -8 in the sequence of the cyclotide precursor. Three-component names were also as- signed to the cyclotide precursor sequences, using a similar system: the first component is the same as in the cyclotides, the second is the sequence’s nu- merical rank (which is independent of that for the corresponding cyclotide), and the third specifies the precursor’s molecular species.

This naming convention is based on that used in previous publications, and in all cases where a precursor had previously been assigned a name, that name was retained. However, the prefix “prc-” was added to the names of precursor sequences that had not previously been named. Thus, prc-viul A is the name of the precursor of the viul A cyclotide from Viola uliginosa.

27 Table 1. Nomenclature of cyclotides and precursors.

Precursor name Cyclotide Species Sections name vima Viman Viola mandshurica W.Becker [37] a, PLA valt valta Viola albida Palib. var. takahashii (Nakai) PLA Kitag. [29] a, vive viver Viola verecunda A.Gray [23] a, PLA VbCP Viba Viola baoshanensis W.S.Shu, W.Liu & PLA C.Y.Lan [42] b voc vodo Viola odorata L. [5] c, VIO prc-Viul Viul Viola uliginosa Bess. [12] d VIO vacu vacum Viola acuminata Ledeb. [23] a VIO Vaf, Val N.A. Viola adunca Sm. [2] e VIO prc-vitri, prc-tri- vitri, tri- L. [44] f MEL cyclon cyclon vori vorie Viola orientalis W.Becker [26] a, CHA Vbc vibi Viola biflora L. [6]g, CHA vica vican Viola canadensis L. [19] f CHA prc-Vpf, prc-Vpl Vpf, Vpl Viola pinetorum Greene [2] e CHA The numbers in square parentheses after the names of the plant species indicate the number of cyclotide precursor sequences derived from that species. Viola sections are denoted by the first three letters of their names: Plagiostigma by PLA, Chamaemelanium by CHA, Viola by VIO, and Melanium by MEL. Within sect. Plagiostigma, V. mandshurica and V. albida belong in subsect. Patellares (Boiss.) Rouy & Foucaud, V. verecunda in subsect. Bilobatae (W.Becker) W. Becker and V. baoshanensis in subsect. Diffusae (W.Becker) Chang. Within sect. Viola, V. odorata belongs to subsect. Viola and V. acuminata, V. adunca, and V. uliginosa in subsect. Rostratae (W.Becker) W. Becker. No subsections are applicable to sect. Chamaemelanium. a Species whose transcriptomes were analyzed in Paper I are V. mandshurica, V. albida for. takahashii, V. verecunda, V. acuminata and V. orientalis; the names of these species are shown in bold; b V. baoshanensis (Zhang, Liao et al. 2009, Zhang, Li et al. 2015); c V. odorata (Dutton, Renda et al. 2004, Ireland, Colgrave et al. 2006); d V. uliginosa (Slazak, Jacobsson et al. 2015); eV. pinetorum and V. adunca (Kaas and Craik 2010). f V. tricolor (Mulvenna, Sando et al. 2005, Hellinger, Koehbach et al. 2015); g V. biflora (Herrmann, Burman et al. 2008); f The transcrip- tome data for V. canadensis were obtained from the 1kp project (www.onekp.com), and the corresponding cyclotide precursor sequences were determined in this work.

3.6. Classification of cyclotide precursors using their sequence signatures A sequence signature is defined as a region in an alignment or group of align- ments where a specific sequence pattern is observed in the protein sequences from all members of one or more taxa (organisms) but not in other taxa (Gupta 1998). A pattern may be a particular substitution or a set of specific insertions or deletions (i.e. indels). To date, protein sequence signatures have

28 mainly been used to deduce phylogenetic relationships among distantly re- lated organisms. In general, the sequence signature approach has the ad- vantage of not being subject to the limitations of the methods and assumptions used to generate phylogenetic trees. For example, it is not sensitive to the re- liability of sequence alignment, differences in evolutionary rates, or the deci- sion to include or exclude specific sequence regions from the phylogenetic analysis.

In contrast to the protein sequences used as phylogenetic markers, individ- ual Viola species have many (between 20 and 100) cyclotide precursor se- quences, which vary quite widely. In paper I, an approach based on signature sequences was used to classify the cyclotide precursors in terms of the cyclo- tides’ structural traits (Figure 11). As noted in the preceding sections, the first structural classification of the cyclotides was based on the twist of the cyclic backbone, i.e. presence or absence of a cis-Pro residue in loop 5 (Craik, Daly et al. 1999). However, many other cyclotides were subsequently discovered that cannot be classified in this way, such as structural hybrid and linear cy- clotides. A more versatile system of classification was there-fore developed based on the full sequences of the cyclotides’ precursor proteins. It was as- sumed that the cyclotide domain sequences have undergone evolutionary changes together with other domains in their precursor sequences, so the evo- lutionary relationships between cyclotides can be evaluated more accurately by considering the precursor sequences in their entirety than by focusing ex- clusively on the cyclotide domains.

We therefore classified the precursor sequences using sequence signatures from the prodomain comprising the NTPP and NTR domains. The prodomain contains a large indel in the NTPP region, located between positions -56 and -38 in the up-stream region in the consensus sequence of the precursors, where the N-terminal cleavage site of the cyclotide domain is defined as position zero. This indel coincides with the peptides’ structural classification as Mö- bius or bracelet cyclotides: the insertion region was found in the precursor sequences of prototypical bracelet cyclotides, and the deletion region in those of prototypical Möbius cyclotides (i.e. cyO2 and kB1 from the bracelet and Möbius structural subfamilies, respectively). This correlation was also ob- served in other cyclotide precursor sequences whose cyclotide domains exhib- ited high sequence similarity with the prototypical members of those classes. Based on these observations, it was suggested that this indel in the NTPP do- main can be used to classify cyclotide precursor sequences into the Möbius or bracelet lineages.

Within each of these lineages, it was found that the sequence signatures of both the NTPP and NTR can be related to sequence traits that determine which

29 structural subfamily (i.e. linear or cyclic, and prototypical or hybrid) the cor- responding cyclotide belongs to (Figure 11). Moreover, within each structural subfamily, it was found that the cyclotide precursors could be classified on the basis of their sequence homology in the NTPP and NTR domains. That is to say, minor sequence variations within each structural subfamily could be re- lated to the sequence homology in these two regions. This finding was used to classify the precursors within each lineage into two additional taxonomic or- ders, named the molecular series and molecular species. Precursors exhibiting sequence homology in both the NTPP and NTR domains were assigned the same molecular species (Figure 12), while precursors having the same se- quence signatures in the NTR domain alone were assigned to the same molec- ular series.

30

Figure 11. Classification of precursors as lineages in relation to their structural subfamilies. In the indel region of the NTPP [-56, -38], the Möbius lineage contains defining sequence gaps at positions [-56, -54] and [-50, -38]. Within a lineage, the sequence-traits of the prodomain are associated with specific structural subfamilies: in the Möbius lineage, the sequence (Y^/F^/H)-9-(S^/A/Y)-8 at positions [-9,-8] is associated with cyclic cyclotide precursors (where ‘^’ indicates high occurrence of the corresponding residue), and the sequence Y-9-Y-8 is associated with linear cyclotide precursors (denoted ‘Li’). In addition, a sequence insertion at positions [-32, -31] occurs in the structural hybrids (CyH), whereas there is a deletion at this position in the prototypical Möbius cyclotide precursors (CyP). Within the bracelet lineage, the sequence (H^/N^/S/T/G/K/P)-9-(L/N/S/F/A)-8 is associated with cyclic cyclotide precursors (Cy), and (Q/E/P/K)-9-(D/N)-8 is associated with linear cyclotide precursors (Li). In addition, linear cyclotide precursors contain a distinctive signature sequence in the NTPP, namely sequence insertions at positions [-49, -48] and [-39] of the form (P^/A/L)-49-(N^/A)-48 and (D/E)-39, together with a deletion between these two insertions.

Figure 12. Classification of precursors into two molecular series (NL and GA), which are further classified into molecular species. The molecular species associated with the NL molecular series are NL2, NL3 and KL1; those associated with the GA series are GA1, GA2 and GP1. Within a given molecular se- ries, the sequences at positions [-56, -38] of the NTPP exhibit strong similarity. However, the sequences in this region differ between molecular series. Red arrows indicate sequence differences between molecular species within the same molecular series. The molecular species are named after their characteristic NTR sequences at positions [-9,-8].

3.7. Distribution of cyclotides in Violets Previously reported studies had only examined cyclotide precursors from four sections of the genus Viola (Herrmann, Burman et al. 2008, Hellinger, Koehbach et al. 2015, Slazak, Jacobsson et al. 2015, Zhang, Li et al. 2015). Paper I integrated data for all of the previously reported precursor sequences and analyzed the distribution of the precursors in relation to infrageneric sec- tions. This revealed that many molecular species are evenly or sporadically distributed across these four sections (Figure 13). The distribution of molecu- lar species among these sections appears to be related to the gene flow of cy- clotide precursors, mediated by the sections’ speciation. For example, molec- ular species found in all four sections were presumably formed prior to the speciation of the common ancestor. Alternatively, they could have been formed after the common ancestor’s speciation and then flowed into each sec- tion separately by hybridization during the genus’ early diversification. How- ever, the presence of molecular species that are specific to different sections demonstrates that speciation is not exclusively due to hybridization.

The genus Viola currently includes approximately 600 species. A huge number of cyclotide precursors (~60000 in total), corresponding to 100 per individual species (Hellinger, Koehbach et al. 2015), is estimated to be dis- tributed across the genus. However, around 31 Mya, the genus was only rep- resented by one species, whose gene pool contained only one suite of cyclotide genes. The vast multiplicity of cyclotide precursor genes found today must have been derived from this early common ancestor. The sequence variation between molecular species may have resulted from genomic changes associ- ated with the speciation of the ancestral species into the 600 species known today.

Paper I explains the cyclotide distribution across four sections of the Viola, which collectively account for 61-65% of all known Viola species. Further insights into the distribution of cyclotides across the genus will be obtained in the near future by studying the distribution of cyclotides in other sections (e.g. Rubellium and Andinium) that diverged from the four studied herein at a rela- tively early stage in the evolution of Viola.

33

Figure 13. Distribution of molecular species across individual Viola sections, and their phylogenetic relationships. A. Proportion of molecular species found across the different sections within Viola. The four different sections are indicated and the per- centages indicated the proportion of molecular species found in each section. The outer rings show results obtained by only counting complete sequences, while the in- ner ring show results based on both complete and partial sequences. The left-hand figure shows results calculated based on numbers of precursor sequences, and the right-hand figure shows results based on the numbers of molecular species. B. Phylo- genetic relationships between sections and genera of Violaceae used in the analyses conducted within this work. Viola Sections are abbreviated as follows: Plagiostigma as PLA, Viola as VIO, Melanium as MEL, Chamaemelanium as CHA, Rubellium as RUB, and Andinium as AND. Dotted lines indicate hybridization between sections. Genera and sections studied using transcriptomic methods are indicated with asterisks (*). The total number of species within Viola is estimated to be 580-620, most of which (61-65%) belong to these four sections. The first ancestor of the genus (α) is dated to 31 Mya, and the common ancestor of the four studied sections (β) is dated to 24 Mya (Marcussen et al., 2015). The phylogeny of Viola is based on the work of Marcussen et al. (2010, 2015) and that of Violaceae is based on the work of Wahlert et al. (2014).

34 4. Structure-activity relationships of cyclotides (Paper II and III)

It appears clear that cyclotides are involved in host defence, and that mem- brane interaction is one of the mechanism involved. Paper II explains the membrane activity of cyclotides using geometrically interpretable descriptors. In order to this, the key physicochemical properties–the lipophilic and elec- trostatic surfaces– are systematized into scalars (surface area) and moments (the distribution of the surfaces over the molecular surface). Furthermore, we analyzed individual residues’ structural contribution to the physiochemical properties of the protein as a whole using ‘exposed surface ratio of each resi- due’. Also, such systematic approach was extended to explain the membrane selectivity in paper III.

4.1 Physicochemical properties and molecular descriptors As of the time of writing, over 200 molecular descriptors have been developed (MOE). However, these descriptors are all based on a much more limited set of fundamental physicochemical properties. Physicochemical properties are extensions of well-known physical and chemical properties.

i) The physical properties relevant to the development of molecular de- scriptors are derived from the distribution of atomic mass within the molecular structure (i.e. atomic coordinates and connectivity). The molecular structure defines the molecule’s geometric shape (and hence quantities such as the mo- lecular surface area and volume), the atomic connectivity determines molec- ular flexibility (Lowell H. Hall 1991), and the mass distribution within the molecular structure determines the radius of gyration (Lobanov, Bogatyreva et al. 2008) and eccentricity (Taylor and Aszodi 2004).

ii) The chemical properties can be divided into two classes: hydrophobic and hydrophilic. The hydrophobic properties are then further subcategorized into negative/positive charge and uncharged polarity. These properties are de- fined by the molecule’s electronic distribution and provide information on the

35 attractive or repulsive force between molecules as well as the driving forces of chemical reactions.

Physical and chemical properties are two sides of the same coin when it comes to describing molecules. In practical terms, chemical properties are measured by studying physical properties, and analyses of physicochemical properties provide useful descriptions of molecular interactions. For example, the hydrophobic surface area of the molecule provides information on both hydrophobic interactions as a type of chemical force and the hydrophobic “strength” of the molecule.

Lipophilicity is a physicochemical property that is related to hydrophobi- city and polarity (van de Waterbeemd, Karajiannis et al. 1994). It is deter- mined by measuring a compound’s octanol/water partition coefficient (logP), and reflects the compound’s preference for octanol over water. A typical membrane lipid consists of two fragments: a long hydrophobic (non-polar) tail and a polar hydrophilic head. Taking octanol as a representative lipid, the tail would be the octyl chain, and the head the hydroxyl group. The partitioning between water and octanol is described by the equation logP = a∙V +Λ, where V is a volume-related term describing the molecule’s expulsion from the aque- ous environment and the Λ term represents the sum of all the polar interactions affecting the partition coefficients (‘a’ is a positive constant). Thus, the strength of the hydrophobic interactions is proportional to ‘V’. Because logP has a positive linear correlation with the volume parameter of nonpolar groups, the side chains of amino acids with bulky n-alkyl chains such as Ile and Val display high lipophilicity and hydrophobicity. In contrast, aryl side chains with pi-conjugation such as Trp and Tyr exhibit high lipophilicity but only intermediate hydrophobicity.

4.2. Physicochemical properties as scalars and moments A given physicochemical property (e.g. the molecular surface area that exhib- its a particular chemical property) can be measured either as a scalar quantity having only a magnitude (which in this case would just be the area) or as a moment having both magnitude and direction (the direction in this case being the offset of the area from the center of the molecule).

These scalar and moment quantities are independent properties (Figure 14): if the surface area showing the relevant physicochemical property doubles, the magnitude of the moment may be doubled, halved, or unchanged, depending on the distribution of the ‘new’ area on the molecular surface.

36

Figure 14. Examples showing that doubling the (scalar) surface area showing a phys- icochemical property of interest (represented by the gray area) can either double (B) or halve (C) the corresponding moment.

Paper II systematically characterizes key physicochemical properties of cy- clotides – their lipophilic, hydrophobic and positively charged surface areas – in terms of both the sum of areas on their surfaces exhibiting the selected properties, and the distribution of those areas on the molecular surface relative to the molecule’s center (i.e. the moments).

4.3. Lipophilic index values can be computed for unnatural amino acids. Because some of the cyclotides examined in Paper II contain chemically mod- ified amino acids, it was necessary to create a lipophilicity scale that can de- scribe such residues. The lipophilicity of the side chains was therefore pre- dicted using their logP values, and it was shown that the developed approach can in principle be applied to any unnatural amino acid (Appendix I).

37 4.4. The exposed surface ratio measures the extent to which a residue’s physicochemical properties affect those of the peptide as a whole. Paper III introduces the concept of the exposed surface ratio of residues’ side- chains, and its effects on protein structure. This quantity is defined as the ratio of the exposure of the side chains, i.e. their solvent-accessible surface area (SASA) in the cyclotide structure, to that of a tripeptide (G-X-G) in an ex- tended conformation (φ = ψ = 180°). The exposed SASA of the amino acid (X) in G-X-G is taken as a reference, and it is assumed that the exposed surface ratio is higher than 1.0 when the side-chain is oriented towards the exterior of the peptide in convex loops. However, depending on the size of the residues in its vicinity, the exposed surface ratio of a given residue may be less than 1 even if it is in a convex loop.

The exposed surface ratio is a useful concept because it relates to the extent to which the physicochemical properties of each individual residue affect those of the protein as a whole, and can be used to evaluate the effects of conformational changes in individual residues on the overall protein structure. The steps involved in computing and using the exposed surface ratio are ex- plained below:

i) The 3D-conformation of cyclotides is typically determined by NMR (Rosengren, Daly et al. 2003), X-ray (Wang, Hu et al. 2009) and homology modelling (Svangård, Göransson et al. 2003). All of these conformations should be different, but the magnitude of the difference between pairs of con- formations can vary. The magnitudes of the differences between confor- mations are reflected in the variation of the exposed surface ratios of the side chains (Figure 15). The data presented in this figure clearly show that the var- iation in the exposed surface ratio for cyO2 is greater than that for kB1, indi- cating that the side chains of the former protein move more and are more con- formationally flexible. However, compared to most proteins, all cyclotides are very rigid because the cyclic cystine knot (CCK) motif constrains their con- formational flexibility.

38

Figure 15. Superimposed conformations derived from NMR structures. The left hand image shows superpositions of twenty conformations of the cyclotides kB1 (pdb code: 1nb1) (top) and cyO2 (2knm) (bottom). The exposed surface ratios of the residues in each protein are shown on the right hand side.

ii) For each individual conformation of each cyclotide, the contribution of each residue to the physicochemical properties of the protein as a whole can be estimated. This is illustrated for the lipophilic intensity in Figure 16. Eval- uating the lipophilic intensity reveals the functions of residues in relation to the protein’s structure. However, to obtain a more detailed understanding, we compared the lipophilic intensities of the cyclotides in their native fold to a hypothetical unfolded conformation in which φ = ψ = 180°. This showed that the cystine knot motif enhances the cyclotides’ overall lipophilicity by in- creasing the exposed surface area of lipophilic residues on the molecular sur- face. Because the cystines are deeply buried in the protein core, they do not interact directly with the membrane. However, the CCK motif forces the back- bone of the loops to curve outwards, causing some lipophilic residues in con- vex loops to be oriented towards the exterior of the peptide with high solvent- accessible surface areas, despite the energetically unfavorable nature of such conformations. Such residues are found in loops 5 and 2 of kB1 and cyO2, both of which are four amino acids long, have ends whose positions are con- strained by the cystine knot, and feature a tryptophan residue (W23 and W10 in kB1 and cyO2, respectively) with a high lipophilic surface area that is di- rectly adjacent to a proline residue. In kB1, the surface area of W23 is lower than in the linear conformation because a large proportion of the residue’s surface is hidden by the side chain of V10 in loop 2.

39 kB1

cyO2

Figure 16. Lipophilic surface area (lipophilic intensity) profiles for each residue on the native folds of kB1 (1nb1) and cyO2 (2knm).

40 4.5. Procedure for computing the electrostatic moment. To explain the distribution of electrostatic forces over the surfaces of the cy- clotides, a new procedure was introduced that is outlined below using the ex- ample of the tripeptide Gly-Ack-Gly, where Ack is acetylated lysine (Figure 17). The acetylation means that the side chain amino group of lysine is chem- ically modified and does not bear a positive charge.

A. Calculation of electrostatic potentials. The electrostatic potential is deter- mined using the Poisson-Boltzmann Equation (PBE) with a 0.5 Å grid spacing. At a given point r = (x, y, z), the electrostatic potential, u(r), represents the negative of the work that must be done to move a unit charge from infinity to a position r within the electric field. Each set of coordinates is assigned an electrostatic potential value of the form [x, y, z, u(r)].

B. Calculation of the center of the positively charged surfaces. The location of the peptide’s hydrogen bond donor (HBD) center (PC) is defined as the center of the volume enclosed by the isosurface that interacts with an aqueous oxygen probe de- rived from the SPC water model (Berendsen, Postma et al. 1981) at -1.6 kcal/mol. This point was taken to represent the center of the molecular surface that displays a positive charge. At a given position r, the hydrogen bond potential, HBP(r), is defined as the interaction energy with the aqueous oxygen probe: HBP(r) =q ∙u(r)+v. Here, qO is -0.82 and represents the partial charge of the aqueous oxygen probe, while vO is the van der Waals potential of the oxygen atom. The HBP(r) term also incorpo- rates a pairwise summation of the Coulombic (q ∙u(r)) and van der Waals interac- tions (v) in the form of a Lennard-Jones potential: v = ∑ A |r−r| − B |r−r| , where Ai and Bi are OPLS van der Waals parameters (Jorgensen, th Maxwell et al. 1996) and ri is the coordinate of the i atom of the protein. After the assignment of hydrogen bonding potentials on the grids [x, y, z, HBP(r)], the coordi- nates are selected to define the isocontour surface on which HBP(r) < -1.6 kcal/mol in Cartesian coordinates (x, y, z).

C. Calculation of the center of hydrophobic surfaces. The hydrophobic center

(HC) is defined as a center of the grid surface reactive to a DRY probe (Goodford 1985) at -0.2 kcal/mol. The DRY probe calculates the hydrophobic energy at each grid point: E = ∑ E +S−∑ E, where S is the entropy, which is taken to be a constant (-0.848), Evdw is the van der Waals interaction energy, and Ehb is the energetic cost of disrupting the hydrogen bonding network in the protein hydration shell.

D. Calculation of the electrostatic moment. The HBD amphipathic moment measures the distance from the center of the hydrophobic surface to the center of the hydrogen bond donor surface.

41

Figure 17. Illustration of the procedure used to compute EM. A. Chemical structure of Gly-Ack-Gly (left), and the corresponding electrostatic potentials (right). The N- terminal amine group bears a positive charge and the C-terminal carboxyl group bears a negative charge. B. PC is the positively charged center, and the distance from the center of the mass (MC) to the positively charged center (PC) is the electrostatic moment. C. The hydrophobic surface is represented with by the green mesh. HC is the hydrophobic center, and is the hydrophobic vector. D. The length of is the electrostatic moment.

The lipophilic and electrostatic moments computed as described above are frequently but not invariably positively correlated (Figure 18). This is because

42 both are mainly determined by the amino acid composition in naturally occur- ring cyclotides. The most important variables in this context are the abundance of hydrophobic residues and positively charged residues. Generally, lipophilic residues are confounding with hydrophobic residues, and lipophobic residues are confounding with both positively charged residues and negatively charged residues. However, in naturally occurring cyclotide structures, the physico- chemical variation due to negatively charged residues is not very large: 70% of known cyclotides (=200/285) contain no more than one negatively charged residue (Glu) in their sequences, and that negatively charged residue is buried inside the protein structure in most cases.

Figure 18. Data for 285 naturally occurring cyclotides indexed in CyBase April 2015 (www.cybase.org.au). Cyclotides whose lipophilic and electrostatic moments are positively correlated are represented by points located inside the dotted ellipse. The cyclotides’ structures were predicted with the CycloMod tool (Kaas and Craik 2010).

43 4.6. Use of QSAR to analyze membrane activity and selectivity In the paper II, the model was used to describe the QSAR of the cytotoxicity and anthelmintic activity of cyclotides. With its extended scope, Paper III pre- sents an analysis of the correlation between cyclotides’ structures and their antibacterial and membrane activities. Unfortunately, the availability of activ- ity data for cyclotides in general is limited and the data that can be obtained are sparsely distributed, with many undetermined IC50 values. In addition, the activities of cyclotides are inconsistently investigated and so even for those cyclotides that have been tested, there is only partial information on their ac- tivity in different cell lines. Therefore, I interpreted the structure-activity rela- tionships for these proteins using hierarchical clustering: the cyclotides were grouped using the newly established molecular descriptors (Figure 19), and their membrane activities were associated with individual groups. In the den- drogram, shown in Figure 19, the cyclotides are clustered into five groups: MCoT (■), Cter (♦), cyO (▲), Tric (▼) and kB (●).

Figure 19. Dendrogram of 18 cyclotides and their physicochemical properties, eval- uated using four molecular descriptors: the lipophilic surface area and lipophilic mo- ment (LS, LM), and positively charged surface area and electrostatic moment (ES, EM). The cyclotides were clustered into five groups: MCoT (■), Cter (♦), cyO (▲), Tric (▼) and kB (●). Negatively charged surface regions are colored in red, positively charged regions in blue, and the hydrophobic regions in green. 44 Grouping the cyclotides in this way made it possible to explain their differ- ent selectivities for phosphatidylethanolamine (PE)- and phosphatidylcholine (PC)-containing membranes. For comprehensive interpretation of membrane selectivity in relation to the five groups, I introduced the 2D plot with the af- finities of each cyclotide according to their two membrane types, as shown in Figure 20A. I defined the concept of the PE selectivity (i.e. the relative affinity for PE membranes relative to PC membranes) for each group as its ratio of PE/PC membrane activity, i.e. the slope of the line from the origin (0,0) to the group’s grid center. Also, the total membrane lytic activity for each group was defined as the absolute value of this coordinate (i.e. the length of the line). It should be noted that the PE selectivity of a given cyclotide group need not relate directly to its total membrane lytic activity. For example, the kB group is more PE-selective than the cyO group, but the total membrane activity of the latter exceeds that of the former.

45

Figure 20. Analysis of membrane selectivity for different cyclotides. A. Selectivity for binding to PE-containing (POPE:POPG) and PC-containing (POPC:POPG) mem- branes. Members of the five cyclotide groups are marked with symbols: MCoT (■), Cter (♦), cyO (▲), Tric (▼) and kB (●). The PE selectivity of each group is the gra- dient of the line from zero to the group’s center, and the group’s total membrane lytic activity is the length of the line. Thus, kB has a greater PE selectivity than cyO (the slope of line α is steeper than that of line β), but the total membrane lytic activity of cyO is greater than that of kB (the length of line β is longer than that of line α). B. Plot of the PE-selectivity and total membrane lytic activity for the five cyclotide groups.

46 5. Microwave-assisted total synthesis of cyclotides (Paper IV)

Having described the sequence diversity and its importance for activity, a common problem in natural product chemistry remains: the access to pure compounds to be tested. Some cyclotides can be isolated in high amounts (e.g. kB1), whereas others are only found at low abundance. For that reason, solid phase peptide synthesis (SPPS) has become a valuable tool for studies on cy- clotides.

Paper IV describes methods for the microwave-assisted Fmoc-SPPS of cy- clotides. The protocol presented in this paper is based on a strategy that com- bines optimized microwave assisted chemical reactions for Fmoc-SPPS of the peptide backbone, the cleavage of the protected peptide and the introduction of a thioester at the C-terminal carboxylic acid to obtain the head-to-tail cy- clized cyclotide backbone by native chemical ligation. To exemplify the utility of this protocol in the synthesis of a wide array of different cyclotide se- quences, various representative members of three cyclotide subfamilies were prepared: the Möbius cyclotide kalata B1, the bracelet cycloviolacin O2, and the trypsin inhibitor MCoTI-II. In addition, a ‘‘one-pot’’ reaction promoting both cyclization and oxidative folding of the crude peptide thioester was de- veloped for kalata B1 and MCoTI-II.

47

Figure 21. Synthetic strategy for the preparation of cyclotides. A. The partial pre- cursor sequence of kB1 and cyO2 (Upper). The cyclotide sequences of kB1 and cyO2 for synthesis (Lower). For native chemical ligation, Cys 5 and Gly 6 are often used as the N- and C- termini, respectively. The choice of Gly as the C-terminus pre- vents epimerization during thioesterification and cyclization. B. Strategy for cyclo- tide synthesis.

Peptide chains of approximately 30 amino acids were prepared by sequen- tial Fmoc-SPPS coupling reactions. Each of these reactions involves several

48 intermediate steps and washes with various reagents, which is easily per- formed in SPPS because the growing peptide chains are immobilized on a resin. Microwave irradiation was used to enhance the efficiency of these cou- pling reactions because as peptide chains grow on the resin, they are prone to aggregation and so become less accessible to reagents.

The elongated peptide chains were cleaved from the resin under mild con- ditions. In traditional Fmoc-SPPS, the peptide chains can be cleaved under strong or mild conditions. Strong conditions involve the use of a strong acid (e.g. 95% TFA) to cleave the peptides from the resin, which also removes the commonly used protecting groups. Mild conditions utilize a weak acid (e.g. 1% TFA or 10% AcOH in DCM). The acid (TFA and AcOH); the solvent is DCM to avoid over-weakening the acid as would happen in water. Mild con- ditions were used because it was necessary to preserve the protection of all of the side chain functional groups that could potentially undergo thioesterifica- tion to prevent side reactions. The C-terminal carboxylic acid group must be converted into the corresponding thioester prior to the cyclisation step, but glutamic acid and aspartic acid residues contain carboxylic acid moieties in their side chains, which would interfere with the thioesterification if they be- came unprotected.

We optimized the thioesterification reaction by testing a range of different solvents (DMF proved to be optimal) and microwave conditions. The original protocol for this reaction utilizes PyBOP, p-acetamidothiophenol and DIPEA in DCM (von Eggelkraut-Gottanka, Klose et al. 2003). Generally, microwave irradiation does not lower the activation energy of chemical reactions but in- stead accelerates the reaction by increasing the molecules’ ability to overcome this energetic barrier, often resulting in shorter reaction times than can be achieved with conventional heating (Hayes 2002).

A buffered one-pot protocol was developed to achieve cyclization and ox- idative folding in a single process. The cyclization occurs via native chemical ligation (NCL), which proceeds quickly in the presence of unprotected cyste- ines that form intramolecular interactions (Tam and Lu 1998, Hackeng, Griffin et al. 1999). Because the p-acetamidothiophenol thioester formed at the C-terminus during the thioesterification step is a good leaving group, the cyclisation step does not require any preliminary activating process (Blanco- Canosa and Dawson 2008).

49 6. Concluding remarks and future perspectives

Over the last few decades, a large body of information on the cyclotides has been accumulated. These proteins have been structurally characterized, their sequences have been determined by direct analysis and by studying the corre- sponding mRNA, and their biological activity has been explored. In addition, the structures of more than 30 cyclotides have been determined by NMR, in- cluding naturally occurring and artificially mutated examples. In total, over 300 cyclotide sequences have been determined.

During my doctoral studies, I aimed to integrate the results of these works and to relate the membrane activity of the cyclotides to their structures (Papers II and III). In addition, I sought to explain why the cyclotides happened to be distributed with large sequence diversity among flowering plants (Paper I). I therefore attempted to relate their natural distribution to their evolutionary his- tory.

I believe that the work presented in Papers II and III will facilitate the de- velopment of drugs with CCK scaffolds. Paper II explains the membrane ac- tivity of cyclotides using geometrically interpretable descriptors. The physi- cochemical properties required to explain the cyclotides’ membrane activities are systematically analyzed in terms of scalars (surface area) and moments (the distribution of specific properties over the molecular surface). Further- more, it is shown that the ‘exposed surface ratio of each residue’ provides information about individual residues’ structural contribution to the physio- chemical properties of the protein as a whole. Such information could poten- tially be useful in the development of guidelines for drug design by highlight- ing residues whose modification could profoundly change the protein’s prop- erties to yield enhanced activity.

In addition, an optimized Fmoc-SPSS protocol for the synthesis of both natural cyclotides and variants incorporating non-natural amino acids was de- veloped (Paper IV). Over 200 non-natural amino acids are commercially available, and their incorporation into synthetic cyclotides could greatly en- hance the sequence diversity of these proteins. Because the molecular de- scriptors developed in Papers II are applicable to both natural and non-natural amino acids, it becomes possible to efficiently design diverse cyclotide se- quences to fit within a tightly defined physicochemical space. By combining 50 this ability with guidance from the QSAR model, it should be possible to re- duce the cost (in terms of both money and time) of identifying potent new drug candidates.

Despite the successes achieved in this work, paper II highlighted a number of challenges that remain to be addressed. Cyclotides oligomerize when dis- solved in water at high concentrations (Nourse, Trabi et al. 2004, Rosengren, Daly et al. 2013), and some of them retain their ability to interact with mem- branes after doing so. However, the QSAR model (Paper II) explains the mem- brane interactions of the cyclotides in terms of the monomeric contact surface, and therefore cannot be applied to the analysis of oligomers. There is thus a need to develop more general descriptors and QSAR equations.

In addition, I think it would be interesting to study flexibility as a physico- chemical property in relation to membrane activity. We excluded flexibility (Lowell H. Hall 1991) from our initial analyses for two reasons: i) existing molecular descriptors only consider flexibility relating to the connectivity of covalent bonds, and cannot describe conformational changes in the hydrogen bond network; and ii) the existing flexibility descriptors are collinear with mo- lecular weight. However, the variation of the exposed surface ratio determined using NMR solution structure data reflects the conformational flexibility of individual residues. In my opinion, the thermal vibrations associated with this conformational flexibility could play an important role in membrane disrup- tion. This hypothesis is somewhat supported by the observation that cyO2 has greater conformational flexibility than kB1 and also has a higher membrane activity. In this regards, I speculate that the enhancing lipophilicity of loop 2 and loop 5 in cyO2 and kB1, respectively, would improve their general mem- brane activity. In those loops, the positions of tryptophan residue provide the sidechain with flexible conformations; also, the replacement with large lipo- philic residues increases the lipophilic moment and lipophilic surface areas. The lipophilicity of chemically modified amino acids is listed in the appendix I.

Although QSAR models provide mathematical descriptions of the correla- tion between structure and activity, they remain influenced by the subjective opinions and choices of the individuals who develop them. For example, we select the physicochemical properties to be included in the models on the basis of our knowledge and understanding of the systems we study. To establish better QSAR models, it will be necessary to perform more fundamental re- search so as to reduce the guesswork and intuition involved in this process of selecting variables.

51 In paper I, we estimated the distribution of cyclotides in four infrageneric sections of the genus Viola. Because the cyclotide precursors exhibit consid- erable sequence diversity, the precursors were classified based on their se- quence signatures. Together with the classification of the precursors, this made it possible to trace the flow of the cyclotide genes through the phyloge- netic tree of the genus Viola. In my opinion, similar approaches could be used to explain the distribution and evolution of other proteins and peptides found throughout the kingdom of life.

In addition, we explained the structural diversity of cyclotides in terms of the evolution of their precursor sequences. Their structural diversity was fur- ther extended by relating their structures to their expression levels. To date, cyclotides have mainly been classified on the basis of structural traits such as the presence or absence of a cis-Pro residue in loop 5, as in the case of the Mӧbius and bracelet cyclotides. More recently, it has been shown that the structural diversity of the cyclotides is greater than was previously recognized, with the discovery of hybrid and linear cyclotides. Paper I explains how thus structural variation is related to the sequences of the cyclotide precursors, and can be linked to the evolution of the Violaceae and natural selection.

However, many challenges remain to be addressed in relation to Paper I. Flowering plants exhibit immense biodiversity, and we still do not know how the known diversity of cyclotides is related to this more general diversity. Ge- nomic changes occur in gene pools during the process of speciation, and the genes encoding cyclotides are likely to be as strongly affected by such changes as every other type of gene. Further research in this area could help to explain the sporadic distribution of cyclotides among the angiosperms.

In addition, the known cyclotide sequences can be studied more extensively to identify previously unrecognized biological functions. Although the cyclo- tides’ biological activity is generally believed to stem from their membrane interactions, recent studies have shown that some cyclotides bind to receptors (Koehbach, O'Brien et al. 2013). Paper I describes many new cyclotides, some of which may have different functions to the known membrane-active pro- teins.

In Paper I, I aimed to explain the evolution of the cyclotides using knowledge from several different fields of study with consilience (Wilson 1998, Choi 2011). For example, to explain the distribution of cyclotides in the violets, we used information on the phylogenetic tree of the genus Viola de- rived from plant . In addition, to integrate the structural diversity of the known cyclotides, we classified their precursor sequences using knowledge from studies on protein evolution. Finally, to explain their natural

52 selection and evolutionary history, we studied their physicochemical proper- ties in relation to the acting selective forces, using knowledge from physical chemistry. This clearly shows how scientific evidence from independent and unrelated sources can converge to strong conclusions about complex biologi- cal issues such as the evolution of the cyclotides.

Over the last few decades, academic research has become increasingly spe- cialized and focused on ever-more specific topics. As a result, large amounts of knowledge have accumulated within individual fields but not been well communicated to or utilized by workers in other, tangentially related fields. I believe that we may be quite easily able to find new solutions to longstanding problems by breaking down these barriers between different disciplines, and that the adoption of the principle of consilience will be particularly advanta- geous in protein science. I hope that my thesis work will serve as one step towards this goal.

53 7. Summary of popular science

We human beings share our planet with a vast diversity of living organisms that are divided across five great kingdoms of life: bacteria, protists, fungi, plants and animals. All living organisms contain DNA, and produce proteins based on the information it encodes. One class of proteins, known as the cy- clotides, is found only in flowering plants and has two unusual structural fea- tures – a cyclic backbone and a so-called cyclic cystine knot (CCK). This raises many questions –what is the origin of this structural motif? Why do plants synthesize these proteins? And is there any way we can use their intri- guing properties to improve human lives? My thesis work aims to answers these questions.

Within the plant kingdom, some plants look quite similar while others look very different. For example, some plants flower and others don’t. Similarly, some plants have net-like veins while others have parallel veins. These differ- ences in shape (or morphological features) are due to genetic differences that occur during the speciation of plants, i.e. the divergence of an ancestral species into two or more new species. To date, cyclotides have only been found in flowering plants. Moreover, the emergence of the CCK structural motif is probably related to genomic changes resulting from the speciation events that led to the evolution of flowering plants. Violets are one of the genera known to produce cyclotides. In my thesis work, I investigated the evolution of cyclotides and related it to their distribution among the violets. Typically, a single Viola species will express 20-100 cyclotides, with different species expressing different cyclotides. At present, around 600 violet species are known to exist around the world. However, all of these 600 species origi- nate from a single ancestral species that existed approximately 30 million years ago. Thus, the diversity of the existing cyclotides has increased dramat- ically over the last 30 million years, in parallel with the speciation of the mem- bers of the Viola genus.

Cyclotides are synthesized by plants for host defense as a result of evolu- tionary selection. In the same way as humans rely on their immune systems to defend themselves against bacteria or other infectious agents, plants produce defensins to protect themselves against herbivores and pathogens. For thou- sands of years, humans have used herbal medicines based on crude or pro- cessed plant materials to treat various illnesses. Unfortunately, because herbal 54 medicines contain many different chemical compounds, it is not generally clear how they work. However, in recent years, advances in science and tech- nology have made it possible to isolate individual compounds from traditional medicines, determine their structures, and assess their biological functions. Some chemical compounds that play very important roles in herbal medicine can be produced in purer forms and used in modern medicine. However, this can present several difficulties. First, the amount of high purity material that can be obtained from plants is often low. Second, the natural compound may be less selective than one would like (and may therefore have unwanted side- effects). Third, the unmodified natural compound may have poor pharmaco- kinetics. It is therefore very common for medicinal chemists to produce drugs that are similar to but subtly different from these natural compounds to make them into better drugs. Chemical synthesis thus gives us access to a wider diversity of compounds for drug development than we could otherwise obtain.

The CCK motif of the cyclotides makes them very structurally resilient: they retain their biological activity even after being heated in boiling water, while most proteins lose their native structure and activity upon much less aggressive heating or when exposed to strong acids such as stomach acid. For example, insulin is an extremely important protein drug, but because of its fragility, it must be administered by injection rather than by oral consumption. The resilience of the CCK motif therefore makes it a very attractive scaffold for drug development. However, the synthesis of the CCK motif is challeng- ing. As part of my thesis work, I developed an efficient microwave-assisted procedure for preparing this motif, which will greatly facilitate the study of CCK-based drug candidates.

Many of the cyclotides are quite toxic, which raises another question: how can we modulate their toxicity to make them suitable for medicinal uses? Good drugs typically have very selective toxicity, and do not cause side ef- fects. For example, antibiotics should kill bacteria very efficiently without harming human cells. Similarly, cancer drugs should aggressively inhibit the growth of cancer cells without harming normal cells. Cyclotides exert their defensive effects in plants by selectively disrupting the cell membranes of in- vading organisms. Their interactions with membranes are due to physico- chemical forces such as electrostatic and hydrophobic interactions, and they can interact with membranes from many different organisms because all bio- logical membranes are composed of various kinds of lipids. However, cyclo- tides with similar structures show different affinities for specific types of membranes. Another part of this thesis sought to explain the relationships be- tween cyclotides’ structures and their activity towards specific types of bio- logical membrane. The physicochemical properties of the cyclotides, such as their weight, shape, hydrophobic area, hydrophilic area, positively charged area, negatively charged area, and so on, were computed from their structures 55 and then correlated with their activities. The resulting information was then used to express the relationship between structure and activity mathematically, in a way that will hopefully be useful in the design of new cyclotides with medicinal uses.

56 8. Acknowledgements

I would like to express my sincere gratitude to many people. Without their helps, this work could not possibly have been completed.

First of all, Professor Ulf Göransson, my supervisor. I deeply appriciate your superb leadership and guidance throughout the years during Ph.D study. Thank you for believing in me, especially during the times when I lacked it myself. You always listen to my crude ideas, and help me to formulate such primitive ideas for others with huge academic scopes. Without your helps and trust on me, I could not make thesis works in relation to other study fields. I have been impressed by your prudent attitude for the science and daily deci- sions even in tough situations. I would like keep the moral attitude that I have learned from you for the rest of my life.

My assistant supervisor, Prof. Lars Bohlin, for giving me the oppotunity to work in pharmacognosy, and for your constant support and cares on me. Prof. Anders Backlund, thanks for introducing many academic topics such as chemoinformatics and phylogeny of plants with the profound philsophy. Also, for your kind advices on the academic and administrative works.

I am also grateful to the present and former members at the Division: Dr Sunithi Gunasekera, thanks for your kind supervising on synthesis work and academic comments in many occassions. Dr Adam Strӧmstedt, thanks for your comments in work. Erik Jacobsson, for Q-TOF maintenance work, and I also sincerely appriciate for your kind many helps in the lab. You are always ready to help with a smile. Camilla Eriksson, many thanks for helping me with synthesis work, and for your kindness in daily discussion. I hope you to achieve in your research works. Dr Stefan Svahn, thank you for always being available to discuss the PhD blues and the smart tactics from your military experiences. Dr Robert Burman, I have been impressed by all your hard work in the lab, and thanks for helping me with HPLC and many guidances in Q- TOF. Dr Teshome Leta Aboye for kind supervising the peptide synthesis, and for sharing office space with me without complaining any of my habits. Dr Mariamawit Yeshak for good frienship, and I hope many good luck with your family. It was wonderful memory with you in ULLA summer school at 57 Parma. Dr Anna Koptina, it was always good time to sit next to you in the official and nonofficial parties. Dr Per Claeson for kind advising thesis, and examining the middle presentation.

my officemate for many years. I sincerely ( صھيب) Sohaib Zafar Malik appriate for your helps in reading my manuscripts and fix wrong contents. I enjoyed the daily discussions of your sharp wit and philosophy. Taj Muham- I have been impressed by your patience over many tough ,(تاج) mad Khan situations, and your kind smiles to many friends. Also, I sincerely appriciate for your helps fixing many experimenal tools in the lab. Javed Hussain thanks for your sincere friendship, and I hope you to achieve your ,(جاويد) thanks for your ,(ھشام) future works in academy soon. Dr Hesham El-seedi helps in academic works. Many thanks to all of you, the arabic collegues, in the division. Kind frienships and treating me with warm hearts and endless helps in life issues.

Dr Christina Wedén for kindly helps with smiles in many division activi- ties. You have very beautiful mind towards science and colleagues. Elisabet Vikeved for helps in life issues, and it has been fun time to chat with you. Astrid Henz, for continuing work on Natural products.

I am also grateful to my other colleagues at the division: Kuei-Hung Lai (Momo), I enjoyed the moment of the chatting with you in Conference, and meeting your friends’ friends. Lu Yang (杨璐), thanks for your kind smiles, which has been big supports in the lab. Dr Błażej Ślązak, I have been inspired with your kind cooperation and helps for others and me. Also, I appreciate for your kind academic advises in manuscript and thesis.

Prof. Curt Pettersson for your kindness and for latent supports on my Ph.D. I sincerely appriciate for your final decisions in many occassions.

I would like to extend my sincere gratitude to the administrative and finan- cial staff members of our faculty who have been kind enough to help and ad- vice in their respective roles. Maj Blad for your kind supports, and Kerstin Ståhlberg and Gunilla Eriksson for many helps in handling financial in- voices.

Dr Pravech Ajawatanawong (pPing), for guiding me about Python pro- graming and introducing me to the world of protein evolution. I have been impressed by your broaden knowelgedes about biology, and huge passion over your reaserch work. Also, it was great comforts from mung bean and coriander soups that you served for me in cold Swedish winter.

58 I am also indebted to my many student collegues for providing a stimulat- ing and fun filled enviornment. Deepa Patel, Viola Hassinen, Christian Chag. It had been very great time to work with all of you, and thanks for your kind helps in my work. While working with all of you, I had been rejuvenes- cent in my attitude for science and life. Anna Joo, for your encouraging and kind comments with fika.

I would also like to appricate Drs John Watson and Anita for allowing the picture of Viola reprinted in my thesis. Also, Dr David Blackwell for revising many academic writings in science regarding works.

I cannot forget to mention my Chinese friends: Xiaoyu Wang (王潇宇), we cooked many Chinese dishes from Swedish ingredients. I love your sense of humor, and chatting with you makes me happy and more confident. Jingzhi Hu (胡劲之), I have been inspired by you to simply many complicated things in simple way, and your kindness to many friends and me. Choi Tao (崔涛), thanks for your close friendship and teaching me Chinese. In some sence, I sometimes felt that I do have more Chinese vocabulary than others:) Sun Yu (孙峪), Yanling Cai and Di Wu, and Molly Lee for long time friendships. Even though we are all busy in own our lives, and we could not contact such often, all of you are very good friends in my life. Also, it was very great time when we togther have undergone many occasions, and I would like to give all of you a million of thanks for long time friendship!

Ernesto, Freja and Wesley for millons of thanks for great friendship and daily discussion. I enjoy the talk with all of you, and it has been great support for me to complete Ph.D research work. Takashi Willebrand, thanks for your warm-hearted foods in your places. I enjoyed the moment of the science-talk with you. I wish many good lucks with your family. I do also appreciate for your mom, 夕起子, treating me kindly. Jesper and Mikaela, I would like to express my sincere appreciation for kind helps with many times in the thesis publication. Without your kind helps, the thesis could not be published in time with good contents.

I would like to say many thanks to Korean friends whom I met in Uppsala and Stockholm. Dr Kiwoong Nam, thanks for your freindship over more than 10 years since we met in the army. Also, I really appriciate for encoraging me to learn mathematics and biology with pure academic scopes. Dr YoungKeun Kwak, thanks for your many encouraging words when I was so frustrated and confused with my study life. I sincerely appriciate for your friendship for many years in Sweden. Dr Sojeong Ka for kind treating. I was impressed your kind-hearted personality and humble. Dr Yoon, I appricate for your warm ap- priciation toward me, and teaching me how to see the friendships with positive manner. Your advice was more valuable than you could ever imagine. Dr 59 Duck Young Kim, for explainning such profound and complicated Physics with easy and simple terms. I was inspired a lot from your explaination like when I read Feynman lectures. Prof. Hwi-yeol Yun, for daily discussions from academic to life issues. I really miss the time when we shared the lunch in BMC Bikupan. JungMin An, for your friendship and for fun chatting. As for me, the memories of sharing time with you all Koreans have been great com- forts and big supports to endure long time in Sweden.

Back to home...

I am extremely indebted to Prof. Sang Myun Park, for giving me with oppontunity to learn in Ajou university, and your kind un- derstanding, encouragement and personal attentions. Also, Dr Jiyoung Park, for the kind guidance over molecular lab.

I am happy to acknowelge my debt to Prof. Kyewon Koh, for teaching me complicated mathematics with comprehensive language. Your great teaching skills, passion and enthusiam has certainly stimulated my interests in all of sciences with mathematical logic all those years ago. Also, to my high school mathematics teacher, Hansoo Park, for introducing me to the world of math- ematics. Your kind advices had been huge help for me to endure the tough times in high school study.

Prof. Kyoung Jae Won, I have been impressed by your endless challenges in academy and life. You are not only great mentor but also good role model in my life. I appriciate for having your time in advices regarding academic career before I started Ph.D.

Sgt. Andrew Derrik, for warm-heart supervising your team members and me. All of my millitary friends and me would not forget about your kindness. in pace requiescat. 1SG Edward Kalk, it had been great time to serve the army under your supervision as an assistant to 2MP operation. Your cares on me had been huge supports for me to endure tough military life in 2nd Infantry Div. Hojin Park, Hong Jinseok, Seongkyu Song and many others. Rough Rider, Hooah!

Yi Eun Chung, thank you doesn’t seem sufficient but it is said with appri- cation for your support, encouragement and precious friendship. It is always pleasure to chatt with you and to find new restourants in anywhere we go. It is always impressive to see your informal concerts with old piano. JeongHo Min, it had been great time with alcohol absorption into brain in the old days. Without the alcohol comsumption from you, I couldnot have plenty enough empty space to be filled with new scientific knowledges. YongTae Kim for your kindness towards me and advices in life issues. Thanks to your cares, the 60 I have great memories in undergraduate study. Jonghun Park, Bonsung Goo and Gilwoong Park, thanks for all of you with long friendship since high school. I wish your families have good luck.

Last but not least, I especially thank my mom, dad, sister and her husband. They have been my best friends all my life, and they form the backbone and origin of my happiness. It has been great support in my life to know that I have a family to rely on when times are rough. Their love and support without com- plaint motivates me work harder and enables me to complete this Ph.D thesis.

Special Acknowlegements There is not enough room to express my feeling to thank to you, Dr Johan Rosengren, for generously having your time in teaching me with protein NMR. I have never forgotten what you teached me in the science with mutiple scopes, and we can develop drugs in near future. Also, I feel highly indebted for providing accomodation in Brisbane. I am thankful to Dr Richard Clark for constant guidance and encouragement for me to complete the synthesis work in Queensland University. Also, I have been impressed by your steady stance in tough chemistry experiments, and prudent atittude for science and leadership. I also like to express my apprication to all members in the group: Kathryn Jaskson, Angela Song and Han Siean Lee for guidance in the lab work, David Armstrong and Taylor Smallwood for kind helps in NMR, Dr Johannes Koehbach for helping me with transcriptome analysis. I owe them both my sincere gratitude for their generous and timely helps.

Prof. Ki-Oug Yoo, thanks for your supervising while I stayed in South Ko- rea for collection of plant material. You enabled me to take the first step into the world full of living organisms from speculative reasoning. You taught me about what science should be, and how to be a good scientist. Also, my sincere thanks go to all members, in his research group, who have offered me their time when I collected some plant materials. Especially for Dr Kyeong-Sik Cheon, for valuable academic discussions. Yong-Ho Park and Jong-Soo Kang, for helping me to collect the plant materials with their expertise.

Dr Thomas Marcussen, this acknowelgement would not be enough for your valuable advice in my work, spending your precious times to read this thesis and manuscripts, and gave me valuable suggestions. You taught me how to interpretate the evolution of Viola with large scopes in worldwide scale. Without your insightful suggestions, I could not make a proper interpretation about cyclotide evolution in relation to Viola phylogeny.

61 9. References

Ballard HE (1996). Phylogenetic relationships and infrageneric groups in Viola (Violaceae) based on morphology, chromosome number, natural hybridization and internal transcribed spacer (ITS) sequences. Ph.D. Dissertation, University of Wisconsin.

Ballard HE, Sytsma KJ (2000). Evolution and biogeography of the woody Hawaiian violets (Viola, Violaceae): Arctic origins, herbaceous ancestry and bird dispersal. Evolution 54(5): 1521-1532.

Ballard HE, Sytsma KJ, Kowal RR (1998). Shrinking the violets: Phylogenetic relationships of infrageneric groups in Viola (Violaceae) based on internal transcribed spacer DNA sequences. Systematic Botany 23(4): 439-458.

Becker W (1925). Viola. In: Engler A, Prantl K eds. Die Natürlichen Pflanzenfamilien, Leipzig: Verlag von Wilhelm Engelmann.

Berendsen HJC, Postma JPM, Hermans J (1981). Intermolecular Forces, Reidel Publishing Company.

Blanco-Canosa JB, Dawson PE (2008). An efficient Fmoc-SPPS approach for the generation of thioester peptide precursors for use in native chemical ligation. Angew Chem Int Ed Engl 47(36): 6851-6855.

Bremer B, Bremer K, Chase MW et al (2009). An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Botanical Journal of the Linnean Society 161: 105-121.

Burman R, Gruber CW, Rizzardi K, Herrmann A, Craik DJ, Gupta MP, Göransson U (2010). Cyclotide proteins and precursors from the genus Gloeospermum: filling a blank spot in the cyclotide map of Violaceae. Phytochemistry 71(1): 13-20.

Burman, R, Herrmann A, Tran R, Kivela JE, Lomize A, Gullbo J, Göransson U (2011). Cytotoxic potency of small macrocyclic knot proteins: Structure-activity and mechanistic studies of native and chemically modified cyclotides. Org Biomol Chem 9(11): 4306-4314.

Burman R, Yeshak MY, Larsson S, Craik DJ, Rosengren KJ, Göransson U (2015). Distribution of circular proteins in plants: large-scale mapping of cyclotides in the Violaceae. Front Plant Sci 6: 855.

62 Choi J (2011). Consilience of the table, Myoung Jinchulpansa.

Claeson P, Göransson U, Johansson S, Luijendijk T, Bohlin L (1998). Fractionation Protocol for the Isolation of Polypeptides from Plant Biomass. J Nat Prod 61(1): 77- 81.

Clausen J (1927). Chromosome number and the relationship of species in the genus Viola. Annals of Botany 41: 677–714.

Clausen J (1931). Cyto-genetic and taxonomic investigations in Melanium violets. Hereditas 15: 219–304.

Clausen J (1964). Cytotaxonomy and distributional ecology of western North American violets. Madroño 17: 173–204.

Colgrave ML, Kotze AC, Ireland DC, Wang CK, Craik DJ (2008). The anthelmintic activity of the cyclotides: natural variants with enhanced activity. Chembiochem 9(12): 1939-1945.

Craik DJ, Daly NL, Bond T, and Waine C (1999). Plant cyclotides: A unique family of cyclic and knotted proteins that defines the cyclic cystine knot structural motif. J Mol Biol 294(5): 1327-1336.

Craik DJ, Daly NL, Mulvenna J, Plan MR, Trabi M (2004). Discovery, structure and biological activities of the cyclotides. Curr Protein Pept Sci 5(5): 297-315.

Daly NL, Clark RJ, Plan MR, Craik DJ (2006). Kalata B8, a novel antiviral circular protein, exhibits conformational flexibility in the cystine knot motif. Biochem J 393(Pt 3): 619-626.

Dutton JL, Renda RF, Waine C, Clark RJ, Daly NL, Jennings CV, Anderson MA, Craik DJ (2004). Conserved structural and sequence elements implicated in the processing of gene-encoded circular proteins. J Biol Chem 279(45): 46858-46867.

Fan, Q, Chen S, Wang L, Chen Z, Liao W (2015). A new species and new section of Viola (Violaceae) from Guangdong, China. Phytotaxa 197: 15–26.

Freitas L, Sazima M (2003). Floral biology and pollination mechanisms in two Viola species from nectar to pollen flowers? Ann Bot 91(3): 311-317.

Gingins F (1823). Mémoires sur la Famille des Violacees. Mémoires de la Société de physique et d'histoire naturelle de Genève, Genève: Paschoud.

Goodford PJ (1985). A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules. J. Med. Chem. 28(7): 849-857.

Göransson U, Burman R, Gunasekera S, Stromstedt AA, Rosengren KJ (2012) Circular proteins from plants and fungi. J Biol Chem 287(32): 27001-27006.

63 Göransson U, Luijendijk T, Johansson S, Bohlin L, Claeson P (1999). Seven novel macrocyclic polypeptides from Viola arvensis. J Nat Prod 62(2): 283-286.

Göransson U, Sjögren M, Svangård E, Claeson P, Bohlin L (2004). Reversible antifouling effect of the cyclotide cycloviolacin O2 against barnacles. J Nat Prod 67(8): 1287-1290.

Gran L (1973). On the effect of a polypeptide isolated from "Kalata-Kalata" (Oldenlandia affinis DC) on the oestrogen dominated uterus. Acta Pharmacol Toxicol (Copenh) 33(5): 400-408.

Gran L, Sandberg F, Sletten K (2000). Oldenlandia affinis (R&S) DC. A plant containing uteroactive peptides used in African traditional medicine. J Ethnopharmacol 70(3): 197-203.

Gruber CW, Elliott AG, Ireland DC, Delprete PG, Dessein S, Göransson U, Trabi M, Wang CK, Kinghorn AB, Robbrecht E, Craik DJ (2008). Distribution and evolution of circular miniproteins in flowering plants. Plant Cell 20(9): 2471-2483.

Grundemann C, Koehbach J, Huber R, Gruber CW (2012). Do plant cyclotides have potential as immunosuppressant peptides? J Nat Prod 75(2): 167-174.

Gupta RS (1998). Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 62(4): 1435-1491.

Gustafson KR, Sowder RC, Henderson LE, Parsons IC, Kashman Y, Cardellina JH, McMahon JB, Buckheit LK, Pannell LK, Boyd MR (1994). Circulins A and B. Novel human immunodeficiency virus (HIV)-inhibitory macrocyclic peptides from the tropical tree Chassalia parvifolia. J Am Chem Soc 116(20): 9337–9338.

Hackeng TM, Griffin JH, Dawson PE (1999). Protein synthesis by native chemical ligation: expanded scope by using straightforward methodology. Proc Natl Acad Sci U S A 96(18): 10068-10073.

Harris KS, Durek T, Kaas Q, Poth AG, Gilding EK, Conlan BF, Saska I, Daly NL, van der Weerden, Craik DJ, Anderson MA (2015). Efficient backbone cyclization of linear peptides by a recombinant asparaginyl endopeptidase. Nat Commun 6: 10199.

Hashempour H, Koehbach J, Daly NL, Ghassempour A, Gruber CW (2013) Characterizing circular peptides in mixtures: sequence fragment assembly of cyclotides from a violet plant by MALDI-TOF/TOF mass spectrometry. Amino Acids 44(2): 581-595.

Hayes BL (2002). Microwave Synthesis: Chemistry at the Speed of Light, CEM Pub.

64 Hellinger R, Koehbach J, Soltis DE, Carpenter EJ, Wong GK, Gruber CW (2015). Peptidomics of circular cysteine-rich plant peptides: analysis of the diversity of cyclotides from Viola tricolor by transcriptome- and proteome-mining. J Proteome Res.

Henriques ST, Huang YH, Rosengren KJ, Franquelim HG, Carvalho FA, Johnson A, Sonza S, Tachedjian G, Castanho MA, Daly NL, Craik DJ (2011) Decoding the membrane activity of the cyclotide kalata B1: the importance of phosphatidylethanolamine phospholipids and lipid organization on hemolytic and anti-HIV activities. J Biol Chem 286(27): 24231-24241.

Heo J. (1610). Dongui Bogam. Korea.

Hernandez JF, Gagnon J, Chiche L, Nguyen TM, Andrieu JP, Heitz A, Trinh Hong T, Pham TT, Nguyen LD (2000). Squash trypsin inhibitors from Momordica cochinchinensis exhibit an atypical macrocyclic structure. Biochemistry 39(19): 5722- 5730.

Herrmann, A, Burman R, Mylne JS, Karlsson G, Gullbo J, Craik DJ, Clark RJ, Göransson U (2008). The alpine violet, Viola biflora, is a rich source of cyclotides with potent cytotoxicity. Phytochemistry 69(4): 939-952.

Huang YH, Colgrave ML, Clark RJ, Kotze AC, Craik DJ (2010). Lysine-scanning Mutagenesis Reveals an Amendable Face of the Cyclotide Kalata B1 for the Optimization of Nematocidal Activity. Journal of Biological Chemistry 285(14): 10797-10805.

Ireland DC, Colgrave ML, Nguyencong P, Daly NL, Craik DJ (2006). Discovery and characterization of a linear cyclotide from Viola odorata: implications for the processing of circular proteins. J Mol Biol 357(5): 1522-1535.

Jang SK (2012). Phylogenetic study of viola section Pinnatae Wang. Ph.D thesis, Kangwon National University.

Jennings C, West J, Waine C, Craik DJ, Anderson M (2001). Biosynthesis and insecticidal properties of plant cyclotides: the cyclic knotted proteins from Oldenlandia affinis. Proc Natl Acad Sci U S A 98(19): 10614-10619.

Jorgensen WL, Maxwell DS, Tirado-Rives J (1996). Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 118(45): 11225–11236.

Kaas Q, Craik DJ (2010). Analysis and classification of circular proteins in CyBase. Biopolymers.

Kim KS (1986). Studies of comparative morphology on the Korean Viola species. Ph.D thesis, Sung Kyun Kwan University.

65 Koehbach J, O'Brien M, Muttenthaler M, Miazzo M, Akcan M, Elliott AG, Daly NL, Harvey PJ, Arrowsmith S, Gunasekera S, Smith TJ, Wray S, Goransson U, Dawson PE, Craik DJ, Freissmuth M, Gruber CW (2013). Oxytocic plant cyclotides as templates for peptide G protein-coupled receptor ligand design. Proc Natl Acad Sci U S A.

Larsson S, Backlund A, Bohlin L (2008). Reappraising a decade old explanatory model for pharmacognosy. Phytochemistry Letters 1: 131–134.

Li S (1593). Bencao Gangmu. China.

Lindholm P, Göransson U, Johansson S, Claeson P, Gullbo J, Larsson R, Bohlin L and Backlund A (2002). Cyclotides: a novel type of cytotoxic agents. Mol Cancer Ther 1(6): 365-369.

Lobanov MY, Bogatyreva NS, Galzitskaya OV (2008). Radius of gyration as an indicator of protein structure compactness. Molekulyarnaya Biologiya 42(4): 623- 628.

Lowell H, Hall LBK (1991). The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. Reviews in Computational Chemistry 2.

Marcussen T, Heier L, Brysting AK, Oxelman B, Jakobsen KS (2015). From gene trees to a dated allopolyploid network: insights from the angiosperm genus Viola (Violaceae). Syst Biol 64(1): 84-101.

Marcussen T, Jakobsen KS, Danihelka J, Ballard HE, Blaxland K, Brysting AK, Oxelman B. (2012). Inferring species networks from gene trees in high-polyploid North American and Hawaiian violets (Viola, Violaceae). Syst Biol 61(1): 107-126.

MOE Molecular Operating Environment (MOE). 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, Chemical Computing Group Inc. 2012.10.

Mulvenna JP, Foley FM, Craik DJ (2005). Discovery, structural determination, and putative processing of the precursor protein that produces the cyclic trypsin inhibitor sunflower trypsin inhibitor 1. J Biol Chem 280(37): 32245-32253.

Mulvenna JP, Mylne JS, Bharathi R, Burton RA, Shirley NJ, Fincher GB, Anderson MA, Craik DJ (2006). Discovery of cyclotide-like protein sequences in graminaceous crop plants: ancestral precursors of circular proteins? Plant Cell 18(9): 2134-2144.

Mulvenna JP, Sando L, Craik DJ (2005). Processing of a 22 kDa precursor protein to produce the circular protein tricyclon A. Structure 13(5): 691-701.

Mylne JS, Colgrave ML, Daly NL, Chanson AH, Elliott AG, McCallum EJ, Jones A, Craik DJ (2011). Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nat Chem Biol 7(5): 257-259.

66 Nguyen GK, Lian Y, Pang EW, Nguyen PQ, Tran TD and Tam JP (2013). Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. J Biol Chem 288(5): 3370-3380.

Nourse A, Trabi M, Daly NL, Craik DJ (2004). A comparison of the self-association behavior of the plant cyclotides kalata B1 and kalata B2 via analytical ultracentrifugation. J Biol Chem 279(1): 562-570.

Ovesen RG, Brandt KK, Göransson U, Nielsen J, Hansen HC and Cedergreen N (2011). Biomedicine in the environment: cyclotides constitute potent natural toxins in plants and soil bacteria. Environ Toxicol Chem 30(5): 1190-1196.

Plan MR, Saska I, Cagauan AG, Craik DJ (2008). Backbone cyclised peptides from plants show molluscicidal activity against the rice pest (golden apple snail). J Agric Food Chem 56(13): 5237-5241.

Poth AG, Colgrave ML, Philip R, B. Kerenga B, Daly NL, Anderson MA, Craik DJ (2011). Discovery of cyclotides in the fabaceae plant family provides new insights into the cyclization, evolution, and distribution of circular proteins. ACS Chem Biol 6(4): 345-355.

Poth AG, Mylne JS, Grassl J, Lyons RE, Millar AH, Colgrave ML, Craik DJ (2012). Cyclotides associate with leaf vasculature and are the products of a novel precursor in petunia (Solanaceae). J Biol Chem 287(32): 27033-27046.

Rosengren KJ, Daly NL, Harvey PJ, Craik DJ (2013). The self-association of the cyclotide kalata B2 in solution is guided by hydrophobic interactions. Biopolymers 100(5): 453-460.

Rosengren KJ, Daly NL, Plan MR, Waine C, Craik DJ (2003). Twists, knots, and rings in proteins: structural definition of the cyclotide framework. J Biol Chem 278(10): 8606-8616.

Saether O, Craik DJ, Campbell ID, Sletten K, Juul J, Norman DG (1995). Elucidation of the primary and three-dimensional structure of the uterotonic polypeptide kalata B1. Biochemistry 34(13): 4147-4158.

Sandberg F (1965). Étude sur les plantes médicinales et toxiques d´Afrique équatoriale. Paris, Cahièrs de la Maboké. 12 Rue de Buffon. Tome III, Fascicule 1: 27.

Schöpke T, Hasan AMI, Kraft R, Otto A, Hiller K (1993). Haemolytisch aktive Komponenten aus Viola tricolor L. und Viola arvensis murray. Sci Pharm 61: 145- 153.

Simonsen SM, Sando L, Ireland DC, Colgrave ML, Bharathi R, Göransson U, Craik DJ (2005). A continent of plant defense peptide diversity: cyclotides in Australian Hybanthus (Violaceae). Plant Cell 17(11): 3176-3189.

67 Simonsen SM, Sando L, Rosengren KJ, Wang CK, Colgrave ML, Daly NL, Craik DJ (2008). scanning mutagenesis of the prototypic cyclotide reveals a cluster of residues essential for bioactivity. J Biol Chem 283(15): 9805-9813.

Slazak B, Jacobsson E, Kuta E, Göransson U (2015). Exogenous plant hormones and cyclotide expression in Viola uliginosa (Violaceae). Phytochemistry 117: 527-536.

Sletten K, Gran L (1973). Some molecular properties of kalatapeptide B-1. Medd Nor Farm Selsk 35: 69-82.

Svangård E, Göransson U, Smith D, Verma C, Backlund A, Bohlin L, Claeson P (2003). Primary and 3-D modelled structures of two cyclotides from Viola odorata. Phytochemistry 64(1): 135-142.

Tam JP, Lu YA (1998). A biomimetic strategy in the synthesis and fragmentation of cyclic protein. Protein Sci 7(7): 1583-1592.

Tam JP, Lu YA, Yang JL, Chiu KW (1999). An unusual structural motif of antimicrobial peptides containing end-to-end macrocycle and cystine-knot disulfides. Proc Natl Acad Sci U S A 96(16): 8913-8918.

Taylor WR, Aszodi A (2004). Protein Geometry, Classification, Topology and Symmetry: A Computational Analysis of Structure, CRC Press.

Tokuoka T (2008). Molecular phylogenetic analysis of Violaceae () based on plastid and nuclear DNA sequences. J Plant Res 121(3): 253-260.

Trabi M, Craik DJ (2002). Circular proteins: no end in sight. Trends Biochem Sci 27(3): 132-138. van de Waterbeemd H, Karajiannis H, Tayar N (1994). Lipophilicity of amino acids. Amino Acids 7(2): 129-145. von Eggelkraut-Gottanka, Klose RA, Beck-Sickinger AG, Beyermann M (2003). Peptide α-thioester formation using standard Fmoc-chemistry. Tetrahedron Letters 44(17): 3551-3554.

Wahlert GA, Marcussen T, Paula-Souza J, Ballard HE (2014). A Phylogeny of the Violaceae (Malpighiales) Inferred from Plastid DNA Sequences: Implications for Generic Diversity and Intrafamilial Classification. Systematic Botany 39(1): 239-252.

Wang CK, Hu SH, Martin JL, Sjögren T, Hajdu J, Bohlin L, Claeson P, Göransson U, Rosengren KJ, Tang J, Tan NH, Craik DJ (2009). Combined X-ray and NMR analysis of the stability of the cyclotide cystine knot fold that underpins its insecticidal activity and potential use as a drug scaffold. J Biol Chem 284(16): 10672-10683.

Watson JM, Flores AR (2011). Study and rehabilitation of some endemic Argentinian taxa in the genus Viola L. (Violaceae), and lectotypification of a Peruvian species. Gayana Bot. 68(2): 297-308.

68 Wilson EO (1998). Consilience: the unity of knowledge, Vintage Books.

Witherup KM, Bogusky MJ, Anderson PS, Ramjit H, Ransom RW, Wood T, Sardana M (1994). Cyclopsychotride A, a biologically active, 31-residue cyclic peptide isolated from Psychotria longipes. J Nat Prod 57(12): 1619-1625.

Yockteng RH, Mansion G, Dajoz I, Nadot S (2003). Relationships among pansies (Viola section Melanium) investigated using ITS and ISSR markers. Plant Systematics and Evolution 241(3-4): 153-170.

Yoo KO and Jang SK (2013). Illustrated book of Korean Violaceae. Seoul, South Korea, Jisungsa Publ. Co.

Zhang J, Hua Z, Huang Z, Chen Q, Long Q, Craik DJ, Baker AJ, Shu W, Wand B, Liao B (2015). Two Blast-independent tools, CyPerl and CyExcel, for harvesting hundreds of novel cyclotides and analogues from plant genomes and protein databases. Planta 241(4): 929-940.

Zhang J, Li J, Huang Z, Yang B, Zhang X, Li D, Craik DJ, Baker AJ, Shu W, Liao B (2015). Transcriptomic screening for cyclotides and other cysteine-rich proteins in the metallophyte Viola baoshanensis. J Plant Physiol 178: 17-26.

Zhang J, Liao B, Craik DJ, Li JT, Hu M, Shu WS (2009). Identification of two suites of cyclotide precursor genes from metallophyte Viola baoshanensis: cDNA sequence variation, alternative RNA splicing and potential cyclotide diversity. Gene 431(1-2): 23-32.

Zhu S, Darbon H, Dyason K, Verdonck F and Tytgat J (2003). Evolutionary origin of inhibitor cystine knot peptides. FASEB J 17(12): 1765-1767.

69 Appendix I

Name Structure LogP SASA Names Structures LogP SASA (Norm) (Norm) Abu 0.73 110.8 Aib 0.054 125.6 (2-Aminobu- (α-Amioisobu- tyric acid) tyric acid) Ath 0.609 321.6 Ahx 0.037 219.5 (anthreneal- (6-Aminohexa- anine) noic acid)

Bal 0.396 239.4 Bip 0.533 320.0 [β-(benzothien- (Biphenylala- 3-yl) alanine] mine)

Cha 0.365 213.4 Chg 0.378 187.6 (β-Cyclohexyl- (α-Cyclohexyl- alanine) glycine)

Dap -0.369 215.0 Dip 0.501 315.6 (Dia- (Diphenylala- minopimelic nine) acid)

HmF 0.346 225.6 Hse 0.081 113.2 (Homophenyl- (Homoserine) alanine)

Lys(Dnp), 0.345 375.6 Tmk -0.461 247.0 (Dfk) [Lys(Me3)] (N-ε-Dinitro- phenyl-lysine)

Nle 0.356 202.7 Orn -0.126 210.1 (Norleucine) (Ornithine)

1-Nal 0.436 245.0 2-Nal 0.440 260.2

Tff 0.407 272.2 Dff 0.294 200.7 (Methy-4-tri- (Methyl-2,6- flouromethoxy- difluoro- phenylalanine) phenylalanine)

70 Name Structure LogP SASA Names Structures LogP SASA (Norm) (Norm) Tfb 0.815 460.2 Bmn 0.896 486.5 [4-(trifluor- [2-(bromo- methyl) methyl)naph- benylbro- thalene] mide]

Oic 0.282 206.2 Tic 0.283 210.6

4NF 0.253 232.1 2ff 0.388 232.1 2-Triflouro- 4-Nitro- methyl-Phe phenyl-ala- nine

1-ISF 0.490 285.4 4Mxf 0.268 225.4 4-Methoxyl- 1-Isopropyl- Phe phenyl-ala- nine

4-AnF 0.118 205.3 Dhf 0.111 209.9 Aminophenyl- (3,4)-Dihy- alanine droxyl-Phe

4Flf 0.284 203.8 4-Clf 0.321 216.8

4-Flourophenyl- 4-Fluoro- alanine phenyl-ala- nine 4Idf 0.424 243.6 2mw 0.248 256.9 4-Ido- D-2-Methyl- phenyl-ala- tryptophan nine L-2-Pal 0.681 185.0 L-3-Pal 0.614 181.5 (2pf) (3pf) 3-(3-Pyridyl)- 3-(2- L-alanine Pyridyl)-L- alanine a. Scaled parameter = (raw parameter + 3.800) / 6.457, where the value of 3.800 is the raw parameter logP(Arg+) and the value of 6.457 is the sum of the absolute values of the raw parameters logP(Arg+) and logP(Ile). Normalized parameter = scaled pa- rameter – scaled parameter of logP(Gly). b. The SASA of amino acid (X) is the ex- posed solvent accessible surface area of the side chain of X in Gly-X-Gly when the tripeptide has ψ and φ angles of 180°, and the contribution of the Cα atom in the side chain of X is excluded. To the author’s knowledge, these definitions of the lipophilic index and the surface areas of the residues are original to this thesis.

71 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 218 Editor: The Dean of the Faculty of Pharmacy

A doctoral dissertation from the Faculty of Pharmacy, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-292668 2016