<<

Plant derived cyclic peptides: from discovery to biotechnological applications Haiou Qu BSc, MSc

A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in 2019 Institute for Molecular Bioscience Abstract

Plant-derived -rich cyclic peptides are a class of small peptides that range from 14-37 amino acids in size and are characterised by a head-to-tail cyclized backbone. Their unique cyclized and -constrained structures make them exceptionally stable molecules. Members include the six-cysteine containing cyclotides, which have been detected in species spanning five plant families, and the two-cysteine PawS-derived cyclic peptides found in sunflower. The sequences between residues are tolerant to residue substitutions in both of these examples, thus making them promising scaffolds for grafting and stabilizing bioactive epitopes. Indeed, a number of pharmaceutical protein-engineering applications based on cyclic peptides have been demonstrated in recent years.

Chemical synthesis has been the dominant approach in the past for producing cysteine-rich cyclic peptides, with native chemical ligation being the most commonly used to generate the cyclic backbone. However, production using this approach can be expensive, especially upon scale-up production, and not all cyclic peptides are amenable to synthesis and correct folding. Furthermore, the large amounts of chemical reagents required for synthesis has impacts for the environment. To overcome this issue, an alternative strategy is to produce cyclic peptides in , as some plant species are naturally efficient at cyclic peptide production. Other advantages of developing plant-based production systems include reduced production costs, reduced chemical waste, and the possibility of developing innovative oral delivery drugs by the packaging of peptides in edible plant products. The overall goal of this thesis is to explore the potential of plant-based production of cyclic peptides. Chapter 1 provides a comprehensive background of plant-derived cysteine-rich cyclic peptides and the current synthesis approaches.

The first aim was to develop rice as a production system to produce cyclic peptides (Chapter 2). Rice was selected as a candidate biofactory host as it has been proven to be efficient for the recombinant production of complex proteins, especially for those with disulfide bonds. Moreover, rice does not naturally produce cyclic peptides, which eliminates the possible interference of native cyclic peptides during separation and purification. A stable transformation platform was set up to produce cyclic peptides in rice suspension cells and seeds. Transgenes encoding prototypical cyclic peptides and engineered analogues were co-expressed with asparaginyl endopeptidases (AEPs) which are enzymes required for peptide backbone cyclization. The yields and structures of rice-derived cyclic peptides and transcript expression levels of AEPs were characterized.

The second aim was to investigate the diversity and cyclization capability of cyclotide-like peptides in monocots (Chapter 3). To date, no native cyclic peptides have been identified in any monocot species, and only genes encoding linear cyclotide-like peptides have been identified. The essential residues known to be required for backbone cyclization are missing in these precursors. To investigate this further, monocot

ii lineages were explored for cyclotide-like gene sequences using transcriptome analysis of monocots spanning the breadth of the taxonomic group. When expressing cyclotide-like genes in planta, a pyroGlu modification at the N-terminus was observed. Additionally, it was observed that some monocot cyclotide-like genes could be engineered with minimal residue changes to allow backbone cyclization both in vitro and in planta. These results will aid efficient cyclic peptide production in monocot cereal plants (e.g. rice, maize).

The third aim was to define the plasticity of seeds for the production of diverse cyclic peptides (Chapter 4). Some cyclic peptides are naturally produced in seeds, which suggests that plant seeds may provide a beneficial environment for the production of heterologous cyclic peptides. To investigate this hypothesis, a number of cyclic peptide genes were expressed in Arabidopsis seeds. Furthermore, the cyclization efficiencies of three different sunflower AEPs were determined. To circumvent the low efficiency of co-transformation of cyclic peptide precursors and AEP genes, a homozygous AEP transgenic line exhibiting high expression was created. This stable AEP expressing line with cyclic peptide gene stacking experiments would shed light on the production of cyclic peptides in seeds.

Throughout my PhD, a range of plants, tissues and cell types has been investigated for their plasticity to produce cyclic peptides. Rice suspension cells and seeds were developed to produce cyclic peptides in continuous production or stable long term storage respectively. Arabidopsis seeds were developed as a simple platform for the seed production of cyclic peptides. Furthermore, the diversity of monocot cyclotide-like genes was investigated, as well as their capability to be engineered for backbone cyclization using a transient leaf expression system. All in all, these studies provide valuable information on the selection of biofactory hosts, tissue specificity and the genetic modifications required to produce cyclic peptides efficiently in plants.

iii Declaration by author

This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly-authored works that I have included in my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, financial support and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my higher degree by research candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the policy and procedures of The University of Queensland, the thesis be made available for research and study in accordance with the Copyright Act 1968 unless a period of embargo has been approved by the Dean of the Graduate School.

I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Where appropriate I have obtained copyright permission from the copyright holder to reproduce material in this thesis and have sought permission from co-authors for any jointly authored works included in the thesis.

iv Publications included in this thesis

Haiou Qu, Bronwyn J. Smithies, Thomas Durek, and David J. Craik. Synthesis and protein engineering applications of cyclotides. Australian Journal of Chemistry, 2017, 70(2):152-161. Partially incorporated in Chapter 1

Contributor Statement of contribution Haiou Qu Wrote and edited paper (70%) Prepared figures (20%) Bronwyn J. Smithies Wrote and edited paper (30%) Prepared figures (40%) Thomas Durek Refined writing (50%) David J. Craik Refined writing (50%) Prepared and edited the figures (40%)

Submitted manuscripts included in this thesis

No manuscripts submitted for publication

Other publications during candidature

Research article Mark A. Jackson, Kuok Yap, Aaron G. Poth, Edward K. Gilding, Joakim E. Swedberg, Simon Poon, Haiou Qu, Thomas Durek, Karen Harris, Marilyn A. Anderson and David J. Craik, 2019. Rapid and scalable plant based production of a potent plasmin inhibitor peptide. Frontiers in plant science 10, 602.

Conference abstracts Haiou Qu, Mark A. Jackson, Edward K. Gilding and David J. Craik. Developing rice as a biofactory for cyclic therapeutic peptides. EMBL Australia PhD Symposium, Nov 2016, Adelaide, Australia. Oral presentation.

Haiou Qu, Mark A. Jackson, Edward K. Gilding and David J. Craik. Developing rice as a production system for cyclic therapeutic peptides. ComBio, Oct 2016, Brisbane, Australia. Poster and short talk presentation.

Haiou Qu, Mark A. Jackson, Edward K. Gilding and David J. Craik. Developing rice as a production system for cyclic therapeutic peptides. 3rd Congress of the International Society for Plant Molecular Farming, June 2018, Helsinki, Finland. Poster presentation.

v Haiou Qu, Edward K. Gilding, Mark A. Jackson, Kuok Yap, Olivier J. Cheneval and David J. Craik. Surveying a diverse set of monocots for cyclotides: are these cyclic peptides restricted to dicots? IMB EMCRA Mini-Symposium, Dec 2018, Brisbane, Australia. Poster presentation.

Haiou Qu, Edward K. Gilding, Mark A. Jackson, Kuok Yap, Olivier J. Cheneval and David J. Craik. Surveying a diverse set of monocots for cyclotides. 4th International Conference on Circular Proteins and Peptides, Nov 2018, Kawasaki, Japan. Poster and short talk presentation.

Contributions by others to the thesis

Dr. Mark A. Jackson and Dr. Edward K. Gilding contributed to the research project design, discussion and editing of the thesis. Dr. Jackson contributed to the design of primers for the PCR analysis (Chapters 2 and 4) and provided some vectors for leaf expression and Arabidopsis seed expression system (Chapters 3 and 4). Dr. Gilding contributed to transcriptome analysis (Chapter 3). Kuok Yap carried out the recombinant enzyme production (Chapter 3) and assisted with peptide absolute quantification (Chapters 2 and 3). Olivier J. Cheneval carried out peptide synthesis (Chapter 3). Peta Harvey carried out NMR analysis (Chapter 2). Professor David Craik contributed to the revision of this thesis.

Statement of parts of the thesis submitted to qualify for the award of another degree

No works submitted towards another degree have been included in this thesis.

Research Involving Human or Animal Subjects

No animal or human subjects were involved in this research.

vi Acknowledgements

I have been supported and inspired during my PhD by the following people and would like to take this chance to express my appreciation and gratitude.

To my supervisors Professor David Craik, Dr Mark Jackson and Dr Edward Gilding, thanks for providing great support and guidance in my research. David, you are always ready to help and full of encouragement. With your kind and generous help, I was able to make the trip to visit Deyun’s lab to learn hands-on skills for rice transformation, and had the opportunity to attend an international conference to present my work on monocot cyclotide-like peptides. Mark, you are the best scientist with ‘a method to the madness’ mind! Thank you for your patient daily guidance and elaborate revision for my writing as well as leading me to have independent and creative thinking in research. Also big cuddles to your lovely babies, Mackenzie (big girl already) and Jamie (cutest curly hair boy) for all the nice coffee time we had. To Ed, you are the Aloha Friday costume king! Thank you for your training in bioinformatics and providing novel suggestions for my research. I appreciate your effort in negotiating for my glasshouse setup and managing the seed ordering.

A big thank you to the whole Craik group who provide a great environment to do research as well as enjoyable entertainment outside the lab. To Dr Thomas Durek, thank you for your contribution to a review article with great chemical knowledge. To Mr Kuok Yap, thank you for training me in MS and HPLC as well as providing assistance for in vitro cyclization assays. To Mr Olivier Cheneval, thank you for the peptide synthesis and for your effort in the lab management. Also, thank you for organizing fun games and hikes outside the lab. To Dr Peta Harvey, thank you for your help with NMR and PyMOL training. To Dr Crystal Huang, you always feed me with tasty snacks, thank you for your company during after-work hours. To Dr Lai Yue Chan, thank you for your encouragements and having lovely ice cream time together with your baby girl. To Dr Aaron Poth, you are an easy going person to talk with, thank you for sharing opinions of doing research and career directions. Dr Joakim Swedberg, thank you for sharing your opinions of being a postdoctoral fellow and organizing an awesome camping trip.

To Ms Felicitas Vernen, great to have your company during my whole PhD, thank you for showing me a wonderful world outside the lab. To Ms Bronwyn Smithies, you are the best office neighbour, thank you for your mental support and the great workshop trip we had together. To Dr Joanna Akello Isakunye Agwa, you are my role model for being an independent researcher, thank you for your company during morning paddles. To Ms Georgianna Oguis, you are the best senior, thank you for showing me how things work as a PhD student and for the scientific discussions. To Ms Junqiao Du, thank you for your help with HPLC and being good company in the lab. To Ms Kirsten McMahon, thank you for helping me out with software issues. To Ms Qingdan Du, thank you for your help with MS-MS and feeding me with tasty food. To Choi

vii Yi Li, thank you for the talks over dinner about future careers and life in general. To Mr Huawu Yin, thank you for sharing your thinking about research with great humour. Thank you for Dr Dorien Van Lysebetten, it has been great to meet you and share great time during your visiting.

To Dr. Deyun Qiu, thank you for the kind offer to visit your lab and for providing useful advice about research and career directions. Dr Amanda Carozzi, thank you for your consistent support for all my milestones and assistance in finalizing my PhD. Dr Robyn Craik, thank you for managing the travel funding and proofread my thesis. Dr Martin Wynne, thank you for spending days and nights to proofread my thesis elaborately.

At last, a big thank you to my family! To Li Pan, you are always there for me, thank you for your warm embrace and endless love. To Bo Qu, you are my strong back force, thank you for taking care of all matters at home. To Amy Woods and Ken Woods, thank you for having me during the first month in Australia and providing great support for my life abroad. To Shengtong Zhao, thank you for sharing accommodation and the car with me and providing professional Switch games and internet supporting. To Ling Ding, thank you for your encouragements and great company, together with your lovely babies during Christmas holidays.

viii Financial support

This research was supported by UQ International Scholarship

Keywords cyclic peptides, in planta cyclization, biofactory, monocot cyclotidelike peptides, seed specific production

Australian and New Zealand Standard Research Classifications (ANZSRC)

ANZSRC code: 060702, Plant cell and Molecular Biology, 50% ANZSRC code: 060101, Analytical Biochemistry, 30% ANZSRC code: 060408, Genomics, 20%

Fields of Research (FoR) Classification

FoR code: 1001, Agricultural Biotechnology, 50% FoR code: 0601, Biochemistry and Cell Biology, 30% FoR code: 0604, Genomics, 20%

ix Table of Contents

Abstract ...... ii Declaration by author ...... iv Publications included in this thesis ...... v Submitted manuscripts included in this thesis ...... v Other publications during candidature ...... v Contributions by others to the thesis ...... vi Statement of parts of the thesis submitted to qualify for the award of another degree ...... vi Research Involving Human or Animal Subjects ...... vi Acknowledgements ...... vii Financial support ...... ix Keywords ...... ix Australian and New Zealand Standard Research Classifications (ANZSRC) ...... ix Fields of Research (FoR) Classification ...... ix List of Figures ...... xiii List of Tables ...... xiv List of Abbreviations used in the thesis ...... xv Chapter 1 Introduction and literature review ...... 1 1.1. Plant derived cysteine-rich cyclic peptides ...... 2 1.1.1. Distribution and diversity ...... 4 1.1.2. Bioactivity ...... 6 1.1.3. Grafting ...... 7 1.1.4. Precursors of cysteine-rich cyclic peptides ...... 10 1.1.5. Cyclization involved enzymes and in planta processing pathway ...... 11 1.2. Synthesis approaches of cyclic peptides ...... 13 1.2.1. Chemical synthesis ...... 14 1.2.2. Chemo-enzymatic (semi-) synthesis ...... 15 1.2.3. Bacteria-based cyclic peptide expression system ...... 17 1.3. Plant-based production systems ...... 18 1.3.1. Expression systems ...... 18 1.3.2. Plant pharmaceutical industry ...... 22 1.3.3. Plant-based system for cyclic peptides ...... 23 1.4. Aims and scope of this thesis ...... 24 1.5. References ...... 26 Chapter 2 Developing rice as a biofactory for cyclic peptide production ...... 42 2.1. Overview ...... 43 2.2. Materials and methods ...... 44 2.2.1. Expression vector design and cloning ...... 44 2.2.2. Agrobacterium-mediated stable transformation in rice ...... 46

x 2.2.3. Fluorescence microscopy ...... 49 2.2.4. DNA and RNA extraction and PCR ...... 49 2.2.5. Peptide extraction and structure confirmation ...... 50 2.2.6. Data analysis ...... 52 2.3. Results ...... 53 2.3.1. Promoter analysis ...... 53 2.3.2. kB1 expression in rice suspension cells ...... 54 2.3.3. Endogenous OsVPEs expression in callus ...... 55 2.3.4. Structural characterization of rice-derived kB1 ...... 58 2.3.5. Expression of grafts based on SFTI-1 in rice callus ...... 60 2.3.6. Prototypical cyclic peptides expression in rice seeds ...... 63 2.4. Discussion ...... 66 2.4.1. kB1 and grafted cyclic peptides in rice suspension cells ...... 66 2.4.2. Expression of cyclic peptides in rice seeds ...... 67 2.4.3. Rice as a monocot biofactory to produce cyclic peptides ...... 69 2.5. References ...... 71 2.6. Supplementary sequences ...... 76 Chapter 3 Investigating the diversity and cyclization potential of acyclic peptides from monocots ...... 81 3.1. Overview ...... 82 3.2. Materials and methods ...... 83 3.2.1. Database search and transcriptome analysis ...... 83 3.2.2. Promoter analysis of rL1 ...... 85 3.2.3. Phytohormonetreatment of rice seedlings ...... 85 3.2.4. Cyclotide-like gene expression in fox millet ...... 85 3.2.5. Cyclotide-like gene expression in N. benthamiana ...... 86 3.2.6. In vitro cyclization assay ...... 87 3.3. Results ...... 88 3.3.1. Cyclotide-like genes appear restricted to the family ...... 88 3.3.2. Promoter analysis reveals a possible defence response role for rL1 ...... 91 3.3.3. rL1 expression under phytohormone treatments ...... 93 3.3.4. mLA and mLB expression in S. italica ...... 94 3.3.5. Transient expression of rL1 and pL1 peptides in N. benthamiana leaf ...... 95 3.3.6. pL1 is cyclisable in vitro and in planta using cyclization efficient AEPs ...... 95 3.3.7. mLA is cyclisable in planta ...... 98 3.4. Discussion ...... 99 3.4.1. Distribution of cyclotide-like genes in monocot lineages ...... 99 3.4.2. Expression of monocot cyclotide-like genes in their native plants ...... 99 3.4.3. Pyroglutamyl modification of linear cyclotide-like peptides ...... 100 3.4.4. Cyclization of cyclotide-like peptides in vitro and in planta ...... 101 3.5. References ...... 103 3.6. Supplementary sequences ...... 106

xi Chapter 4 Understanding the plasticity of seeds for the expression of grafted cyclic peptides ...... 109 4.1. Overview ...... 110 4.2. Materials and methods ...... 111 4.2.1 Expression of AEPs in N. benthamiana ...... 111 4.2.2 Seed-specific expression of cyclic peptides in Arabidopsis ...... 111 4.2.3 DNA and RNA extraction and PCR ...... 113 4.2.4 Peptide extraction and MALDI-TOF-MS analysis ...... 114 4.3. Results ...... 114 4.3.1. Sunflower AEPs are poor ligases in Nicotiana benthamiana ...... 114 4.3.2. Expression of cyclic peptides in Arabidopsis seeds ...... 116 4.3.3. Expression of kB1 and its analogues in Arabidopsis seeds ...... 118 4.3.4. Expression of SFTI-1 and its analogues in Arabidopsis seeds ...... 121 4.3.5. Expression of a plasmin inhibitor based on SFTI-1 in Arabidopsis seeds ...... 121

4.3.6. Obtaining stable OaAEP1b transgenic lines ...... 122 4.4. Discussion ...... 124 4.4.1. Cyclization efficiency of sunflower AEPs ...... 124 4.4.2. Co-transformation in floral dip transformation ...... 124 4.4.3. Expression of cyclic peptides in Arabidopsis seeds ...... 125 4.5. References ...... 127 Chapter 5 Conclusions and future directions ...... 130 5.1. Conclusions ...... 131 5.2. Future directions ...... 132 5.2.1. Transformation and production in plants ...... 132 5.2.2. Cyclotide-like peptides in monocots and the monocot bioreactor ...... 133 5.2.3. Cyclization efficient AEPs ...... 134 5.3. References ...... 135

xii List of Figures

Figure 1. 1 Model of the CCK motif...... 3 Figure 1. 2 Classification and structure of cysteine-rich cyclic peptides produced in plants...... 4 Figure 1. 3 Cyclotides and cyclotide-like peptides producing families...... 5 Figure 1. 4 Process of grafting bioactive epitopes into CCK scaffolds...... 7 Figure 1. 5 The precursor gene structure of cysteine-rich cyclic peptides...... 11 Figure 1. 6 Alignment of the MLA and LAD regions in ligase-type AEPs...... 12

Figure 2. 1 Map of vectors and constructs designed for rice expression systems...... 45 Figure 2. 2 Agrobacterium-mediated transformation of rice...... 46 Figure 2. 3 Expression of GFP using constitutive and endosperm specific promoters...... 53 Figure 2. 4 Expression of cyclic kB1 in rice suspension cells...... 55 Figure 2. 5 Oak1 expression in rice suspension cells between UO #4 and UOPOa #9...... 55 Figure 2. 6 Alignment of LAD and MLA in OsVPEs...... 56

Figure 2. 7 Transcript analysis of OsVPEs and OaAEP1b in suspension cells...... 57 Figure 2. 8 Endogenous OsVPEs expression in rice...... 58 Figure 2. 9 Structure characterization of suspension cell produced kB1...... 59 Figure 2. 10 Expression of Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN in rice callus. .... 61 Figure 2. 11 Expression of Oak[T4Y,I7R]SFTI-1 in rice callus...... 62 Figure 2. 12 Expression of Os_OakSFTImcrB and Os_OakSFTImcrF in rice callus...... 63 Figure 2. 13 PCR test of transgenic callus...... 64 Figure 2. 14 Phenotype, seed weight and seed length of transgenic seeds...... 65 Figure 2. 15 MALDI spectra of kB1 and SFTI-1 in rice seeds...... 65

Figure 3. 1 Schematic diagram of Agro-infiltration in N. benthamiana...... 87 Figure 3. 2 Alignment of cyclotide-like sequences from Poaceae with the Oak1 precursor...... 90 Figure 3. 3 Cylotide-like sequence from C. esculenta and alignment with endochitinae CH25 in S. polyrhiza...... 91 Figure 3. 4 Promoter analysis of rL1...... 93 Figure 3. 5 Optimization of RT-PCR primers to amplify the rL1 gene ...... 94 Figure 3. 6 Expression of mLA and mLB in S. italica...... 94 Figure 3. 7 Expression of rL1 and pL1 peptide precursor gens in N. benthamiana...... 95

Figure 3. 8 Cyclization of pL1 with recombinant OaAEP1b in vitro...... 96 Figure 3. 9 Cyclization of pL1 in planta...... 97 Figure 3. 10 Cyclization of mLA in planta...... 98 Figure 3. 11 Novel cyclotide-like sequences with Gln at N-termini...... 100 Figure 3. 12 Alignment of acyclotides from Violaceae with mLA...... 102

xiii

Figure 4. 1 Map of the pOH123 vector...... 112 Figure 4. 2 Schematic diagram of floral dip transformation and Basta screening in Arabidopsis...... 113 Figure 4. 3 Testing the efficiency of AEPs for cyclization of SFTI-1 in planta...... 115 Figure 4. 4 Alignment of the LAD and MLA regions in HaAEPs...... 116 Figure 4. 5 Co-transformation approach for the expression of cyclic peptide precursors and

OaAEP1b in Arabidopsis seeds...... 118 Figure 4. 6 Expression of Oak1 in Arabidopsis seeds...... 120 Figure 4. 7 SFTI-1 and [D14N]SFTI-1 expression in Arabidopsis seeds...... 121 Figure 4. 8 Expression of an SFTI-1 engineered plasmin inhibitor in Arabidopsis seeds...... 122

Figure 4. 9 Selection of OaAEP1b expressing Arabidopsis transgenic lines...... 123

List of Tables

Table 1. 1. Summary of published therapeutic applications of cyclic peptide frameworks ...... 9

Table 2. 1 Expression cassettes used in suspension cell and seed expression system ...... 45 Table 2. 2 Initiation medium ...... 48 Table 2. 3 Selection medium ...... 48 Table 2. 4 Regeneration II medium ...... 48 Table 2. 5 Liquid infection medium ...... 48 Table 2. 6 Regeneration I medium ...... 48 Table 2. 7 Liquid suspension medium ...... 48 Table 2. 8 Primers used in Chapter 2...... 52 Table 2. 9 Similarity and identity among AEPs ...... 56

Table 3. 1 Query sequences of cyclotides and acyclotides for use with tblastn assembled monocot transcriptomes...... 83 Table 3. 2 Classification of monocot species studied in cyclotide-like sequences blasting...... 84 Table 3. 3 Primers used in Chapter 3 ...... 86 Table 3. 4 tblastn results of cyclotide-like peptides in monocots ...... 89 Table 3. 5 List of cis regulatory elements in promoter of rL1 ...... 92

Table 4. 1 Primers used in Chapter 4 ...... 114 Table 4. 2 Summary of transgenic lines by co-infiltration ...... 117

xiv List of Abbreviations used in the thesis

ABA: abscisic acid AEP: asparaginyl endopeptidase or -dependent endopeptidase CCK: cyclic cystine knot motif cGMP: current Good Manufacturing Practice CREs: cis regulatory elements CTPP: C-terminal propeptide ER: endoplasmic reticulum GFP: green fluorescent protein HspT: heat shock protein terminator kB1: kalata B1 LAD: ligase-activity determinants MALDI-TOF-MS: matrix-assisted laser desorption ionization-time of flight mass spectrometry MCoTI: momordica trypsin inhibitor MCR: melanocortin receptor MLA: marker of ligase activity NMR: nuclear magnetic resonance NTPP: N-terminal propeptide NTR: N-terminal repeat ORF: open reading frame PCR: polymerase chain reaction PDP: PawS-derived peptide PLCP: papain-like cysteine protease RP-HPLC: reversed-phase high-performance liquid chromatography SA: salicylic acid SFTI-1: sunflower trypsin inhibitor-1 SP: endoplasmic reticulum signal SPE: solid phase extraction SPPS: solid phase peptide synthesis SRA: sequence read archive TFA: trifluoroacetic acid UTR: untranslated sequences VPE: vacuolar processing enzyme

xv

91&6$*%(J((

>.$%-30'$,-.(&.3(+,$*%&$0%*(%*?,*<(

Plant-derived cysteine-rich cyclic peptides are a class of small peptides and are characterised by a head-to-tail cyclized backbone. Their unique cyclized and disulfide-constrained structures make them exceptionally stable molecules. The sequences between cysteines residues are tolerant to residue substitutions, which making them promising scaffolds for grafting and stabilizing bioactive epitopes. Indeed, a number of pharmaceutical protein-engineering applications based on cyclic peptides have been demonstrated in recent years.

This chapter provides a comprehensive background of plant-derived cysteine-rich cyclic peptides and the current synthesis approaches. The characteristics and applications of cyclic peptides, current synthesis approaches and plant-based production systems are described. In the last 20 years, cysteine rich cyclic peptides have been widely explored and developed as promising drug scaffolds and potential crop protection agents. Chemical synthesis and semi-synthetic approaches have been developed to achieve the production of cyclic peptides at laboratory scale. At the same time, the plant expression systems are under development to enable recombinant protein/peptide production in various systems, including leaf, seed and cell culture expression systems. A number of pharmaceuticals have been produced in these plant-based systems.

1.1. Plant derived cysteine-rich cyclic peptides

Part of this section has been published in ‘Synthesis and protein engineering applications of cyclotides. Australian Journal of Chemistry, 2017, 70(2):152-161.’

Cyclic peptides represent a unique class of peptides with improved stability over their linear counterparts due to the presence of a head-to-tail cyclized backbone. One class of plant derived cyclic peptides is the cyclotides (Craik et al., 1999), which range from 28 to 37 amino acids in length and carry six conserved cysteine residues, which form a characteristic cystine knot (CCK) motif (Figure 1.1). The term ‘knot’ is used because one of the disulfide bonds (CysIII-CysVI) is threaded through a ring formed by the other two disulfide bonds (CysI-CysIV and CysII-CysV) and their connecting backbone segments. The backbone regions between successive cysteine residues are referred to as loops and their tolerance to substitutions makes cyclotides valuable protein engineering templates (Henriques & Craik, 2012). When folded correctly, two loops (1 and 4) reside in the core of the cyclotide, and four loops (2, 3, 5 and 6) are exposed on the surface, as illustrated in Figure 1.1. This unique cyclized knot structure makes cyclotides exceptionally resistant to thermal, enzymatic and chemical degradation (Colgrave & Craik, 2004).

2

Figure 1. 1 Model of the CCK motif. The grey ribbon represents the cyclic backbone, cysteines and disulfide bonds are indicated on yellow circles and bridges respectively. The CCK motif contains six loops, with loops 1 and 4 situated in the cystine knot core.

Cyclotides are classified into three subfamilies: Möbius, bracelet and trypsin inhibitors (Figure 1.2). The defining structural feature of Möbius cyclotides is a Pro residue in loop 5 that adopts a cis Xaa-Pro peptide bond, thereby introducing a 180º twist into the backbone (Jennings et al., 2005). This cis-Pro is absent in bracelet cyclotides; as such, the peptide backbone displays an all-trans configuration. Cyclotides of the trypsin inhibitor class have low sequence similarity with the other two subfamilies (Möbius and bracelet) and are classified by their function as potent inhibitors of trypsin. Indeed, they are more closely related in sequence to non-cyclic squash trypsin inhibitors found in the Cucurbitaceae plant family than to Möbius or bracelet cyclotides (Hernandez et al., 2000).

In addition to the cyclotides, there is another class of cysteine-rich cyclic peptides from the Asteraceae plant family. The best known peptide from this class is the sunflower trypsin inhibitor-1 (SFTI-1) peptide, which was discovered in sunflower (Helianthus annuus) seeds (Luckett et al., 1999). SFTI-1 is a 14 cyclic peptide that contains two conserved cysteines that form a single disulfide bond (Figure 1.2). Biosynthetically, the domain that encodes SFTI-1 is buried within a seed storage protein albumin PawS1 (preproalbumin with SFTI-1) precursor (Mylne et al., 2011). Like SFTI-1, PawS-derived peptides (PDPs) are also encoded in albumin precursors, comprising fewer than 20 amino acids in length and possess one disulfide bond (Elliott et al., 2014). They generally have a central Pro-Pro and two conserved cysteines that form a single disulfide bond.

3

Figure 1. 2 Classification and structure of cysteine-rich cyclic peptides produced in plants. Peptides classified as cyclotides all contain a characteristic CCK motif, consisting of three disulfide bonds (yellow) and six cysteine residues (green). Cyclotides are further classified as Möbius, bracelet or trypsin inhibitors, represented here by kalata B1 (kB1), cycloviolacin O2 (CyO2) and Momordica trypsin inhibitor II (MCoTI-II) respectively. The Möbius subgroup of cyclotides contains a characteristic backbone twist due to the presence of a cis- residue (red) in loop 5. Smaller cysteine-rich cyclic peptides found in sunflower seeds are termed PawS-derived peptides (PDPs). PDPs generally have a central Pro-Pro (blue) and a single disulfide bond (yellow), represented here by the sunflower trypsin inhibitor 1 (SFTI-1).

JKJKJK(),#$%,"0$,-.(&.3(3,?*%#,$/( Unlike other plant peptide families such as the defensins, cyclotides are not universally produced across all plants. So far, they have only been discovered among plant species from five dicot plant families, namely the Violaceae (violet family), Rubiaceae (coffee family), Fabaceae (pea family), Solanaceae (potato family) and Cucurbitaceae (cucurbit family), as schematically illustrated in Figure 1.3 (Chiche et al., 2004, Gruber et al., 2008, Nguyen et al., 2011a, Poth et al., 2011b, Poth et al., 2012). Interestingly, linear cyclotide-like peptides, termed panitides, were discovered from a monocot grass Panicum laxum from the Poaceae family (Nguyen et al., 2013a).

Within the Violaceae, cyclotides have been detected in all examined species so far, which makes the Violaceae a major reservoir of cyclotide diversity (Burman et al., 2010). In a recent study of the distribution and occurrence of cyclotides in the Violaceae, up to 25 unique cyclotides were found in each species tested where the total number of unique cyclotides were estimated to be from 5,000 to 25,000 (Burman et al., 2015). Another important cyclotide-rich family is the Rubiaceae family. Since kB1 was first reported in Oldenlandia affinis, 88 unique cyclotides have been characterised from this species O. affinis (Koehbach, 2015). Cyclotides are also present within the legume species Clitoria ternatea, commonly

4

Figure 1. 3 Cyclotides and cyclotide-like peptides producing families. Native cyclotide-producing families, Violaceae (violet family), Rubiaceae (coffee family), Fabaceae (pea family), Solanaceae (potato family) and Cucurbitaceae (cucurbit family) are all dicot plants. They are represented by violet, O. affinis, butterfly pea, petunia, gac fruit respectively. Only linear cyclotide- like peptides were discovered from monocot plant family Poaceae (grass family), represented by P. laxum. known as butterfly pea (Nguyen et al., 2011a), and within petunia in the Solanaceae family (Poth et al., 2012). In Momordica cochinchinensis (gac fruit) seeds, cyclotides are abundant where they act as potent trypsin inhibitors (Chan et al., 2013).

Some cyclotide-like genes have been found by data mining nucleotide databases in the Poaceae family, including a few economically important crops like rice (Oryza sativa), maize (Zea mays), wheat (Triticum aestivum) and barley (Hordeum vulgare) (Mulvenna et al., 2006, Porto et al., 2016, Salehi et al., 2017). Genes identified as similar to cyclotides in the Poaceae are expressed in a tissue-specific and developmentally-specific manner. Salehi et al. reported two cyclotide-like genes, Zmcyc1 and Zmcyc5 were also regulated by wounding, fungal disease and stress relevant hormones (Salehi et al., 2017). Related to the Poaceae cyclotide-like genes, linear cyclotides, termed acyclotides, were also discovered in the native cyclotide-producing species, including the Violaceae (Ireland et al., 2006), the Rubiaceae and the Cucurbitaceae (Plan et al., 2007, Mylne et al., 2010, Gerlach et al., 2010, Nguyen et al., 2011b, Nguyen et al., 2012, Du et al., 2019). Acyclotides share high sequence similarity and conserved cysteines as observed in conventional cyclotides. However, acyclotides lack the required residues for enzyme mediated cyclization. These discoveries expanded the variety of cyclotides and also confirmed the importance of an Asp/Asn residue at the C-terminus for peptide cyclization. Currently, nine acyclotides have been reported in monocots from Panicum laxum, designated as panitides (Nguyen et al., 2013b). Panitide L3 has been studied in most details, where it was shown to contain the same cystine knot arrangement as cyclotides and acyclic variants found in dicot plants. This finding suggests the existence of cyclotides or acyclotides is before the divergence of dicot and monocot lineages.

PDPs have been discovered in the Asteraceae among the subfamilies Helianthinae, Zinniinae, Ecliptinae, and Galinsoginae (Elliott et al., 2014). Based on evolution of the PawS gene in the Millereae and Heliantheae, PDPs are estimated to be present in more than 4700 species (Panero, 2007). Unlike most PDPs

5 that form the β-hairpin motif, PDP-4 and SFTI-L1 (a SFTI-1 like peptide) do not have this motif (Elliott et al., 2014). Interestingly, some PDPs (e.g. PDP-10, 11, 19) have an open peptide backbone instead of the cyclized backbone. Moreover, PDPs from Heliantheae share a similar inhibitory loop with Bowman-Birk inhibitors, but the evolutionary processes leading to their existence are still unknown.

Presently there are approximately 400 unique cyclic peptide sequences reported in the CyBase database (http://cybase.org.au) (Kaas & Craik, 2010). Their sequences highlight the diversity possible in the backbone loops between the conserved three-disulfide knotted structure. This diversity demonstrates the tolerance to sequence substitution in cyclic peptides which allows grafting of a range of foreign sequences into cyclic peptide scaffolds. These cyclic peptide grafting applications are discussed in Section 1.1.3.

1.1.2. Bioactivity The natural variation seen in the loop sequences of cysteine-rich cyclic peptides accounts for the observed wide range of biological activities. Cyclotides were originally discovered from O. affinis, and are associated with the use as a traditional herbal tea to accelerate childbirth in Africa (Gran, 1970). The molecular validation of this uterotonic bioactivity was recently unveiled, showing that kalata B7 modulates the human oxytocin and vasopressin G protein-coupled receptors, which induces strong contractility on human uterine smooth muscle cells (Koehbach et al., 2013, Fahradpour et al., 2017). Intriguingly, a range of other bioactivities have been demonstrated for a variety of naturally occurring cyclotides, including pesticidal (Jennings et al., 2001, Jennings et al., 2005, Colgrave et al., 2008b, Plan et al., 2008, Colgrave et al., 2008a), anti-HIV (Daly et al., 2004, Chen et al., 2005, Wang et al., 2008, Ireland et al., 2008), antimicrobial (Pränting et al., 2010, Fensterseifer et al., 2015), immunosuppressive (Grundemann et al., 2012, Hellinger et al., 2014), haemolytic (Daly & Craik, 2000) and cytotoxic activities (Herrmann et al., 2008, Tang et al., 2010, Ding et al., 2014). SFTI-1 is the smallest naturally occurring potent inhibitor which can block bovine trypsin (Ki=100 pM) (Luckett et al., 1999), human matriptase

(Ki=0.92 nM) (Long et al., 2001) and insect trypsins from cotton bollworm (Elliott et al., 2014).

The natural function of cyclotides in plants is presumed to be as host defence agents based on the range of pesticidal activities reported to date. Examples include insecticidal activities against the cotton pests Helicoverpa punctigera and Helicoverpa armigera (Jennings et al., 2001, Jennings et al., 2005), molluscidal activity against the rice pest golden apple snail Pomacea canaliculata (Plan et al., 2008), and anthelmintic activity against the sheep gastrointestinal nematode parasites Hemonchus contortus and Trichostrongylus colubriformis (Colgrave et al., 2008a, Colgrave et al., 2008b). These various toxicities suggest that extracts of cyclotide-producing species have potential applications as bio-pesticides. A bio-pesticide produced from a cyclotide-producing species, termed Sero-X, was recently commercialised by Innovate AG Pty Ltd. The primary mode of action for cyclotide natural host defence is thought to involve interactions with, and disruption of biological membranes (Huang et al., 2009b, Henriques et al., 2011).

6 The anti-HIV and antimicrobial activities of cyclic peptides are potentially exploitable for pharmaceutical applications. However, one barrier is that some cyclic peptides have haemolytic or cytotoxic activities (Ireland et al., 2008). These undesired activities can be eliminated by replacing or inserting certain amino acids in the sequence. This is further discussed in Section 1.1.3.

JKJKNK(X%&:$,.8( All cyclotides share similar three-dimensional structures and contain six conserved cysteines. However, they are remarkably tolerant to residue substitutions in the backbone loops. With this tolerance to substitutions and their exceptional stability, cyclotides have attracted attention as stabilising scaffolds for the insertion (or ‘grafting’) of bioactive peptide epitopes. Engineered cyclotides potentially have a wider range of targets and mechanisms of action than native cyclotides, as they can be designed or tuned to interact with specific targets. Generally, the grafting process involves a bioactive peptide sequence replacing one or more of the natural loops in the CCK scaffold to form a cyclic hybrid molecule with tailor-made properties (Figure 1.4). The bioactive epitope can come from a range of sources, including natural or synthetic peptides, protein fragments, or phage-derived sequences.

Figure 1. 4 Process of grafting bioactive epitopes into CCK scaffolds. The bioactive epitope (dashed box) could be a peptide (left), a peptide sequence derived from a larger protein (middle), or based on a sequence selected from a phage library after multiple rounds of selection (right). The active sequence can be grafted into the CCK scaffold by replacing one of backbone loops.

One aim of using cyclotide scaffolds is to lock a foreign epitope into a pharmacologically relevant, bioactive conformation for a desired target while increasing the biological stability of the epitope (Craik et al., 2006, Henriques & Craik, 2010). As mentioned in Section 1.1.2, some native cyclotides have haemolytic or cytotoxic activities that are undesirable in pharmaceutical applications. In most cases, engineered cyclotides

7 eliminate or reduce such undesired native activities. For example, two analogues with replaced residues in loop 5 both resulted in the co-incidental removal of haemolytic activity from native kB1 (Clark et al., 2006). Additionally, a recent study showed that insertion of an extra linker (e.g. GGGT) into loop 6 of kB1 can reduce the haemolytic activity (Jia et al., 2014).

To date, most grafting applications have focused on extracellular targets. Recently, some cyclotides, e.g. kB1, MCoTI-I and MCoTI-II, have been reported to penetrate cell membranes which provide new opportunities to address intracellular targets (Greenwood et al., 2007, Contreras et al., 2011, Cascales et al., 2011, D’Souza et al., 2014, Henriques et al., 2015). kB1 can enter cells via both endocytosis and direct membrane translocation (Henriques et al., 2015). Cellular uptake of MCoTI-I follows multiple endocytic pathways (Contreras et al., 2011). For MCoTI-II, the cellular uptake can be improved by increasing the overall positive charge (D’Souza et al., 2014). In recent years, there has been great interest in using these cell penetrating cyclotides to target intracellular receptors. In one example, an intracellularly targeted cancer therapeutic peptide, PMI, was grafted into loop 6 of MCoTI-I to block a p53 oncoprotein interaction (Ji et al., 2013). The grafted cyclotide caused suppression of tumour cell growth in a mouse xenograft model and had greatly increased stability compared to the ungrafted peptide sequence.

Another favoured grafting scaffold is SFTI-1, which has only 14 amino acids and one disulfide bond. SFTI-1 is advantageous as its small size reduces the complexity of chemical synthesis. Additionally, as SFTI-1 is a natural protease inhibitor, only small changes are required to engineer potent and selective protease inhibitors. For example, grafted SFTI-FCQR has a potent selective inhibition against human kallikrein-related peptidase 4 (KLK4) (Swedberg et al., 2009). A series of inhibitors based on SFTI-1, including a matriptase inhibitor, a proteasome inhibitor and a thrombospondin-1 inhibitor all showed improved selectivity and inhibitory potency relate to their native inhibition (Quimbar et al., 2013, Dębowski et al., 2014, Chan et al., 2015).

A summary of published therapeutic applications of cyclic peptide frameworks, including kB1, MCoTI and SFTI-1, is shown in Table 1.1. This extensive list demonstrates the plasticity of cyclic peptide loops to substitutions and the suitability of these frameworks for a wide range of targets.

8 Table 1. 1. Summary of published therapeutic applications of cyclic peptide frameworks

Framework Activity Application References kB1 VEGF-A antagonistA Cancer and rheumatoid arthritis (Gunasekera et al., 2008) Thrombin inhibitor Cardiovascular disease (Getz et al., 2011) MC4R agonistB Obesity (Eliasen et al., 2012) Bradykinin B1 antagonist Chronic pain and inflammatory pain (Wong et al., 2012) Neuropilin-1 and -2 antagonist Angiogenesis and lymphangiogenesis (Getz et al., 2013) Myelin oligodendrocyte Multiple sclerosis (Wang et al., 2014) MCoTI FMDV 3C protease inhibitorC Foot-and-mouth disease (Thongyoo et al., 2008) β-tryptase and human leukocyte inhibitor Inflammation disorders (Thongyoo et al., 2009) β-tryptase inhibitor Allergic asthma and inflammation disorders (Sommerhoff et al., 2010) VEGF-receptor agonistA Therapeutic angiogenesis (Chan et al., 2011) CXCR4 receptorD HIV (Aboye et al., 2012) p53 tumor suppressor Cancer (Ji et al., 2013) Matriptase-1 inhibitor Cancer and arthritic therapy (Glotzbach et al., 2013) CTLA-4 inhibitory receptorE Metastatic melanoma (Maass et al., 2015) BCR-ABL inhibitorF Chronic myeloid leukemia (Huang et al., 2015) MAS1 receptorG Cancer and myocardial infarction (Aboye et al., 2016) Dual graft angiogenesis inhibitor Cancer (Chan et al., 2016) SFTI-1 Kallikrein-related peptidase 4 inhibitor Anti-prostate cancer (Swedberg et al., 2009) VEGF-receptor agonist Therapeutic angiogenesis (Chan et al., 2011) Matriptase inhibitor Anti-cancer (Quimbar et al., 2013) Human matriptase-1 inhibitor Anti-cancer (Fittler et al., 2014) Human and 20S proteasome inhibitor Protein degradation (Dębowski et al., 2014) Thrombospondin-1 inhibitor Anti-cancer (Chan et al., 2015) Kallikrein-related peptidase 5 and 7 inhibitor Anti-cancer and skin diseases (Jendrny & Beck-Sickinger, 2016) MCR agonistH Obesity and inflammable diseases (Durek et al., 2018)

AVEGF-A, vascular endothelial growth factor-A. DCXCR4, C-X-C chemokine receptor type 4. GMAS1, mas-related G protein-coupled receptor A. BMC4R, melanocortin receptor 4. ECTLA-4, cytotoxic T lymphocyte-associated antigen 4. HMCR, melanocortin receptor CFMDV 3C, foot-and-mouth-disease virus 3C. FBCR-ABL, breakpoint cluster-Abelson.

9 1.1.4. Precursors of cysteine-rich cyclic peptides

As more cysteine-rich cyclic peptides are discovered in plants, the nature of their biosynthesis in planta is being progressively uncovered. Cyclotides and PDPs are gene-encoded and ribosomally synthesized via precursors that contain various targeting domains (endoplasmic reticulum and vacuole) as well as flanking sequences for post-translational processing to generate the mature cyclic backbone (Jennings et al., 2001, Gillon et al., 2008). Precursors are first expressed as proproteins and subsequently undergo post-translational processing to generate the cyclized backbone and embedded disulfide bonds.

For the prototypical cyclotide kB1 from the Rubiaceae, the precursor contains of an endoplasmic reticulum signal (SP), an N-terminal propeptide (NTPP), a N-terminal repeat (NTR), a cyclotide domain and a short C-terminal propeptide (CTPP) as illustrated in Figure 1.5A (Jennings et al., 2001). Some precursors found in the Violaceae and Rubiaceae possess up to three tandem repeats of the NTR and cyclotide domain (Dutton et al., 2004). Cyclotide sequence analysis has identified that Gly is highly conserved at the beginning of cyclotide domain and Asn/Asp (~10:1) is conserved as the last residue of the cyclotide domain (Gillon et al., 2008). The conserved Asn/Asp (Asx) has proven to be essential for cyclization and involves an asparagine-dependent endopeptidase (asparaginyl endopeptidase, AEP) that mediates a transpeptidation reaction between the N- and C-termini of the linear cyclotide precursor (Saska et al., 2007, Harris et al., 2015). The mechanism details are further discussed in Section 1.2.2. For efficient peptide cyclization, it is thought that processing at the N-terminus is an important pre-requisite, yet the process has not been well understood until recently (Rehm et al., 2019). In the case of Oak1, the precursor of kB1, its N-terminal processing enzymes were shown to be papain-like proteases which cleave precursors at the N-terminal Gly. Unlike AEPs, papain-like proteases are not involved in backbone ligation.

Precursors within the Solanaceae have a similar architecture to those from the Violaceae and Rubiaceae, but have a short NTPP region upstream of the cyclotide domain without a NTR, as illustrated in Figures 1.5A&B (Poth et al., 2012). In the Cucurbitaceae, the precursors discovered thus far encode tandem cyclotide repeats, often with an acyclotide domain at the C-terminus, as illustrated in Figure 1.5C (Mylne et al., 2012). Unlike other cyclotide precursors, those found in the Fabaceae family contain the cyclotide domain embedded within a pre-existing albumin gene, replacing the normally present albumin b- chain, as shown in Figure 1.5D (Poth et al., 2011a). This entrenchment of a pre-cyclic domain in another gene is also shared by PDPs, which is encoded in the PawS gene as shown in Figure 1.5E (Mylne et al., 2011). These variants of precursors support a hypothesis that cyclic peptides might be more widely distributed than they currently observed.

10

Figure 1. 5 The precursor gene structure of cysteine-rich cyclic peptides. A. Precursors in the Rubiaceae and Violaceae. Precursors contain a signal peptide sequence (SP, blue), an N-terminal propeptide (NTPP, orange), an N-terminal repeat (NTR, yellow), and a cyclotide domain (green). This region can be repeated up to three times, followed by a C-terminal propeptide (CTPP, red). Amino acids at N- and C-terminal processing regions (green letters) are conserved. The height of each one-letter amino acid code is proportional to the relative frequency in cyclotides. B. Precursors in the Solanaceae. Precursors have a SP, NTPP and CTPP, but no NTR and only a single cyclotide domain in the Solanaceae. C. Precursors in the Cucurbitaceae. Like precursors in Rubiaceae and Violaceae, precursors in the Cucurbitaceae always contain multiple cyclotide domain repeats and end with an acyclotide domain (light green). D. Precursors in the Fabaceae. The cyclotide domain is embedded in an albumin gene (grey) with a SP and CTPP at the N- and C-terminus respectively. E. Precursors of PawS-derived cyclic peptides. Like precursors in the Fabaceae, the PDP domain (purple) is embedded in an albumin gene (grey) with two different albumin domains.

!"!"5"$6-(2/78./+0$/09+29'4$'07-:'*$804$!"#$%&"'&$3&+('**/0;$38.1<8-$ Initially, the high sequence conservation of Asx at the C-termini of cyclotides led to the suggestion that AEPs may play a role in cyclotide maturation, due to their exclusively executed cleavage of C-terminal Asx (Jennings et al., 2001). AEPs, also known as vacuolar processing enzymes, are cysteine proteases involved in a variety of vacuolar processes, including protein maturation (Shimada et al., 1994) and programmed cell death in response to pathogen attack (Hatsugai et al., 2004, Qiang et al., 2012). In plants, AEPs constitute a multi-gene family with four known isoforms present in the model plant Arabidopsis. The capacity of AEPs to function as peptide ligases was first noted in a study of the post-translational processing of concanavalin A, catalysing both proteolysis after Asn residues and exhibiting peptide ligase activity (Sheldon & Bowles, 1996). This capacity was also verified using transgenic approaches, where it was observed that the cyclization of kB1 was severely reduced when AEP genes were suppressed through virus-induced gene silencing in tobacco (Saska et al., 2007). More recently, the opposite was shown to be true, where the in planta yield of cyclotides could be enhanced by co-expressing ligase type AEPs (Poon et al., 2017).

11 An AEP from C. ternatea termed butelase1 was reported to function as an Asx-specific ligase in vitro (Nguyen et al., 2014). Rather than functioning as a protease, butelase1 shows high catalytic efficiency as a -1 -1 -1 ligase (kcat/Km up to 542,000 M s with kcat values up to 17 s ). Modelling of the butelase1 structure showed that it comprises three parts: an AEP active domain, an activation peptide region, and the legumain stabilization and activity modulation domain. The last two parts are cleaved when the enzyme is activated. Similar to other legumain proteases, the activated butelase1 has a Cys-His-Asn catalytic triad.

Following the discovery of butelase1, another cyclizing capable AEP from O. affinis, OaAEP1b, was shown to cyclise kB1 substrate precursors, with almost 100% cyclization efficiency (Harris et al., 2015). Its crystal structure showed that the Asx ligase function is caused by a wide and open surface to accommodate the amine group (Yang et al., 2017). Based on this structure analysis, a mutant, [C247A]OaAEP1b was designed as an improved ligase, with 100-fold faster catalytic kinetics. Most recently, PxAEP3b from petunia (Petunia x hybrida E.Vilm), was reported to function as a ligase in planta (Jackson et al., 2018). Through bioinformatics and functional testing, a region termed “marker of ligase activity” (MLA) was shown to be an important structural feature present in AEP ligases (Figure 1.6). By using the MLA region as a marker for the discovery of other ligase-type AEPs, HeAEP3 from Hybanthus enneaspermus F.Muell (Violaceae) was identified and validated as a ligase amongst three AEP candidates. Following this, ligase-activity determinants (LAD1 & 2) were discovered in AEPs which can be used to predict the activity of unknown AEPs as illustrated in Figure 1.6 (Hemu et al., 2019). By using LAD regions, a number of ligase-type AEPs were identified from Viola yedoensis (Violaceae). Among these AEPs, VyPAL2 exhibited the best ligase activity at pH 6.5. For an efficient ligase, the first position of the LAD1 is preferably bulky and aromatic while the second position is hydrophobic, and dipeptides including GA/AA/AP are preferred for the LAD2. Although the dipeptides in the LAD2 are necessary, they are not sufficient for the ligase activity.

Figure 1. 6 Alignment of the MLA and LAD regions in ligase-type AEPs. Ligase-type AEPs, including butelase1, OaAEP1b, PxAEP3b, HeAEP3 and VyPAL2 share either a deletion or hydrophobic residues (red frame) in the MLA region. They also share similar variants (red letters) in the LAD1 and preferred dipeptides including GA/AA/AP (red) in the LAD2. GP (navy) from PxAEP3b is not preferred in the LAD2 for ligase activity.

12 Compared with other peptide ligases characterised from fungi and bacteria (Haase & Lanka, 1997,

Lee et al., 2009, Barber et al., 2013), plant-derived butelase1 and OaAEP1b exhibit faster kinetics, reducing reaction times required from days or hours to seconds. For butelase1 and OaAEP1b, it was surprising to observe that the presence of disulfide bonds in substrate peptides is not needed for AEP-mediated cyclization; for example, cyclotide precursors in which all cysteines were S-alkylated or substituted with serine were efficiently cyclized in vitro (Nguyen et al., 2014, Harris et al., 2015). For efficient cyclization, the residues that flank the peptide domain have been shown to be important. For example, a C-terminal

Asx-His-Val motif is preferred to butelase1, while Asx-Gly-Leu motif is preferred to OaAEP1b. (Nguyen et al., 2014, Poon et al., 2017) In both cases, the dipeptide, His-Val or Gly-Leu is not incorporated into the resulting cyclic peptide. Overall, these two AEPs have shown to be capable of cyclising a diverse range of substrates, which suggests they are broadly applicable in peptide engineering for the production of cyclic peptides.

The biosynthetic pathway of SFTI-1 has been well described, in that the cleavage and cyclization involves an AEP (Mylne et al., 2011). The maturation of SFTI-1 peptide occurs along with the maturation of the albumin. The signal peptide directs the PawS proalbumin to endoplasmic reticulum, where folding and disulfide bond formation occur. This folded PawS proalbumin is then thought to be processed in the plant cell vacuole, firstly by an AEP that cleaves at the peptides N-terminus, followed by an AEP mediated backbone cyclization reaction between the liberated N-terminal and the C-terminal Asp residue of SFTI-1.

The biosynthetic pathway of cyclotides is similar to the pathway for SFTI-1, except that an AEP is not involved in the N-terminal cleavage. Recently, papain-like cysteine proteases were shown to be involved in cyclotide N-terminal cleavage (Rehm et al., 2019). OaRD21A from O. affinis showed high catalytic efficiency and specificity in the peptide N-terminal cleavage. By incubating OaRD21A with a peptide fragment consisting of kB1 and flanking processing sites, LQLK-kB1-GI, the product was kB1-GI. This observation showed that OaRD21A is able to process the N-terminus of kB1 within the context of the Oak1 gene. Similarly, incubating the peptide fragment with OaAEP1b, the product was cyclic LQLK-kB1. Only when incubating the peptide with both OaRD21A and OaAEP1b, mature cyclic kB1 obtained. With this discovery, the whole processing at N- and C-termini as well as the cyclization of cyclotides is completed.

1.2. Synthesis approaches of cyclic peptides

To expand our understanding of plasticity of cyclic peptides with regard to epitope grafting and to explore their suitability as drugs, it is essential to develop rapid and cost effective methods for their synthesis. Currently total chemical synthesis and chemo-enzymatic (semi-) synthesis are the main approaches used in our laboratory.

Part of this section has been published in ‘Synthesis and protein engineering applications of cyclotides. Australian Journal of Chemistry, 2017, 70(2):152-161.’

13 1.2.1. Chemical synthesis The total chemical synthesis of cyclotides has enabled numerous studies into their structure and function over the past 20 years. Solid phase peptide synthesis (SPPS) has been a mainstay technology in peptide science since its inception by Merrifield (Merrifield, 1997), but requires specialised adaptations for constructing a cyclic peptide backbone. Several tert-butyloxycarbonyl (Boc) and 9-fluorenylmethoxycarbonyl (Fmoc) strategies have been developed, which involve solid phase assembly of linear precursor peptides, cleavage from the solid support, cyclization in solution and formation of the disulfide bonds (Daly et al., 1999, Cheneval et al., 2014). The last two steps are generally the most challenging, but with the breadth of alternative strategies available today, the chemical synthesis of complex ‘designer’ cyclic peptides carrying non-native modifications (as well as non-proteinogenic amino acids when desired) is routinely achieved. With an upper size limit of ~40 amino acids, cyclic peptides are in an easily manageable size range for SPPS.

Once assembled, the linear cyclotide backbone is cleaved from the resin before the N- and C-terminal residues are joined to form the cyclic backbone in solution. Traditionally, this has been achieved using native chemical ligation (Tam et al., 1999, Thongyoo et al., 2008) of fully unprotected peptides or, more recently, by standard amide bond-forming chemistry using fully side-chain–protected peptides (Cheneval et al., 2014). The ability to tailor the synthesis by combining different protecting groups, linkers and cyclization methods allows for production of most cyclotides. Finally, the subsequent oxidation and folding of the synthetic cyclotide backbone has also been optimised for many cyclotides from the Möbius and trypsin inhibitor subfamilies. Through these strategies, native and modified cyclic peptide analogues have been synthesized, including kalata B1 (Jia et al., 2014) and MCoTI (Thongyoo et al., 2008). However, the final step remains problematic for the bracelet subfamily of cyclotides, with the efficient in vitro folding of bracelet cyclotides remaining one of the significant challenges in this field. For this reason, bracelet cyclotides are under-represented in grafting studies, excluding many bioactive molecules and grafting scaffolds from research. Just two bracelet cyclotides have been studied extensively in terms of chemical synthesis and oxidative folding: cycloviolacin O1 (Gunasekera et al., 2009) and cycloviolacin O2 (Aboye et al., 2008).

It is striking that our inability to synthesise, and therefore study, bracelet cyclotides is in such contrast to their relative abundance in nature (outnumbering Möbius cyclotides 2:1). This might not be so surprising when considering the differences between biosynthesis in nature and synthesis in the laboratory. For example, a higher yield of correctly folded protein is achieved in vitro if the peptide is cyclized before oxidation (Daly et al., 1999). In vivo, the linear cyclotide backbone is believed to be oxidised first to form disulfide bonds prior to head-to-tail cyclization (Nguyen et al., 2014). One approach to overcoming this limitation could be making a linear cyclic peptide chemically and structurally cyclising it by cyclization enzymes. This semi-synthesis approach is further discussed in Section 1.2.2.

14 In recent years, an increasing concern about chemical synthesis is the need for large amounts of highly hazardous reagents and solvents (Isidro-Llobet et al., 2019). The minimization and substitution of hazardous waste, like diethyl ether and TFA, and safe use of hydrogenation reactions should be considered in industrial processing applications (Andersson et al., 2000). These shortcomings of chemical synthesis give an impetus to develop more efficient, environmentally friendly approaches to produce peptides, like bacteria- or plant-based production.

1.2.2. Chemo-enzymatic (semi-) synthesis

1.2.2.1. AEP-mediated cyclization

Ligase type AEPs have been demonstrated as the key enzymes involved in cyclic peptide maturation in planta. Moreover, they have been developed as a versatile peptide engineering tool for in vitro applications. Butelase1 has been broadly used to cyclize peptides and create ligated chains on intramolecular peptides or proteins (Deng et al., 2019). Reactions include the in vitro cyclization of kB1 and SFTI-1 from plants, the animal conotoxin MrIA (Nguyen et al., 2016a), as well as thanatin and histatin-3 (Nguyen et al., 2014), other macrocyclic peptides (Nguyen et al., 2015, Nguyen et al., 2016a) and large circular bacteriocins (Hemu et al., 2016). Among these cyclic peptides, SFTI-1, MrIA and q-defensin were also produced with D-amino acids. The yield of D-amino-acid cyclic peptides is comparable to that of L-amino-acid peptides (> 95%), only with slightly slower rates from 1/15 min to 15/60 min (Nguyen et al., 2016a). The size of substrates catalysed by butelase1 can be up to 70 amino acid residues. For the larger substrates, only a refolding step needs to be added before the cyclization, which ensures that the N- and C-termini are in close proximity. However, cyclic peptides that are smaller than nine residues cannot be cyclized (Nguyen et al., 2016b). To facilitate the broad utilisation of butelase1, it has been developed to be produced recombinantly in E. coli (James et al., 2019) and yeast (Pi et al., 2019).

Like butelase1, active OaAEP1b also has been produced recombinantly in E. coli (Harris et al., 2015). A recent study reported a mutant of OaAEP1b, C247A mutant, that showed 160 times higher catalytic kinetics (Yang et al., 2017). Also, a recent structural study revealed the ligase-activity determinants in peptide asparaginyl ligase (Hemu et al., 2019). These studies provide a promising potential to develop efficient ligase for peptide or protein engineering. So far, OaAEP1b is mainly used to assist cyclization in planta when expression in non-native cyclotide-producing plants is attempted (Poon et al., 2017, Jackson et al., 2019).

1.2.2.2. Trypsin-mediated cyclization

Trypsin inhibitor cyclotides have a well-defined trypsin binding site. Although not implicated in the biosynthetic pathway, in principle, this site could be employed as a protease recognition site for protease-mediated cyclization of an appropriate substrate. The first clue to the practicality of this

15 mechanism came from a study of the cyclization mechanism of the trypsin inhibitor, SFTI-1, when it was reported that the open backbone of a synthetic linearised SFTI-1 peptide can be cyclized between Lys and Ile by trypsin (Marx et al., 2003). Thongyoo et al. subsequently optimised the trypsin cyclization system for MCoTI and analogues (Thongyoo et al., 2007). Adopting trypsin immobilised on sepharose beads, linear peptides synthesized by Fmoc-SPPS are readily cyclized in 100 mM phosphate buffer at pH 7.4 and 37ºC for 15 min. The yields of wild-type trypsin inhibitors, MCoTI-I and MCoTI-II, are as high as 90%. Additionally, a MCoTI-II analogue containing a K10F substitution was cyclized by chymotrypsin in a 90% yield.

To further illustrate the chemoenzymatic production route for a broader range of trypsin inhibitor cyclotide scaffolds, the same team reported the successful chemical synthesis and trypsin-mediated cyclization of various engineered peptides based on MCoTI-II (Thongyoo et al., 2008), including one with specific inhibitory activity against a protease from the foot-and-mouth disease virus. A more recent study on the cell penetrating properties of cyclic peptides also adopted this trypsin-mediated cyclization approach to synthesise analogues of MCoTI-II (Cascales et al., 2011).

Overall, trypsin-mediated cyclization provides an efficient strategy for synthesising cyclotides from the trypsin inhibitor subfamily, including analogues with a range of engineered bioactivities. By utilising the inherent trypsin binding site for ligation, this approach is highly specific for generating cyclotides that can bind to trypsin. More generally, it shows that even ‘classic’ proteases can exhibit potent ability for cyclization. However, many analogues of MCoTI were found to be inactive or partially digested under the same processing conditions, showing that trypsin lacks the substrate tolerance of AEPs. These drawbacks may limit the broad application of trypsin in the engineering of cyclic peptides.

1.2.2.3. Sortase-mediated cyclization

Recently, there has been increased interest in adopting other enzymes to cyclise engineered peptides in a site-specific manner, with particular interest in the transpeptidation capabilities of sortase A (SrtA). Sortases are a group of transpeptidases present in many Gram-positive bacteria, which were first identified by their function in anchoring surface proteins to bacterial cell walls (Mazmanian et al., 1999). SrtA from Staphylococcus aureus specifically recognises a signal motif, Leu-Pro-Xaa-Thr-Gly (LPXTG, where X can be any amino acid except Pro), and then cleaves between the Thr and Gly residues, forming an enzyme-linked LPXT-thioester intermediate. This reaction is followed by an intra- or intermolecular acyl group transfer to an amino group of an oligoglycine polypeptide. In addition to its natural function in bacteria, this site-specific transpeptidation reaction can be exploited to ligate proteins for bioengineering purposes (Tsukiji & Nagamune, 2009, Popp & Ploegh, 2011, Bolscher et al., 2011, Matsumoto et al., 2011).

16 Recently, SrtA was used to cyclise the backbone of kB1, using GGG and TGG recognition sequences at the N and C termini (Jia et al., 2014). To test the compatibility of substrates, two other cyclic peptides, cVc1.1, a cyclic conotoxin drug lead for pain relief (Clark et al., 2010) and SFTI-1, were also synthesized with one and two disulfide bonds, respectively. The order of oxidation and cyclization affected the yield of correctly folded peptides differently for these cyclic peptides. For kB1, prior oxidation reduced the yield due to the presence of multiple misfolding isomers, but by contrast, prior cyclization lowered the yield for cVc1.1. Both cyclization and oxidation were attempted simultaneously for SFTI-1; the product contained both cis and trans Pro peptide bonds, whereas only the cis conformation exists in the wild-type. To improve the efficiency of catalysis, a screen of sortase variants was used to identify m4SrtA, which exhibited a 140-fold increase in substrate selectivity compared to the native form (Chen et al., 2011a).

These successful trials demonstrate the applicability of sortase-mediated cyclization of disulfide-rich peptides. Combining in vivo production in E. coli with sortase-mediated cyclization also showed the potential to produce cyclic peptides on a large scale. The SrtA recognition motif can be inserted into any given region of the peptide to further optimise cyclization efficiency. Clearly, this additional motif might affect the bioactivity and structure of cyclized peptides and this needs to be considered on a case-by-case basis. Finally, some instances of minor sequence modification required for sortase recognition can cause destabilisation of the whole structure, resulting in low yields of the desired cyclic peptides (Gunasekera et al., 2008, Wong et al., 2012, Nguyen et al., 2014). The catalytic rate of SrtA is around 20,000 times slower than butelase1. These can be limitations of the approach that must be considered.

1.2.3. Bacteria-based cyclic peptide expression system

Following the development of recombinant expression technologies, cyclotides have been synthesized in E. coli and yeast. A common strategy to produce and cyclise peptides in vivo in bacteria is through the use of gyrase intein sequences which functions in trans-splicing. Using this method, recombinant peptides have been expressed and cyclized in vivo in both bacteria and yeast (Camarero et al., 2007, Austin et al., 2009). A mutagenesis library of MCoTI-1 was built up and only two mutations, G25P and I20G, could not be cyclized, which indicate the plasticity of the sequences. Importantly it was recently demonstrated that a split-intein approach coupled with a high-throughput intracellular screen could select highly efficient cyclotide variants from a large expression library (Jagadish et al., 2013). Similar to the case for SrtA, the introduction of an intein may affect cyclic peptides’ folding and make the peptide insoluble.

A number of synthesis techniques have also been investigated in which linear peptides are first produced recombinantly then cyclized in vitro. These include a reported efficient cyclization approach of a bacterial produced linear kB1 that contained a thiol-labile Xaa-Cys (Cowper et al., 2013). Following a 48h incubation at 45 ºC with 10% (w/v) sodium 2-mercaptoethane sulfonate, kB1 was correctly formed. In a similar fashion as semi-chemical synthesis, folded linear cyclotides can be first produced in bacteria and

17 following the enzyme mediated cyclization in vitro. For example, 100 µM linear MCoTI-II produced from of E. coli with 40 µM m4SrtA, a mutant sortase A, was incubated for 96 h at 37 ºC to obtain >95% yield of cyclized products (Stanger et al., 2014). These studies not only provide viable approaches to synthesising cyclotides at high levels but also opens a door for large scale screening of cyclotide libraries for drug design.

1.3. Plant-based production systems

Plant-based production systems, like the other production systems mentioned above, are an attractive alternative method to produce biologics with medical, industrial and research applications. They provide advantages, including low cost, high output, scale-up possibility, and safety for humans and the environment (Holtz et al., 2015). These advances are facilitated by the technological and conceptual developments made by plant molecular biologists, including plant viral vectors, transgenic technologies and post-processing (e.g. purification and separation). In recent years, plant molecular pharming, or alternatively plant molecular farming, based on a wide range of in planta expression systems, has attracted great interest and has led to some therapeutics being produced for the market (He et al., 2011, Vamvaka et al., 2016, Buyel, 2018). Recently, plant-based production systems have also been developed to produce cyclic peptides. In particular, discoveries of in planta cyclization efficient enzymes and improvements on understanding biosynthesis have contributed to the production of cyclic peptides. To achieve efficient expression of cyclic peptides, cell (Seydel et al., 2007, Seydel et al., 2009) and leaf (Gillon et al., 2008, Poon et al., 2017, Jackson et al., 2019) expression systems have been developed.

1.3.1. Expression systems Plant-based expression systems are categorised into three groups: transgenic whole plant, organ cultures and cell suspension cultures. Desired proteins can be tailored for production in specific plants and tissue types to maximise expression, whilst reducing any potential toxicity to the plant itself. Several plant systems were used in the current work, including a transient leaf expression system in tobacco, a seed expression system in rice and Arabidopsis, and rice suspension cell systems. These expression systems are described below within the context of current plant-based production systems.

1.3.1.1. Transient leaf expression systems

Nicotiana benthamiana is a species of choice for transient expression by leaf infiltration with Agrobacteria. This expression system is ideal for rapid recombinant production, allowing infiltrations all year round. The plant grows rapidly and provides leaf tissue amenable to leaf infiltration after only five to six weeks of growth. Following infiltration, only five to eight days are needed before harvesting of product. Viral vectors, like pEAQ (Sainsbury et al., 2009), magnICON (Klimyuk et al., 2012) and geminiviral based vectors (Chen et al., 2011b) have been established, that ensure reliable high level transgene expression

18 without the need for transgene integration into the plants genome. These approaches are considered non-GMO, having a negligible risk of environmental contamination and no risk of spreading transgenic seeds. Additionally, N. benthamiana is not a food crop which eliminates possible contamination of the food supply.

With these advantages, N. benthamiana leaf expression has been used to produce antibodies for infectious diseases, like influenza (Shoji et al., 2015, Le Mauff et al., 2017), Ebola virus (Phoolcharoen et al., 2011), dengue (Martínez et al., 2010, Kim et al., 2015), norovirus (Mathew et al., 2014, Diamos & Mason, 2018), HIV (Rosenberg et al., 2015, Loos et al., 2015), malaria (Kapelski et al., 2015, Chichester et al., 2018), West Nile virus (Lai et al., 2014, Yang et al., 2018) and human growth factors (Feng et al., 2014). This fast expression system is able to be deployed quickly when facing the sudden outburst of a global disease. During the Ebola virus outbreak in 2014, N. benthamiana was used to produce ZMapp, a three monoclonal antibody mixture. This preparation was approved by U.S. Food and Drug Administration (FDA) and was used to treat infected patients (Lyon et al., 2014). To avoid undesired immune responses caused by plant-glycans in human, an engineered N. benthamiana line called ΔXF which does not produce plant-specific N-glycans was used to produce ZMapp (Qiu et al., 2014). To further refine this system for medical applications, a stable transgenic line that co-expresses six mammalian gylcoenzymes in ΔXF was developed and successfully used to produce defined sialylated N-glycan proteins (Kallolimath et al., 2016). A rapid expression system based on the N. benthamiana leaf transient expression system was recently developed to produce a potent plasmin inhibitor based on SFTI-1 (Jackson et al., 2019).

In summary, the N. benthamiana leaf transient expression system has advantages over other modalities to express target products in a short time without complex facilities. The yield is generally high and can be further improved by engineered transgenic lines or scale-up biomass for infiltration. Overall, it has proven its usefulness for the commercial production for pharmaceuticals, especially vaccines. These applications are further discussed in Section 1.3.2.

1.3.1.2. Seed expression systems

A highlight of a seed-based expression system is the stable nature of this organ, which allows the accumulation and long term storage of the recombinant products. Seed expression systems have been developed in various plants, including Arabidopsis (De Jaeger et al., 2002, Dong et al., 2017), rice (He et al., 2011, Ou et al., 2014, Vamvaka et al., 2016), maize (Hood et al., 1997, Hood et al., 2003) and tobacco (Hernández-Velázquez et al., 2015, Ceballo et al., 2017).

Rice seed as a production tissue for peptide therapeutics carries many advantages. First, the rice genome sequence is available, which provides researchers with rice specific gene regulatory sequences for construct design (Goff et al., 2002). Furthermore, methodologies for transformation, tissue culture, cultivation,

19 harvesting and storage have been well developed in rice along with the improved agronomic performance. In addition, rice is a crop that nearly exclusively self-pollinates. It has been reported that the frequencies of transgene flow are reported very low (0.05-0.79%), which lowers the risk of transgene escape to the natural environment (Stoger et al., 2005, Rong et al., 2005). Considering biomass, the yield of rice grains per hectare is higher than many other cereals, only lower than maize (Takaiwa et al., 2015). As a generally safe food, rice grains are harmless to most humans as rice lacks gluten and antigenic components such as those known from wheat. In combining engineered lines with reduced or no allergenicity, rice seeds have been developed into oral vaccines accessible by most of the population. Transgenic rice seeds have been tested for oral administration of bioactive molecules to treat human diseases and symptoms including diarrhea (Zavaleta et al., 2007, Yuki et al., 2013, Soh et al., 2015), leucopenia (Ning et al., 2008), parasite infections (e.g. roundworm Ascaris suum) (Matsumoto et al., 2009), Alzheimer’s disease (Yoshida et al., 2011), anti-HIV (Vamvaka et al., 2018), allergies, e.g. mite allergy (Yang et al., 2008, Suzuki et al., 2011), cedar pollinosis (Wakasa et al., 2013a), and animal diseases, e.g. infectious bursal disease in chickens (Wu et al., 2007). Among the rice seed based vaccines developed thus far, the cholera vaccine MucoRice- CTB can be produced under the current Good Manufacturing Practice (cGMP) standard (Kashima et al., 2016).

In addition to the advantages noted above, rice endosperm has been targeted to accumulate and fold complex recombinant proteins. For example, human serum albumin, which incorporates 17 disulfide bonds, was produced efficiently in rice seeds, attaining a yield up to 2.75 g per kg of grain (He et al., 2011). Similarly, various human pharmaceutical proteins were produced in rice seed, including human insulin, growth factors (Xie et al., 2008), antitrypsin (Zhang et al., 2013), fibroblast growth factor (An et al., 2013), human lysozyme (Huang et al., 2002a), lactoferrin (Nandi et al., 2005), interleukins (Fujiwara et al., 2010, Yang et al., 2012, Kudo et al., 2013, Fujiwara et al., 2016) and transforming growth factor-β (Takaiwa et al., 2016).

Arabidopsis thaliana is a model plant with a short life cycle which is ideal for testing peptide production strategies directed in the seed. The large research portfolio associated with this plant has resulted in many established techniques to produce heterologous proteins/peptides as well as methodologies of identifying gene expression patterns and functions (Weigel & Glazebrook, 2002). For example, an anti-CD20 antibody fragment was expressed in Arabidopsis seeds to understand the effect of subcellular accumulation of recombinant proteins on the ER (Wang et al., 2015). The fragment fused with the albumin signal peptide which is designed to secrete the mature protein to the apoplast and showed the highest yield up to 6.12% of total soluble proteins. A recent study showed a similar result in attempts to express antibody fragment in Arabidopsis seeds, which suggests the seed is a stable bioreactor for the production of antibodies or their fragments (Dong et al., 2017).

20 Various strategies were used to improve gene expression in seeds, including the choice of promoter, stability of transcript and efficiency of translation. As constitutive promoters can cause poor growth of plants and eventually affect the yield, seed specific promoters have the advantages to avoid this drawback. They can not only raise the yield of heterologous expression but also provide a benefit to biological containment. For example, endosperm specific promoters showed improved yield in production of recombinant proteins in the main storage tissue of rice, endosperm (Qu & Takaiwa, 2004, Furtado et al., 2008). In Arabidopsis, seed storage protein promoters show high expression, e.g. arcelin-5 and β–phaseolin (De Jaeger et al., 2002). Similarly, oleosins are structural proteins involved in oil storage in oilseeds. Its promoter can drive high expression in oil bodies of oilseeds (e.g. Arabidopsis seeds) whilst this specific expression and accumulation can facilitate the separation from seed proteins (Parmenter et al., 1995, Chung et al., 2008). Transgene designs using the most optimised codons for a particular plant species can help improve translation efficiency and finally raise the yield (Gustafsson et al., 2004, Wakasa et al., 2013, Ogo et al., 2014). Furthermore, flanking regulatory sequences, like 3’ and 5’ untranslated sequences (UTR) and terminators can be optimized to ensure adequate transgene transcription (Pooggin & Skryabin, 1992, Shivprasad et al., 1999, Richter et al., 2000, Hirai et al., 2011).

1.3.1.3. Suspension cell culture expression systems

Plant cell expression systems offer an alternative approach for recombinant protein production. The advantage of a closed and controlled environment is the capability of meeting the cGMP requirements for large scale production (Wirz et al., 2012). Unlike mammalian and microbial cell expression systems, plant cell expression systems do not produce endotoxins and cannot be contaminated by human pathogens. Plant cells, like microbes, require relatively low input to maintain and scale up, and like mammalian cells they can express complex proteins with post-translational modifications. Compared to extraction from whole plants or organs, plant cell systems increase costs when fermentation equipment and sterile handling are required. However, the production time for cell systems is generally shorter than culturing a whole plant. For rare species or low biomass species, the alternative of cell culture is a practical option for consistent production. Biosafety concerns associated with cross-fertilization and gene flow can also be eliminated in plant cell expression systems.

With the above advantages, plant cell expression systems have been developed to produce pharmaceuticals in rice (Huang et al., 2005, Jung et al., 2016, Corbin et al., 2016), tobacco (Xu et al., 2010, Holland et al., 2010), soybean (Smith et al., 2002), (Kwon et al., 2003) and carrot (Shaaltiel et al., 2007). As one of the most frequently used suspension cells, rice cells can achieve high yields and are capable of synthesising complex proteins as well as having well-setup transformation platforms and downstream processing pipelines. For some products, the yields from rice suspension cells are competitive with microbial and mammalian cells or even higher. For example, the yield of human growth hormone is up to

21 57 mg/L in rice suspension cells and has the equivalent bioactivity as the equivalent product produced in E.coli (Kim et al., 2008). For acid α-glucosidase, the yield is up to 37mg/L in 11 days and its bioactivity is equivalent to mammalian cells products (Jung et al., 2016). The highest yield of rice suspension cell derived product is !-1-antitrypsin, where production in a membrane bioreactor produced 247 mg/L in six days. This high expression level was partly attributed to the use of an inducible promoter, RAmy3D, which is triggered under conditions of sugar starvation (Morita et al., 1998). Using this two-stage expression system, butyrylcholinesterase, a human enzyme (Corbin et al., 2016) and human serum albumin (Liu et al., 2018) have been successfully produced, with yields up to 1.6 mg/L and 49.6 mg/L respectively. From these examples, the yields vary from a few to hundreds of mg per L which makes it hard to compare and predict the best expression system or yield. This case-by-case situation is caused by many factors, including the choice of promoter, stability of transcript and efficiency of translation as mentioned above in Section 1.3.1.2, and also different bioreactor and culture types.

Based on general scaling volume from low to high, bioreactor systems include wave (Eibl & Eibl, 2008), wave and undertow (Terrier et al., 2007), slug bubble (Terrier et al., 2007), plastic-lined (Curtis, 2004), stirred bag (Eibl et al., 2009)and Osmotek bag (Weathers et al., 2010). These bioreactors are chosen to suit different production requirements. The lab scale is normally under 20 L where the shake-flask is the most commonly used system. Different culture models, like batch (Huang et al., 2009a), fed-batch (Park et al., 2010), perfusion (Lee et al., 2004), semi-continuous (Corbin et al., 2016, Liu et al., 2018) and continuous culture (Des Molles et al., 1999), can influence the production yield. Considering all these factors, it is necessary to optimise gene construct design, cell type, culture strategy, and production conditions to maximise recombinant product yield.

1.3.2. Plant pharmaceutical industry The approach of using plants as natural biofactories for recombinant products has attracted continued interest, especially for complex proteins requiring precise post-translational modifications, including glycosylation and disulfide bond formation. As noted earlier, a number of pharmaceuticals have been produced in plants, some of which have been under clinical trials and some have been approved for commercialization in the market. In particular, there are two milestones in using plant-based pharmaceuticals to treat human diseases. In 2012, Elelyso© was first plant-based drug approved by the FDA (Fox, 2012). It is produced in carrot suspension cells as a taliglucerase alfa enzyme required for treating type 1 Gaucher’s disease. In 2014, ZMapp, an antibody produced in N. benthamiana leaves, was used to treat Ebola virus affected patients (Lyon et al., 2014). In addition, a series of recombinant human therapeutics, like OsrHSA (human serum albumin) (He et al., 2011), OsrAAT (human -1 antitrypsin) (Zhang et al., 2013), OsrbFGF (human basic fibroblast growth factor) (An et al., 2013), OsrEGF, lysozyme (human epidermal growth factor) (Huang et al., 2002a, Huang et al., 2002b), OsrLF (human lactoferrin) (Zavaleta et al., 2007) and transferrin (Huang et al., 2002b) have been produced in rice seed expression

22 systems. Among these human therapeutics, OsrHSA is under clinical trial phase I and has been developed as biocosmetics along with OsrbFGF, OsrLF and OsrLF. To achieve rapid and robust production, the agroinfitration-based transient expression platform in N. benthamiana has been developed for producing vaccines and antibodies (Shoji et al., 2015, Chichester et al., 2018, Diamos & Mason, 2018, Yang et al., 2018). Recently, there is a growing number of bio-pharmaceutical companies, such as Medicago Inc (Canada), PlantForm (Canada), Kentucky BioProcessing Inc (USA), iBio Inc (USA), Icon Genetics (Germany) and Leaf Expression System (UK) were established. Taking Medicago as an example, various plant-based vaccine antigens have been proceeded in clinical trials, e.g. seasonal influenza virus (Phase I, II, III), pandemic influenza (Phase I and II), rotavirus (Phase I) and norovirus (preclinical) (https://www.medicago.com/en/).

In the last few decades, significant improvements in transformation, yield and biosafety have been achieved in the plant pharmaceutical industry. At the same time, cGMP compliant manufacturing facilities have been designed and constructed (Holtz et al., 2015). Medicago Inc, Kentucky BioProcessing Inc and iBio Inc are sponsored by defence advanced research projects agency to demonstrate and validate the feasibility of plant based pharmaceutical production using a cost-efficient and robust large-scale manufacturing platform. With the new cGMP standards for manufacturing of plant produced pharmaceuticals, these improvements will advance the further commercialisation of plant-based therapeutics.

1.3.3. Plant-based system for cyclic peptides

Since the first isolation of cyclotides from the tropical plant O. affinis (Craik et al., 1999), many more cyclotides have been discovered, with numerous bioactivities. The extraction of cyclotides from native cyclotide-producing plants was the initial approach used to obtain cyclic peptides. Later, cell culture systems emerged to produce native cyclotides. Specifically, O. affinis cells, the natural producer of kB1, were used to produce cyclotides (Seydel et al., 2007). That research demonstrated that expression of cyclotides was highly influenced by environmental triggers. An optimised 25 L bioreactor could produce 21 mg kB1 per day which is 20-40% of the amount derived from O. affinis plants (Seydel et al., 2009). However, culturing native cyclotide-producing plant cells cannot meet the requirements to produce engineered cyclic peptides.

To meet the requirement for producing engineered cyclic peptides, two model plants, Arabidopsis and Nicotiana tabacum, were tested to cyclic peptide expression. These model plants were well developed for transformation and heterologous expression, and as they are non-native cyclotide-producing plants, the purification processing is simplified. Compared to expression of kB1 in O. affinis plant and cells, cyclic, linear and misprocessed kB1 peptides were detected when introduced Oak1 gene into Arabidopsis and N. tabacum (Gillon et al., 2008). This suggests that the unique biosynthesis pathway in native cyclic-peptide producing plants is absent in Arabidopsis and N. tabacum. Similar poor production was also

23 seen in N. benthamiana (Conlan et al., 2012). By co-expression of cyclization efficient enzymes, heterologous expression of the Oak1 gene in N. benthamiana has been improved (Poon et al., 2017). The increasing yield of kB1 was at least 8-fold higher compared to expressing Oak1 without OaAEP1b. Following this, SFTI-1 and a SFTI-1 based plasmin inhibitor [T4Y,I7R]SFTI-1, were produced in N. benthamiana (Jackson et al., 2019). That study showed a tremendously increased expression of cyclic

SFTI-1 when co-expressed with OaAEP1b. To further boost the yield of [T4Y,I7R]SFTI-1, a gene carrying three tandem repeats of [T4Y,I7R]SFTI-1 was used, which showed around a five-fold greater yield compared to the single repeat with co-expression of OaAEP1b. These results indicate the cyclization efficient enzymes play a key role in improving the expression of cyclic peptide in planta, while the substrates and other expression strategies also contribute to their yields.

1.4. Aims and scope of this thesis

This thesis covers the topics of plant-derived cyclic peptides from discovery to biotechnological applications. In Chapter 1, the characteristics and applications of cyclic peptides, current synthesis approaches and plant-based production systems are described. In the last 20 years, cysteine rich cyclic peptides have been widely explored and developed as promising drug scaffolds and potential crop protection agents. Chemical synthesis and semi-synthetic approaches have been developed to achieve the production of cyclic peptides at laboratory scale. At the same time, the plant expression systems are under development to enable recombinant protein/peptide production in various systems, including leaf, seed and cell expression systems.

Three aims were investigated in this thesis to illustrate the potential of plants as biofactories for cyclic peptides. They are (i) to develop rice as a production system for cyclic peptides; (ii) to investigate the diversity and cyclization capability of cyclotide-like peptides from monocot species; (iii) to use the model plant Arabidopsis to explore the plasticity of seeds for the production of diverse cyclic peptides. These aims and results are outlined respectively in Chapters 2 to 4 of this thesis.

Chapter 2 describes my approach for developing rice as a biofactory for cyclic peptide production. A stable transformation platform was established in rice callus-derived suspension cells and seeds. Various precursor genes of cyclic peptides were expressed to demonstrate the feasibility of producing cyclic peptides in rice. Highlights include the successful production of correctly cyclized and folded kB1 in rice suspension cells, which is the first example of cyclotide production in monocots.

Chapter 3 describes investigating the diversity and cyclization capability of cyclotide-like peptides in monocots. Cyclotide-like genes were screened in a diverse set of monocot species at the transcriptome level. Using a rapid transient leaf expression system, cyclotide-like peptides were obtained and their cyclization capabilities were tested. Highlights include (i) monocot cyclotide-like genes are restricted to

24 the Poaceae family; (ii) an N-terminal pyroGlu modification might provide stabilization to linear cyclotide-like peptides; (iii) monocot cyclotide-like genes can be re-engineered with minimal residue changes to allow backbone cyclization.

Chapter 4 describes using the model plant Arabidopsis to define the plasticity of seeds for the production of diverse cyclic peptides. Several peptides could be produced in Arabidopsis seeds with its endogenous AEPs. To achieve the co-expression of cyclic peptides and cyclization efficient AEPs, a pre-requisite and significant contribution was to develop a homozygous AEP expressing transgenic Arabidopsis line. This stable AEP expression line will enable future gene crossing or gene stacking experiments.

Finally, Chapter 5 outlines my overall conclusions and future research directions were discussed to translate cyclotide research from discovery through to biotechnological applications. The broader advantages and technical challenges of developing plants for cyclic peptide production are covered.

25 1.5. References

Aboye T, Meeks C, Majumder S, et al., 2016. Design of a MCoTI-based cyclotide with angiotensin (1-7)-like activity. Molecules 21, 152. Aboye TL, Clark RJ, Craik DJ, et al., 2008. Ultra stable peptide scaffolds for protein engineering-synthesis and folding of the circular cystine knotted cyclotide cycloviolacin O2. ChemBioChem 9, 103-113. Aboye TL, Ha H, Majumder S, et al., 2012. Design of a novel cyclotide-based CXCR4 antagonist with anti-human immunodeficiency virus (HIV)-1 activity. Journal of medicinal chemistry 55, 10729-10734. An N, Ou J, Jiang D, et al., 2013. Expression of a functional recombinant human basic fibroblast growth factor from transgenic rice seeds. International Journal of Molecular Sciences 14, 3556-3567. Andersson L, Blomberg L, Flegel M, et al., 2000. Large-scale synthesis of peptides. Peptide Science 55, 227-250. Austin J, Wang W, Puttamadappa S, et al., 2009. Biosynthesis and biological screening of a genetically encoded library based on the cyclotide MCoTI-I. ChemBioChem 10, 2663-2670. Barber CJ, Pujara PT, Reed DW, et al., 2013. The two-step biosynthesis of cyclic peptides from linear precursors in a member of the plant family Caryophyllaceae involves cyclization by a serine protease-like enzyme. Journal of Biological Chemistry 288, 12500-12510. Bolscher JG, Oudhoff MJ, Nazmi K, et al., 2011. Sortase A as a tool for high-yield histatin cyclization. The FASEB Journal 25, 2650-2658. Burman R, Gruber CW, Rizzardi K, et al., 2010. Cyclotide proteins and precursors from the genus Gloeospermum: Filling a blank spot in the cyclotide map of Violaceae. Phytochemistry 71, 13-20. Burman R, Yeshak MY, Larsson S, et al., 2015. Distribution of circular proteins in plants: large-scale mapping of cyclotides in the Violaceae. Frontiers in Plant Science 6, 885. Buyel J, 2018. Plants as sources of natural and recombinant anti-cancer agents. Biotechnology Advances 36, 506-520. Camarero JA, Kimura RH, Woo YH, et al., 2007. A cell-based approach for the biosynthesis/screening of cyclic peptide libraries against bacterial toxins. Chemistry Today 25, 20-23. Cascales L, Henriques ST, Kerr MC, et al., 2011. Identification and characterization of a new family of cell-penetrating peptides cyclic cell-penetrating peptides. Journal of Biological Chemistry 286, 36932-36943. Ceballo Y, Tiel K, López A, et al., 2017. High accumulation in tobacco seeds of hemagglutinin antigen from avian (H5N1) influenza. Transgenic Research 26, 775-789. Chan LY, Craik DJ, Daly NL, 2015. Cyclic thrombospondin-1 mimetics: grafting of a thrombospondin sequence into circular disulfide-rich frameworks to inhibit endothelial cell migration. Bioscience Reports 35, e00270.

26 Chan LY, Craik DJ, Daly NL, 2016. Dual targeting anti-angiogenic cyclic peptides as potential drug leads for cancer. Scientific Reports 6, 35347. Chan LY, Gunasekera S, Henriques ST, et al., 2011. Engineering pro-angiogenic peptides using stable, disulfide-rich cyclic scaffolds. Blood 118, 6709-6717. Chan LY, He W, Tan N, et al., 2013. A new family of cystine knot peptides from the seeds of Momordica cochinchinensis. Peptides 39, 29-35. Chen B, Colgrave ML, Daly NL, et al., 2005. Isolation and characterization of novel cyclotides from Viola hederaceae: solution structure and anti-HIV activity of vhl-1, a leaf-specific expressed cyclotide. Journal of Biological Chemistry 280, 22395-22405. Chen I, Dorr BM, Liu DR, 2011a. A general strategy for the evolution of bond-forming enzymes using yeast display. Proceedings of the National Academy of Sciences 108, 11399-11404. Chen Q, He J, Phoolcharoen W, et al., 2011b. Geminiviral vectors based on bean yellow dwarf virus for production of vaccine antigens and monoclonal antibodies in plants. Human Vaccines 7, 331-338. Cheneval O, Schroeder CI, Durek T, et al., 2014. Fmoc-based synthesis of disulfide-rich cyclic peptides. The Journal of Organic Chemistry 79, 5538-5544. Chiche L, Heitz A, Gelly JC, et al., 2004. Squash inhibitors: from structural motifs to macrocyclic knottins. Current Protein and Peptide Science 5, 341-349. Chichester JA, Green BJ, Jones RM, et al., 2018. Safety and immunogenicity of a plant-produced Pfs25 virus-like particle as a transmission blocking vaccine against malaria: A Phase 1 dose-escalation study in healthy adults. Vaccine 36, 5865-5871. Chung K-J, Hwang S-K, Hahn B-S, et al., 2008. Authentic seed-specific activity of the Perilla oleosin 19 gene promoter in transgenic Arabidopsis. Plant Cell Reports 27, 29-37. Clark RJ, Daly NL, Craik DJ, 2006. Structural plasticity of the cyclic-cystine-knot framework: implications for biological activity and drug design. Biochemical Journal 394, 85-93. Clark RJ, Jensen J, Nevin ST, et al., 2010. The engineering of an orally active conotoxin for the treatment of neuropathic pain. Angewandte Chemie International Edition 49, 6545-6548. Colgrave ML, Craik DJ, 2004. Thermal, chemical, and enzymatic stability of the cyclotide kalata B1: the importance of the cyclic cystine knot. Biochemistry 43, 5965-5975. Colgrave ML, Kotze AC, Huang Y-H, et al., 2008a. Cyclotides: natural, circular plant peptides that possess significant activity against gastrointestinal nematode parasites of sheep. Biochemistry 47, 5581-5589. Colgrave ML, Kotze AC, Ireland DC, et al., 2008b. The anthelmintic activity of the cyclotides: Natural variants with enhanced activity. ChemBioChem 9, 1939-1945. Conlan BF, Colgrave ML, Gillon AD, et al., 2012. Insights into processing and cyclization events associated with biosynthesis of the cyclic peptide kalata B1. Journal of Biological Chemistry 287, 28037-28046.

27 Contreras J, Elnagar AY, Hamm-Alvarez SF, et al., 2011. Cellular uptake of cyclotide MCoTI-I follows multiple endocytic pathways. Journal of Controlled Release 155, 134-143. Corbin JM, Hashimoto BI, Karuppanan K, et al., 2016. Semicontinuous bioreactor production of recombinant butyrylcholinesterase in transgenic rice cell suspension cultures. Frontiers in Plant Science 7, 412. Cowper B, Craik DJ, Macmillan D, 2013. Making ends meet: Chemically mediated circularization of recombinant proteins. ChemBioChem 14, 809-812. Craik DJ, Cĕmažar M, Wang CK, et al., 2006. The cyclotide family of circular miniproteins: nature's combinatorial peptide template. Biopolymers: Peptide Science 84, 250-266. Craik DJ, Daly NL, Bond T, et al., 1999. Plant cyclotides: A unique family of cyclic and knotted proteins that defines the cyclic cystine knot structural motif. Journal of Molecular Biology 294, 1327-1336. Curtis WR, 2004. Growing cells in a reservoir formed of a flexible sterile plastic liner. United States Patent, No. 6709862. D’souza C, Henriques ST, Wang CK, et al., 2014. Structural parameters modulating the cellular uptake of disulfide-rich cyclic cell-penetrating peptides: MCoTI-II and SFTI-1. European Journal of Medicinal Chemistry 88, 10-18. Daly NL, Craik DJ, 2000. Acyclic permutants of naturally occurring cyclic proteins Characterization of cystine knot and β-sheet formation in the macrocyclic polypeptide kalata B1. Journal of Biological Chemistry 275, 19068-19075. Daly NL, Gustafson KR, Craik DJ, 2004. The role of the cyclic peptide backbone in the anti-HIV activity of the cyclotide kalata B1. FEBS Lett. 574, 69-72. Daly NL, Love S, Alewood PF, et al., 1999. Chemical synthesis and folding pathways of large cyclic polypeptides: studies of the cystine knot polypeptide kalata B1. Biochemistry 38, 10606-10614. De Jaeger G, Scheffer S, Jacobs A, et al., 2002. Boosting heterologous protein production in transgenic dicotyledonous seeds using Phaseolus vulgaris regulatory sequences. Nature Biotechnology 20, 1265-1268. Dębowski D, Pikuła M, Lubos M, et al., 2014. Inhibition of human and yeast 20S proteasome by analogues of trypsin inhibitor SFTI-1. PloS one 9, e89465. Deng Y, Wu T, Wang M, et al., 2019. Enzymatic biosynthesis and immobilization of polyprotein verified at the single-molecule level. Nature Communications 10, 2775. Des Molles DV, Gomord V, Bastin M, et al., 1999. Expression of a carrot invertase gene in tobacco suspension cells cultivated in batch and continuous culture conditions. Journal of Bioscience and Bioengineering 87, 302-306. Diamos AG, Mason HS, 2018. High-level expression and enrichment of norovirus virus-like particles in plants using modified geminiviral vectors. Protein Expression and Purification 151, 86-92.

28 Ding XM, Bai DS, Qian JJ, 2014. Novel cyclotides from Hedyotis biflora inhibit proliferation and migration of pancreatic cancer cell in vitro and in vivo. Medicinal Chemistry Research 23, 1406-1413. Dong Y, Li J, Yao N, et al., 2017. Seed-specific expression and analysis of recombinant anti-HER2 single- chain variable fragment (scFv-Fc) in Arabidopsis thaliana. Protein Expression and Purification 133, 187-192. Du J, Chan LY, Poth AG, et al., 2019. Discovery and characterization of cyclic and acyclic trypsin inhibitors from momordica dioica. Journal of Natural Products 82, 293-300. Durek T, Cromm PM, White AM, et al., 2018. Development of novel melanocortin receptor agonists based on the cyclic peptide framework of sunflower trypsin inhibitor-1. Journal of Medicinal Chemistry 61, 3674-3684. Dutton JL, Renda RF, Waine C, et al., 2004. Conserved structural and sequence elements implicated in the processing of gene-encoded circular proteins. Journal of Biological Chemistry 279, 46858-46867. Eibl R, Eibl D, 2008. Design and use of the wave bioreactor for plant cell culture. In. Gupta S.D., Ibaraki Y. (ed.) Plant tissue culture engineering. Springer Dordrecht, pp. 203-227. Eibl R, Werner S, Eibl D, 2009. Disposable bioreactors for plant liquid cultures at Litre-scale. Engineering in Life Sciences 9, 156-164. Eliasen R, Daly NL, Wulff BS, et al.,2012. Design, synthesis, structural and functional characterization of novel melanocortin agonists based on the cyclotide kalata B1. Journal of Biological Chemistry 287, 40493-40501. Elliott AG, Delay C, Liu H, et al., 2014. Evolutionary origins of a bioactive peptide buried within preproalbumin. The Plant Cell 26, 981-995. Fahradpour M, Keov P, Tognola C, et al., 2017. Cyclotides isolated from an ipecac root extract antagonize the corticotropin releasing factor type 1 receptor. Frontiers in Pharmacology 8, 616. Feng Z-G, Pang S-F, Guo D-J, et al., 2014. Recombinant keratinocyte growth factor 1 in tobacco potentially promotes wound healing in diabetic rats. BioMed Research International 2014, 579632- 579641. Fensterseifer ICM, Silva ON, Malik U, et al., 2015. Effects of cyclotides against cutaneous infections caused by Staphylococcus aureus. Peptides 63, 38-42. Fittler H, Avrutina O, Empting M, et al., 2014. Potent inhibitors of human matriptase-1 based on the scaffold of sunflower trypsin inhibitor. Journal of Peptide Science 20, 415-420. Fox JL, 2012. First plant-made biologic approved. Nature Biotechnology 30, 472. Fujiwara Y, Aiki Y, Yang L, et al., 2010. Extraction and purification of human interleukin-10 from transgenic rice seeds. Protein Expression and Purification 72, 125-130. Fujiwara Y, Yang L, Takaiwa F, et al., 2016. Expression and purification of recombinant mouse interleukin-4 and-6 from transgenic rice seeds. Molecular Biotechnology 58, 223-231. Furtado A, Henry RJ, Takaiwa F, 2008. Comparison of promoters in transgenic rice. Plant Biotechnology Journal 6, 679-693.

29 Gerlach SL, Burman R, Bohlin L, et al., 2010. Isolation, characterization, and bioactivity of cyclotides from the Micronesian plant Psychotria leptothyrsa. Journal of Natural Products 73, 1207-1213. Getz JA, Cheneval O, Craik DJ, et al., 2013. Design of a cyclotide antagonist of neuropilin-1 and -2 that potently inhibits endothelial cell migration. ACS Chemical Biology 8, 1147-1154. Getz JA, Rice JJ, Daugherty PS, 2011. Protease-resistant peptide ligands from a knottin scaffold library. ACS Chemical Biology 6, 837-844. Gillon AD, Saska I, Jennings CV, et al., 2008. Biosynthesis of circular proteins in plants. The Plant Journal 53, 505-515. Glotzbach B, Reinwarth M, Weber N, et al., 2013. Combinatorial optimization of cystine-knot peptides towards high-affinity inhibitors of human matriptase-1. PloS one 8, e76956. Goff SA, Ricke D, Lan TH, et al., 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296, 92-100. Gran L, 1970. An oxytocic principle found in Oldenlandia affinis DC. Meddelelser Fra Norsk Farmaceutisk Selskap 12, 173-180. Greenwood KP, Daly NL, Brown DL, et al., 2007. The cyclic cystine knot miniprotein MCoTI-II is internalized into cells by macropinocytosis. The International Journal of Biochemistry & Cell Biology 39, 2252-2264. Gruber CW, Elliott AG, Ireland DC, et al., 2008. Distribution and evolution of circular miniproteins in flowering plants. Plant Cell 20, 2471-2483. Grundemann C, Koehbach J, Huber R, et al., 2012. Do plant cyclotides have potential as immunosuppressant peptides? Journal of Natural Products 75, 167-174. Gunasekera S, Daly NL, Clark RJ, et al., 2009. Dissecting the oxidative folding of circular cystine knot miniproteins. Antioxidants & Redox Signaling 11, 971-980. Gunasekera S, Foley FM, Clark RJ, et al., 2008. Engineering stabilized vascular endothelial growth factor-A antagonists: synthesis, structural characterization, and bioactivity of grafted analogues of cyclotides. Journal of Medicinal Chemistry 51, 7697-7704. Gustafsson C, Govindarajan S, Minshull J, 2004. Codon bias and heterologous protein expression. Trends in Biotechnology 22, 346-353. Haase J, Lanka E, 1997. A specific protease encoded by the conjugative DNA transfer systems of IncP and Ti plasmids is essential for pilus synthesis. Journal of Bacteriology 179, 5728-5735. Harris KS, Durek T, Kaas Q, et al., 2015. Efficient backbone cyclization of linear peptides by a recombinant asparaginyl endopeptidase. Nature Communications 6, 10199. Hatsugai N, Kuroyanagi M, Yamada K, et al., 2004. A plant vacuolar protease, VPE, mediates virus-induced hypersensitive cell death. Science 305, 855-858. He Y, Ning T, Xie T, et al., 2011. Large-scale production of functional human serum albumin from transgenic rice seeds. Proceedings of the National Academy of Sciences, 108, 19078-19083.

30 Hellinger R, Koehbach J, Fedchuk H, et al., 2014. Immunosuppressive activity of an aqueous Viola tricolor herbal extract. Journal of Ethnopharmacology 151, 299-306. Hemu X, El Sahili A, Hu S, et al., 2019. Structural determinants for peptide-bond formation by asparaginyl ligases. Proc Natl Acad Sci U S A 116, 11737-11746. Hemu X, Qiu Y, Nguyen GKT, et al., 2016. Total synthesis of circular bacteriocins by butelase 1. Journal of the American Chemical Society 138, 6968-6971. Henriques ST, Craik DJ, 2010. Cyclotides as templates in drug design. Drug Discovery Today 15, 57-64. Henriques ST, Craik DJ, 2012. Importance of the cell membrane on the mechanism of action of cyclotides. ACS Chemical Biology 7, 626-636. Henriques ST, Huang YH, Chaousis S, et al., 2015. The prototypic cyclotide kalata B1 has a unique mechanism of entering cells. Chemistry & Biology 22, 1087-1097. Henriques ST, Huang YH, Rosengren KJ, et al., 2011. Decoding the membrane activity of the cyclotide kalata B1: The importance of phosphatidylethanolamine phospholipids and lipid organization on hemolytic and anti-HIV activities. Journal of Biological Chemistry 286, 24231-24241. Hernandez JF, Gagnon J, Chiche L, et al., 2000. Squash trypsin inhibitors from Momordica cochinchinensis exhibit an atypical macrocyclic structure. Biochemistry 39, 5722-5730. Hernández VA, López QA, Ceballo CY, et al., 2015. Tobacco seeds as efficient production platform for a biologically active anti-HBsAg monoclonal antibody. Transgenic Research 24, 897-909. Herrmann A, Burman R, Mylne JS, et al., 2008. The alpine violet, Viola biflora, is a rich source of cyclotides with potent cytotoxicity. Phytochemistry 69, 939-52. Hirai T, Kurokawa N, Duhita N, et al., 2011. The HSP terminator of Arabidopsis thaliana induces a high level of miraculin accumulation in transgenic tomatoes. Journal of Agricultural and Food Chemistry 59, 9942-9949. Holland T, Sack M, Rademacher T, et al., 2010. Optimal nitrogen supply as a key to increased and sustained production of a monoclonal full-size antibody in BY-2 suspension culture. Biotechnology and Bioengineering 107, 278-289. Holtz BR, Berquist BR, Bennett LD, et al., 2015. Commercial-scale biotherapeutics manufacturing facility for plant-made pharmaceuticals. Plant Biotechnology Journal 13, 1180-1190. Hood EE, Bailey MR, Beifuss K, et al., 2003. Criteria for high-level expression of a fungal laccase gene in transgenic maize. Plant Biotechnology Journal 1, 129-140. Hood EE, Witcher DR, Maddock S, et al., 1997. Commercial production of avidin from transgenic maize: characterization of transformant, production, processing, extraction and purification. Molecular Breeding 3, 291-306. Huang J, Nandi S, Wu L, et al., 2002a. Expression of natural antimicrobial human lysozyme in rice grains. Molecular Breeding 10, 83-94.

31 Huang J, Wu L, Yalda D, et al., 2002b. Expression of functional recombinant human lysozyme in transgenic rice cell culture. Transgenic research 11, 229-239. Huang LF, Liu YK, Lu CA, et al., 2005. Production of human serum albumin by sugar starvation induced promoter and rice cell culture. Transgenic Research 14, 569-581. Huang TK, Plesha MA, Falk BW, et al., 2009a. Bioreactor strategies for improving production yield and functionality of a recombinant human protein in transgenic tobacco cell cultures. Biotechnology and Bioengineering 102, 508-520. Huang YH, Colgrave ML, Daly NL, et al., 2009b. The biological activity of the prototypic cyclotide kalata B1 is modulated by the formation of multimeric pores. Journal of Biological Chemistry 284, 20699-20707. Huang YH, Chaousis S, Cheneval O, et al., 2015. Optimization of the cyclotide framework to improve cell penetration properties. Frontiers in Pharmacology 6, 17. Ireland DC, Colgrave ML, Nguyencong P, et al., 2006. Discovery and characterization of a linear cyclotide from Viola odorata: implications for the processing of circular proteins. Journal of Molecular Biology 357, 1522-1535. Ireland DC, Wang CK, Wilson JA, et al., 2008. Cyclotides as natural anti-HIV agents. Peptide Science 90, 51-60. Isidro-Llobet A, Kenworthy MN, Mukherjee S, et al., 2019. Sustainability challenges in peptide synthesis and purification: from R&D to production. The Journal of Organic Chemistry 84, 4615-4628. Jackson MA, Gilding EK, Shafee T, et al., 2018. Molecular basis for the production of cyclic peptides by plant asparaginyl endopeptidases. Nature Communications 9, 2411. Jackson MA, Yap K, Poth A, et al., 2019. Rapid and scalable plant based production of a potent plasmin inhibitor peptide. Frontiers in Plant Science 10, 602. Jagadish K, Borra R, Lacey V, et al., 2013. Expression of fluorescent cyclotides using protein trans-splicing for easy monitoring of cyclotide-protein interactions. Angewandte Chemie International Edition 52, 3126-3131. James AM, Haywood J, Leroux J, et al., 2019. The macrocyclizing protease butelase 1 remains auto-catalytic and reveals the structural basis for ligase activity. The Plant Journal 98, 955-1158. Jendrny C, Beck-Sickinger AG, 2016. Inhibition of kallikrein-related peptidases 7 and 5 by grafting serpin reactive-center loop sequences onto sunflower trypsin inhibitor-1 (SFTI-1). ChemBioChem 17, 719-726. Jennings C, West J, Waine C, et al., 2001. Biosynthesis and insecticidal properties of plant cyclotides: the cyclic knotted proteins from Oldenlandia affinis. Proceedings of the National Academy of Sciences 98, 10614-10619.

32 Jennings CV, Rosengren KJ, Daly NL, et al., 2005. Isolation, solution structure, and insecticidal activity of kalata B2, a circular protein with a twist: do Möbius strips exist in nature? Biochemistry 44, 851-860. Ji Y, Majumder S, Millard M, et al., 2013. In vivo activation of the p53 tumor suppressor pathway by an engineered cyclotide. Journal of the American Chemical Society 135, 11623-11633. Jia X, Kwon S, Wang CI, et al., 2014. Semienzymatic cyclization of disulfide-rich peptides using sortase A. Journal of Biological Chemistry 289, 6627-6638. Jung JW, Kim NS, Jang SH, et al., 2016. Production and characterization of recombinant human acid α-glucosidase in transgenic rice cell suspension culture. Journal of Biotechnology 226, 44-53. Kaas Q, Craik DJ, 2010. Analysis and classification of circular proteins in CyBase. Biopolymers: Peptide Science 94, 584-591. Kallolimath S, Castilho A, Strasser R, et al., 2016. Engineering of complex protein sialylation in plants. Proceedings of the National Academy of Sciences 113, 9498-9503. Kapelski S, Boes A, Spiegel H, et al., 2015. Fast track antibody V-gene rescue, recombinant expression in plants and characterization of a PfMSP4-specific antibody. Malaria Journal 14, 50. Kashima K, Yuki Y, Mejima M, et al., 2016. Good manufacturing practices production of a purification-free oral cholera vaccine expressed in transgenic rice plants. Plant Cell Reports 35, 667-679. Kim MY, Reljic R, Kilbourne J, et al., 2015. Novel vaccination approach for dengue infection based on recombinant immune complex universal platform. Vaccine 33, 1830-1838. Kim TG, Baek MY, Lee EK, et al., 2008. Expression of human growth hormone in transgenic rice cell suspension culture. Plant Cell Reports 27, 885-891. Klimyuk V, Pogue G, Herz S, et al., 2012. Production of recombinant antigens and antibodies in Nicotiana benthamiana using ‘magnifection’technology: GMP-compliant facilities for small-and large-scale manufacturing. Plant Viral Vectors 375, 127-154. Koehbach J, Gruber CW, 2015. Cyclotides in the Rubiaceae. In. Craik DJ (ed.) Advances in Botanical Research Plant Cyclotides. Elsevier, Chapter 3 pp. 51-78. Koehbach J, O’brien M, Muttenthaler M, et al., 2013. Oxytocic plant cyclotides as templates for peptide G protein-coupled receptor ligand design. Proceedings of the National Academy of Sciences 110, 21183-21188. Kudo K, Ohta M, Yang L, et al., 2013. ER stress response induced by the production of human IL-7 in rice endosperm cells. Plant Molecular Biology 81, 461-475. Kwon TH, Kim YS, Lee JH, et al., 2003. Production and secretion of biologically active human granulocyte-macrophage colony stimulating factor in transgenic tomato suspension cultures. Biotechnology Letters 25, 1571-1574.

33 Lai H, He J, Hurtado J, et al., 2014. Structural and functional characterization of an anti-West Nile virus monoclonal antibody and its single-chain variant produced in glycoengineered plants. Plant Biotechnology Journal 12, 1098-1107. Le Mauff F, Loutelier-Bourhis C, Bardor M, et al., 2017. Cell wall biochemical alterations during Agrobacterium-mediated expression of haemagglutinin-based influenza virus-like vaccine particles in tobacco. Plant Biotechnology Journal 15, 285-296. Lee J, Mcintosh J, Hathaway BJ, et al., 2009. Using marine natural products to discover a protease that catalyzes peptide macrocyclization of diverse substrates. Journal of the American Chemical Society 131, 2122-2124. Lee SY, Kim YH, Roh YS, et al., 2004. Bioreactor operation for transgenic Nicotiana tabacum cell cultures and continuous production of recombinant human granulocyte-macrophage colony-stimulating factor by perfusion culture. Enzyme and Microbial Technology 35, 663-671. Liu YK, Lu CW, Chang JY, et al., 2018. Optimization of the culture medium for recombinant protein production under the control of the αAmy3 promoter in a rice suspension-cultured cell expression system. Plant Cell, Tissue and Organ Culture 132, 383-391. Long YQ, Lee SL, Lin CY, et al., 2001. Synthesis and evaluation of the sunflower derived trypsin inhibitor as a potent inhibitor of the type II transmembrane serine protease, matriptase. Bioorganic & Medicinal Chemistry Letters 11, 2515-2519. Loos A, Gach JS, Hackl T, et al., 2015. Glycan modulation and sulfoengineering of anti-HIV-1 monoclonal antibody PG9 in plants. Proceedings of the National Academy of Sciences 112, 12675-12680. Luckett S, Garcia RS, Barker J, et al., 1999. High-resolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds. Journal of Molecular Biology 290, 525-533. Lyon GM, Mehta AK, Varkey JB, et al., 2014. Clinical care of two patients with Ebola virus disease in the United States. New England Journal of Medicine 371, 2402-2409. Maass F, Wustehube-Lausch J, Dickgiessr S, et al., 2015. Cystine-knot peptides targeting cancer-relevant human cytotoxic T lymphocyte-associated antigen 4 (CTLA-4). Journal of Peptide Science 21, 651-660. Martínez CA, Topal E, Giulietti AM, et al., 2010. Exploring different strategies to express Dengue virus envelope protein in a plant system. Biotechnology Letters 32, 867-875. Marx UC, Korsinczky ML, Schirra HJ, et al., 2003. Enzymatic cyclization of a potent Bowman-Birk protease inhibitor, sunflower trypsin inhibitor-1, and solution structure of an acyclic precursor peptide. Journal of Biological Chemistry 278, 21782-21789. Mathew LG, Herbst-Kralovetz MM, Mason HS, 2014. Norovirus Narita 104 virus-like particles expressed in Nicotiana benthamiana induce serum and mucosal immune responses. BioMed Research International 2014, 807539- 807548.

34 Matsumoto T, Sawamoto S, Sakamoto T, et al., 2011. Site-specific tetrameric streptavidin-protein conjugation using sortase A. Journal of Biotechnology 152, 37-42. Matsumoto Y, Suzuki S, Nozoye T, et al., 2009. Oral immunogenicity and protective efficacy in mice of transgenic rice plants producing a vaccine candidate antigen (As16) of Ascaris suum fused with cholera toxin B subunit. Transgenic Research 18, 185. Mazmanian SK, Liu G, Ton-That H, et al., 1999. Staphylococcus aureus sortase, an enzyme that anchors surface proteins to the cell wall. Science 285, 760-763. Merrifield B, 1997. Concept and early development of solid-phase peptide synthesis. Methods in Enzymology 289, 3-13. Morita A, Umemura TA, Kuroyanagi M, et al., 1998. Functional dissection of a sugar-repressed α-amylase gene (RAmy1A) promoter in rice embryos. FEBS Letters 423, 81-85. Mulvenna JP, Mylne JS, Bharathi R, et al., 2006. Discovery of cyclotide-like protein sequences in graminaceous crop plants: ancestral precursors of circular proteins? The Plant Cell 18, 2134-2144. Mylne JS, Chan LY, Chanson AH, et al., 2012. Cyclic peptides arising by evolutionary parallelism via asparaginyl-endopeptidase-mediated biosynthesis. The Plant Cell 24, 2765-2778. Mylne JS, Colgrave ML, Daly NL, et al., 2011. Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nature Chemical Biology 7, 257-259. Mylne JS, Wang CK, Weerden NL, et al., 2010. Cyclotides are a component of the innate defense of Oldenlandia affinis. Peptide Science 94, 635-646. Nandi S, Yalda D, Lu S, et al., 2005. Process development and economic evaluation of recombinant human lactoferrin expressed in rice grain. Transgenic Research 14, 237-249. Nguyen GK, Hemu X, Quek JP, et al., 2016a. Butelase-Mediated Macrocyclization of d-Amino-Acid-Containing Peptides. Angewandte Chemie International Edition 55, 12802-12806. Nguyen GK, Kam A, Loo S, et al., 2015. Butelase 1: a versatile ligase for peptide and protein macrocyclization. Journal of the American Chemical Society 137, 15398-15401. Nguyen GK, Lian YL, Pang EWH, et al., 2013a. Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. Journal of Biological Chemistry 288, 3370-3380. Nguyen GK, Qiu Y, Cao Y, et al., 2016b. Butelase-mediated cyclization and ligation of peptides and proteins. Nature Protocols 11, 1977-1988. Nguyen GK, Wang S, Qiu Y, et al., 2014. Butelase 1 is an Asx-specific ligase enabling peptide macrocyclization and synthesis. Nature Chemical Biology 10, 732-738. Nguyen GK, Lian Y, Pang EWH, et al., 2013b. Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. Journal of Biological Chemistry 288, 3370-3380.

35 Nguyen GK, Lim WH, Nguyen PQT, et al., 2012. Novel cyclotides and uncyclotides with highly shortened precursors from Chassalia chartacea and effects of oxidation on bioactivities. Journal of Biological Chemistry 287, 17598-17607. Nguyen GK, Zhang S, Ngan TKN, et al., 2011a. Discovery and characterization of novel cyclotides originated from chimeric precursors consisting of albumin-1 chain a and cyclotide domains in the Fabaceae family. Journal of Biological Chemistry 286, 24275-24287. Nguyen GK, Zhang S, Wang W, et al., 2011b. Discovery of a linear cyclotide from the bracelet subfamily and its disulfide mapping by top-down mass spectrometry. Journal of Biological Chemistry 286, 44833-44844. Ning T, Xie T, Qiu Q, et al., 2008. Oral administration of recombinant human granulocyte-macrophage colony stimulating factor expressed in rice endosperm can increase leukocytes in mice. Biotechnology Letters 30, 1679-1686. Ogo Y, Takahashi H, Wang S, et al., 2014. Generation mechanism of novel, huge protein bodies containing wild type or hypoallergenic derivatives of birch pollen allergen Bet v 1 in rice endosperm. Plant Molecular Biology 86, 111-123. Ou J, Guo Z, Shi J, et al., 2014. Transgenic rice endosperm as a bioreactor for molecular pharming. Plant Cell Reports 33, 585-594. Panero J, 2007. Key to the tribes of the Heliantheae alliance. Families and Genera of Vascular Plants 8, 391-395. Park CI, Lee SJ, Kang SH, et al., 2010. Fed-batch cultivation of transgenic rice cells for the production of hCTLA4Ig using concentrated amino acids. Process Biochemistry 45, 67-74. Parmenter D, Boothe JV, Van Rooijen G, et al., 1995. Production of biologically active hirudin in plant seeds using oleosin partitioning. Plant molecular biology 29, 1167-1180. Phoolcharoen W, Bhoo SH, Lai H, et al., 2011. Expression of an immunogenic Ebola immune complex in Nicotiana benthamiana. Plant Biotechnology Journal 9, 807-816. Pi N, Gao M, Cheng X, et al., 2019. Recombinant butelase-mediated cyclization of the p53-binding domain of the oncoprotein MdmX stabilized protein conformation as a promising model for structural investigation. Biochemistry 58, 3005-3015. Plan MRR, Göransson U, Clark RJ, et al., 2007. The cyclotide fingerprint in Oldenlandia affinis: elucidation of chemically modified, linear and novel macrocyclic peptides. ChemBioChem 8, 1001-1011. Plan MRR, Saska I, Cagauan AG, et al., 2008. Backbone cyclized peptides from plants show molluscicidal activity against the rice pest Pomacea canaliculata (golden apple snail). Journal of agricultural and food chemistry 56, 5237-5241. Pooggin MM, Skryabin KG, 1992. The 5'-untranslated leader sequence of potato virus X RNA enhances the expression of a heterologous gene in vivo. Molecular and General Genetics 234, 329-331.

36 Poon S, Harris KS, Jackson MA, et al., 2017. Co-expression of a cyclizing asparaginyl endopeptidase enables efficient production of cyclic peptides in planta. Journal of Experimental Botany 69, 633-641. Popp MWL, Ploegh HL, 2011. Making and breaking peptide bonds: protein engineering using sortase. Angewandte Chemie International Edition 50, 5024-32. Porto WF, Miranda VJ, Pinto MF, et al., 2016. High-performance computational analysis and peptide screening from databases of cyclotides from Poaceae. Peptide Science 106, 109-18. Poth AG, Colgrave ML, Lyons RE, et al., 2011a. Discovery of an unusual biosynthetic origin for circular proteins in legumes. Proceedings of the National Academy of Sciences 108, 10127-10132. Poth AG, Colgrave ML, Philip R, et al., 2011b. Discovery of cyclotides in the Fabaceae plant family provides new insights into the cyclization, evolution, and distribution of circular proteins. ACS Chemical Biology 6, 345-355. Poth AG, Mylne JS, Grassl J, et al., 2012. Cyclotides associate with leaf vasculature and are the products of a novel precursor in Petunia (Solanaceae). Journal of Biological Chemistry 287, 27033-27046. Pränting M, Loov C, Burman R, et al., 2010. The cyclotide cycloviolacin O2 from Viola odorata has potent bactericidal activity against Gram-negative bacteria. Journal of antimicrobial chemotherapy 65, 1964-1971. Qiang X, Zechmann B, Reitz MU, et al., 2012. The mutualistic fungus Piriformospora indica colonizes Arabidopsis roots by inducing an endoplasmic reticulum stress-triggered caspase-dependent cell death. The Plant Cell 24, 794-809. Qiu X, Wong G, Audet J, et al., 2014. Reversion of advanced Ebola virus disease in nonhuman primates with ZMapp. Nature 514, 47-53. Qu LQ, Takaiwa F, 2004. Evaluation of tissue specificity and expression strength of rice seed component gene promoters in transgenic rice. Plant Biotechnology Journal 2, 113-125. Quimbar P, Malik U, Sommerhoff CP, et al., 2013. High-affinity cyclic peptide matriptase inhibitors. Journal of Biological Chemistry, Journal of Biological Chemistry 288, 13885-13896. Rehm FB, Jackson MA, De Geyter E, et al., 2019. Papain-like cysteine proteases prepare plant cyclic peptide precursors for cyclization. Proceedings of the National Academy of Sciences, 116, 7831-7836. Richter LJ, Thanavala Y, Arntzen CJ, et al.,2000. Production of hepatitis B surface antigen in transgenic plants for oral immunization. Nature Biotechnology 18, 1167-1171. Rong J, Song Z, Su J, et al.,2005. Low frequency of transgene flow from Bt/CpTI rice to its nontransgenic counterparts planted at close spacing. New Phytologist 168, 559-566. Rosenberg Y, Sack M, Montefiori D, et al., 2015. Pharmacokinetics and immunogenicity of broadly neutralizing HIV monoclonal antibodies in macaques. PloS one 10, e0120451.

37 Sainsbury F, Thuenemann EC, Lomonossoff GP, 2009. pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnology Journal 7, 682-693. Salehi H, Bahramnejad B, Majdi M, 2017. Induction of two cyclotide-like genes Zmcyc1 and Zmcyc5 by abiotic and biotic stresses in Zea mays. Acta Physiologiae Plantarum 39, 131. Saska I, Gillon AD, Hatsugai N, et al., 2007. An asparaginyl endopeptidase mediates in vivo protein backbone cyclization. Journal of Biological Chemistry 282, 29721-29728. Seydel P, Gruber CW, Craik DJ, et al., 2007. Formation of cyclotides and variations in cyclotide expression in Oldenlandia affinis suspension cultures. Applied Microbiology and Bbiotechnology 77, 275-284. Seydel P, Walter C, Dörnenburg H, 2009. Scale-up of Oldenlandia affinis suspension cultures in photobioreactors for cyclotide production. Engineering in Life Sciences 9, 219-226. Shaaltiel Y, Bartfeld D, Hashmueli S, et al., 2007. Production of glucocerebrosidase with terminal glycans for enzyme replacement therapy of Gaucher's disease using a plant cell system. Plant Biotechnology Journal 5, 579-590. Sheldon PS, Bowles DJ, 1996. Post-translational peptide bond formation during concanavalin A processing in vitro. Biochemical Journal 320, 865-870. Shimada T, Hiraiwa N, Nishimura M, et al., 1994. Vacuolar processing enzyme of soybean that converts proproteins to the corresponding mature forms. Plant and Cell Physiology 35, 713-718. Shivprasad S, Pogue GP, Lewandowski DJ, et al., 1999. Heterologous sequences greatly affect foreign gene expression in tobacco mosaic virus-based vectors. Virology 255, 312-323. Shoji Y, Prokhnevsky A, Leffet B, et al., 2015. Immunogenicity of H1N1 influenza virus-like particles produced in Nicotiana benthamiana. Human Vaccines & Immunotherapeutics 11, 118-123. Smith ML, Mason HS, Shuler ML, 2002. Hepatitis B surface antigen (HBsAg) expression in plant cell culture: kinetics of antigen accumulation in batch culture and its intracellular form. Biotechnology and Bioengineering 80, 812-822. Soh HS, Chung HY, Lee HH, et al., 2015. Expression and functional validation of heat-labile enterotoxin B (LTB) and cholera toxin B (CTB) subunits in transgenic rice (Oryza sativa). Springerplus 4, 148. Sommerhoff CP, Avrutina O, Schmoldt HU, et al., 2010. Engineered cystine knot miniproteins as potent inhibitors of human mast cell tryptase β. Journal of Molecular Biology 395, 167-175. Stanger K, Maurer T, Kaluarachchi H, et al., 2014. Backbone cyclization of a recombinant cystine-knot peptide by engineered Sortase A. FEBS Letters 588, 4487-4496. Stoger E, Ma JK, Fischer R, et al., 2005. Sowing the seeds of success: pharmaceutical proteins from plants. Current Opinion in Biotechnology 16, 167-173. Suzuki K, Kaminuma O, Yang L, et al., 2011. Prevention of allergic asthma by vaccination with transgenic rice seed expressing mite allergen: induction of allergen-specific oral tolerance without bystander suppression. Plant Biotechnology Journal 9, 982-990.

38 Swedberg JE, Nigon LV, Reid JC, et al., 2009. Substrate-guided design of a potent and selective kallikrein- related peptidase inhibitor for kallikrein 4. Chemistry & Biology 16, 633-643. Takaiwa F, Wakasa Y, Takagi H, et al., 2015. Rice seed for delivery of vaccines to gut mucosal immune tissues. Plant Biotechnology Journal 13, 1041-1055. Takaiwa F, Yang L, Maruyama N, et al., 2016. Deposition mode of transforming growth factor-β expressed in transgenic rice seed. Plant Cell Reports 35, 2461-2473. Tam JP, Lu YA, Yu Q, 1999. Thia zip reaction for synthesis of large cyclic peptides: Mechanisms and applications. Journal of the American Chemical Society 121, 4316-4324. Tang J, Wang CK, Pan X, et al., 2010. Isolation and characterization of cytotoxic cyclotides from Viola tricolor. Peptides 31, 1434-1440. Terrier B, Courtois D, Hénault N, et al., 2007. Two new disposable bioreactors for plant cell culture: the wave and undertow bioreactor and the slug bubble bioreactor. Biotechnology and Bioengineering 96, 914-923. Thongyoo P, Bonomelli C, Leatherbarrow RJ, et al., 2009. Potent inhibitors of β-tryptase and human leukocyte elastase based on the MCoTI-II scaffold. Journal of Medicinal Chemistry 52, 6197-6200. Thongyoo P, Jaulent AM, Tate EW, et al., 2007. Immobilized protease-assisted synthesis of engineered cysteine-knot microproteins. ChemBioChem 8, 1107-1109. Thongyoo P, Roque-Rosell N, Leatherbarrow RJ, et al., 2008. Chemical and biomimetic total syntheses of natural and engineered MCoTI cyclotides. Organic and Biomolecular Chemistry 6, 1462-1470. Tsukiji S, Nagamune T, 2009. Sortase-Mediated Ligation: A gift from gram-positive bacteria to protein engineering. ChemBioChem 10, 787-798. Vamvaka E, Arcalis E, Ramessar K, et al., 2016. Rice endosperm is cost-effective for the production of recombinant griffithsin with potent activity against HIV. Plant Biotechnology Journal 14, 1427-1437. Vamvaka E, Farré G, Molinos-Albert LM, et al., 2018. Unexpected synergistic HIV neutralization by a triple microbicide produced in rice endosperm. Proceedings of the National Academy of Sciences 115, E7854-E7862. Wakasa Y, Takagi H, Hirose S, et al., 2013. Oral immunotherapy with transgenic rice seed containing destructed Japanese cedar pollen allergens, Cry j 1 and Cry j 2, against Japanese cedar pollinosis. Plant Biotechnology Journal 11, 66-76. Wang CK, Colgrave ML, Gustafson KR, Ireland DC, Göransson U, Craik DJ, 2008. Anti-HIV cyclotides from the Chinese medicinal herb Viola yedoensis. Journal of natural products 71, 47-52. Wang CK, Gruber CW, Cĕmažar M, et al., 2014. Molecular grafting onto a stable framework yields novel cyclic peptides for the treatment of multiple sclerosis. ACS Chemical Biology 9, 156-163.

39 Wang D, Ma J, Sun D, Li H, Jiang C, Li X, 2015. Expression of bioactive anti-CD20 antibody fragments and induction of ER stress response in Arabidopsis seeds. Applied microbiology and biotechnology 99, 6753-6764. Weathers PJ, Towler MJ, Xu J, 2010. Bench to batch: advances in plant cell culture for producing useful products. Applied microbiology and Biotechnology 85, 1339-1351. Weigel D, Glazebrook J, 2002. Arabidopsis: a laboratory manual. Cold Spring Harbor Laboratory. Wirz H, Sauer-Budge AF, Briggs J, et al., 2012. Automated production of plant-based vaccines and pharmaceuticals. Journal of Laboratory Automation 17, 449-457. Wong CTT, Rowlands DK, Wong CH, et al., 2012. Orally active peptidic bradykinin B-1 receptor antagonists engineered from a cyclotide scaffold for inflammatory pain treatment. Angewandte Chemie-International Edition 51, 5620-5624. Wu J, Yu L, Li L, et al., 2007. Oral immunization with transgenic rice seeds expressing VP2 protein of infectious bursal disease virus induces protective immune responses in chickens. Plant Biotechnology Journal 5, 570-578. Xie T, Qiu Q, Zhang W, et al., 2008. A biologically active rhIGF-1 fusion accumulated in transgenic rice seeds can reduce blood glucose in diabetic mice via oral delivery. Peptides 29, 1862-1870. Xu J, Okada S, Tan L, et al., 2010. Human growth hormone expressed in tobacco cells as an arabinogalactan-protein fusion glycoprotein has a prolonged serum life. Transgenic Research 19, 849-867. Yang L, Hirose S, Takahashi H, et al., 2012. Recombinant protein yield in rice seed is enhanced by specific suppression of endogenous seed proteins at the same deposit site. Plant Biotechnology Journal 10, 1035-1045. Yang L, Kajiura H, Suzuki K, et al., 2008. Generation of a transgenic rice seed-based edible vaccine against house dust mite allergy. Biochemical and biophysical research communications 365, 334-339. Yang M, Sun H, Lai H, et al., 2018. Plant-produced Zika virus envelope protein elicits neutralizing immune responses that correlate with protective immunity against Zika virus in mice. Plant Biotechnology Journal 16, 572-580. Yang R, Wong YH, Nguyen GK, et al., 2017. Engineering a catalytically efficient recombinant protein ligase. Journal of the American Chemical Society 139, 5351-5358. Yoshida T, Kimura E, Koike S, et al., 2011. Transgenic rice expressing amyloid β-peptide for oral immunization. International Journal of Biological Sciences 7, 301-307. Yuki Y, Mejima M, Kurokawa S, et al., 2013. Induction of toxin-specific neutralizing immunity by molecularly uniform rice-based oral cholera toxin B subunit vaccine without plant-associated sugar modification. Plant Biotechnology Journal 11, 799-808.

40 Zavaleta N, Figueroa D, Rivera J, et al., 2007. Efficacy of rice-based oral rehydration solution containing recombinant human lactoferrin and lysozyme in Peruvian children with acute diarrhea. Journal of Pediatric Gastroenterology and Nutrition 44, 258-264. Zhang L, Shi J, Jiang D, et al., 2013. Expression and characterization of recombinant human alpha-antitrypsin in transgenic rice seed. Journal of Biotechnology 164, 300-308.

Note: credits of images used on page 1 and Figure 1.3 https://themeditativegardener.blogspot.com/2016/04/dear-sweet-violets.html https://www.walmart.com/ip/Peel-n-Stick-Poster-of-Blue-Asia-Butterfly-Pea-Flower-Clitoria-TernateaPoster-24x16-Adhesive-Sticker-Poster-Print/894069957 https://pixnio.com/media/flora-summer-petunia-flower-garden https://www.frozenseeds.com/products/gac-melon-seeds-momordica-cochinchinensis https://www.123rf.com/photo_91797105_maize-crop-in-growth-at-farm.html https://pixabay.com/photos/sunflower-sunflower-field-yellow-1627193/ http://publish.plantnet-project.org/project/riceweeds_en/collection/collection/information/taxo_view_gallery/Poaceae%20-%20Panicum%20laxum%20Sw. https://commons.wikimedia.org/wiki/File:Kolbenhirse.jpg Courtesy of Marilyn Anderson, Department of Biochemistry, La Trobe University, Melbourne, Victoria, Australia

41

6183.'&$=$$

I'9'2+3/0;$&/('$8*$8$F/+,8(.+&-$,+&$$

(-(2/($3'3./4'$3&+4)(./+0$

2.1. Overview Cyclotides, together with the cyclic sunflower trypsin inhibitor peptide (SFTI-1) are proven to be effective scaffolds for the display of bioactive epitopes. This approach, which is often referred to as peptide grafting, has been used to design therapeutic peptide candidates active against a wide range of diseases and symptoms, including pain (Wong et al., 2012), cancer (Swedberg et al., 2009, Quimbar et al., 2013, Fittler et al., 2014, Jendrny & Beck-Sickinger, 2016), metabolic disease (Eliasen et al., 2012) and cardiovascular disease (Getz et al., 2011). There is a great need for efficient production systems to manufacture these therapeutic peptides.

To achieve producing cyclic peptides in a cost efficient and environment friendly way, a plant-based recombinant production system is appealing. Rice production systems for cyclic peptides are described in this chapter. Rice carries several advantages for this purpose, including an efficient transformation method, a complete genome sequence and the ability to produce and store recombinant products, such as disulfide-rich proteins in seeds (He et al., 2011). Furthermore, rice does not naturally produce cyclic peptides, which eliminates possible interference of native cyclic peptides during purification steps. Recently, some cyclotide-like peptides and sequences were discovered in monocot plants (Nguyen et al., 2013, Porto et al., 2016, Salehi et al., 2017), which raises a hypothesis that rice as a monocot plant is likely be able to express and cyclise cyclotides provided that appropriate cyclization enzymes are present.

A specialised subgroup of proteases termed asparaginyl endopeptidases (AEPs) have been characterised that act as peptide ligases, including butelase1 from C. ternatea (Nguyen et al., 2014) and OaAEP1b from O. affinis (Harris et al., 2015). These peptide ligases are known to process the C-terminal Asx residue of the cyclotide precursor, resulting in the cyclization of the peptide backbone in the vacuole (Mylne et al., 2011, Poon et al., 2017). Recently, co-expression of these ligase type AEPs together with cyclotide precursor genes in N. benthamiana was shown to significantly increase the level of cyclic peptide production in planta (Poon et al., 2017, Jackson et al., 2019).

This chapter describes the establishment of a stable transformation platform to produce cyclic peptides in rice callus-derived suspension cells and seeds. Transgenes encoding the prototypical cyclic peptides, kB1 and SFTI-1, and a series of engineered SFTI-1analogues were co-expressed with OaAEP1b in these two production systems. To achieve the efficient production in rice, strong promoters were chosen to drive gene expression, endosperm specific promoters were used for accumulation in seeds, and gene code of some analogues was optimised for expression in rice. The yields and structures of the rice-derived cyclic peptides were characterized and the transcript expression levels of AEPs were analysed.

43 2.2. Materials and methods 2.2.1. Expression vector design and cloning Choice of regulatory sequences Gene promoters previously demonstrated to promote high level gene expression in rice (Qu & Takaiwa, 2004, Park et al., 2012) were selected. For expression in rice suspension cells, the well characterised maize ubiquitin (Ubi) (DQ141598) (Christensen & Quail, 1996) and phosphogluconate dehydrogenase gene (PGD1) (AK065920) promoters were used, while for specific expression in rice seed, the Glutelin B1 (AY427569) and Glutelin B4 (AY427571) promoters were used. For transcriptional termination, the Arabidopsis heat shock protein terminator (hspT) was used in all expression cassettes (Nagaya et al., 2009). Native cyclic peptide precursor genes chosen for expression in rice included Oak1, encoding kB1 and PawS1, encoding SFTI-1. Additionally, engineered SFTI-1 peptide variants [D14N]SFTI-1, [T4Y,I7R]SFTI-1 together with native SFTI-1, were engineered to be processed from the Oak1 precursor protein, replacing the kB1 peptide domain (Oak[D14N]SFTI-1_GLDN, OakSFTI-1_GLDN and Oak[T4Y,I7R]SFTI-1). Two codon optimized therapeutic SFTI-1 grafts of a melanocortin receptor 1

(MC1R) epitope (HFRW, His-Phe-Arg-Trp), mcrB (single epitope) and mcrF (double epitopes), were also engineered into the Oak1 precursor (Os_OakSFTImcrB and Os_OakSFTImcrF). All precursor sequences are listed in Section 2.6.

Vector construction Promoters, cyclic peptide precursor genes and terminator sequences were PCR amplified from either gene block DNA fragments or previously prepared plasmids. Primers were designed to incorporate NotI SbfI restriction sites that span the gene of interest, AscI NotI restriction sites that span the promoter sequences and SbfI PacI restriction sites that span the terminator sequences, 5’ and 3’ ends respectively. Promoters, coding and terminator sequences were initially assembled into an intermediate cloning vector PM10 as showed in Figure 2.1A (in-house design, supplementary sequences in Section 2.6). Once assembled, the expression cassette was either digested with AscI PacI or AscI KpnI for ligation into the similarly digested plant expression vector pMDC99 as indicated in Figure 2.1B (Curtis & Grossniklaus, 2003). The AscI PacI and AscI KpnI dual insertion sites allowed the assembly of dual expression cassettes to express both precursor peptide genes and the cyclization assisting AEP (OaAEP1b) gene as illustrated in Figure 2.1C. For dual expression, different promoters were used to minimise the possibility of homology dependent gene silencing (Meyer & Saedler, 1996). Assembled expression cassettes used for suspension cells and seeds are listed in the Table 2.1. Assembled vectors were then transferred to an Agrobacterium tumefaciens strain LBA4404 by electroporation (Weigel & Glazebrook, 2006).

44

Figure 2. 1 Map of vectors and constructs designed for rice expression systems. A. PM10 vector. This vector is used as an intermediate cloning vector to assemble promoters and coding genes. B. pMDC99 vector. This vector is designed for high-throughput expression in planta (Curtis & Grossniklaus, 2003). C. Overview of the gene construct design approach. Promoter elements were inserted into the intermediate PM10 vector using restriction sites AscI and NotI while genes and terminators were inserted at NotI SbfI and SbfI PacI/KpnI sites respectively. Once within PM10 the whole transgene expression cassette can be digested out and religated into the binary vector pMDC99 which enables Agrobacterium transformation of rice callus. Double expression cassettes were constructed with AscI PacI and AscI KpnI which provides two opposite orientated gene cassettes.

Table 2. 1 Expression cassettes used in suspension cell and seed expression system Expression cassettes Suspension cell system Seed system

UO: Ubi::Oak1::hspT UP: Ubi::PawS1::hspT

UOPOa: BOBOa:

Ubi::Oak1::hspT+PGD1::OaAEP1b::hspT GluB1_prom::Oak1::hspT+BluB4_prom::OaAEP1b ::hspT UOS_NPOa:

Ubi::Oak[D14N]SFTI-1_GLDN::hspT+PGD1:: OaAEP1b BPBOa:

::hspT GluB1_prom::PawS1::hspT+BluB4_prom::OaAEP1b ::hspT UOS_DPOa:

Ubi::OakSFTI-1_GLDN::hspT+PGD1:: OaAEP1b ::hspT UOSpiPOa:

Ubi::Oak[T4Y,I7R]SFTI-1::hspT+PGD1:: OaAEP1b ::hspT UOSmcrBPOa:

Ubi::OakSFTImcrB::hspT+PGD1:: OaAEP1b ::hspT UOSmcrFPOa:

Ubi::OakSFTImcrF::hspT+PGD1:: OaAEP1b ::hspT

45 ="="="$A;&+F8(.'&/):-:'4/8.'4$*.8F2'$.&80*,+&:8./+0$/0$&/('$ Rice (Oryza sativa L. subsp. japonica) cultivar Nipponbare was used for transformation. To regenerate transgenic rice plants, a stepwise progression through the rice life cycle is required (Figure 2.2). This includes callus induction from mature seed, Agrobacterium infection, co-cultivation, selection, regeneration, cultivation and finally harvesting of transgenic seed. For suspension cells, resistant callus were induced in liquid initiation medium by sub-culturing supernatant fine cells.

Figure 2. 2 Agrobacterium-mediated transformation of rice. A. Flowchart of Agrobacterium-mediated transformation in rice. Rice seed sterilization and callus initiation normally takes 6-8 weeks before callus are ready for agrobacterium infection. Following with 3-day co-cultivation in dark, selection takes another 7-9 weeks. To obtain transgenic seeds, it takes 4-5 weeks for regeneration and 5 months for transgenic plants growing to mature. To induce fine suspension cells, it normally takes 8-10 weeks. B. Diagram of transformation of rice and obtaining transgenic seeds and suspension cells. Expression vectors were first transformed into Agrobacterium. Mediated by Agrobacterium, T-DNA was integrated into plant genomic DNA. The transgenic cells were selected for either regenerating to be transgenic plants for obtaining transgenic seeds or inducing as transgenic suspension cells.

Solutions and Media a.! Chu N6 vitamin solution: 1000x 1mL/L, was purchased from PhytoTech Laboratories, LLCTM b.! MS vitamin stock: 1000x 1 mL/L, was purchased from PhytoTech Laboratories, LLCTM c.! 2,4-Dichlorophenoxyacetic acid (2,4-D) stock: 1 mg/mL, powder was purchased from

Sigma-Aldrich, dissolved in 1 M KOH and add H2O to volume, filter sterilized, store at -4 °C d.! Kinetin stock: 2 mg/mL, powder was purchased from Sigma-Aldrich, dissolved in 1 M KOH and

H2O added to volume, filter sterilized, and stored at -20 °C e.! Naphthalene acetic acid (NAA) stock: 2 mg/mL, powder was purchased from Sigma-Aldrich,

dissolved in 1 M KOH and H2O added to volume, filter sterilized and stored at -20 °C f.! Rifampin solution: 50 mg/mL, powder was purchased from Austratec Pty. Ltd, dissolved in methanol, filter sterilized and stored at -20 °C

46 g. Kanamycin solution: 50 mg/mL, powder was purchased from Austratec Pty. Ltd, dissolved in H2O, filter sterilized and stored at -20 °C h. Hygromycin B solution: 100 mg/mL, was purchased from PhytoTech Laboratories, LLCTM

i. Timetin: 200 mg/mL, powder was purchased from Austratec Pty. Ltd, dissolved in H2O, filter sterilized and stored at -20 °C j. Acetosyringone (ACS) stock: 0.2 M, powder was purchased from Sigma-Aldrich, dissolved in

DMSO and H2O (1:1), filter sterilized and stored at -20 °C

The medium used in rice transformation was based on the protocol of Main et al. (Main et al., 2015), with slight modifications taken from a review by Hiei & Komari (Hiei & Komari, 2008). The composition of the various media used are listed in Tables 2.2-2.7.

Callus initiation Glumes of seeds were removed individually by hand, then dehusked seeds were soaked in 70% ethanol for 1 min. After rinsing three times with sterile water, seeds were submerged in 50% bleach solution (sodium hypochlorite) for 15 min, with gentle mixing using a rotary suspension mixer. Following a further three washes in sterile water, seeds were left to dry on pre-sterilized filter paper in a lamina flow. For initiated callus growth, seeds were placed embryo face up on callus initiation medium at a density of 12 seeds per petri dish. Each plate was sealed with micropore tape with embryonic callus was induced under 26 °C, 18/6h photoperiod. Callus were grown until 4-5 mm in size for Agrobacterium infection by subculturing every other week. In this thesis, callus was used in both singular and plural form to describe the process and object (Teixeira da Silva, 2012).

Agrobacterium infection Single colonies of transformed Agrobacterium were selected on LB agar medium supplemented with 50 mg/mL rifampicin and kanamycin. A single clone was picked and cultured in 5 mL LB broth medium with 50 mg/mL rifampicin & kanamycin. This starter culture was then scaled up to 50 mL and grown for a further 2 days at 30 °C, with the addition of 100 µM ACS and 2 mM MgSO4 to induce virulence genes and prevent Agrobacterium aggregation respectively. Agrobacterium were then collected by centrifugation and resuspended to an optical density of OD600 = 0.6 in liquid infection medium supplemented with 100 µM ACS. Callus were subcultured to fresh initiation medium 3 days ahead of the infection with Agrobacterium. For inoculation with Agrobacterium, around 300 callus were first transferred to a 50-mL falcon tube with

5 mL liquid infection medium, then Agrobacterium suspension buffer (OD600 = 0.6) was added and shaken gently for 15 min in the rotary suspension mixer. Following this, callus were air dried in the lamina flow hood until no liquid drop left on surface. Callus were then co-cultivated on two pieces of sterile filter paper on the top of initiation medium with100 µM ACS in the dark at 28 °C for 3 days.

47 Table 2. 1 Initiation medium Table 2. 4 Liquid infection medium Compound Amount per L Compound Amount per L N6 salts 3.98 g N6 salts 3.98 g 30 g N6 vitamin 1 mL (1000x stock) L‑Proline 2.8 g Sucrose 68.4 g Casein 0.3 g 2,4‑D 1.5 mg Myo‑inositol 0.1 g Glucose 36.0 g ‑ 2,4 D 2 mg L‑Proline 0.7 g pH 5.8 pH 5.2 Filter sterilization 4 °C storage Phytagel 3 g ACS 100 µM Autoclave sterilization

N6 vitamin 1 mL (1000x stock) Table 2. 5 Regeneration I medium Compound Amount per L Table 2. 2 Selection medium MS salts 4.43 g Compound Amount per L Sucrose 30 g N6 salts 3.98 g Sorbitol 30 g Sucrose 30 g Casein 2 g L‑Proline 2.8 g Kinetin 2 mg Casein 0.3 g pH 5.8 Myo‑inositol 0.1 g Phytoagar 8 g 2,4‑D 2 mg Autoclave sterilization pH 5.8 MS vitamin 1 mL (1000x stock) Phytagel 3 g NAA 0.02 mg Autoclave sterilization Timentin 200 mg N6 vitamin 1 mL (1000x stock) Hygromycin 25 mg Hygromycin 25 mg

Timentin 200 mg Table 2. 6 Liquid suspension medium

Table 2. 3 Regeneration II medium Compound Amount per L Compound Amount per L N6 salts 3.98 g MS salts 4.43 g Sucrose 30 g Sucrose 30 g Kinetin 0.02 mg pH 5.8 2,4‑D 2 mg Phytagel 3 g pH 5.8 Autoclave sterilization Autoclave sterilization N6 vitamin 1 mL (1000x stock) MS vitamin 1 mL (1000x stock)

48 Selection of transformed embryogenic callus After two days of co-cultivation, callus were transferred to selection medium (40-50 callus per petri dish). Each piece was kept separate to ensure that individual transformation events were selected. Selection of hygromycin resistant callus was performed in the dark at 28 °C, with sub-culturing performed after the first week and every other week afterwards. Hygromycin resistant callus became evident after the third subculture, upon which they were transferred to fresh selection medium and given an identification number. No more than eight independent transformed lines were cultured in one petri dish.

Regeneration of transformed seedlings Callus that was transformed with seed expression vectors were advanced through to the regeneration step. Rapidly growing callus were transferred to the regeneration medium I petri dish and sealed with Micropore tape, and cultivated under light (16:8 photoperiod) at 26 °C. Callus started turning green after ten days and was followed by root and shoot production. Small seedlings were then transferred to regeneration medium II for further development. After seedlings grew to 10 cm in height, they were moved to soil.

Seedling cultivation condition Transgenic rice lines were grown in a temperature controlled PC2 glasshouse. During the summer, air conditioning and shed protectors kept the temperature cool. During the winter, extra lights were added to extend the day light time as this affects rice flowering time.

Suspension cells Callus transformed with suspension cell expression vectors were advanced to the cell culture step. After selection, hygromycin resistant and rapidly growing callus were transferred to liquid initiation medium, starting with 5 mL medium in a 15 mL flask. Suspension cells were placed on a reciprocating shaker at 100 rpm in light (16:8 photoperiod) and subcultured every week. When the suspension cells covered the bottom of flask, they were transferred into bigger flasks, gradually progressing from 15 mL, 35 mL, 85 mL, 125 mL, 250 mL, 500 mL, 1 L and 2 L.

2.2.3. Fluorescence microscopy Callus that carried green fluorescent protein (GFP) were observed under a dissecting microscope (Nikon SMZ18) with a digital SLR camera (Nikon DS-Qi2). The microscope comes with 0.5x and 1.0x magnification objective lenses. A 130 W intense light mercury lamp provided illumination (Nikon HG lamp C-LHGFI). Images were analysed by NIS-Elements Advanced Research (v5.01).

2.2.4. DNA and RNA extraction and PCR DNA extraction A small piece of young leaf (about 100 mg) or a small amount of fresh cells (about 0.5 mL) was prepared by grinding with 600 µL 2% CTAB buffer using Geno/Grinder (SPEX SamplePrep). CTAB buffer and

49 extraction followed the protocol created by Porebski et al. (Porebski et al., 1997). DNA samples were stored in -20 °C.

RNA extraction Preparation of RNA began with harvesting a small piece of young leaf (about 100 mg) or a small amount of fresh cells (about 0.5 mL) which was then ground with 1 mL TRIzolTM reagent (InvitrogenTM) using Geno/Grinder (SPEX SamplePrep) with two to three grinding balls per 2 mL Eppendorf tube®. Extraction followed the TRIzolTM reagent manufacturer’s protocol. RNA samples were stored long-term at -80°C, and stored short-term at -20 °C. cDNA synthesis The concentration of RNA was tested by NanoDropTM 2000 (ThermoFisher ScientificTM) and then a certain amount of RNA was used to synthesize cDNA. DNA was removed by DNase using a DNA Removal kit (InvitrogenTM) as described by the manufacturer’s protocol. DNA free RNA was reverse transcribed into cDNA using SuperScriptTM III Reverse Transcriptase kit (InvitrogenTM).

PCR Taq DNA Polymerase (InvitrogenTM) was used to amplify DNA fragments from plant DNA. Phusion High-Fidelity DNA Polymerase (ThermoFisher ScientificTM) was used to amplify target sequences for cloning. Both types of amplifications were performed using the manufacturer’s recommended parameters. Touchdown PCR and gradient PCR were used to optimise the amplification of specific target sequences, including amplifying the rL1 gene from rice genomic DNA and optimisation of Real time PCR (RT-PCR) primers. All PCR reactions were carried out in a Kytatec SuperCycler 300T.

RT-PCR was performed using an Applied BioSystem ViiA 7 real-time PCR instrument (ThermoFisher ScientificTM). Housekeeping genes OsGAPDH and OsUBQ5 were used to normalise the threshold cycle of cyclotide-like genes. Three biological replicates and three technical replicates of each treatment sample were used in RT-PCR. The following components made up all RT-PCR experiments: 5 µL cDNA, 6 µL SYBER green master mix (Applied BiosystemsTM) and 1 µL primers. The relative changes of expression level were analysed using the 2−ΔCT method (Livak & Schmittgen, 2001). Primers were synthesized by Integrated DNA Technologies and are shown in Table 2.8.

2.2.5. Peptide extraction and structure confirmation Peptide extraction Selected callus were lyophilized using an Alpha 2-4 LD Christ freeze-dryer and ground by Geno/Grinder (SPEX SamplePrep). Dry powder was soaked in extraction buffer (50% acetonitrile (ACN) + 1% formic acid (FA)) for 2 hours in a vertical shaker. For small amounts of crude extract, samples were desalted and concentrated using C18 ZipTip (Millipore). For large amounts of crude extract, samples were purified

50 TM through Whatman filter paper, solid phase extraction (SPE) C18 column and further separated and purified by reverse-phase high performance liquid chromatography (RP-HPLC) (Solvent A: 0.1% trifluoroacetic acid (TFA) in H2O; Solvent B: 90% ACN with 0.1% TFA), through preparative columns (8 min/s), semi-preparative columns (3 min/s) and analytical columns (1 min/s) in the Prominence UFLC system (Shimadzu).

Absolute and relative quantification For absolute quantification, three technical replicates of each transgenic line were used for peptide quantification. Around 20 mg dry powder (exact amounts were recorded) of suspension cells was soaked in 300 µL extraction buffer (50% ACN + 1% FA) with 0.05 µM codeine as the internal standard. 30 µL supernatant was measured using targeted multiple reaction monitoring (MRM) in a SCIEX QTRAP 6500 mass spectrometer. SCIEX MultiQuant (v3.0.2) software was used to plot curves against concentration. A standard curve was plotted by using chemically synthesized standard peptides in a range of concentrations (1000, 500, 250, 125, 62.5 and 31.25 ng/mL).

For relative quantification, 10 mg of dry powder was soaked in 100 µL extraction buffer (50%ACN + 1% FA) with gentle mixing for 2 hours. After centrifugation, the supernatant was transferred to a new tube and diluted to 10% ACN+1% FA. Samples were then desalted and concentrated using a C18 ZipTip (Millipore). Cleaned samples were then 1:1 mixed with a-cyano-4-hydroxycinnamic acid (CHCA, 5 mg mL-1 in 50%

ACN + 1% FA, 5 mM NH4H2PO4) (Smirnov et al., 2004) and spotted on a matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF-MS) sample plate. They were analysed in TOF/TOF Proteomics Analyzer 4700 (Applied Biosystem). The relative expression yields were calculated based on the ratio of relative intensity to an internally spiked peptide control.

Co-elution Chromatographic co-elution with kB1 from suspension cells and native standard kB1 was run separately and together by LC-MS (Shimadzu LC-MS 2020) at a 2% min-1 solvent B gradient (solvent B: 0.05% FA in 90% ACN) with a flow rate of 0.3 mL/min.

Reduction, alkylation and enzymatic digestion kB1 from suspension cells was dissolved in 100 mM NH4HCO3 buffer (pH 8), then 100 mM dithiothreitol (DTT) (1:9 v/v) was added to reduce disulfide bonds. The reaction was incubated at 60 °C for 30 min under nitrogen. 250 mM iodoacetamide (1:9 v/v) was add after the reaction cooled to room temperature, and incubate for 30 min under nitrogen. 0.7 µL of the reduced and alkylated peptides was used for MALDI-TOF-MS analysis, and the rest was digested with endoproteinase Glu-C (1:3 v/v) at 37 °C overnight and quenched by adding 5% formic acid (1:4 v/v). Samples were then analysed by MALDI-TOF-MS as before.

51 Table 2. 8 Primers used in Chapter 2. Primer Name Primer Sequence GluB1_prom_Fwd_AscI AGGCGCGCCACAGATTCTTGCTACCAACAAC GluB1_prom_Rev_NotI AGCGGCCGCGGCTATTTGTACTTGCTTATGG GluB4_prom_Fwd_AscI AGGCGCGCCTACAGGGTTCCTTGCGTG GluB4_prom_Rev_NotI AGCGGCCGCGGCTATTTGAGGATGTTATTGG PGD1_Fwd_AscI AGGCGCGCCTAGATATGCCGAACATGACC PGD1_Rev_NotI AGCGGCCGCGCAGATAGATGCACCAAATG Oak1SFTI-1_NotI_Fwd AGCGGCCGCCCAAATGGCTAAGTTCACCGTCTG Oak1SFTI PD_GLDN_SbfI_Rev ACCTGCAGGTTAATTATCAAGGCCATCAGGG Oak1SFTI PN_GLDN_SbfI_Rev ACCTGCAGGTTAATTATCAAGGCCATTAGGG Oak1_Fwd GGAAGCTTGTGAGTCGGGTAGCATG Oak1_Rev CATTAGCAGGGTTAGAACCCATATAG PawS1_Fwd ATGGCTAAGCTCATCATCCTCG PawS1_ Rev TCAGATAGAGCACTGAAGGTTAC qOsGAPDH_Fwd AAGCCAGCATCCTATGATCAGATT qOsGAPDH_Rev CGTAACCCAGAATACCCTTGAGTTT qOsUBQ5_Fwd ACCACTTCGACCGCCACTACT qOsUBQ5_Rev ACGCCTAAGCCTGCTGGTT qOsVPE1_Fwd_2 GACTGAAGGAGGACACAGTG qOsVPE1_Rev_2 GATTAGCCATCACACAAGGG qOsVPE2_Fwd_3 GCTCCTGCAATATACCCAAG qOsVPE2_Rev_3 GTACAGCAATGATGGCAGGT qOsVPE3_Fwd_3 CAGTGCTTGATCTGCCGCAA qOsVPE3_Rev_3 TTCCCGGGCATGTAAAAGCA qOsVPE4_Fwd_1 ATGGGGCGCGGTCTCCTC qOsVPE4_Rev_1 CGTCGTCTCATCCGACGA qOsVPE4_Fwd_2 AGCTCGTCGGCCTCGTCGTC qOsVPE4_Rev_2 GCCTGGTGCCGGTAGTT qOsVPE4_Fwd_3 CAAAGGCAGCCACTCCTACAC qOsVPE4_Rev_3 GCACTCCCAGTCCTCAACCAG Fwd: Forward primer; Rev: Reverse primer; Restriction enzyme sites were italic and bold; Primers starting with q were designed for RT-PCR

NMR analysis

Suspension cell derived and chemical synthesized kB1 (~ 0.5 mg) were dissolved in 500 µL H2O and 50 µL deuterium oxide with the pH adjusted to 4.6. One-dimensional analysis was undertaken using a Bruker Avance 600 MHz NMR spectrometer at 25 °C. Data was processed by TOPSPIN (Bruker) and assigned by CCPNMR (Vranken et al., 2005). 2.2.6. Data analysis Statistical analysis

52 The significant differences between different transgenic lines or between transgenic lines and wild type were determined by using unpaired T test for two tailed P value in Prism. Statistical significance was expressed as follows: P * 0.05*, P * 0.01**, P * 0.001***, P * 0.0001****.

Image analysis Seed length were measured using ImageJ with an internal scale bar. 20 seeds from each line were measured.

="D"$ H'*)2.*$ ="D"!"$%&+:+.'&$8082-*/*$ To aid in the establishment of a rice transformation protocol and to test the activity of the chosen promoters, a series of promoter-GFP fusion constructs were prepared (Figure 2.3). Promoters incorporated constitutive promoters, including maize ubiquitin promoter (Ubi) and rice phosphogluconate dehydrogenase gene promoter (PGD1). Endosperm specific promoters from rice were also employed, including the glutelin B1 (GluB1_prom) and B4 (GluB4_prom) promoters. Hygromycin resistant callus were observed under a fluorescence microscope, and GFP fluorescence was observed from callus harbouring one of the selected promoters. The fluorescence observed from callus harbouring endosperm specific promoters suggested that rice callus contains active transcription factors for the cis regulatory elements of the chosen endosome specific promoter elements. However, the transformation efficiency was low at the beginning stage and only a limited number of hygromycin resistant callus were obtained.

Figure 2. 3 Expression of GFP using constitutive and endosperm specific promoters. A. Callus under the bright and UV light. Transformed callus with GFP were observed fluorescence under UV light. Untransformed callus were detected under bright light but not UV light. B. Expression of GFP with promoters used in the current study. GFP was driven by constitutive primers, Ubi and PGD1, and endosperm specific primers, GluB1_prom and GluB4_prom. Transformed callus with GFP were detected under UV light.

53 2.3.2. kB1 expression in rice suspension cells To test the feasibility of producing cyclic peptides in rice callus, the Oak1 gene from O. affinis encoding for the cyclotide kB1 was first introduced into rice. To enhance kB1 maturation and backbone cyclization, a ligase-type AEP from O. affinis, OaAEP1b, was co-expressed, which was previously shown improvement in the yield of kB1 in N. benthamiana (Poon et al., 2017). To test if the same applies for rice, two expression constructs were made containing either single or double cassettes as illustrated in Figure 2.4A. In the first instance, a single expression cassette (UO) was incorporated consisting of the Ubi driving the cyclotide-encoding Oak1 gene followed by the Arabidopsis heat shock protein terminator (HspT) (Nagaya et al., 2009). For the double cassette expression construct (UOPOa), a second cassette was added incorporating the gene for the asparaginyl endopeptidase OaAEP1b ligase. OaAEP1b was placed under the control of a rice specific constitutive promoter, PGD1 and the HspT terminator.

A total of 13 and nine transgenic rice callus lines were generated respectively from single (UO) and double (UOPOa) kB1 expression constructs. Quantification of cyclic kB1 yield using an LC-MS/MS method revealed all transgenic lines produced cyclic kB1. The average yields of cyclic kB1 between UO and UOPOa groups were not significantly different (Figure 2.4B). The top three yielding lines were then selected for continuous culturing, and after three months, a stable expression and fine cell line from each construct (UO #4, UOPOa #9) were selected for further analyses. To investigate the accumulation of kB1 in suspension cells, kB1 was measured from UO #4 and UOPOa #9 separately over nine days. Both suspension cell lines produced cyclic kB1 at a similar yield trend over time, slowly rising from day one to reach a peak at day seven, followed by a slight decline. The highest levels at day seven was 64.21 ug/g (DW) and 57.14 ug/g (DW) for UOPOa #9 and UO #4 respectively (Figure 2.4C).

Using MALDI-TOF-MS to further investigate kB1 expression, three prominent peptide mass signals were evident, cyclic kB1 (m/z 2891), linear kB1 (m/z 2909) and truncated linear kB1-G (missing Gly at the end of linear kB1) (m/z 2852) (Figure 2.5A). In the case of line UOPOa #9, the proportion of cyclic kB1 to linear peptides was clearly higher, suggesting cleaner processing of Oak1 when OaAEP1b is co-expressed. To ensure the accuracy of the MALDI trace, absolute quantifications were also determined (Figure 2.5B). The cyclic kB1 yield was similar between UO #4 and UOPOa #9 at around 60 ug/g (DW), but the linear kB1 yield of UO #4 was 50.36 ug/g (DW) compared to 29.85 ug/g (DW) for UOPOa #9. An additional 5.08 ug/g (DW) of linear kB1-G was calculated for UO #4, but zero for UOPOa #9. The low yields of linear kB1 and kB1-G in UOPOa #9 suggest that the OaAEP1b did assist the cyclization of kB1, although the final yield of cyclic kB1 did not significantly increase.

54

Figure 2. 4 Expression of cyclic kB1 in rice suspension cells. A. Expression cassettes of kB1. The single cassette (UO) contained the Oak1 gene (red) driven by maize ubiquitin promoter (Ubi, blue) and terminated by the heat shock protein terminator (HspT, grey). Double cassette vectors (UOPOa) contained an additional cassette incorporating OaAEP1b for cyclization. OaAEP1b (red) was driven by a rice specific constitutive promoter (PGD1, blue) and was terminated by the HspT terminator (grey). B. kB1 yield in transgenic lines. Box and whisker plots shows the yield of cyclic kB1 from 13 transgenic lines harbouring vector UO and nine lines harbouring vector UOPOa. Average level of UOPOa group (~40 ug/g) is higher than UO group (~28 ug/g) C. kB1 expression trend over nine days. Cyclic kB1 from three replicates of each UOPOa #9 and UO #4 were measured every other day after subculture over nine days. The yield raised from day one and reached a maximum at day seveh, followed by a slight decline.

Figure 2. 5 Oak1 expression in rice suspension cells between UO #4 and UOPOa #9. A. MALDI-TOF MS spectra of kB1 and its variants. Cyclic kB1 (m/z 2891), linear kB1 (m/z 2909) and truncated linear kB1-G (m/z 2852) were detected in both UO #4 and UOPOa #9. B. Yields of cyclic kB1, linear kB1 and truncated linear kB1-G in UO #4 and UOPOa #9. Cyclic kB1(black), linear kB1 (light brown) and linear kB1-G (grey) from day seven of UO #4 and UOPOa #9 were quantified. The yield of cyclic kB1 is the highest in both lines; the second highest yield is linear kB1-G and the least yield is linear kB1. Linear kB1 from UOPOa #9 could not be detected in the MS data.

="D"D"$B04+;'0+)*$()P%B*$'G3&'**/+0$/0$(822)*$

Although the cyclic to linear kB1 ratios improved with the co-expression of the OaAEP1b gene, the absolute amount of cyclic kB1 quantified was similar, with or without AEP co-expression. This finding suggests that rice suspension cells do harbour an AEP or AEPs that are highly active on the kB1 substrate. To understand the role of OsVPEs in cyclic peptide maturation and their expression in rice callus, sequence alignment analysis and real time-PCR were utilized.

55 In rice, four endogenous vacuolar processing enzymes (VPEs), also known as asparaginyl endopeptidase AEPs, have been identified based on the sequence similarity to the VPEs in Arabidopsis and Nicotiana, including OsVPE1 (Os04g45470), OsVPE2 (Os01g37910), OsVPE3 (Os02g43010) and OsVPE4 (Os05g51570) (Deng et al., 2011). Recently characterised structural features of ligase type AEPs include ligase-activity determinants (LAD1 & LAD2) and the marker of ligase activity (MLA) (Jackson et al., 2018, Hemu et al., 2019). By aligning rice OsVPEs with ligase-type AEPs, C. ternatea butelase1, O. affinis

OaAEP1b, V. yedoensis VyPAL2 and Petunia PxAEP3b, no ligase type MLA region was observed in any rice OsVPEs (Figure 2.6). Moreover, no rice OsVPEs were found to contain ligase-like LAD1 and LAD2 features; instead they carried more homology to those present in the protease-type as PxAEP3a. Comparing the overall similarity and identity score of the four rice OsVPEs indicated OsVPE1 and OsVPE3 share the closest similarity and identity at 82% and 72.7% (Table 2.9). In addition, OsVPE2 was closest to the ligase-type AEPs sharing a similarity of over 70% and an identity of over 55%.

Figure 2. 6 Alignment of LAD and MLA in OsVPEs. Ligase-type AEPs, butelase1, OaAEP1b, VyPAL2 and PxAEP3b share similar variants (red letters) in LAD1 and LAD 2. Four rice OsVPEs are similar to protease-type AEP (PxAEP3a) at LAD1 and LAD2 (navy letters). Ligase-type AEPs share either a deletion or hydrophobic residues (red frame) in MLA region. Four rice OsVEPs share neither similarity with ligase-type AEPs.

Table 2. 9 Similarity and identity among AEPs

Similarity\Identity Scores (%)

AEPs Butelase1 OaAEP1b VyPAL2 PxAEP3b PxAEP3a OsVPE1 OsVPE2 OsVPE3 OsVPE4 Butelase1 100.0 61.5 64.5 64.9 64.8 49.6 60.8 51.1 51.6

OaAEP1b 74.2 100.0 60.2 63.2 62.2 49.0 58.3 51.0 49.2 VyPAL2 76.9 77.0 100.0 63.4 62.0 45.9 56.1 47.4 46.6

PxAEP3b 78.9 74.5 77.9 100.0 91.3 52.7 63.9 56.4 54.0

PxAEP3a 79.7 74.8 77.3 94.8 100.0 54.5 65.9 58.0 55.3 OsVPE1 65.1 62.9 62.2 66.2 68.4 100.0 55.8 72.7 47.2 OsVPE2 74.3 71.4 70.7 75.0 77.4 69.2 100.0 56.8 56.7 OsVPE3 65.7 65.2 64.6 67.0 69.2 82.0 70.3 100.0 48.7 OsVPE4 66.9 63.9 62.9 68.1 69.8 60.1 69.4 62.0 100.0 Similarity scores are listed on the bottom left and identity scores are listed on the top right between any two AEPs.

56 To identify the predominant rice OsVPEs expressed in rice callus, real time PCR analysis was used to analyse transcript levels in wild type cells (WT), as well as in transgenic lines UO #4 and UOPOa #9. Two housekeeping genes (OsGAPDH and OsUQB5) were used as controls to assess the relative expression of AEPs. Similar expression patterns were found when normalising against both reference genes, so only relative expression data with OsGAPDH is presented in Figure 2.7. Among all tested suspension cell lines, OsVPE2 transcript was dominant, followed by OsVPE1 and OsVPE3. Interestingly, the transcript levels of OsVPE1, 2 and 3 were higher in UOPOa #9 than their transcript levels in UO#4. The reason for this up-regulation by co-expressing OaAEP1b remains unclear, but some AEPs are known to be upregulated during various stresses. Compared to the endogenous rice AEPs, expression of the OaAEP1b transgene in UOPOa #9 was the lowest. This low expression can be affected by the promoter, the number of gene copies and the location of the gene in the genome.

Figure 2. 7 Transcript analysis of OsVPEs and OaAEP1b in suspension cells. The expression of three endogenous OsVPEs and OaAEP1b was analysed between transgenic lines with single (UO #4) and double (UOPOa #9) cassettes and wild type (WT) rice suspension cells. Housekeeping gene OsGAPDH was used as the control gene to assess the relative expression. The significance between transgenic lines to WT were calculated, and labelled with **** (P * 0.0001), **(P * 0.01), *(P * 0.05).

Expression of OsVPE4 could not be confirmed as no amplifications were obtained using primers designed for RT-PCR based on OsVPE4 genomic sequence. According to the Rice eFP browser (Figure 2.8), OsVPE4, is predominantly expressed in roots and germinating seeds. It is noteworthy that the microarray expression intensity for OsVPE4 is lower than the other three OsVPEs and the eFP browser flagged that some samples exhibit high standard deviations for replicates (Figure 2.8). In contrast, OsVPE1 and OsVPE3 are predominantly expressed in seeds, and OsVPE2 in leaves.

57

Figure 2. 8 Endogenous OsVPEs expression in rice. The GCOS expression intensity of each OsVPE from high to low are shown from red to light yellow. Some samples exhibit high standard deviations (masked grey) for replicates in OsVPE4. These expression data were from the Rice eFP browser (http://bar.utoronto.ca/efprice/cgi-bin/efpWeb.cgi).

="D"#"$>.&)(.)&82$(18&8(.'&/78./+0$+,$&/('-4'&/9'4$OE!$$ To confirm the structural equivalence of kB1 produced from rice to that of native extracted kB1, co-elution, reduction, alkylation, enzyme digestion and NMR were used. For co-elution studies, a crude extract of kB1 peptide was prepared from suspension cells of line UOPOa #9 using a C18 SPE column prior to analysis by LC-MS. The elution time of native extracted kB1 was shown to be equivalent to the rice-produced kB1 at 10.2 min as illustrated in Figure 2.9A. When the samples were combined, only one eluted peak was evident, which suggests that rice derived and native extracted kB1 have the same hydrophobicity.

Reduction, alkylation and backbone linearization digestion experiments were combined to confirm the formation of disulfide bonds and the cyclized backbone of rice-derived kB1. For this analysis, cyclic kB1 was first extracted and HPLC purified from suspension cells of line UOPOa #9. To confirm that the purified cyclic kB1 contained three disulfide bonds, DTT was first added, followed by iodoacetamide treatment. Together this treatment resulted in the expected mass shift for the three disulfide-containing kB1 of +348 Da (Figure 2.9B). The alkylated samples were then digested with endoproteinase Glu-C, which was expected to linearize the cyclic backbone of kB1, resulting in a mass increase of 18 Da. As the spectra in Figure 2.9B show, the peak of cyclic kB1 is at m/z 2891, reduced and S-alkylated kB1 is at m/z 3239

58 (2891+348), and enzyme digested kB1 is at m/z 3257 (2891+348+18). This result indicates that the suspension cell derived kB1 formed a cyclized backbone and three disulfide bonds.

One dimensional NMR was used to confirm the secondary structure of cell derived kB1. As indicated in Figure 2.9C, cell-derived kB1 from line UOPOa #9 shows a similar peak pattern to the native extracted kB1, except for the first peak highlighted with an arrow. This unmatched peak was caused by a pH difference, as this shift was also observed when native extracted kB1 was tested at different pH values (Rosengren et al., 2003).

Figure 2. 9 Structure characterization of suspension cell produced kB1. A. LC-MS co-elution between native extracted kB1 and cell expressed kB1 from UOPOa #9. Cell extracted kB1 had the same peak as native kB1 at 10.2 min, as well as their combination. B. Reduction, alkylation and enzyme digestion of cell expressed kB1 from UOPOa #9. After disulfide bond reduction and alkylation, the peak of cyclic kB1 (m/z 2891) was shifted 348 Da to m/z 3239. An 18 Da shift was shown after treatment with endoproteinase Glu-C. C. NMR spectra of native kB1 and cell expressed kB1 from UOPOa #9. One dimensional spectra of native kB1 and cell derived kB1 demonstrate similar structures to each other. An unmatched peak attributed to pH differences is highlighted with an arrow.

59 2.3.5. Expression of grafts based on SFTI-1 in rice callus To test the expression of SFTI-1 and its analogues in the rice suspension cells, the kB1 domain within Oak1 was swapped with the smaller cyclic peptide SFTI-1 (Figure 2.10A). Additionally, a shorter C-terminal propeptide, GLDN, replaced the GLPSLAA tail of the Oak1 precursor. [D14N]SFTI-1, an analogue of SFTI-1 with Asn replacing Asp at the ligation point was designed to test the cyclization efficiency and stability of cyclic peptides. To assist cyclization, both the precursor peptide gene and OaAEP1b gene cassettes were combined into one expression vector in the form of a double cassette. After hygromycin selection, five positive lines of Oak[D14N]SFTI-1_GLDN and 12 OakSFTI-1_GLDN lines were obtained.

To confirm transgene integration and expression, genomic DNA and RNA were extracted from hygromycin resistant callus carrying Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN respectively. The PCR generated specific amplicons representing peptide, suggesting that they are positive transgenic lines harbouring the peptide cassette (Figure 2.10B). However, no PCR was undertaken to test AEP transgene integration.

For the transcript expression of Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN, the PCR using primers specific for the peptide precursor gene revealed only a faint amplicon in line #4 of OakSFTI-1_GLDN and no amplicons were observed in other lines. This faint band suggests low transgene expression of OakSFTI-1_GLDN, with no expression of Oak[D14N]SFTI-1_GLDN consist with the inability to detect the peptide. Using OaAEP1b specific primers, two specific amplified bands were observed in line #1 of Oak[D14N]SFTI-1_GLDN and line #4 of OakSFTI-1_GLDN and two unspecific amplified bands in line

#5 and #12 of OakSFTI-1_GLDN. This result suggests that not all lines were expressing OaAEP1b.

For peptide detection in transgenic lines carrying Oak[D14N]SFTI-1_GLDN, there were no peptides masses at 1512 Da (cyclic [D14N]SFTI-1) or 1530 Da (linear [D14N]SFTI-1) detected as illustrated in Figure 2.10C. Similarly, no signals were detected at 1513 Da (cyclic SFTI-1) or 1531 Da (linear SFTI-1) from transgenic lines carrying OakSFTI-1_GLDN. However, three unexpected peaks were detected at 1496, 1546 and 1568 Da. These peaks might represent modified SFTI-1 peptides, either with extra subunits on the peptide chain or with a truncated peptide chain. However, no mass values matched any of the modifications observed previously, such as PyroGln (+18 Da), truncated G (57 Da) or truncated N (114 Da).

60

Figure 2. 10 Expression of Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN in rice callus. A. Precursors of Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN. SFTI-1 (purple) contains Asp (D) at the end, [D14N]SFTI-1 was replaced with Asn (N) as the last amino acid. Both SFTI-1 and [D14N]SFTI-1 were swapped with kB1 in Oak1 precursor with a shortened CTPP, GLDN (red). SP: signal peptide sequence (blue), NTPP: N-terminal propeptide (orange), NTR: N-terminal repeat (yellow). B. Genomic and transcript analysis of transgenic lines carrying Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN. Template used for PCR were labelled on the bottom of each picture with the specific primers in brackets. N, D are short for Oak[D14N]SFTI-1_GLDN and OakSFTI-1_GLDN respectively. M is short for DNA Marker (1 kb ladder). Sizes of amplified products: SFTI-1/[D14N]SFTI-1, 321 bp; AEP 341 bp. C. MALDI spectra of SFTI-1 and [D14N]SFTI-1. No peaks detected in transgenic lines carrying Oak[D14N]SFTI-1_GLDN, three peaks were observed in transgenic lines carrying OakSFTI-1_GLDN at m/z 1496, 1546 and 1568.

A potent plasmin inhibitor based on SFTI-1 was also introduced into the rice expression system.

[T4Y,I7R]SFTI displays high potency for inhibiting plasmin (Ki= 0.041 nM) and is a promising candidate as an antifibrinolytic drug (Swedberg et al., 2018). As for SFTI-1, this inhibitor peptide domain was inserted in the Oak1 precursor gene as illustrated in Figure 2.11A (Oak[T4Y,I7R]SFTI-1). There are only two residue changes in SFTI-1which introduce a slight change in structure compared to native SFTI-1 structure. Moreover, [T4Y,I7R]SFTI-1 has been successfully produced in N. benthamiana leaf using an identical precursor gene arrangement (Jackson et al., 2019) to that assembled here for rice. In the

N. benthamiana study, [T4Y,I7R]SFTI-1 was only detected by co-expressing OaAEP1b, thus here an expression vector carrying both gene cassettes for expressing [T4Y,I7R]SFTI-1 and OaAEP1b was assembled. This expression vector was termed UOSpiPOa. After hygromycin selection, 11 positive UOSpiPOa lines were obtained.

61 To confirm transgene integration and expression, genomic DNA and RNA were extracted from hygromycin resistant callus carrying Oak[T4Y,I7R]SFTI-1. PCR generated specific amplicons for the AEP and precursor peptide genes from both genomic DNA and cDNA, which suggests they are positive transgenic lines (Figure 2.11B). Despite they are transgenic positive lines, there were no peptides detected with the expected mass values, 1618 Da for cyclic peptide and 1636 Da for linear peptide (Figure 2.11C). This inability to detect [T4Y,I7R]SFTI-1 could be due to an inefficiency of the folding machinery in rice suspension cells.

Figure 2. 11 Expression of Oak[T4Y,I7R]SFTI-1 in rice callus. A. Precursor of Oak[T4Y,I7R]SFTI-1. [T4Y,I7R]SFTI-1 (Plasmin inhibitor, PI) (SFTI-1 in purple with mutants in navy) replaced kB1 in the Oak1 precursor, with a shortened CTPP, GLDN (red). SP: signal peptide sequence (blue), NTPP: N-terminal propeptide (orange), NTR: N-terminal repeat (yellow). B. Genomic and transcript analysis of Oak[T4Y,I7R]SFTI-1 lines. Template used for PCR are labelled at the bottom of each gel picture with specific primers in brackets. P is short for positive control. M is short for DNA Marker (100 bp ladder). Sizes of amplified products: [T4Y,I7R]SFTI-1, 363 bp; AEP, 341 bp. C. MALDI spectra of [T4Y,I7R]SFTI-1. No peaks were detected between m/z 1500 to 1650 in Oak[T4Y,I7R]SFTI-1 lines.

Two additional SFTI-1 peptide grafts (SFTImcrB and SFTImcrF) were selected for expression in rice suspension cells. Both peptides have previously been demonstrated to be active as melanocortin receptor 1

(MC1R) agonists (Durek et al., 2018) and have the potential for development into drugs controlling obesity and inflammatory diseases. The grafted pharmacophore, His-Phe-Arg-Trp (HFRW), is the shortest peptide which is active on MCRs. It is inserted as a single or double unit for SFTImcrB and SFTImcrF respectively as illustrated in Figure 2.12A. Previous unpublished work on the expression of SFTImcrB and SFTImcrF in Arabidopsis showed that these cyclic peptides can be produced in the plant seed. To achieve efficient expression in rice, two codon optimized grafts were assembled, with each peptide sequence inserted into the Oak1 precursor containing the shortened CTPP of GLDN. These two constructs were termed OS_OakSFTImcrB and OS_OakSFTImcrF. To enable peptide cyclization, expression vectors were assembled carrying both gene cassettes for expressing SFTImcrB/F and OaAEP1b. These two expression vectors were termed UOSmcrBPOa and UOSmcrFPOa. After hygromycin selection, 18 positive lines carrying UOSmcrBPOa and one UOSmcrFPOa line were obtained.

62 To confirm transgene integration and expression, genomic DNA and RNA were extracted from hygromycin resistant lines of UOSmcrBPOa and UOSmcrFPOa and cDNA was prepared. PCR generated specific amplicons representing peptide and AEP transgenes, suggesting that they are positive transgenic lines harbouring double cassettes (Figure 2.12B). For the transcript expression, lines 11 and 12 of UOSmcrBPOa produced correctly sized amplicons; however, line 18 only produced peptide specific amplicons but not

OaAEP1b. Additionally, no PCR amplicons were detected from cDNA of the transgenic positive line of UOSmcrFPOa. For all those transgenic positive lines, the expected mass values, 1896 Da and 1914 Da for cyclic and linear SFTImcrB, and 2097 Da and 2115 Da for cyclic and linear SFTImcrF respectively were not detected in the MALDI-TOF-MS (Figure 2.12C).

Figure 2. 12 Expression of Os_OakSFTImcrB and Os_OakSFTImcrF in rice callus. A. Precursors of Os_OakSFTImcrB and Os_OakSFTImcrF. SFTImcrB and F (SFTI-1 in purple with MCR inserts in navy) were swopped with kB1 in Oak1 precursor with a shorten CTPP, GLDN (red). SP: signal peptide sequence (blue), NTPP: N-terminal propeptide (orange), NTR: N-terminal repeat (yellow). B. Genomic and transcript analysis of Os_OakSFTImcrB and Os_OakSFTImcrF lines. Template used for PCR were labelled on the bottom of each gel picture with specific primers in brackets. M is short for DNA Marker (100 bp ladder). Sizes of amplified products: SFTImcrB/F, 363 bp; AEP, 341 bp. C. MALDI spectra of Os_OakSFTImcrB and Os_OakSFTImcrF. Expected peptides were searched between 1700 to 2200 Da. A few peaks were detected, but none of them were the expected MS values for SFTImcrB or SFTImcrF.

="D"N"$%&+.+.-3/(82$(-(2/($3'3./4'*$'G3&'**/+0$/0$&/('$*''4*$ To test the feasibility of producing cyclic peptides in rice seeds, the Oak1 encoding kB1and PawS1 genes encoding SFTI-1 were introduced into rice along with OaAEP1b to assist in planta cyclization, termed BOBOa and BPBOa. To achieve expression and accumulation in seeds, endosperm specific promoters,

GluB1_prom driven peptide precursor and GluB4_prom driven OaAEP1b gene, were used in double cassette expression vectors. The Additionally, transgenic lines harbouring the PawS1 gene, under the regulatory control of the maize ubiquitin promoter, termed UP, were used as the test for regeneration in the current study.

63 To confirm the transgenic events at the initial selection stage, small amounts of hygromycin resistant callus were screened by PCR using peptide gene specific primers. PCR produced specific amplicons for the precursor peptide genes from genomic DNA suggesting they are positive transgenic lines (Figure 2.13).

Figure 2. 13 PCR test of transgenic callus. Genomic DNA was extracted from hygromycin resistant callus which harbor vectors BPBOa, BOBOa and UP. Peptide specific primers were used for amplifications. Sizes of amplified products: kB1, 375 bp; SFTI, 459 bp.

After trial and error, only a limited number of transgenic plants progressed to maturity and seed set. These plants when directly transferred from tissue culture conditions showed a stunted growth phenotype and produced less seed than wild type plants, including three transgenic lines of BOBOa (#4, #13, #7), and one line each for BPBOa (#2) and UP (#11) respectively. For transgenic lines, BOBOa #4, BPBOa #2 and UP #11, approximately one hundred seeds were collected for analyses of phenotype, seed weight, seed length and peptide expression. Transgenic seeds were generally shrunken and wrinkled, while wild-type seeds were mellow and smooth. As shown in Figure 2.14A, the colours of transgenic seed glumes were lighter than those of the wild-type. Some dehusked transgenic seeds were opaque white, especially seeds from line UP #11. Seed weight (100 seeds) revealed that both seeds with glumes and dehusked seeds in transgenic lines were significant lighter than wild-type seeds (p-value < 0.0001) as illustrated in Figure 2.14B. However, there were no significant differences in seed length between transgenic seeds and wild-type seeds as illustrated in Figure 2.14C.

For the production of kB1 in BOBOa #4, MS signal of cyclic kB1 predominated over linear kB1 as illustrated in Figure 2.15. For SFTI-1 production, only cyclic SFTI-1 was detected in both BPBOa #2 and UP #11. As determined by MS, the yields of cyclic kB1 and SFTI-1 were quantified by MS. The yield of cyclic kB1 from BOBOa #4 was 1.05 ug/g (DW), while the yields of cyclic SFTI-1 from BPBOa #2 and UP #11were 0.091 ug/g (DW) and 0.165 ug/g (DW) respectively.

64

Figure 2. 14 Phenotype, seed weight and seed length of transgenic seeds. A. Phenotype of rice seeds and dehusked seeds. 100 of seeds with glumes were shown in the photos with corresponding dehusked seeds. Bar: 1cm B. Column graph of seed weight in seeds and dehusked seeds. Both seeds with glumes and dehusked seeds in transgenic lines were significant lighter than wild-type seeds (P value < 0.0001). Significance was calculated using an unpaired T-test. C. Box and whisker plots of seed length. No significant differences observed between transgenic seeds and wild-type seeds. 20 seeds from each line were measured by image analysis. Significance was calculated using an unpaired T-test.

Figure 2. 15 MALDI spectra of kB1 and SFTI-1 in rice seeds. Cyclic kB1 (m/z 2891) was detected in BOBOa #4 with a small amount of linear kB1 (m/z 2910); cyclic SFTI-1 (m/z 1513) was detected in both BPBOa #2 and UP #11.

65 2.4. Discussion 2.4.1. kB1 and grafted cyclic peptides in rice suspension cells To test the capability of rice suspension cells to produce cyclic peptides, the Oak1 gene encoding kB1 was transferred into rice. By co-expressing OaAEP1b, cyclic kB1 was produced with a maximum yield of 64.21 ug/g (DW), enabling purification by HPLC and structural analysis by NMR spectroscopy. This is the first example of a recombinantly produced cyclotide in a monocot plant, where the structure of rice suspension cell produced kB1 was demonstrated equivalent to the native kB1 extracted from O. affinis. A previous attempt by Lim & Lai to express Oak1 in rice plants only provided preliminary evidence of Oak1 protein by dot blot (Lim & Lai, 2017). However, pure cyclic kB1 was not used as the positive control, and the result is not sufficient to claim the antigen detected is the Oak1 protein or the mature cyclic kB1. In addition, the affinity of cyclic kB1 is reportedly the much lower than the Oak1 protein for an anti-kB1 antibody (Gillon et al., 2008). This general poor antibodies affinity to cyclotides has limited the application of immunolocalization techniques to cyclotides in plant tissue (Gunasekera et al., 2006).

OaAEP1b was co-expressed to assist the cyclization of kB1 in rice suspension cells. Among cyclized, linear and truncated kB1 peptides, the cyclized kB1 mass signal was dominant. With co-expressed OaAEP1b, less linear and truncated kB1 were produced. This result is consistent with previous studies, where OaAEP1b expression positively influenced cyclic peptide production in N. benthamiana (Jackson et al., 2018, Poon et al., 2017). In the case of production in rice, although the proportion of cyclic kB1 improved upon

OaAEP1b expression, the absolute yields of cyclic kB1were similar whether OaAEP1b was co-expressed or not. This could be due to the relatively low expression of OaAEP1b in transgenic cell suspensions, or by the upregulated expression of endogenous OsAEPs, especially OsAEP2 and OsAEP1. As these endogenous OsAEPs share less sequence similarity to ligase-type AEPs, they may prefer peptide hydrolysis instead of ligation, which may eventually lower the yield of cyclic kB1.

While studying the transcript expression of endogenous OsVPEs in rice suspension cells, expression of OsVPE4 was not confirmed. It is notable that there are no introns in the genomic sequence of OsVPE4, which is abnormal when compared to AEPs from other monocots and broadly across phyla. This phenomenon suggests an annotation error may have occurred when the genomic sequence of OsVPE4 was identified based on its similarity to the VPEs of Arabidopsis and Nicotiana (Deng et al., 2011). Moreover, no amplifications were identified using RT-PCR primers based on the OsVPE4 genomic sequence. The pair of primers used by Deng et al. can amplify an unexpected sequence when rice genomic DNA was used. This unexpected sequence was then sequenced and blasted against the rice genome, showing its locations were over the whole genome. Combining the information from the eFP browser, OsVPE4 was therefore treated as a mis-annotated sequence.

66 Unlike kB1, a number of engineered SFTI-1 variants could not be produced in rice suspension cells, despite the presence of transcript in many instances. This remains puzzling as both SFTI-1 and the [T4Y,I7R]SFTI1 variant have been previously shown to be produced in N. benthamiana leaves (Jackson et al., 2018). Possibly, rice suspension cells harbour proteases that rapidly degrade SFTI-1 like peptides. One protease candidate for this is the endogenous rice AEPs that prefer to hydrolyse Asp-Gly peptide bonds. To test this hypothesis, gene editing techniques such as CRISPR/Cas9 could be used to selectively knock out endogenous AEPs. Moreover, previous attempts to produce small peptides in rice have proven to be difficult. For example, glucagon-like peptide 1 was not detected either in the transcript or peptide level, as its gene was silenced by co-suppression with endogenous glutelin (Yasuda et al., 2005). In other cases, peptides have proven non detectable unless fused to fusion partner proteins or expressed by tandem repeats of encoded peptide. A six amino acid peptide, novokinin, was only detected in rice seeds when 18 tandem repeats were expressed (Wakasa et al., 2011). In addition, an antifungal peptide was not detectable when produced as a single peptide, but was detected as a fusion protein in rice seeds (Bundó et al., 2019). These strategies could be useful to achieve production of engineered SFTI-1 variants in rice in the future.

2.4.2. Expression of cyclic peptides in rice seeds To obtain adequate transgenic rice plants, it is important to maximise transformation efficiency. Critical factors that affect this were optimised during trails, these included the callus differentiation potency, agrobacterium virulence, antibiotic option and usage amount, and the cultivation environment. Although only a few transgenic plants were available to characterise the cyclic peptides produced in seeds, these factors are useful for future studies for production of cyclic peptides in rice seed.

As only a number of transgenic seeds were analysed, it is still too early to draw conclusions concerning the yield of kB1 or SFTI-1 in rice seeds. The yield of kB1 from the single tested set of transgenic seeds (BOBOa #4) is lower than the yield from suspension cells (UOPOa #9). Although the absolute yield is low, the proportion of MS signal showed cyclic kB1 was predominantly accumulated in seeds with a small amount of linear kB1 and observed no truncated kB1 peptides. A similar MS spectra was also observed in seed-produced SFTI-1, showing a single peak of cyclic SFTI-1. These findings suggest that rice seed is a stable environment to produce and accumulate matured cyclic peptides, which is also a benefit for the purification of mature cyclic peptides with minimum interference from relevant linear cyclic peptides.

In the current study, strong promoters, endosperm specific promoters and codon optimised sequences were adopted to improve the production of cyclic peptides in rice. There are various reports of the successful production of heterologous proteins/peptides in rice seeds using other options of promoters and expression strategies. For example, using the strong endosperm specific promoter, G13a, and rice preferred gene codons of signal peptide sequences, the yield of human serum albumin in seeds reached 2.75 g/kg (DW), 10.58% of the total soluble protein. This yield was 20 fold higher than the minimum commercial production

67 requirement (He et al., 2011). Using the same strategy, the yield of human alpha-antitrypsin reached 2.24 g/kg (DW) (Zhang et al., 2013). In some cases, proteins were fused into native proteins to achieve high production and stable accumulation. For example, human insulin-like growth factor 1 was fused with the luminal binding protein, with the yield of up to 6.8% of total seed protein (Xie et al., 2008). An anti-fungal peptide was fused into oleosin which achieved stable accumulation with a yield up to 20 ug/g (Bundó et al., 2019). In addition, a mutant rice lacking three glutelins (GluA-1, GluA-2 and GluA-4) (Iida et al., 1997) was developed to increase the recombinant protein yield in rice seeds, e.g. expression Cry j 1 and Cry j 2 against Japanese cedar pollinosis (Wakasa et al., 2013). Similarly, suppression of endogenous prolamins increased 3-fold the yield of human IL-10, a contra-inflammatory regulatory cytokine (Yang et al., 2012). Moreover, ligating the signal peptide or the KDEL (Lys-Asp-Glu-Leu) ER retention signal peptide can improve the accumulation level, e.g. 7Crp (hybrid peptide comprising seven predominant human T cell epitopes) with a yield to 116 ug/grain (Entesari et al., 2018) and GRFT (antiviral lectin griffithsin) with a yield of up to 223 ug/g (DW) (Vamvaka et al., 2016). The above strategies could be viable future approaches to improve the production of cyclic peptides in rice.

In regard to phenotypic variation of transgenic seeds, some dehusked transgenic seeds exhibited floury and shrunken features, and their weights were significant lighter than wild-type seeds. These aberrant phenotypes were also observed in other transgenic seeds producing heterologous proteins (Ogo et al., 2014, Entesari et al., 2018). This finding indicated that expressing cyclic peptides in seeds deteriorated the accumulation of endogenous seed storage proteins. Seed-produced cyclic therapeutic peptides are promising modalities to be developed as oral delivery drugs, especially for engineered cyclic peptides with mucosal immune epitopes. There is a various of rice seed-produced bioactive molecules developed to treat human diseases and symptoms, including diarrhea (Zavaleta et al., 2007, Yuki et al., 2013, Soh et al., 2015), leucopenia (Ning et al., 2008), parasite infections (e.g. roundworm Ascaris suum) (Matsumoto et al., 2009), Alzheimer’s disease (Yoshida et al., 2011), anti-HIV (Vamvaka et al., 2018) and allergies, e.g. mite allergy (Yang et al., 2008, Suzuki et al., 2011), rheumatoid arthritis (Iizuka et al., 2014), cedar pollinoses (Wakasa et al., 2013) and animal diseases, e.g. infectious bursal disease in chickens (Wu et al., 2007). Among the rice seed based vaccines developed thus far, the cholera vaccine MucoRice- CTB can be produced under the current Good Manufacturing Practice (cGMP) standard (Kashima et al., 2016). These studies provide the practical manufactorability to develop rice seeds for therapeutic cyclic peptide production.

Although developing transgenic rice seeds requires more time compared to developing suspension cells, this drawback can be partially addressed by establishing a stable and efficient transformation and cultivation platform. Furthermore, product yields from seeds could increase and stabilise as generations of plants are grown and lines exhibiting stable expression are selected for and retained. This easy improvement and scale-up capacity augers well for a lower cost in the long term. A previous study showed that seed

68 storage of an antigen remained stable and maintained immunogenicity at ambient temperature for at least 1.5 years (Nochi et al., 2007). This long-term storage capacity is another benefit of transgenic seeds compared to suspension cells.

2.4.3. Rice as a monocot biofactory to produce cyclic peptides Cyclotide-like peptides and sequences have been discovered in monocot plants (Nguyen et al., 2013, Porto et al., 2016, Salehi et al., 2017), which raises the hypothesis that monocot plants are likely be able to express and cyclise cyclotides. Rice is an ideal host to test this hypothesis, as a member of the Poaceae family from monocots and a well-developed biofactory for production of recombinantly proteins/peptides. In the current study, one prototypical cyclotide, kB1 was efficiently produced in rice suspension cells. Its structure was proved to be equivalent to the native kB1extracted from O. affinis. When co-expressing

OaAEP1b in rice suspension cells, kB1 yield can reach 64.21 ug/g (DW), which is on par with the yield (70 ug /g DW) of kB1 produced in a flask shaking system of O. affinis suspension cells (Seydel et al., 2009). A recent transient production system based on a dicot plant, N. benthamiana, showed an improved yield of kB1 reaching 199 ug/g (DW) when co-expressing OaAEP1b (Poon et al., 2017). However, only less than

10.8 ug/g (DW) of kB1 could be produced without OaAEP1b in this dicot production system. In the current study, when expressing the Oak1 gene alone in rice suspension cells, the yield of cyclic kB1 reached 57.14 ug/g (DW). This finding supports the hypothesis that rice is capable of producing and cyclising cyclic peptides. However, a higher proportion of linear peptides were observed. This result suggests that endogenous OsVPEs do not function as efficiently for peptide ligation as those characterised from cyclotide producing plants. Moreover, it remains a possibility that endogenous OsVPEs may also compete for the precursor substrates. Thus knocking out these genes could further increase the yield of heterologous cyclic peptides in rice.

Similarly, SFTI-1 is able to be produced in rice seeds, but with a lower yield compared to rice seed produced kB1. This might be due to its instable accumulation as a small peptide. To mimic the processing of

OaAEP1b for cyclization, engineered SFTI-1 analogues were swapped with the kB1 domain in the Oak1 precursor gene used in the current study. Using the same construct, SFTI-1 and an analogous,

[T4Y,I7R]SFTI-1were produced in N. benthamiana leaf when co-expressing OaAEP1b (Jackson et al., 2019). However, there was no evidence that engineered SFTI-1 analogues could be produced in rice suspension cells. To improve the yield and achieve the accumulation of SFTI-1 analogues in rice, designing a tandem repeat peptide sequence and fusing cyclic peptide genes into the native proteins are further potential directions to explore. To further explore the flexibility of rice for cyclic peptides production, it would be informative to test various precursors of cyclic peptides based on kB1 or MCoTI in rice suspension cells. Moreover, discovering native ligase-type AEPs for SFTI-1 in sunflower would presumably improve the yield of SFTI-1 in planta.

69 For in planta cyclization, N- and C-terminal processing are critical to mature the cyclic peptides. Recent studies on C-terminal enzymes, AEPs, have provided more information about their activity and characteristics, which can be used to discover or design efficient ligase type AEPs to assist peptide cyclization (Yang et al., 2017, Jackson et al., 2018, Hemu et al., 2019). Moreover, the N-terminal processing enzyme of kB1, a papain-like protease, was recently described (Rehm et al., 2019). Co-expressing these cyclization processing enzymes would improve the efficiency of cyclization for heterologous expression of cyclic peptides in planta. Additionally, in the case of the rice production systems, the study of monocot cyclotide-like sequences and their biosynthetic mechanism is likely to provide paths for improving cyclic peptide production in rice. This direction is pursued with monocots in Chapter 3. With these cyclization improving strategies, rice has a definite potential to be an efficient biofactory to produce various cyclic peptides.

70 2.5. References Bundó M, Shi X, Vernet M, Marcos JF, López-García B, Coca M, 2019. Rice seeds as biofactories of rationally-designed and cell-penetrating antifungal PAF peptides. Frontiers in Plant Science 10, 731. Christensen AH, Quail PH, 1996. Ubiquitin promoter-based vectors for high-level expression of selectable and/or screenable marker genes in monocotyledonous plants. Transgenic Research 5, 213-218. Curtis MD, Grossniklaus U, 2003. A gateway cloning vector set for high-throughput functional analysis of genes in planta. Plant Physiology 133, 462-469. Deng M, Bian H, Xie Y, et al., 2011. Bcl-2 suppresses hydrogen peroxide-induced programmed cell death via OsVPE2 and OsVPE3, but not via OsVPE1 and OsVPE4, in rice. The FEBS Journal 278, 4797- 4810. Durek T, Cromm PM, White AM, et al., 2018. Development of novel melanocortin receptor agonists based on the cyclic peptide framework of sunflower trypsin inhibitor-1. Journal of Medicinal Chemistry 61, 3674-3684. Eliasen R, Daly NL, Wulff BS, et al., 2012. Design, synthesis, structural and functional characterization of novel melanocortin agonists based on the cyclotide kalata B1. Journal of Biological Chemistry 287, 40493-40501. Elliott AG, Delay C, Liu H, et al., 2014. Evolutionary origins of a bioactive peptide buried within preproalbumin. The Plant Cell 26, 981-995. Entesari M, Wakasa Y, Zanjani BM, et al., 2018. Change in subcellular localization of overexpressed vaccine peptide in rice endosperm cell that is caused by suppression of endogenous seed storage proteins. Plant Cell, Tissue and Organ Culture (PCTOC) 133, 275-287. Fittler H, Avrutina O, Empting M, et al., 2014. Potent inhibitors of human matriptase-1 based on the scaffold of sunflower trypsin inhibitor. Journal of Peptide Science 20, 415-420. Getz JA, Rice JJ, Daugherty PS, 2011. Protease-resistant peptide ligands from a knottin scaffold library. ACS Chemical Biology 6, 837-844. Gillon AD, Saska I, Jennings CV, et al., 2008. Biosynthesis of circular proteins in plants. The Plant Journal 53, 505-515. Gunasekera S, Daly NL, Anderson MA, et al., 2006. Chemical synthesis and biosynthesis of the cyclotide family of circular proteins. IUBMB life 58, 515-524. Harris KS, Durek T, Kaas Q, et al., 2015. Efficient backbone cyclization of linear peptides by a recombinant asparaginyl endopeptidase. Nature Communications 6, 10199. Haywood J, Schmidberger JW, James AM, et al., 2018. Structural basis of ribosomal peptide macrocyclization in plants. eLife 7, e32955. He Y, Ning T, Xie T, et al., 2011. Large-scale production of functional human serum albumin from transgenic rice seeds. Proceedings of the National Academy of Sciences 108, 19078-19083.

71 Hemu X, El Sahili A, Hu S, et al., 2019. Structural determinants for peptide-bond formation by asparaginyl ligases. Proceedings of the National Academy of Sciences 116, 11737-11746. Hiei Y, Komari T, 2008. Agrobacterium-mediated transformation of rice using immature embryos or calli induced from mature seed. Nature Protocols 3, 824-834. Iida S, Kusaba M, Nishio T, 1997. Mutants lacking glutelin subunits in rice: mapping and combination of mutated glutelin genes. Theoretical and Applied Genetics 94, 177-183. Iizuka M, Wakasa Y, Tsuboi H, et al., 2014. Suppression of collagen-induced arthritis by oral administration of transgenic rice seeds expressing altered peptide ligands of type II collagen. Plant Biotechnology Journal 12, 1143-1152. Jackson M, Gilding E, Shafee T, et al., 2018. Molecular basis for the production of cyclic peptides by plant asparaginyl endopeptidases. Nature Communications 9, 2411. Jackson MA, Yap K, Poth A, et al., 2019. Rapid and scalable plant based production of a potent plasmin inhibitor peptide. Frontiers in Plant Science 10, 602. Jendrny C, Beck-Sickinger AG, 2016. Inhibition of kallikrein-related peptidases 7 and 5 by grafting serpin reactive-center loop sequences onto sunflower trypsin inhibitor-1 (SFTI-1). ChemBioChem 17, 719-726. Kashima K, Yuki Y, Mejima M, et al., 2016. Good manufacturing practices production of a purification-free oral cholera vaccine expressed in transgenic rice plants. Plant Cell Reports 35, 667-679. Lim Y, Lai K, 2017. Generation of transgenic rice expressing cyclotide precursor Oldenlandia affinis kalata B1 protein. The Journal of Animal & Plant Sciences 27, 667-671. Livak KJ, Schmittgen TD, 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2-ΔΔCT method. Methods 25, 402-408. Main M, Frame B, Wang K, 2015. Rice, Japonica (Oryza sativa L.). In. Wang K (ed.) Agrobacterium Protocols. Springer Protocols, pp. 169-180. Matsumoto Y, Suzuki S, Nozoye T, et al., 2009. Oral immunogenicity and protective efficacy in mice of transgenic rice plants producing a vaccine candidate antigen (As16) of Ascaris suum fused with cholera toxin B subunit. Transgenic Research 18, 185. Meyer P, Saedler H, 1996. Homology-dependent gene silencing in plants. Annual Review of Plant Biology 47, 23-48. Mylne JS, Colgrave ML, Daly NL, et al., 2011. Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nature Chemical Biology 7, 257-259. Nagaya S, Kawamura K, Shinmyo A, et al., 2009. The HSP terminator of Arabidopsis thaliana increases gene expression in plant cells. Plant and Cell Physiology 51, 328-332.

72 Nguyen GK, Lian Y, Pang EWH, et al., 2013. Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. Journal of Biological Chemistry 288, 3370-3380. Nguyen GK, Wang S, Qiu Y, et al., 2014. Butelase 1 is an Asx-specific ligase enabling peptide macrocyclization and synthesis. Nature Chemical Biology 10, 732-738. Ning T, Xie T, Qiu Q, et al., 2008. Oral administration of recombinant human granulocyte-macrophage colony stimulating factor expressed in rice endosperm can increase leukocytes in mice. Biotechnology Letters 30, 1679-1686. Nochi T, Takagi H, Yuki Y, et al., 2007. Rice-based mucosal vaccine as a global strategy for cold-chain-and needle-free vaccination. Proceedings of the National Academy of Sciences 104, 10986-10991. Ogo Y, Takahashi H, Wang S, et al., 2014. Generation mechanism of novel, huge protein bodies containing wild type or hypoallergenic derivatives of birch pollen allergen Bet v 1 in rice endosperm. Plant Molecular Biology 86, 111-123. Park S-H, Bang SW, Jeong JS, et al., 2012. Analysis of the APX, PGD1 and R1G1B constitutive gene promoters in various organs over three homozygous generations of transgenic rice plants. Planta 235, 1397-1408. Poon S, Harris KS, Jackson MA, et al., 2017. Co-expression of a cyclizing asparaginyl endopeptidase enables efficient production of cyclic peptides in planta. Journal of Experimental Botany 69, 633- 641. Porebski S, Bailey LG, Baum BR, 1997. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Molecular Biology Reporter 15, 8-15. Porto WF, Miranda VJ, Pinto MF, et al., 2016. High-performance computational analysis and peptide screening from databases of cyclotides from poaceae. Peptide Science 106, 109-118. Qu LQ, Takaiwa F, 2004. Evaluation of tissue specificity and expression strength of rice seed component gene promoters in transgenic rice. Plant Biotechnology Journal 2, 113-125. Quimbar P, Malik U, Sommerhoff CP, et al., 2013. High-affinity cyclic peptide matriptase inhibitors. Journal of Biological Chemistry 288, 13885-13896. Rehm FB, Jackson MA, De Geyter E, et al., 2019. Papain-like cysteine proteases prepare plant cyclic peptide precursors for cyclization. Proceedings of the National Academy of Sciences, 116, 7831-7836. Rosengren KJ, Daly NL, Plan MR, et al., 2003. Twists, knots, and rings in proteins structural definition of the cyclotide framework. Journal of Biological Chemistry 278, 8606-8616. Salehi H, Bahramnejad B, Majdi M, 2017. Induction of two cyclotide-like genes Zmcyc1 and Zmcyc5 by abiotic and biotic stresses in Zea mays. Acta Physiologiae Plantarum 39, 131.

73 Seydel P, Walter C, Dörnenburg H, 2009. Scale-up of Oldenlandia affinis suspension cultures in photobioreactors for cyclotide production. Engineering in Life Sciences 9, 219-226. Smirnov I, Zhu X, Taylor T, et al., 2004. Suppression of α-cyano-4-hydroxycinnamic acid matrix clusters and reduction of chemical noise in MALDI-TOF mass spectrometry. Analytical Chemistry 76, 2958-2965. Soh HS, Chung HY, Lee HH, et al., 2015. Expression and functional validation of heat-labile enterotoxin B (LTB) and cholera toxin B (CTB) subunits in transgenic rice (Oryza sativa). SpringerPlus 4, 148. Suzuki K, Kaminuma O, Yang L, et al., 2011. Prevention of allergic asthma by vaccination with transgenic rice seed expressing mite allergen: induction of allergen-specific oral tolerance without bystander suppression. Plant Biotechnology Journal 9, 982-990. Swedberg JE, Nigon LV, Reid JC, et al., 2009. Substrate-guided design of a potent and selective kallikrein- related peptidase inhibitor for kallikrein 4. Chemistry & Biology 16, 633-643. Swedberg JE, Wu G, Mahatmanto T, et al., 2018. Highly potent and selective plasmin inhibitors based on the sunflower trypsin inhibitor-1 scaffold attenuate fibrinolysis in plasma. Journal of Medicinal Chemistry 62, 552-560. Teixeira Da Silva J, 2012. Callus, calluses or calli: multiple plurals. The Asian and Australasian Journal of Plant Science and Biotechnology 6, 125-126. Vamvaka E, Arcalis E, Ramessar K, et al., 2016. Rice endosperm is cost-effective for the production of recombinant griffithsin with potent activity against HIV. Plant Biotechnology Journal 14, 1427-1437. Vamvaka E, Farré G, Molinos-Albert LM, et al., 2018. Unexpected synergistic HIV neutralization by a triple microbicide produced in rice endosperm. Proceedings of the National Academy of Sciences 115, E7854-E7862. Vranken WF, Boucher W, Stevens TJ, et al., 2005. The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins: Structure, Function, and Bioinformatics 59, 687-696. Wakasa Y, Takagi H, Hirose S, et al., 2013. Oral immunotherapy with transgenic rice seed containing destructed Japanese cedar pollen allergens, Cry j 1 and Cry j 2, against Japanese cedar pollinosis. Plant Biotechnology Journal 11, 66-76. Wakasa Y, Zhao H, Hirose S, et al., 2011. Antihypertensive activity of transgenic rice seed containing an 18-repeat novokinin peptide localized in the nucleolus of endosperm cells. Plant Biotechnology Journal 9, 729-735. Weigel D, Glazebrook J, 2006. Transformation of agrobacterium using electroporation. Cold Spring Harbor Protocols 2006, 1-13. Wong CT, Rowlands DK, Wong CH, et al., 2012. Orally active peptidic bradykinin B1 receptor antagonists engineered from a cyclotide scaffold for inflammatory pain treatment. Angewandte Chemie 124, 5718-5722.

74 Wu J, Yu L, Li L, et al., 2007. Oral immunization with transgenic rice seeds expressing VP2 protein of infectious bursal disease virus induces protective immune responses in chickens. Plant Biotechnology Journal 5, 570-578. Xie T, Qiu Q, Zhang W, et al., 2008. A biologically active rhIGF-1 fusion accumulated in transgenic rice seeds can reduce blood glucose in diabetic mice via oral delivery. Peptides 29, 1862-1870. Yang L, Hirose S, Takahashi H, et al., 2012. Recombinant protein yield in rice seed is enhanced by specific suppression of endogenous seed proteins at the same deposit site. Plant Biotechnology Journal 10, 1035-1045. Yang L, Kajiura H, Suzuki K, et al., 2008. Generation of a transgenic rice seed-based edible vaccine against house dust mite allergy. Biochemical and Biophysical Research Communications 365, 334-339. Yang R, Wong YH, Nguyen GK, et al., 2017. Engineering a catalytically efficient recombinant protein ligase. Journal of the American Chemical Society 139, 5351-5358. Yasuda H, Tada Y, Hayashi Y, et al., 2005. Expression of the small peptide GLP-1 in transgenic plants. Transgenic Research 14, 677-684. Yoshida T, Kimura E, Koike S, et al., 2011. Transgenic rice expressing amyloid β-peptide for oral immunization. International Journal of Biological Sciences 7, 301-307. Yuki Y, Mejima M, Kurokawa S, et al., 2013. Induction of toxin-specific neutralizing immunity by molecularly uniform rice-based oral cholera toxin B subunit vaccine without plant-associated sugar modification. Plant Biotechnology Journal 11, 799-808. Zavaleta N, Figueroa D, Rivera J, et al., 2007. Efficacy of rice-based oral rehydration solution containing recombinant human lactoferrin and lysozyme in Peruvian children with acute diarrhea. Journal of Pediatric Gastroenterology and Nutrition 44, 258-264. Zhang L, Shi J, Jiang D, et al., 2013. Expression and characterization of recombinant human alpha-antitrypsin in transgenic rice seed. Journal of Biotechnology 164, 300-308.

75 2.6. Supplementary sequences

Oak1

10 20 30 40 50 60 70 80 90 100 110 120 TATTGCGGCCGCATCGATTGGCACCAGCACTTTCTTAAAATTTACTGCTTTTTCTTATTTCTTGTTCTGTGCTTGCTTCTTCCATGGCTAAGTTTACCGTGTGCCTTTTGCTCTGCCTTC ATAACGCCGGCGTAGCTAACCGTGGTCGTGAAAGAATTTTAAATGACGAAAAAGAATAAAGAACAAGACACGAACGAAGAAGGTACCGATTCAAATGGCACACGGAAAACGAGACGGAAG M A K F T V C L L L C L>

130 140 150 160 170 180 190 200 210 220 230 240 TCCTCGCTGCTTTTGTTGGAGCTTTCGGATCTGAGCTTTCTGATTCTCACAAGACCACCCTCGTGAACGAGATCGCTGAGAAGATGCTCCAGAGAAAGATCCTCGATGGTGTTGAGGCTA AGGAGCGACGAAAACAACCTCGAAAGCCTAGACTCGAAAGACTAAGAGTGTTCTGGTGGGAGCACTTGCTCTAGCGACTCTTCTACGAGGTCTCTTTCTAGGAGCTACCACAACTCCGAT L L A A F V G A F G S E L S D S H K T T L V N E I A E K M L Q R K I L D G V E A>

250 260 270 280 290 300 310 320 330 340 350 360 CTCTCGTGACTGATGTGGCAGAGAAGATGTTCCTCAGAAAGATGAAGGCTGAGGCTAAGACCTCTGAGACTGCTGATCAGGTTTTCCTCAAGCAGCTTCAGCTTAAGGGACTCCCTGTTT GAGAGCACTGACTACACCGTCTCTTCTACAAGGAGTCTTTCTACTTCCGACTCCGATTCTGGAGACTCTGACGACTAGTCCAAAAGGAGTTCGTCGAAGTCGAATTCCCTGAGGGACAAA T L V T D V A E K M F L R K M K A E A K T S E T A D Q V F L K Q L Q L K G L P V>

370 380 390 400 410 420 430 440 450 460 470 480 GCGGAGAGACTTGTGTTGGAGGAACTTGCAACACTCCTGGATGCACTTGTTCTTGGCCTGTGTGTACTAGAAACGGACTCCCTTCTCTTGCTGCTTGATTTGCTTGATCAAACTGCAAAA CGCCTCTCTGAACACAACCTCCTTGAACGTTGTGAGGACCTACGTGAACAAGAACCGGACACACATGATCTTTGCCTGAGGGAAGAGAACGACGAACTAAACGAACTAGTTTGACGTTTT C G E T C V G G T C N T P G C T C S W P V C T R N G L P S L A A *>

490 500 510 520 530 540 550 560 570 580 590 600 ATGAATGAGAAGGCCGACACCAATAAAGCTATCAATGTAGTTGGTCCCTGTACTTAATTTGGTTGGCTCCAAACCATGTGTGCTGCTCTTGTTTTTGTTTTTTCTTTTTTCTTCTCTCTT TACTTACTCTTCCGGCTGTGGTTATTTCGATAGTTACATCAACCAGGGACATGAATTAAACCAACCGAGGTTTGGTACACACGACGAGAACAAAAACAAAAAAGAAAAAAGAAGAGAGAA

610 620 630 640 650 660 670 680 690 700 710 720 TCGGGCACTCTTCAGGACATGAAGTGATGATCAGTACTCTTTGCTATCATGTTTTCTGTGCACACCTTCTATTGTAGGTGTTGTTGTGATGTTGATGCCCAATTGGAATAAACTGTTGTC AGCCCGTGAGAAGTCCTGTACTTCACTACTAGTCATGAGAAACGATAGTACAAAAGACACGTGTGGAAGATAACATCCACAACAACACTACAACTACGGGTTAACCTTATTTGACAACAG

730 GCCTGCAGGA CGGACGTCCT

76 PawS1

10 20 30 40 50 60 70 80 90 100 110 120 CGCGGCCGCATCGATAAACAATGGCTAAGCTCATCATCCTCGTTGTTCTTGCTATCCTCGCTTTCGTTGAGGTTTCAGTTTCTGGATACAAGACCTCTATCTCTACCATCACCATCGAGG GCGCCGGCGTAGCTATTTGTTACCGATTCGAGTAGTAGGAGCAACAAGAACGATAGGAGCGAAAGCAACTCCAAAGTCAAAGACCTATGTTCTGGAGATAGAGATGGTAGTGGTAGCTCC M A K L I I L V V L A I L A F V E V S V S G Y K T S I S T I T I E>

130 140 150 160 170 180 190 200 210 220 230 240 ATAACGGAAGATGTACCAAGTCTATCCCTCCTATCTGTTTCCCTGATGGACTTGATAACCCTAGAGGATGTCAGATCAGAATCCAGCAGCTTAACCATTGCCAGATGCATCTCACCTCAT TATTGCCTTCTACATGGTTCAGATAGGGAGGATAGACAAAGGGACTACCTGAACTATTGGGATCTCCTACAGTCTAGTCTTAGGTCGTCGAATTGGTAACGGTCTACGTAGAGTGGAGTA D N G R C T K S I P P I C F P D G L D N P R G C Q I R I Q Q L N H C Q M H L T S>

250 260 270 280 290 300 310 320 330 340 350 360 TCGATTACAAGCTCAGAATGGCTGTTGAGAACCCTAAGCAACAGCAGCATCTTTCTTTGTGTTGCAACCAGCTTCAAGAGGTTGAGAAGCAATGTCAATGCGAGGCTATCAAGCAAGTTG AGCTAATGTTCGAGTCTTACCGACAACTCTTGGGATTCGTTGTCGTCGTAGAAAGAAACACAACGTTGGTCGAAGTTCTCCAACTCTTCGTTACAGTTACGCTCCGATAGTTCGTTCAAC F D Y K L R M A V E N P K Q Q Q H L S L C C N Q L Q E V E K Q C Q C E A I K Q V>

370 380 390 400 410 420 430 440 450 460 470 480 TTGAGCAAGCTCAAAAGCAACTTCAACAAGGACAAGGTGGACAACAACAAGTTCAGCAGATGGTTAAGAAAGCTCAGATGCTTCCTAACCAGTGTAACCTTCAGTGCTCTATCTGATCAG AACTCGTTCGAGTTTTCGTTGAAGTTGTTCCTGTTCCACCTGTTGTTGTTCAAGTCGTCTACCAATTCTTTCGAGTCTACGAAGGATTGGTCACATTGGAAGTCACGAGATAGACTAGTC V E Q A Q K Q L Q Q G Q G G Q Q Q V Q Q M V K K A Q M L P N Q C N L Q C S I *>

490 500 510 520 530 540 550 560 570 580 590 600 TCACAAGCTTGCACTAGTGTTTGTTTGAGTTTGAATGTATGCATGCATGTAATATATATAATAATGCATGATCGCTCTTTGGCTTGAGATGGGAAGCCGCTTTTCTCTGCATAATAAAAA AGTGTTCGAACGTGATCACAAACAAACTCAAACTTACATACGTACGTACATTATATATATTATTACGTACTAGCGAGAAACCGAACTCTACCCTTCGGCGAAAAGAGACGTATTATTTTT

610 620 630 640 CACACACTCGTGTGAATGTGTATCAACCGCCTGCAGGATA GTGTGTGAGCACACTTACACATAGTTGGCGGACGTCCTAT

77 Oak[D14N]SFTI-1_GLDN

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCTAAGTTCACCGTCTGTCTCCTCCTGTGCTTGCTTCTTGCAGCATTTGTTGGGGCGTTTGGATCTGAGCTTTCTGACTCCCACAAGACCACCTTGGTCAATGAAATCGCTGAGAAG TACCGATTCAAGTGGCAGACAGAGGAGGACACGAACGAAGAACGTCGTAAACAACCCCGCAAACCTAGACTCGAAAGACTGAGGGTGTTCTGGTGGAACCAGTTACTTTAGCGACTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTACAAAGAAAGATATTGGATGGAGTGGAAGCTACTTTGGTCACTGATGTCGCCGAGAAGATGTTCCTAAGAAAGATGAAGGCTGAAGCGAAAACTTCTGAAACCGCCGATCAGGTG TACGATGTTTCTTTCTATAACCTACCTCACCTTCGATGAAACCAGTGACTACAGCGGCTCTTCTACAAGGATTCTTTCTACTTCCGACTTCGCTTTTGAAGACTTTGGCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 TTCCTGAAACAGTTGCAGCTCAAAGGAAGATGTACCAAGTCTATCCCTCCTATCTGTTTCCCTAATGGCCTTGATAATTAA AAGGACTTTGTCAACGTCGAGTTTCCTTCTACATGGTTCAGATAGGGAGGATAGACAAAGGGATTACCGGAACTATTAATT F L K Q L Q L K G R C T K S I P P I C F P N G L D N *>

OakSFTI-1_GLDN

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCTAAGTTCACCGTCTGTCTCCTCCTGTGCTTGCTTCTTGCAGCATTTGTTGGGGCGTTTGGATCTGAGCTTTCTGACTCCCACAAGACCACCTTGGTCAATGAAATCGCTGAGAAG TACCGATTCAAGTGGCAGACAGAGGAGGACACGAACGAAGAACGTCGTAAACAACCCCGCAAACCTAGACTCGAAAGACTGAGGGTGTTCTGGTGGAACCAGTTACTTTAGCGACTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTACAAAGAAAGATATTGGATGGAGTGGAAGCTACTTTGGTCACTGATGTCGCCGAGAAGATGTTCCTAAGAAAGATGAAGGCTGAAGCGAAAACTTCTGAAACCGCCGATCAGGTG TACGATGTTTCTTTCTATAACCTACCTCACCTTCGATGAAACCAGTGACTACAGCGGCTCTTCTACAAGGATTCTTTCTACTTCCGACTTCGCTTTTGAAGACTTTGGCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 TTCCTGAAACAGTTGCAGCTCAAAGGAAGATGTACCAAGTCTATCCCTCCTATCTGTTTCCCTGATGGCCTTGATAATTAA AAGGACTTTGTCAACGTCGAGTTTCCTTCTACATGGTTCAGATAGGGAGGATAGACAAAGGGACTACCGGAACTATTAATT F L K Q L Q L K G R C T K S I P P I C F P D G L D N *>

78 Oak[T4Y,I7R]SFTI-1

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCTAAGTTCACCGTCTGTCTCCTCCTGTGCTTGCTTCTTGCAGCATTTGTTGGGGCGTTTGGATCTGAGCTTTCTGACTCCCACAAGACCACCTTGGTCAATGAAATCGCTGAGAAG TACCGATTCAAGTGGCAGACAGAGGAGGACACGAACGAAGAACGTCGTAAACAACCCCGCAAACCTAGACTCGAAAGACTGAGGGTGTTCTGGTGGAACCAGTTACTTTAGCGACTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTACAAAGAAAGATATTGGATGGAGTGGAAGCTACTTTGGTCACTGATGTCGCCGAGAAGATGTTCCTAAGAAAGATGAAGGCTGAAGCGAAAACTTCTGAAACCGCCGATCAGGTG TACGATGTTTCTTTCTATAACCTACCTCACCTTCGATGAAACCAGTGACTACAGCGGCTCTTCTACAAGGATTCTTTCTACTTCCGACTTCGCTTTTGAAGACTTTGGCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 TTCCTGAAACAGTTGCAGCTCAAAGGAAGATGTTACAAGTCTAGACCTCCTATCTGTTTCCCTGATGGCCTTGATAATTAA AAGGACTTTGTCAACGTCGAGTTTCCTTCTACAATGTTCAGATCTGGAGGATAGACAAAGGGACTACCGGAACTATTAATT F L K Q L Q L K G R C Y K S R P P I C F P D G L D N *>

Os_OakSFTImcrB

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCCAAGTTTACAGTGTGCCTCCTCCTCTGCTTGCTCCTCGCTGCTTTCGTTGGCGCTTTCGGCTCCGAGCTGTCCGACAGCCATAAGACCACACTCGTGAACGAGATCGCCGAGAAG TACCGGTTCAAATGTCACACGGAGGAGGAGACGAACGAGGAGCGACGAAAGCAACCGCGAAAGCCGAGGCTCGACAGGCTGTCGGTATTCTGGTGTGAGCACTTGCTCTAGCGGCTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTCCAGCGCAAGATCCTCGATGGCGTCGAGGCCACTCTCGTGACAGATGTGGCGGAGAAGATGTTCCTCCGGAAGATGAAGGCCGAGGCCAAGACATCCGAGACAGCCGATCAGGTG TACGAGGTCGCGTTCTAGGAGCTACCGCAGCTCCGGTGAGAGCACTGTCTACACCGCCTCTTCTACAAGGAGGCCTTCTACTTCCGGCTCCGGTTCTGTAGGCTCTGTCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 TTCCTCAAGCAGCTCCAACTTAAGGGCCGCTGCACCAAGTCTATCCCTCCAATCTGCCATTTCAGGTGGGACGGCCTCGACAACTGA AAGGAGTTCGTCGAGGTTGAATTCCCGGCGACGTGGTTCAGATAGGGAGGTTAGACGGTAAAGTCCACCCTGCCGGAGCTGTTGACT F L K Q L Q L K G R C T K S I P P I C H F R W D G L D N *>

79 Os_OakSFTImcrF

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCCAAGTTTACAGTGTGCCTCCTCCTCTGCTTGCTCCTCGCTGCTTTCGTTGGCGCTTTCGGCTCCGAGCTGTCCGACAGCCATAAGACCACACTCGTGAACGAGATCGCCGAGAAG TACCGGTTCAAATGTCACACGGAGGAGGAGACGAACGAGGAGCGACGAAAGCAACCGCGAAAGCCGAGGCTCGACAGGCTGTCGGTATTCTGGTGTGAGCACTTGCTCTAGCGGCTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTCCAGCGCAAGATCCTCGATGGCGTCGAGGCCACTCTCGTGACAGATGTGGCGGAGAAGATGTTCCTCCGGAAGATGAAGGCCGAGGCCAAGACATCCGAGACAGCCGATCAGGTG TACGAGGTCGCGTTCTAGGAGCTACCGCAGCTCCGGTGAGAGCACTGTCTACACCGCCTCTTCTACAAGGAGGCCTTCTACTTCCGGCTCCGGTTCTGTAGGCTCTGTCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 TTCCTCAAGCAGCTCCAACTTAAGGGCCGCTGCACCCACTTCAGGTGGCCAATCTGCCATTTCAGGTGGGACGGCCTCGACAACTGA AAGGAGTTCGTCGAGGTTGAATTCCCGGCGACGTGGGTGAAGTCCACCGGTTAGACGGTAAAGTCCACCCTGCCGGAGCTGTTGACT F L K Q L Q L K G R C T H F R W P I C H F R W D G L D N *>

PM10

CTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGGCCGCTACAGGGCGCTCCCATTC GCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGTTTCGGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAGCG CGACGTAATACGACTCACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCACGTGTCTTGTCCAGGATCCTTAATTAAGCGATCGCGGCGCGCCATGGTTCTAAAAATTAGGTTTAATCATTGCGTCCTCAATGAACCCATGCTATATGTTTTAAA GTTTTTTGTTTTTTGACAATGTTTTTTATTTCTGAGATTGCTCTTAGGATTGAAATTATGTTTGATACTAGAAAACGAAGAAGTAGAGAGTAGTGTATACACGTGTAAAAAATAATAGTTGTGGGAACTTAAGTTGGATTTGAATACTAGGACGAGG CTGGAAGGGTTTCCACTAAGTTGACAAAAATTATTACAAGTGGCAACTAGCTAGGTCTCACAAAGTATTACTAATTAATAGTGGGTCTGTCTGCATACCAACTCTTGCCTAATTTTCAAACACCGCATTCTCTCTTCTTCTCTCCTTCTTCCTCTGG AAACTTCATCGATGTGGACTTCTGTCTCTCAAAAGTCAAGCTCAATTTATCCAATGCATTATAAATACACACTCTCCCTCCCTTCTATTCTTCATTGCATCACATTTCCTCTATAAATTACTCACACCTTATTCCTAACTTCATTTCAACATCCTCT CTCCCACTTACTTCGATTTCATCAATTCCAATAAACTCAACACACTTTTTTACACTCCACACTCTAACCACATACACCGCGGCCGCACAACCTGCAGGATATGAAGATGAAGATGAAATATTTGGTGTGTCAAATAAAAAGCTTGTGTGCTTAAGTT TGTGTTTTTTTCTTGGCTTGTTGTGTTATGAATTTGTGGCTTTTTCTAATATTAAATGAATGTAAGATCTCATTATAATGAATAAACAAATGTTTCTATAAGGCCGGCCTAGGGATAACAGGGTAATGAGCTCAATTGAATTCGGCTTAATTAAGGT ACCTGGAGCACAAGACTGGCCTCATGGGCCTTCCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAACATGGTCATAGCTGTTTCCTTGCGTATTGGGCGCTCTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCG TTCGGGTAAAGCCTGGGGTGCCTAATGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGA CTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCG TTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGG CGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTT GTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATT AAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGA GGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGG GAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAA AAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTG AGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATG TAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAA GCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCAC

Bold letters: ACS_promoter‑HspT

80

!"#$%&'()((

*+,&-%./#%.+/(%"&(0.,&'-.%1(#+0(2123.4#%.5+($5%&+%.#3(56( #2123.2($&$%.0&-(6'57(75+525%-(

3.1. Overview Over 400 unique cyclotides have been discovered from five dicot families (http://cybase.org.au) (Mulvenna et al., 2006b, Wang et al., 2007). Improved high-throughput proteomic and transcriptomic approaches to peptide discovery is expected to increase this number substantially in coming years (Gruber et al., 2008). The cyclotides characterised so far have sequence variability in most positions, with the notable exception of the conserved cysteines, required for disulfide bond formation (Wang & Craik, 2018). Based on this feature of a conserved number and spacing of the cysteines, data mining for cyclotide encoding transcripts offers a high throughput approach for peptide discovery. This strategy is made easier by the increasing number of plant species with genome and transcriptome data available (International Rice Genome Sequencing Project, 2005, Paterson et al., 2009, Arabidopsis Genome Initiative, 2000).

There are only a limited number of monocot cyclotide-like genes characterized, including a few from economically important crops such as rice (Oryza sativa), maize (Zea mays), wheat (Triticum aestivum) and barley (Hordeum vulgare) (Mulvenna et al., 2006a, Porto et al., 2016, Salehi et al., 2017). These cyclotide-like genes share similar cysteine patterns with the cyclotides, but are all linear as they normally lack a ligation-competent C-terminal site. Although many of these encoded peptides remain uncharacterized, two cyclotide-like precursor genes from maize, Zmcyc1 and Zmcyc5, have been shown to be responsive to wounding, fungal diseases and stress-related hormones (Salehi et al., 2017). In the case of Panicum laxum, nine linear cyclotide-like molecules were identified and designated as panitides (Nguyen et al., 2013). Disulfide mapping of one purified variant (Panitide L3) showed that panitides form a cyclotide like cystine knot arrangement. Although the functional role of panitides in planta remains unknown, in vitro cell assays of panitides L1, L2, L5 and L6 exhibited cytotoxicity towards HeLa cells at similar potencies to the cyclotides.

Interestingly, linear cyclotide-like peptides have also been observed in dicot cyclotide-producing species, including Viola odorata (Ireland et al., 2006), Oldenlandia affinis (Plan et al., 2007, Mylne et al., 2010), Psychotria leptothyrsa (Gerlach et al., 2010), Hedyotis biflora (Nguyen et al., 2011), Chassalia chartacea (Nguyen et al., 2012) and Petunia x hybrida (Poth et al., 2012). One major difference between the acyclotide and cyclotide precursors is the presence of an Asn or Asp residue at the C-terminus of the peptide domain in the latter, which together with a short, extended C-terminal propeptide is required for asparaginyl endopeptidase (AEP) mediated backbone cyclization (Harris et al., 2015). In lieu of backbone cyclization, many acyclotides have a pyroglutamic acid at their N-terminus, which probably contributes to their stability (Poth et al., 2012, Kersten & Weng, 2018).

Monocot cyclotide-like sequences are investigated in this chapter. A diverse set of monocot species were screened for cyclotide-like genes at the transcriptome level, to gain further insights into the distribution of these cyclotide-like sequences in monocot plants. When expressing cyclotide-like genes in planta, a

82 pyroGlu modification at the N-terminus was observed. Additionally, it was observed that some monocot cyclotide-like genes could be engineered with minimal residue changes to allow backbone cyclization both in vitro and in planta. These results enhance knowledge of cyclotide distribution in the plant kingdom and will aid efficient cyclic peptide production in monocot cereal plants (e.g. rice, maize).

3.2. Materials and methods 3.2.1. Database search and transcriptome analysis Cyclotide-like genes were retrieved using BLAST+ from the National Centre for Biotechnology Information (NCBI) and the tblastn program from Phytozome (Altschul et al., 1990). Blast analysis was performed against 10 species of the Poaceae family using the Phytozome database, while a further 13 monocot species were analysed using transcriptome data downloaded from the Sequence Read Archive (SRA) in NCBI (Table 3.2). Species were chosen based on their phylogenetic position as delineated by the Angiosperm Phylogeny Group IV (APG-IV) system (http://www.mobot.org/MOBOT/research/APweb/). For local blast by BLAST+, the SRA raw sequencing data were first trimmed and filtered by FastQC, before assembled by Trinity (v2.4.0) (Grabherr et al., 2011, Haas et al., 2013). The blast query included the prototypic cyclotide amino acid sequences from each of four native cyclotide-producing families including cycloviolacin O2, kalata B1, Phyb A, and McoTI-II, as well as the panitide L1 (pL1) and pL1 precursor genes from the acyclotide-producing species P. laxum (Table 3.1). Potential hits were analysed for open reading frames (ORFs), conserved cysteines, and signal peptide sequences using SnapGene (v3.2.1) and SignalP (v4.1 Server).

Table 3. 1 Query sequences of cyclotides and acyclotides for use with tblastn assembled monocot transcriptomes. Cyclotide name Peptide sequence Cycloviolacin O2 GIPCGESCVWIPCISSAIGCSCKSKVCYRN Kalata B1 GLPVCGETCVGGTCNTPGCTCSWPVCTRN Phyb A GIGCGESCVWIPCVSAAIGCSCSNKICYRN McoTI-II GGVCPKILKKCRRDSDCPGACICRGNGYCGSGSD pL1 QLPICGETCVLGTCYTPGCRCQYPICVR pL1 precursor MESAKRVACVVALVLLVQLMAAPATMARNVEVENTPLV- GLLDIAKEVNHNQLPICGETCVLGTCYTPGCRCQYPICVR

83 Table 3. 2 Classification of monocot species studied in cyclotide-like sequences blasting. Orders Family Genus Species Common name SRA NO. Alismatid monocots Alismatales Araceae Colocasia Colocasia esculenta Taro SRR873449 Lilioid monocots Pandanales Pandanaceae Pandanus Pandanus boninensis Pandanus DRR001102 Velloziaceae Xerophyta Xerophyta villosa _ ERR2040709 Liliales Liliaceae Lilium Lilium longiflorum Lily ERR260307; ERR578470 Lilium davidii var. unicolor Lily SRR2924890 Asparagales Amaryllidaceae Narcissus Narcissus tazetta Daffodil SRR2477578; SRR2477579 Commelinid monocots Arecales Arecaceae Cocos Cocos nucifera Coconut palm SRR1063404; SRR1063407 Zingiberales Zingiberaceae Zingiber Zingiber officinale Ginger SRR5313727 Typhaceae Typha Typha latifolia Typha SRR3233335 Poaceae Brachypodium Brachypodium distachyon Stiff brome _ Brachypodium stacei _ _ Oryza Oryza sativa Rice _ Oropetium Oropetium thomaeum _ _ Panicum Panicum hallii Hall's panicgrass _ Panicum virgatum Switchgrass _ Setaria Setaria italica Foxtail millet _ Setaria virdis Green foxtail _ Sorghum Sorghum bicolor Sorghum _ Zea Zea mays Maize _ Alloteropsis Alloteropsis semialata Cockatoo grass SRR3323242; SRR3323243 Cenchrus Cenchrus americanus _ SRR8549304; SRR8549305 Megathyrsus Guinea grass SRR8164930; SRR8164931 Panicum Panicum miliaceum Proso millet SRR6660752; SRR6660753

84 3.2.2. Promoter analysis of rL1 The sequence of the rice cyclotidelike linear 1 (rL1) gene and its flanking regulatory sequences were downloaded from Phytozome (Chr8:26374253..26374372). Upstream 1000bp sequence (1000nt) of the rL1 gene was analysed using PlantPAN (v2.0) for conserved cis regulatory elements (CREs). The CREs were identified and annotated using SnapGene. For functional classification, dual function CREs were counted separately in both groups.

3.2.3. Phytohormonetreatment of rice seedlings Plant material Rice (Oryza sativa L. subsp. japonica) cultivar Nipponbare seedlings were grown in a plant cultivation room at 28 °C with 16 hours of light under 2000 lux of LED illumination supplied by Valoya AP67 spectra lamps (Valoya Oy, Helsinki).

Phytohormone treatment Sixweekold rice plants were subjected to phytohoromone treatment by spraying leaves with 100 µM and 1 mM salicylic acid (SA) and abscisic acid (ABA) respectively. 100 mM SA and ABA stock was made by dissolving in ethanol and methanol respectively. Immediately prior spraying on leaves, SA and ABA stocks were diluted using H2O containing 0.1% (v/v) Tween 20 (Promega) to the working concentrations. Nine replicates were treated by each treatment. After spraying, three plants were randomly chosen from the original nine and analysed after 6 hours, 24 hours and 6 days.

RNA extraction, cDNA synthesis and RTPCR RNA of leaf was extracted from phytohoromonetreated rice seedlings. Methods of RNA extraction, cDNA synthesis and RTPCR were described in Section 2.2.4. Primers used in the current study are shown in Table 3.3.

3.2.4. Cyclotidelike gene expression in fox millet Plant material Three species of foxtail millet (Setaria italica) were obtained from the Australian Seed Bank Partnership, including Angu 18 hao (cultivar, China), CPI 108040 (landrace, China) and Nekoashi (cultivar, Japan). Seeds were germinated and plants grown in a plant cultivation room at 28 °C under 2000 lux LED illumination (photoperiod of 16hour light) using Valoya AP67 spectra lamps (Valoya Oy, Helsinki).

DNA & RNA extraction, cDNA synthesis, and PCR Genomic DNA was extracted from leaves from Angu 18 hao, Nekoashi and CPI 108040. RNA of leaf, stem and root were extracted from CPI 108040. Extraction methods and PCRs were described in Section 2.2.4. Primers used in the current study are shown in Table 3.3.

85 Table 3. 3 Primers used in Chapter 3 Primer Name Primer Sequence qOsGAPDH_Fwd AAGCCAGCATCCTATGATCAGATT qOsGAPDH_Rev CGTAACCCAGAATACCCTTGAGTTT qRcl_Fwd GAAGATCATGTGGAGTGGCA qRcl_Rev AATGACTTCAGACTTCGCAGG qOspep1_Fwd GGCTTGGACACCTTGCGGGG qOspep1_Rev TCATGCAATGACTTCAGACTTCGC qOspep2_Fwd ATCAGGCTTGGACACCTT qOspep2_Rev TCATGCAATGACTTCAGACTTCGC mLA Fwd-2 ATGGACCCAATGTTA mLA Rev TTAATTAGCTAGGCTG mLB Fwd ATGGAGAGTGGCAAGA mLB Rev TTACTTGTAGCAAACTCT Fwd: Forward primer; Rev: Reverse primer; Primers starting with q were designed for RT-PCR

3.2.5. Cyclotide-like gene expression in N. benthamiana Vector construction pL1 from P. laxum and the variant pL1_cyc1 and pL1_cyc2 genes were previously cloned in-house. The rice cyclotide-like gene rL1 was amplified from rice genomic DNA using PCR. DNA encoding modified rL1 genes of rL1_cyc1, rL1_cyc2, OakrL1, together with mLA, mLB, mLBmLA, OakmLA_cyc and OakpL1_cyc were synthesized by Integrated DNA Technologies (Singapore) as gene block fragments. Sequences of mLA (Seita.3G339900.1) and mLB (Seita.3G339700.1) were downloaded from Phytozome. All gene block fragments (Supplementary sequences in Section 3.6) were amplified with primers incorporating attB1 and attB2 sites that allowed recombination into entry vector pDS221 using GatewayTM BP ClonaseTM (Invitrogen). Transfer of peptide encoding genes into the plant expression vector pEAQ-Dest1 (Sainsbury et al., 2009) was achieved using GatewayTM LR ClonaseTM (Invitrogen). Sanger sequencing was used to confirm that the desired plasmids were obtained. Confirmed plasmids were then transferred into Agrobacterium tumefaciens (strain LBA4404) by electroporation.

Plant material N. benthamiana plants were cultivated in Jiffy pellets in the plant cultivation room at 28 °C under 2000 lux LED illumination (photoperiod of 16-hour light) using Valoya AP67 spectra lamps (Valoya Oy, Helsinki). Replicate plants were used for transient transformation when they were approximately five weeks old before flowering.

Agrobacterium infiltration Agrobacterium harbouring transient expression vectors were cultured from single isolated colonies

86 overnight in LB media supplemented with 50 µg/mL rifampicin and kanamycin each. Starter cultures were then scaled up to 500 mL and grown for a further two days at 30 °C, with the addition of 20 !M ACS and

2 mM MgSO4 to induce virulence genes and prevent Agrobacterium aggregation. Agrobacterium were then pelleted by centrifugation at 4000 RPM for five minutes before resuspension in infiltration buffer (10mM

MES (2-[N-morpholino]ethanesulfonic acid) pH 5.6, 10 mM MgCl2, 100 uM acetosyringone) to an OD600 of 0.5. Each Agrobacterium culture was infiltrated into three leaves of three plantlets by syringe infiltration with leaves harvested after six days. The infiltration procedure is shown in Figure 3.1.

Figure 3. 1 Schematic diagram of Agro-infiltration in N. benthamiana. N. benthamiana grew as a plantlet in five weeks, and three big leaves were infiltrated with Agrobacterium harbouring expression vectors. Infiltrated leaves were harvested after six days of growth.

Peptide extraction and structure confirmation Peptides were extracted from six-day old infiltrated leaves. The extraction method and MALDI-TOF-MS analysis were described previously in Section 2.2.5.

!"#"<"$/)&0.*!1$3.3-0=+)0',$+//+.$ Peptide synthesis and purification Peptides rL1_che1 and pL1_che1 were synthesized using established Fmoc (9-fluorenylmethoxycarbonyl) solid-phase synthesis methods established in the Craik laboratory (Cheneval et al., 2014). They were purified by reversed-phase high-performance liquid chromatography (RP-HPLC) and characterized by high resolution mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy.

In vitro peptide cyclization

Recombinant OaAEP1b was produced using E. coli according to established methods (Harris et al., 2015).

The G_pL1_NGLP peptide (31.6 µM) was incubated with recombinant OaAEP1b (2.8 µM) in activity buffer (50 mM sodium acetate, 50 mM NaCl, 1 mM EDTA, pH 5) for both 1 hour and 24 hours at room temperature. Reactions were stopped by adding 5 µL of 5% fomic acid. Peptides were desalted and concentrated by C18 ZipTip (Milipore), and then analysed by MALDI-TOF-MS for the presence of linear, cyclic and unprocessed peptides masses.

87 3.3. Results 3.3.1. Cyclotide-like genes appear restricted to the Poaceae family In order to search for cyclotide-like sequences in diverse monocot plant species, 13 RNA-seq data sets were assembled and analysed along with ten sequence data sets available for species within the Phytozome database. Of these thirteen species RNA-seq datasets obtained from NCBI, nine were from non-Poaceae species deemed economically important. These were chosen based on the monocot taxonomic tree to maximise representation of family and order taxonomic groups across the monocots (Chase et al., 2016).

In total, 167 putative genes with homology to cyclotide-like sequences were identified and are shown in Table 3.4. After manually checking for predicted ORFs, conserved cysteines, and signal peptides, 11 cyclotide-like genes were narrowed down as illustrated in Figure 3.2. Ten of these were from Poaceae species, with the remaining sequence from the non-Poaceae species C. esculenta. Sequences discovered from the Poaceae family included one from S. viridis (Sevir.3G354700.1), one from O. sativa (rL1), five from S. italica (Seita.3G339600.1, 3G340000.1, 3G339700.1, 3G340100.1, 8G127500.1), one from S.bicolor (Sobic.8G124100.1), and two from Z. mays (GRMZM 2G032198_T01, 2G374405_T01). The sequence from S. viridis and one sequence (8G127500.1) from S. italica are novel and have never been identified before as cyclotide-like genes. One sequence, termed mLA, from S. italica (Seita.3G339900.1) had a classical cyclotide cysteine pattern with a predicted ORF, but no signal peptide was identified from this sequence (Figure 3.2). Interestingly, this sequence contains Gly at the putative peptide N-terminus and Asn-Ser-Leu at its C-terminus which is observed in some cyclotides from the Violaceae family (Hellinger et al., 2015, Burman et al., 2010).

The cyclotidelike sequence identified in C. esculenta contains ten cysteines, a signal peptide but no stop codon (Figure 3.3A). This cysteine pattern does not resemble those present in cyclotides (Craik et al., 1999), defensins (Ganz & Lehrer, 1995) or conotoxins (Craig, 2000). Instead it shares a high similarity with endochitinases CH25 with of Spirodela polyrhiza (Figure 3.3B).

88 Table 3. 4 tblastn results of cyclotide-like peptides in monocots Conserved Signal Species Common name Hits ORF cysteines peptide Non-Poaceae C. esculenta Taro 25 12 2 1 P. boninensis Pandanus 26 13 2 0 X. villosa - 11 5 0 0 L. longiflorum Lily 7 6 0 0 L. davidii var. unicolor Lily 9 2 0 0 N. tazetta Daffoil 12 3 0 0 C. nucifera Coconut palm 8 6 0 0 Z. officinale Ginger 2 1 1 0 T. latifolia Typha 11 4 0 1 Poaceae B. distachyon Stiff brome 0 - - - B. stacei - 0 - - - O. sativa Rice 1 1 1 1 O. thomaeum - 0 - - - P. hallii Hall’s panicgrass 0 - - - P. virgatum Switchgrass 0 - - - S. italica Foxtail millet 6 6 6 5 S. virdis Green foxtail 1 1 1 1 S. bicolor Sorghum 1 1 1 1 Z. mays Maize 2 2 2 2 A. semialata Cockatoo grass 13 4 0 0 C. americanus - 15 9 0 0 M. maximus Guinea grass 14 9 1 0 P. miliaceum Proso millet 3 2 0 0 ORF: opening reading frame

89

Figure 3. 2 Alignment of cyclotide-like sequences from Poaceae with the Oak1 precursor. Based on Oak1, sequences consist of a signal peptide, N-terminal propeptide (NTPP), peptide domain, and C-terminal propeptide (CTPP). Any Met (highlighted in yellow) in signal peptides is indicative the start of translation. Ala is consistent with the end of predicted cleavage sites (green) of cyclotide-like sequences from Poaceae. Conserved cysteines were lined up with a red background. Sequence from S. italica is coloured in red as it does not have a signal peptide, despite its predicted peptide domain having the same cysteine pattern.

Figure 3. 3 Cylotide-like sequence from C. esculenta and alignment with endochitinae CH25 in S. polyrhiza. A. Cysteine-rich sequenc from C. esculenta. The identified sequence (grey background) is located in an ORF and starts with Met (light yellow background). The predicted signal peptide cleavage site is between Ala (A) and Glu (E), (light green). Nine cysteines are found in the precursor region (white letter with red background). B. Alignment of cysteine-rich sequence in C. esculenta with endochitinae CH25 in S. polyrhiza. Cysteine-rich sequence in C. esculenta shares high similarity with endochitinase CH25 in S. polyrhiza. Nine of ten cysteines match, with the only exception of the third cysteine. Cysteines were numbered 1 to 10 from the N-terminus.

!"!"#"$%&'(')*&$+,+-./0/$&*1*+-/$+$2'//03-*$4*5*,6*$&*/2',/*$&'-*$5'&$!"#$ Given the economic importance of rice, it is of interest to define the role of the cyclotide-like gene rL1 from O. sativa. The genomic sequence spanning 1000 bp upstream of the rL1 gene start site was downloaded from Phytozome. This sequence was subjected to promoter analysis for cis regulatory elements (CREs) that could control expression at that locus. CREs were classified into 11 groups based on their annotated functions as listed in Table 3.5. These groups were further grouped into three classes as illustrated in Figure 3.4A. The largest CRE class (45.5%) is a defence responsive class comprised of phytohormone responsive, pathogen responsive, environmental stress responsive, and oxygen-responsive CREs. In this class, over half of the CREs are associated with phytohormone responses, including ABA, gibberellin, auxin and ethylene response. The second largest class is comprised of CREs associated with general growth. The small number of CREs are relevant to specific functions in seed, light and sugar responses and were grouped as the specific responsive group (13.9%). Together, the most common CRE group identified were for development (86 motifs), closely followed by CREs associated with phytohormone responses (72 motifs). The locations of CREs in the promoters are shown in Figure 3.4B.

91 Table 3. 5 List of cis regulatory elements in promoter of rL1 Function group Cis regulatory element ID NO. Phytohormone responsive B3 15 Phytohormone responsive ARF 1 Phytohormone responsive EIN3 4 Phytohormone responsive CAREOSREP1 2 Phytohormone responsive PYRIMIDINEBOXOSRAMY1A 3 Phytohormone responsive GARE2OSREP1 2 Phytohormone responsive & Enviromental stress responsive ABREs 3 Pathogen responsive Homeodomain; TALE 2 Pathogen responsive BIHD1OS 34 Enviromental stress responsive C2H2 3 Enviromental stress responsive LEA_5 3 Enviromental stress responsive RNFG1OS 2 Oxygen-responsive ANAERO1CONSENSUS 5 Oxygen-responsive ANAERO3CONSENSUS 3 Oxygen-responsive ANAERO4CONSENSUS 1 Development Alpha-amylase 9 Development Homeodomain; WOX 1 Development MADF 2 Development Myb/SANT; MYB 3 Development NAC; NAM 2 Development NF-YB 34 Development TATCCAOSAMY 1 Development LOB 1 Development TCR 1 Development & Enviromental stress responsive WRKY 2 Development & Phytohormone responsive AP2; B3; RAV 1 Development & Phytohormone responsive Homeodomain; HD-ZIP 3 Development & Phytohormone responsive SBP 6 Development & Phytohormone responsive TCP 18 Development & Seed specific MADS box; MIKC; M-type 2 Conserved promoter motifs AT-Hook 13 Conserved promoter motifs TBP 2 Conserved promoter motifs TATABOXOSPAL 4 DNA binding bHLH 1 DNA binding bZIP 4 Transcriptional activation SITEIOSPCNA 4 Transcriptional activation E2F1OSPCNA 2 Seed-specific GLUTAACAOS 2 Seed-specific GLUTEBP1OS 1 Seed-specific RITA-1 3 Seed-specific pGL5-1 2 Seed-specific CGACGOSAMY3 2 Seed-specific PROLAMINBOXOSGLUB1 2 Seed-specific AACACOREOSGLUB1 5 Seed-specific ACGTOSGLUB1 2 Light responsive GATA 3 Light responsive & Phytohormone responsive GT1CONSENSUS 14 Sugar-responsive TATCCAYMOTIFOSRAMY3D 2

92

Figure 3. 4 Promoter analysis of rL1. A. Pie chart of CREs in the rL1 promoter (1000bp upstream). CREs are divided into three groups based on their functions, including defence responsive (blue), general growth relevant (orange) and specific responsive (grey) CREs. The number of CREs in each group are shown in bracelets. B. Locations of CREs in the rL1 promoter. CREs are labelled in colours according to the classification of their relevant functions, consistent with the pie chart.

!"!"!"#!"##$%&'$(()*+#,+-$'#&./0*.*'1*+$#0'$201$+0(# The rL1 gene from rice has been reported to be expressed in leaf, root and grain, but not in flowers (Mulvenna et al., 2006a). Phytohormone responsive CREs are abundant within the rL1 promoter representing the second largest group of CREs. The presence of these CREs suggest rL1 gene expression might be affected by phytohormone treatments. To test this hypothesis, rice seedlings were treated with SA and ABA. Based on published protocols, two concentrations of SA and ABA (100 µM and 1 mM) were used to treat six-week-old seedlings. Leaf tissue was then harvested at 6 hours, 24 hours and 6 days post treatments. Primer pairs including qRcl (Mulvenna et al., 2006a), qOspep1 and qOspep2 were used to amplify the rL1 gene with cDNA from wild type rice in order to select the efficiently amplified primers. A specific and bright amplicon of the correct size was generated by the qRcl primer pairs as illustrated in Figure 3.5A. Following this positive result, the qRcl primers were used to optimise the annealing temperature. Of the temperatures tested, 62 °C provided the strongest amplification of the target DNA fragment (Figure 3.5B), and consequently this temperature was chosen for all subsequent RT-PCR experiments. Real time PCR revealed that CT values were very low for the rL1 gene and were not improved upon treatment of rice leaves with phytohormone treatments.

93

Figure 3. 5 Optimization of RT-PCR primers to amplify the rL1 gene . A. Selection of three primers for RT-PCR. cDNA of wild type rice leaf was used as the PCR template. A specific and bright amplicon of correct size was observed with the primer pair qRcl (96 bp). Primer dimers were observed with primer pairs qOspep1 and qOspep2. A faint amplicon was observed with primer pairs qOspep1 (128 bp). B. Annealing temperature optimization with primer pair qRcl. Four Ta temperatures were tested from 55 to 62 °C. M stands for gel ladder.

!"!":"$$"%$+,4$$"&$*72&*//0',$0,$'()*+,-*.,$ Two cyclotide-like sequences from S. italica were identified by homology searching, mLA (Seita.3G339900.1) and mLB (Seita.3G339700.1). The mLA peptide naturally contains putative cyclisable N- and C-termini as for canonical cyclotides, although its precursor does not have a signal peptide. To further examine the occurrence of these peptides in planta, mLA and mLB were both analysed at the transcript and peptide levels. Genomic DNAs prepared from three S. italica accessions (Angu 18 hao, Nekoashi and CPI 108040) were used for PCR analysis. mLA genomic sequence was amplified from all three accessions, while mLB was only amplified in accession CPI 108040 as shown in Figure 3.6A (amplification products were confirmed by sequencing). To test for expression of mLA and mLB transcripts, RNA was extracted from leaf, stem and root of CPI 108040, and cDNA was prepared. However, no transcript was detected for mLA or mLB in either leaf, stem or root (Figure 3.6B). At the same time, peptides were extracted from all three S. italica accessions and masses for the predicted mLA and mLB peptides were searched using MALDI-TOF-MS (Figure 3.6C). However, no putative mLA or mLB peptides were identified in any of the three S. italica accessions.

Figure 3. 6 Expression of mLA and mLB in S. italica. Genes of mLA and mLB in three accessions of S. italica. mLA was amplified in all three accessions, while mLB was only amplified in CPI. mLA, 300 bp, mLB, 326 bp. B. Transcript expression of mLA and mLB in leaf, stem and root of CPI. No amplifications of either mLA or mLB, only primer dimer of mLB was visible at the bottom. C. Peptide expression of S. italica and possible mLA and mLB peptides. MS values of predicted mLA and mLB peptides do not match peptides detected from Angu 18 hao (Angu), Nekoashi (Nek) and CPI 108040 (CPI) respectively.

94 !"!";"$<&+,/0*,)$*72&*//0',$'5$&=>$+,4$2=>$2*2)04*/$0,$/()012+3,$*,2,$-*+5$ Heterologous expression of rL1, mLA and mLB was attempted in N. benthamiana leaf, as no peptides were detected in their corresponding native plants. The precursor gene of pL1 from P. laxum was chosen as it is expressed in its native host tissue (Nguyen et al., 2013). All peptide precursor genes were assembled into the plant expression vector pEAQ-Dest1, which enables high-level recombinant protein production in N. benthamiana leaf. In the case of pL1 expression, a single dominant peptide mass was produced at m/z 3052, corresponding to the full predicted linear pL1 peptide with a pyroglutamyl modification of Gln (pyroGlu) at the N-terminus (Figure 3.7). This modification was also reported in pL1 peptides produced natually in P. laxum (Nguyen et al., 2013). In the case of rL1, two linear rL1 peptides, differing only by an extended C-terminal residue were detected at m/z 3519 and 3616 (Figure 3.7). These peptide masses were also consistent with a pyroglutamyl modification of Gln at their N-termini. To confirm that these linear rL1 peptides contained six cysteines, crude peptide extracts were reduced and alkylated, a process that would add 348 Da to the mass if six cysteines were alkylated. Although some of the signal intensity was lost, it is clear that after treatment this mass shift was evident (Figure 3.7). For the mLA and mLB precursor genes, attempted expression in N. benthamiana did not produce any peptide products.

Figure 3. 7 Expression of rL1 and pL1 peptide precursor gens in N. benthamiana. MS spectrum of pL1 (m/z 3052) and rL1 (m/z 3519 and 3616) peptides from N. benthamiana leaf. Their sequences are aligned on the side with MS value. Two rL1 peptides added 328 Da when they were reduced and alkylated (R&A) (m/z 3867 and 3964). pQ (m/z 128) stands for the pyroglutamyl modification of (pyroGlu).

!"!"?"$2=>$0/$6.6-0/+3-*$*2)4*+!5)+,4)*2)6-,2+,)8/0,@$6.6-0A+)0',$*55060*,)$BC%/) Given the apparent stability of the cyclized peptide backbone, it was of interest to see what changes would allow AEP-mediated cyclization of otherwise acyclic monocot peptides. To test the cyclization potential, pL1 and rL1 peptide variants were synthesized with the following changes: N-terminal Gln residue was changed to the canonical cyclotide Gly, and Asn was added at the C-terminus, followed by the canonical cyclotide motif Gly-Leu-Pro as illustrated in Figure 3.8A. These modified sequences were named rL1_che1 from rice and pL1_che1 from P. laxum. During synthesis and folding, two isomers of pL1_che1 were obtained with proper folding, but no correctly folded rL1_che1 was detected. Therefore only pL1_che1 was

95 examined in the following work. To test whether the pL1-che1 isomers could be cyclized, they were incubated with recombinantly produced OaAEP1b, which acts as a peptide ligase (Harris et al., 2015). Both linear G_pL1_N (m/z 3113) and cyclic G_pL1_N (m/z 3095) were detected after 24-hour incubation of the pL1_che1 isomer 1. The higher signal intensity of cyclic to linear peptide forms, suggesting that cyclization is favoured over hydrolysis. However, the considerable amount of remaining substrate suggested the cyclisation was inefficient and slow (Figure 3.8B).

Figure 3. 8 Cyclization of pL1 with recombinant OaAEP1b in vitro. A. Alignment of sequences of chemically synthesized pL1_che1 with native pL1. Predicted cleavage site of OaAEP1b is at C-terminal Asn-Gly (NG) (red letters with scissors above the sequences). The backbone was closed between C-terminal Asn and N-terminal Gly (connected with a line). Cysteines are aligned in white letters with red background. B. Spectra of pL1_che1 cyclized in vitro. The MS signals of cyclic G_pL1_N (m/z 3095) and linear G_pL1_N (m/z 3113) were detected after incubated linear pL1_che1 (m/z 3380) with OaAEP1b for 24 hr.

Following this promising result, modified precursors were tested to see if they could be cyclized in planta. Two constructs pL1_cyc1 and pL1_cyc2 that incorporated the modified N- and C-terminal peptide domains were prepared, as illustrated in Figure 3.9A. Another precursor (OakpL1_cyc) carried the Oak1 precursor protein in which the domain for kB1 was replaced with pL1(Figure 3.9A). To assist with cyclization,

Agrobacterium hosting expression vectors for OaAEP1b or butelase1 and Agrobacterium hosting the substrate precursor peptide genes were co-infiltrated into N. benthamiana leaf. In the case of pL1_cyc1 and pL1_cyc2, there was no peptides detected. However, cyclic (m/z 3095) and linear (m/z 3113) G_pL1_N, truncated linear pL1_N (m/z 3056) and extended linear G_pL1_NG (m/z 3170) were detected from the OakpL1_cyc precursor (Figure 3.9B). This successful maturation of cyclic peptides from the engineered precursor suggests the signal peptide and NTPP of Oak1 are sufficient to deliver the peptide to the plant endomembrane system, thus facilitating the N- and C-terminal processing of pro-peptides. Cyclic (m/z 3095) and linear G_pL1_N (m/z 3113) were detected in N. benthamiana leaf extract along with misprocessed pL1 peptides including linear pL_N (m/z 3056) and G_pL1_NG (m/z 3170). Without AEP co-expression, linear pL1 peptides were predominantly produced (Figure 3.9B). With AEP co-expression,

96 the relative amount of cyclic G_pL1_N detected improved. Compared to OaAEP1b, butelase1 performed better in cyclising G_pL1_N. Reduction, alkylation and backbone linearization digestion were performed to confirm the disulfide bonds and cyclized backbone of G_pL1_N produced in N. benthamiana. An expected mass shift (+348 Da) for reduced and alkylated cyclic peptide was detected. Following this, digestion with the endoproteinase Glu-C enzyme resulted in a mass shift of 18 Da which indicated that the N. benthamiana produced G_pL1_N was cyclized (Figure 3.9B).

Figure 3. 9 Cyclization of pL1 in planta. A. Alignment of pL1 sequences used for in planta cyclization. The predicted cleavage site is at the C-terminal Asn-Gly (NG) (highlighted in red letters with scissors above the sequences). The backbone (connected with a line) was closed between N-terminal Gly/Gln (G/Q) and C-terminal Asn (N). In pL1_cyc1 and 2, Thr (T) replaced Val (V) at the C-terminus (highlight in red letters). Cysteines are aligned in white letters with red background. OakpL1_cyc precursor is made by replacing kB1 (green) with pL1_cyc (navy) in the Oak1 gene. B. Spectra of OakpL1_cyc cyclization in planta. Truncated linear pL1_N (m/z 3056), cyclic G_pL1_N (m/z 3095), linear G_pL1_N (m/z 3113) and linear G_pL1_N (m/z 3170) were detected from the OakpL1_cyc construct either with or without butelase1 and OaAEP1b. After reduction and alkylation, MS signal of cyclic G_pL1_N (m/z 3095) was switched 348 Da to 3443 Da. After enzyme digestion, the MS signal shifted 18 Da to 3461 Da.

97 !"!"D"$(=B$0/$6.6-0/+3-*$*2)6-,2+,) The mLA peptide stands out because its sequence naturally contains putative cyclisable N- and C-termini from other cyclotide-like sequences in the Poaceae family. Like cyclotides, the mLA peptide comprises a N-terminal Gly, a C-terminal Asn and a short CTPP, which are the key residues for peptide cyclization. To test the cyclization potential of the mLA peptide, native mLA precursor and two constructs were made where the peptide domain of mLA inserted into the mLB and Oak1 precursors (Figure 3.10A). To assist cyclization, butelase1 and OaAEP1b were transiently alongside the three precursor genes in N. benthamiana leaves. However, there were no peptides detected in transgenic lines harbouring native mLA and mLBmLA precursors. Only cyclic mLA (m/z 3204) was detected from transgenic lines harbouring both OakmLA_cyc and butelase1 (Figure 3.10B). To confirm the disulfide bonds and cyclized backbone of this N. benthamiana produced cyclic mLA, reduction, alkylation and backbone linearization digestion were carried out. After reduction and alkylation, an expected mass shift (+348 Da) corresponding to three disulfide bonds was detected, but with very low intensity (Figure 3.10B). As the amount of reduced and alkylated cyclic mLA was low, no peptide was detected following endoproteinase Glu-C digestion.

Figure 3. 10 Cyclization of mLA in planta. A. Alignment of peptide sequences of mLA, mLBmLA and OakmLA_cyc. The mLA peptide starts with Gly at the N-terminus and ends with Asn at the C-terminus (connected with a line), followed with a short CTPP. The predicted cleavage site is at C-terminal Asn-Gly (scissors above the sequences). mLBmLA and OakmLA_cyc were made with the peptide domain of mLA inserted into the mLB and Oak1 precursors. The example shows kB1 (green) replaced with mLA (purple) in the Oak1 precursor. B. Spectra of mLA cyclization in planta. Cyclic mLA (m/z 3204) was detected when co-expressed with butelase1 in N. benthamiana leaves. After reduction and alkylation, MS signal of cyclic mLA (m/z 3204) was shifted 348 Da to 3552 Da.

98 3.4. Discussion 3.4.1. Distribution of cyclotide-like genes in monocot lineages To gain insight into the distribution of cyclotide-like sequences in monocot plants, a diverse set of monocot species were screened for cyclotide-like genes. These cyclotide-like genes identified share similar cysteine patterns to cyclotides, but they typically lack ligation-competent C-terminal sites. The economically and environmentally important non-Poaceae species of taro, lily, daffodil, coconut, ginger and cattails were chosen as representative species to cover a maximum number of taxonomic groups within the monocots. The lack of cyclotide-like sequences in these species indicates that the Poaceae is the sole monocot family with cyclotide-like gene signatures. This suggestion is consistent with previous reports (Mulvenna et al., 2006a, Salehi et al., 2017, Nguyen et al., 2013) where cyclotide-like sequences have been observed in taxa that are all members of the Poaceae. However, the ever increasing number of genomic and transcriptomic datasets being released, driven by decreasing sequencing costs, it is conceivable that cyclotide-like sequences could be discovered in a wider range of monocot plants.

3.4.2. Expression of monocot cyclotide-like genes in their native plants As important crops worldwide, rice and millet are useful plants in which to analyse the expression of cyclotide-like genes and determine whether they function in plant defence. The cyclotide-like gene rL1 is expressed throughout the rice plant, with the exception of the flower (Mulvenna et al., 2006a). Similarly, the cyclotide-like genes Zmcyc1 and Zmcyc5 from Z. mays are expressed in the whole plant, with the highest expression in leaf tissue (Salehi et al., 2017). Both of these cyclotide-like genes from Z. mays can be regulated by wounding, phytohormones and fungi. However, these cyclotide-like genes were only detected at transcript level.

By analysing the CREs in the promoter sequence of rL1, gene expression was predicted to be regulated by phytohormones, as the majority of the CREs identified were ABA responsive. On this basis, it was worth determining whether ABA or SA affect rL1 expression as well as SA. However, no rL1 transcript induction nor peptide was detected following ABA and SA treatments. For rL1 transcript detection, the RT-PCR experimental parameters may need further optimisation to detect low rL1 expression. Further investigation using other phytohormone treatments or fusing the rL1 promoter to a more sensitive reporter protein such as luciferase are worth pursuing. Assuming it is expressed at some point in rice, it would also be informative to test the function of the encoded rL1 peptide. Unfortunately, the peptide could not be purified from its native source, nor could it be produced in sufficient quantities in N. benthamiana, nor chemically synthesized. In previous studies, only panitides from P. laxum were discovered as cyclotide-like peptides in monocots (Nguyen et al., 2013). In the current study, mLA and mLB peptides were not detected in three accessions of foxtail millet. Together, the evidence available suggests that cyclotide-like genes may express

99 gene products only under specific stress. There is a scope for further work with induction experiments and functional characterization.

!":"!"$%.&'@-8)+(.-$('40506+)0',$'5$-0,*+&$6.6-')04*--0F*$2*2)04*/$ An interesting finding from the current study was that the N-termini of rL1 and pL1 had a post-translational pyroglutamyl modification when produced in N. benthamiana leaves. This characteristic also occurs in the panitides from P. laxum (Nguyen et al., 2013). When other acyclic peptides are compared to one another from the Poaceae family, a N-terminal Glu is present or predicted to be present in the cyclotide-like sequences discovered thus far (Figure 3.11A). Moreover, pyroGlu is also present in acyclotides from dicot species of the Solanaceae and Cucurbitaceae families (Figure 3.11B). An acyclotide, Phyb M, from Petunia hybrida shares the same cysteine-loop pattern and N-terminal pyroGlu as cyclotide-like sequences from the Poaceae (Poth et al., 2012). Although the squash trypsin inhibitors from the Cucurbitaceae share a cysteine-loop pattern different to that of the Poaceae cyclotide-like sequences, they contain the same pyroGlu at their N-termini, as recorded previously and illustrated in Figure 3.11B (Hara et al., 1989, Hamato et al., 1992, Mar et al., 1996). Interestingly, N-terminal pyroGlu is known to enhance the stability of peptides and thus, this particular modification may be a viable alternative mode of stabilization that makes cyclization unlikely to develop in the Poaceae through evolution. In cytotoxic T lymphocyte peptides, N-terminal Glu lymphocyte peptides are more stable than their corresponding Gln homologues (Beck et al., 2001). Small peptides discovered from Lyciumin barbarum also displayed an N-terminal pyroGlu motif, which presumably contributed to their stability in vivo. (Kersten & Weng, 2018). These observations suggest that pyroGlu modification might stalbe linear cyclotide-like peptides as they lack a stable cyclized backbone.

Figure 3. 11 Novel cyclotide-like sequences with Gln at N-termini. A. Alignment of cyclotide-like sequences in the Poaceae family. Cyclotide-like genes from P. laxum, O. sativa, S. italica and S. virdis have conserved Gln (Q in red) at their N-termini. B. Alignment of acyclotides from dicot plants. PyroGlu was detected in acyclotides from species in the Solanaceae (Poth et al., 2012) and the Cucurbitaceae families (Hara et al., 1989, Hamato et al., 1992, Mar et al., 1996). Phyb M from Solanaceae shares different cysteine-loop pattern compared to acyclotides from the Cucurbitaceae family, but share the same pattern ascyclotide-like sequences in Poaceae family. Cysteines are aligned in white letters with red background.

100 3.4.4. Cyclization of cyclotide-like peptides in vitro and in planta Reduced pL1 was able to convert into the native oxidized form at pH 8.0 (Nguyen et al., 2013). In that study, chemically synthesized pL1 was cyclized mediated with recombinant OaAEP1b in vitro. Additionally, the same homologue was cyclized in N. benthamiana when co-expressed with cyclization efficient AEPs. This work is the first to report the cyclization of monocot cyclotide-like sequences both in vitro and in planta. These successful expressions provide the possibility obtaining monocot cyclotide-like peptides via two pathways. Unfortunately, no chemosynthesis route of producing rL1 has been successful. One hypothesis is that the bracelet-like topology of rL1 causes synthesis failures as observed in the laboratory when synthesis of bracelet cyclotides is attempted, and has been attributed to folding issues (Leta Aboye et al., 2008). Alternatively, the rL1 can be produced in N. benthamiana leaves. This observation demonstrates the ability of plant expression systems as viable alternatives to chemosynthesis in the laboratory.

For the cyclization of monocot cyclotide-like peptides, the mLA peptide was of particular interest because it has Gly and Asn residues at its N- and C-termini respectively. These amino acids are suitable for AEP mediated backbone cyclization and are most common at the termini in prototypical cyclotides. Moreover, the C-terminal propeptide of mLA, SLAN, has a similar sequence to some acyclotides from the Violaceae (Figure 3.13). Collectively, these sequences have a short C-terminal propeptide, including a conserved Ser-Leu dipeptide (Simonsen et al., 2005, Burman et al., 2010, Zarrabi et al., 2013). This observation indicates that the mLA could be cyclized if processed by ligase-capable AEPs. In the current study, mLA was indeed cyclized when the mLA domain was inserted into the Oak1 precursor and co-expressed butelase1 in N. benthamiana. However, no mLA peptide was detected in experiments involving transiently expressed mLA native precursor or mLBmLA precursor. The successful detection of cyclic mLA from the OakmLA gene suggests that the Oak1 precursor can direct the mLA peptide to reach the subcellular compartment required for AEP mediated and facilitate cyclization. Although a mass matching that of cyclic mLA is detectable in N. benthamiana, the amount is quite low. This low expression hindered further tests to show the presence of three through reduction and alkylation, and hindered attempts to measure if the molecule was cyclic using the endo Glu-C treatment. In the future, scaled up or optimised expression of mLA in N. benthamiana may produce enough mLA to further study its structure and function.

101

Figure 3. 12 Alignment of acyclotides from Violaceae with mLA. A conserved Asn-Ser-Leu (NSL in red) was observed in both mLA and acyclotides from Violaceae. The putative AEP cleavage site is at Asn-Ser (NS) highlighted with scissors above the sequences. Cysteines are aligned in white letters with red background.

In the current study, butelase1 was highly efficient at cyclization of heterologous precursors in planta. In the case of OakpL1_cyc, the ratio of cyclic to linear peptide was higher when co-expressed with butelase1 compared to OaAEP1b. In the case of OakmLA_cyc, the cyclic peptide was only detected when co-expressed with butelase1. Both of these peptide expression cassettes were designed using the Oak1 precursor, which presumably carries signals that direct the precursor peptides to AEP-containing organelles. Unfortunately, this hybrid design did not achieve better cyclization when co-expressed with

OaAEP1b. This might be due to the broader substrate range and higher catalysis rate of butelase1 compared to OaAEP1b (Nguyen et al., 2016a, Nguyen et al., 2016b).

Overall, the results in this chapter have demonstrated that (i) monocot cyclotide-like genes are likely restricted to the Poaceae family, (ii) the N-terminal pyroGlu modification might stabilise linear cyclotide-like peptides, and (iii) monocot cyclotide-like genes can be re-engineered with minimal residue changes to allow backbone cyclization. These studies have laid the ground work for cyclotide-like genes in monocots. The insight gained from this work is beneficial for the development of monocot cereals (e.g. rice, maize) as efficient hosts of cyclic peptides.

102 3.5. References Altschul SF, Gish W, Miller W, et al., 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 403-410. Arabidopsis Genome Initiative, 2000. Analysis of the genome sequence of the Arabidopsis thaliana. Nature 408, 796-815. Beck A, Bussat MC, Klinguer-Hamour C, et al., 2001. Stability and CTL activity of N-terminal containing peptides. The Journal of Peptide Research 57, 528-538. Burman R, Gruber CW, Rizzardi K, et al., 2010. Cyclotide proteins and precursors from the genus Gloeospermum: filling a blank spot in the cyclotide map of Violaceae. Phytochemistry 71, 13-20. Chase MW, Christenhusz M, Fay M, et al., 2016. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society 181, 1-20. Cheneval O, Schroeder CI, Durek T, et al., 2014. Fmoc-based synthesis of disulfide-rich cyclic peptides. The Journal of Organic Chemistry 79, 5538-5544. Chiche L, Heitz A, Gelly JC, et al., 2004. Squash inhibitors: from structural motifs to macrocyclic knottins. Current Protein and Peptide Science 5, 341-349. Craig A, 2000. The characterization of conotoxins. Journal of Toxicology: Toxin Reviews 19, 53-93. Craik DJ, Daly NL, Bond T, et al., 1999. Plant cyclotides: A unique family of cyclic and knotted proteins that defines the cyclic cystine knot structural motif. Journal of Molecular Biology 294, 1327-1336. Ganz T, Lehrer RI, 1995. Defensins. Pharmacology & Therapeutics 66, 191-205. Gerlach SL, Burman R, Bohlin L, et al., 2010. Isolation, characterization, and bioactivity of cyclotides from the Micronesian plant Psychotria leptothyrsa. Journal of Natural Products 73, 1207-1213. Grabherr MG, Haas BJ, Yassour M, et al., 2011. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature Biotechnology 29, 644. Gruber CW, Elliott AG, Ireland DC, et al., 2008. Distribution and evolution of circular miniproteins in flowering plants. The Plant Cell 20, 2471-2483. Haas BJ, Papanicolaou A, Yassour M, et al., 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols 8, 1494. Hamato N, Takano R, Kamei-Hayashi K, et al.,1992. Purification and Characterization of Serine Proteinase Inhibitors form Gourd (Lagenaria leucantha RUSBY var. Gourda MAKINO) Seeds. Bioscience, Biotechnology, and Biochemistry 56, 275-279. Hara S, Makino J, Ikenaka T, 1989. Amino acid sequences and disulfide bridges of serine proteinase inhibitors from bitter gourd seeds. The Journal of Biochemistry 105, 88-92. Harris KS, Durek T, Kaas Q, et al., 2015. Efficient backbone cyclization of linear peptides by a recombinant asparaginyl endopeptidase. Nature Communications 6, 10199.

103 Hellinger R, Koehbach J, Soltis DE, et al., 2015. Peptidomics of circular cysteine-rich plant peptides: analysis of the diversity of cyclotides from viola tricolor by transcriptome and proteome mining. Journal of Proteome Research 14, 4851-4862. International Rice Genome Sequencing Project, 2005. The map-based sequence of the rice genome. Nature 436, 793-800. Ireland DC, Colgrave ML, Nguyencong P, et al., 2006. Discovery and characterization of a linear cyclotide from Viola odorata: implications for the processing of circular proteins. Journal of Molecular Biology 357, 1522-1535. Kersten RD, Weng JK, 2018. Gene-guided discovery and engineering of branched cyclic peptides in plants. Proceedings of the National Academy of Sciences 115, E10961-E10969. Leta Aboye T, Clark RJ, Craik DJ, et al., 2008. Ultra-stable peptide scaffolds for protein engineering- synthesis and folding of the circular cystine knotted cyclotide cycloviolacin O2. ChemBioChem 9, 103-113. Mar RI, Carver JA, Sheil MM, et al., 1996. Primary structure of trypsin inhibitors from Sicyos australis. Phytochemistry 41, 1265-1274. Meyer P, Saedler H, 1996. Homology-dependent gene silencing in plants. Annual Review of Plant Biology 47, 23-48. Mulvenna JP, Mylne JS, Bharathi R, et al., 2006a. Discovery of cyclotide-like protein sequences in graminaceous crop plants: ancestral precursors of circular proteins? The Plant Cell 18, 2134-2144. Mulvenna JP, Wang C, Craik DJ, 2006b. CyBase: a database of cyclic protein sequence and structure. Nucleic Acids Research 34, D192-D194. Mylne JS, Wang CK, Van Der Weerden NL, et al., 2010. Cyclotides are a component of the innate defense of Oldenlandia affinis. Peptide Science 94, 635-646. Nguyen GK, Hemu X, Quek JP, et al., 2016a. Butelase-mediated macrocyclization of d-amino-acid- containing peptides. Angewandte Chemie International Edition 55, 12802-12806. Nguyen GK, Qiu Y, Cao Y, et al., 2016b. Butelase-mediated cyclization and ligation of peptides and proteins. Nature Protocols 11, 1977-1988. Nguyen GKT, Lian Y, Pang EWH, et al., 2013. Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. Journal of Biological Chemistry 288, 3370-3380. Nguyen GKT, Lim WH, Nguyen PQT, et al., 2012. Novel cyclotides and uncyclotides with highly shortened precursors from Chassalia chartacea and effects of methionine oxidation on bioactivities. Journal of Biological Chemistry, 287, 17598-17607. Nguyen GKT, Zhang S, Wang W, Wong CTT, Ngan TKN, Tam JP, 2011. Discovery of a linear cyclotide from the bracelet subfamily and its disulfide mapping by top-down mass spectrometry. Journal of Biological Chemistry 286, 44833-44844.

104 Paterson AH, Bowers JE, Bruggmann R, et al., 2009. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551-556. Plan MRR, Göransson U, Clark RJ, et al., 2007. The cyclotide fingerprint in Oldenlandia affinis: elucidation of chemically modified, linear and novel macrocyclic peptides. ChemBioChem 8, 1001-1011. Porto WF, Miranda VJ, Pinto MF, et al., 2016. High-performance computational analysis and peptide screening from databases of cyclotides from Poaceae. Peptide Science 106, 109-118. Poth AG, Colgrave ML, Philip R, et al., 2011. Discovery of cyclotides in the Fabaceae plant family provides new insights into the cyclization, evolution, and distribution of circular proteins. ACS Chemical Biology 6, 345-355. Poth AG, Mylne JS, Grassl J, et al., 2012. Cyclotides associate with leaf vasculature and are the products of a novel precursor in petunia (Solanaceae). Journal of Biological Chemistry 287, 27033-27046. Sainsbury F, Thuenemann EC, Lomonossoff GP, 2009. pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnology Journal 7, 682-693. Salehi H, Bahramnejad B, Majdi M, 2017. Induction of two cyclotide-like genes Zmcyc1 and Zmcyc5 by abiotic and biotic stresses in Zea mays. Acta Physiologiae Plantarum 39, 131. Simonsen SM, Sando L, Ireland DC, et al., 2005. A continent of plant defense peptide diversity: cyclotides in Australian Hybanthus (Violaceae). The Plant Cell 17, 3176-3189. Wang CK, Craik DJ, 2018. Designing macrocyclic disulfide-rich peptides for biotechnological applications. Nature Chemical Biology 14, 417-427. Wang CK, Kaas Q, Chiche L, et al., 2007. CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering. Nucleic Acids Research 36, D206-D210. Zarrabi M, Dalirfardouei R, Sepehrizade Z, et al., 2013. Comparison of the antimicrobial effects of semipurified cyclotides from Iranian Viola odorata against some of plant and human pathogenic bacteria. Journal of Applied Microbiology 115, 367-375.

105 3.6. Supplementary sequences rL1

10 20 30 40 50 60 70 80 90 100 110 120 ATGTGGAGTGGCAAGAAGGCAGCTTGTATGGTGGTGGCCATGACCACAATACTGATTCTGCAGCTGACAGTAACACAGGTAGTATTGGCAATCCGCTCACCACAGGAGGTGGAACTAGAT TACACCTCACCGTTCTTCCGTCGAACATACCACCACCGGTACTGGTGTTATGACTAAGACGTCGACTGTCATTGTGTCCATCATAACCGTTAGGCGAGTGGTGTCCTCCACCTTGATCTA M W S G K K A A C M V V A M T T I L I L Q L T V T Q V V L A I R S P Q E V E L D>

130 140 150 160 170 180 190 200 210 220 230 240 GGTTTCATTACTACGGTAAAATCAGACTGCCTTACTGTCTCCCAAACGGAAGTGGATCAGGCTTGGACACCTTGCGGGGAAACTTGTCTTGTTTTACCATGTTATACAGCTCGGATTGGC CCAAAGTAATGATGCCATTTTAGTCTGACGGAATGACAGAGGGTTTGCCTTCACCTAGTCCGAACCTGTGGAACGCCCCTTTGAACAGAACAAAATGGTACAATATGTCGAGCCTAACCG G F I T T V K S D C L T V S Q T E V D Q A W T P C G E T C L V L P C Y T A R I G>

250 260 270 280 290 300 TGTCTATGTGTTAACAGCGTATGTATGCTACCTGTGCCTGCGAAGTCTGAAGTCATTGCATGA ACAGATACACAATTGTCGCATACATACGATGGACACGGACGCTTCAGACTTCAGTAACGTACT C L C V N S V C M L P V P A K S E V I A *> pL1

10 20 30 40 50 60 70 80 90 100 110 120 ATGGAGAGCGCCAAGAGAGTTGCTTGCGTGGTGGCCCTAGTGCTGCTTGTGCAGCTGATGGCAGCTCCGGCCACCATGGCACGCAATGTGGAAGTGGAAAACACTCCTCTCGTGGGCTTA TACCTCTCGCGGTTCTCTCAACGAACGCACCACCGGGATCACGACGAACACGTCGACTACCGTCGAGGCCGGTGGTACCGTGCGTTACACCTTCACCTTTTGTGAGGAGAGCACCCGAAT M E S A K R V A C V V A L V L L V Q L M A A P A T M A R N V E V E N T P L V G L>

130 140 150 160 170 180 190 200 210 220 230 TTGGATATTGCTAAGGAGGTTAACCACAACCAATTACCAATTTGCGGAGAAACTTGTGTTTTGGGCACATGCTACACTCCTGGTTGCAGGTGCCAGTATCCTATCTGTGTTAGGTGA AACCTATAACGATTCCTCCAATTGGTGTTGGTTAATGGTTAAACGCCTCTTTGAACACAAAACCCGTGTACGATGTGAGGACCAACGTCCACGGTCATAGGATAGACACAATCCACT L D I A K E V N H N Q L P I C G E T C V L G T C Y T P G C R C Q Y P I C V R *> pL1_cyc1

10 20 30 40 50 60 70 80 90 100 110 120 ATGGAGAGCGCCAAGAGAGTTGCTTGCGTGGTGGCCCTAGTGCTGCTTGTGCAGCTGATGGCAGCTCCGGCCACCATGGCACGCAATGTGGAAGTGGAAAACACTCCTCTCGTGGGCTTA TACCTCTCGCGGTTCTCTCAACGAACGCACCACCGGGATCACGACGAACACGTCGACTACCGTCGAGGCCGGTGGTACCGTGCGTTACACCTTCACCTTTTGTGAGGAGAGCACCCGAAT M E S A K R V A C V V A L V L L V Q L M A A P A T M A R N V E V E N T P L V G L>

130 140 150 160 170 180 190 200 210 220 230 240 TTGGATATTGCTAAGGAGGTTAACCACAACCAATTACCAATTTGCGGAGAAACTTGTGTTTTGGGCACATGCTACACTCCTGGTTGCAGGTGCCAGTATCCTATCTGTACTCGAAACGGT AACCTATAACGATTCCTCCAATTGGTGTTGGTTAATGGTTAAACGCCTCTTTGAACACAAAACCCGTGTACGATGTGAGGACCAACGTCCACGGTCATAGGATAGACATGAGCTTTGCCA L D I A K E V N H N Q L P I C G E T C V L G T C Y T P G C R C Q Y P I C T R N G>

250 260 CTTCCTTCTCTTGCTGCATGA GAAGGAAGAGAACGACGTACT L P S L A A *>

106 pL1_cyc2

10 20 30 40 50 60 70 80 90 100 110 120 ATGGAGAGCGCCAAGAGAGTTGCTTGCGTGGTGGCCCTAGTGCTGCTTGTGCAGCTGATGGCAGCTCCGGCCACCATGGCACGCAATGTGGAAGTGGAAAACACTCCTCTCGTGGGCTTA TACCTCTCGCGGTTCTCTCAACGAACGCACCACCGGGATCACGACGAACACGTCGACTACCGTCGAGGCCGGTGGTACCGTGCGTTACACCTTCACCTTTTGTGAGGAGAGCACCCGAAT M E S A K R V A C V V A L V L L V Q L M A A P A T M A R N V E V E N T P L V G L>

130 140 150 160 170 180 190 200 210 220 230 240 TTGGATATTGCTAAGGAGGTTAACCACAACGGATTACCAATTTGCGGAGAAACTTGTGTTTTGGGCACATGCTACACTCCTGGTTGCAGGTGCCAGTATCCTATCTGTACTCGAAACGGT AACCTATAACGATTCCTCCAATTGGTGTTGCCTAATGGTTAAACGCCTCTTTGAACACAAAACCCGTGTACGATGTGAGGACCAACGTCCACGGTCATAGGATAGACATGAGCTTTGCCA L D I A K E V N H N G L P I C G E T C V L G T C Y T P G C R C Q Y P I C T R N G>

250 260 CTTCCTTCTCTTGCTGCATGA GAAGGAAGAGAACGACGTACT L P S L A A *>

OakpL1_cyc

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCTAAGTTCACCGTCTGTCTCCTCCTGTGCTTGCTTCTTGCAGCATTTGTTGGGGCGTTTGGATCTGAGCTTTCTGACTCCCACAAGACCACCTTGGTCAATGAAATCGCTGAGAAG TACCGATTCAAGTGGCAGACAGAGGAGGACACGAACGAAGAACGTCGTAAACAACCCCGCAAACCTAGACTCGAAAGACTGAGGGTGTTCTGGTGGAACCAGTTACTTTAGCGACTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTACAAAGAAAGATATTGGATGGAGTGGAAGCTACTTTGGTCACTGATGTCGCCGAGAAGATGTTCCTAAGAAAGATGAAGGCTGAAGCGAAAACTTCTGAAACCGCCGATCAGGTG TACGATGTTTCTTTCTATAACCTACCTCACCTTCGATGAAACCAGTGACTACAGCGGCTCTTCTACAAGGATTCTTTCTACTTCCGACTTCGCTTTTGAAGACTTTGGCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 330 340 350 360 TTCCTGAAACAGTTGCAGCTCAAAGGATTACCAATTTGCGGAGAAACTTGTGTTTTGGGCACATGCTACACTCCTGGTTGCAGGTGCCAGTATCCTATCTGTGTTCGAAACGGTCTTCCT AAGGACTTTGTCAACGTCGAGTTTCCTAATGGTTAAACGCCTCTTTGAACACAAAACCCGTGTACGATGTGAGGACCAACGTCCACGGTCATAGGATAGACACAAGCTTTGCCAGAAGGA F L K Q L Q L K G L P I C G E T C V L G T C Y T P G C R C Q Y P I C V R N G L P>

370 TCTCTTGCTGCATGA AGAGAACGACGTACT S L A A *> mLA

10 20 30 40 50 60 70 80 90 100 110 120 ATGGACCCAATGTTACTGAAGGGAAAAATGAAAATTTCAATGGCACTGAAGGGATTTCATCAAGCACCGAGGGAATTGGTGGTTCTGCAGCTCATGGCAGCTCCGATGGCCATGGCCCGC TACCTGGGTTACAATGACTTCCCTTTTTACTTTTAAAGTTACCGTGACTTCCCTAAAGTAGTTCGTGGCTCCCTTAACCACCAAGACGTCGAGTACCGTCGAGGCTACCGGTACCGGGCG M D P M L L K G K M K I S M A L K G F H Q A P R E L V V L Q L M A A P M A M A R>

130 140 150 160 170 180 190 200 210 220 230 240 TCGCTGCCGGATACCACCCCCGTCCTGAGCTTAAACAGGATTGCAAGGGAATTTGCTTGGCTGGGTGGTATCCAGTGTGGGGAAAGTTGTGTTTGGATACCGTGCATTACTGCTGCCATC AGCGACGGCCTATGGTGGGGGCAGGACTCGAATTTGTCCTAACGTTCCCTTAAACGAACCGACCCACCATAGGTCACACCCCTTTCAACACAAACCTATGGCACGTAATGACGACGGTAG S L P D T T P V L S L N R I A R E F A W L G G I Q C G E S C V W I P C I T A A I>

250 260 270 280 290 GGTTGCAGCTGTCAGAACAGAGTTTGCTCCAGGAACAGCCTAGCTAATTAA CCAACGTCGACAGTCTTGTCTCAAACGAGGTCCTTGTCGGATCGATTAATT G C S C Q N R V C S R N S L A N *>

107 mLB

10 20 30 40 50 60 70 80 90 100 110 120 ATGGAGAGTGGCAAGAGGGCTACCGGTGTCGTGGTATTGGTGGCCATGATGGTGGTTCTGCAGCTCATGGCAGCTCCGATGGCCATGGCCCGCTCGGATAGCACCCCCGTCCTGAGCTTA TACCTCTCACCGTTCTCCCGATGGCCACAGCACCATAACCACCGGTACTACCACCAAGACGTCGAGTACCGTCGAGGCTACCGGTACCGGGCGAGCCTATCGTGGGGGCAGGACTCGAAT M E S G K R A T G V V V L V A M M V V L Q L M A A P M A M A R S D S T P V L S L>

130 140 150 160 170 180 190 200 210 220 230 240 AACAGGATTGCAAGGGAATTTGCTTCGCAGGGTGGTATCCCGTGTGGGGAAAGTTGTTTCTTGATACCGTGCGTTACTGCTGCCATCGGTTGCAGCTGTCAGGACAGAGTTTGCTACAAG TTGTCCTAACGTTCCCTTAAACGAAGCGTCCCACCATAGGGCACACCCCTTTCAACAAAGAACTATGGCACGCAATGACGACGGTAGCCAACGTCGACAGTCCTGTCTCAAACGATGTTC N R I A R E F A S Q G G I P C G E S C F L I P C V T A A I G C S C Q D R V C Y K>

TAA ATT *> mLBmLA

10 20 30 40 50 60 70 80 90 100 110 120 ATGGAGAGTGGCAAGAGGGCTACCGGTGTCGTGGTATTGGTGGCCATGATGGTGGTTCTGCAGCTCATGGCAGCTCCGATGGCCATGGCCCGCTCGGATAGCACCCCCGTCCTGAGCTTA TACCTCTCACCGTTCTCCCGATGGCCACAGCACCATAACCACCGGTACTACCACCAAGACGTCGAGTACCGTCGAGGCTACCGGTACCGGGCGAGCCTATCGTGGGGGCAGGACTCGAAT M E S G K R A T G V V V L V A M M V V L Q L M A A P M A M A R S D S T P V L S L>

130 140 150 160 170 180 190 200 210 220 230 240 AACAGGATTGCAAGGGAATTTGCTTGGCTGGGTGGTATCCAGTGTGGGGAAAGTTGTGTTTGGATACCGTGCATTACTGCTGCCATCGGTTGCAGCTGTCAGAACAGAGTTTGCTCCAGG TTGTCCTAACGTTCCCTTAAACGAACCGACCCACCATAGGTCACACCCCTTTCAACACAAACCTATGGCACGTAATGACGACGGTAGCCAACGTCGACAGTCTTGTCTCAAACGAGGTCC N R I A R E F A W L G G I Q C G E S C V W I P C I T A A I G C S C Q N R V C S R>

250 AACAGCCTAGCTAATTAA TTGTCGGATCGATTAATT N S L A N *>

OakmLA_cyc

10 20 30 40 50 60 70 80 90 100 110 120 ATGGCTAAGTTCACCGTCTGTCTCCTCCTGTGCTTGCTTCTTGCAGCATTTGTTGGGGCGTTTGGATCTGAGCTTTCTGACTCCCACAAGACCACCTTGGTCAATGAAATCGCTGAGAAG TACCGATTCAAGTGGCAGACAGAGGAGGACACGAACGAAGAACGTCGTAAACAACCCCGCAAACCTAGACTCGAAAGACTGAGGGTGTTCTGGTGGAACCAGTTACTTTAGCGACTCTTC M A K F T V C L L L C L L L A A F V G A F G S E L S D S H K T T L V N E I A E K>

130 140 150 160 170 180 190 200 210 220 230 240 ATGCTACAAAGAAAGATATTGGATGGAGTGGAAGCTACTTTGGTCACTGATGTCGCCGAGAAGATGTTCCTAAGAAAGATGAAGGCTGAAGCGAAAACTTCTGAAACCGCCGATCAGGTG TACGATGTTTCTTTCTATAACCTACCTCACCTTCGATGAAACCAGTGACTACAGCGGCTCTTCTACAAGGATTCTTTCTACTTCCGACTTCGCTTTTGAAGACTTTGGCGGCTAGTCCAC M L Q R K I L D G V E A T L V T D V A E K M F L R K M K A E A K T S E T A D Q V>

250 260 270 280 290 300 310 320 330 340 350 360 TTCCTGAAACAGTTGCAGCTCAAAGGTGGTATCCAGTGTGGGGAAAGTTGTGTTTGGATACCGTGCATTACTGCTGCCATCGGTTGCAGCTGTCAGAACAGAGTTTGCTCCAGGAACGGC AAGGACTTTGTCAACGTCGAGTTTCCACCATAGGTCACACCCCTTTCAACACAAACCTATGGCACGTAATGACGACGGTAGCCAACGTCGACAGTCTTGTCTCAAACGAGGTCCTTGCCG F L K Q L Q L K G G I Q C G E S C V W I P C I T A A I G C S C Q N R V C S R N G>

370 380 CTTCCTAGTTTGGCCGCATAA GAAGGATCAAACCGGCGTATT L P S L A A *>

108

!"#$%&'()((

*+,&'-%#+,.+/(%"&($0#-%.1.%2(34(-&&,-(43'(%"&((

&5$'&--.3+(34(/'#4%&,(1210.1($&$%.,&-(

4.1. Overview Seeds are the quiescent reproductive units that contain the adequate nutrients needed for a young plant to grow and establish itself. In a typical seed, a majority of the biomass is storage tissue, either endosperm (monocots) or cotyledons (dicots), that is rich in proteins, carbohydrates, and lipids. Because of the nature of these storage tissues, it is reasonable to envisage that they could be a storage location for disulfide rich cyclic peptides. This is the case for sunflower trypsin inhibitor-1 (SFTI-1), which is a 14 amino acid backbone cyclized peptide found naturally in sunflower seeds (Helianthus annuus) (Luckett et al., 1999). Its small size, structural stability, and amenability to chemical synthesis has led to its common use as a parent scaffold for engineering applications. Potent and stable SFTI-1 analogues have been engineered against a wide range of therapeutically relevant serine proteases, including matriptase, cathepsin G, kallikreins and the proteasome (Łęgowska et al., 2009, Gitlin-Domagalska et al., 2017, Riley et al., 2018, Swedberg et al., 2018). Additionally, SFTI-1 can penetrate cell membranes, which extends potential applications to intracellular targets (D’Souza et al., 2014).

The SFTI-1 peptide is encoded by the PawS1 gene, which also encodes an albumin storage protein. The maturation of both albumin and SFTI-1 requires the action of resident asparaginyl endopeptidases (AEPs). In the case of the SFTI-1 peptide, it undergoes a maturation process including the formation of a disulfide bridge and enzyme-mediated head-to-tail cyclization (Mylne et al., 2011). Despite AEPs playing important roles in the maturation of SFTI-1, the exact sunflower AEP responsible for the backbone cyclization of SFTI-1 remains unknown.

Like SFTI-1, cyclotides discovered within the seeds of gac (Momordica cochinchinensis) display potent trypsin inhibition activity (Hernandez et al., 2000, Heitz et al., 2001, Chan et al., 2013). These saved cyclotides have less sequence similarity with the other two cyclotide subfamilies (bracelet and Möbius), but retain the conserved three disulfide bonds arranged in a knotted topology. Like SFTI-1, the prototypic M. cochinchinensis cyclotide McoTI-II is reported to have very efficient cell penetrating properties (Cascales et al., 2011). The presence of these divergent structures in the seeds of sunflower and M. cochinchinensis suggests that the seed provides an environment conducive to cysteine rich cyclic peptide production, and this could be exploited for cyclotide study and production more widely.

In this chapter, experiments designed to gain insights into the plasticity of seeds for production of engineered cyclic peptides are described. First, the cyclization efficiencies of a number of AEPs from sunflower were tested on engineered cyclic peptides. Second, seed-specific promoters were used to drive the expression of genes encoding cyclic peptides and AEPs to test the plasticity of accumulation of cyclic peptides in Arabidopsis seeds. Lastly, a homozygous cyclization efficient transgenic line exhibiting high expression of a cyclization capable AEP was selected to improve the production of cyclic peptides in seeds.

110 4.2. Materials and methods 4.2.1 Expression of AEPs in N. benthamiana Vector construction Entry clones in the form of a pDS221 vector with selected sunflower AEP genes, HaAEP1 (KJ147147),

HaAEP2 (MH115430.1) and HaAEP3 (XM_022164834) together with OaAEP1b (KR259378) from O. affinis were provided by Mark Jackson. These recombined into the Gateway® compatible destination vector pEAQ-Dest1 using the method described for transient expression in N. benthamiana leaf in Section 3.2.5

Plant material and Agrobacterium infiltration Methods of N. benthamiana cultivation and infiltrating Agrobacterium harbouring vectors carrying HaAEP1, HaAEP2 and HaAEP3 into N. benthamiana leaf were described previously in Section 3.2.5.

4.2.2 Seed-specific expression of cyclic peptides in Arabidopsis Vector construction To test the production of cyclic peptides in Arabidopsis seeds, precursor genes for kB1 and SFTI-1 (Oak1 and PawS1 respectively), and engineered variants were cloned into the Gateway® compatible pOH123 vector (Figure 4.1) using LR ClonaseTM (Invitrogen). This recombination approach placed the peptide precursor gene of interest under the regulatory control of the Arabidopsis seed-specific oleosin gene (AT2G25890) promoter and heat shock protein terminator (HspT) (Nagaya et al., 2009). Genes encoding the cyclization efficient AEPs, OaAEP1b (LQ854891.1) and butelase1 (Nguyen et al., 2014), were likewise assembled into the pOH123 vector. The inclusion of the Bar gene in pOH123 allowed selection of transgenic Arabidopsis by application of the herbicide Basta. Assembled vectors were then transferred into Agrobacterium tumefaciens strain EHA105 by electroporation.

Plant material and cultivation Two species of Arabidopsis thaliana were used in the current study, including the wild-type Columbia (Col-0) and a βVPE (AT1G62710) knockout (βvpe) line (CS67915 from the Arabidopsis Biological Resource Centre). To ensure uniform germination, seeds were stratified in the dark at 4 °C for 3 days before transfer to a dedicated plant growth room (22 °C, 18/6h photoperiod). To provide a humid environment, germinating seeds were covered with plastic domes until the seedlings were one week old. The initial inflorescence shoots were trimmed to encourage secondary bolts. These secondary inflorescence shoots were used for floral dip transformation when they were ~8 cm tall.

111

Figure 4. 1 Map of the pOH123 vector. The oleosin promoter and heat shock protein terminator (HspT) were in attR1 and attR2’s flanks. Genes of cyclic peptides and AEPs replaced the chloramphenicol resistance gene (Cmr) and ccdB (a killer gene, a gyrase poison to bacterial cells) between the attR1 and attR2 sites by Gateway cloning. Basta (bar) gene is expressed under the control of cauliflower mosaic virus (CaMV) 35S promoter and terminator.

Floral dip transformation The floral dip transformation followed the protocol by Weigel and Glazebrook with minor differences (Weigel & Glazebrook, 2002). Agrobacterium harbouring expression vectors were cultured from single isolated colonies overnight in LB media supplemented with 50 µg/mL rifampicin and kanamycin each. Starter cultures were then scaled up to 100 mL and grown for a further two days at 30 °C. Cultures were then pelleted by centrifugation at 4000 RPM for five minutes before being resuspended to an OD600 of 0.5 in infiltration buffer (1/2 MS, 5% sucrose, 50$L/liter Silwet L-77). For each transformation, two or three pots of plants were inverted to submerge the inflorescence in the Agrobacterium suspension, soaking for a minute (Figure 4.2). After submersion, plants were laid down on their side and wrapped with cling wrap, where they were allowed to recover in dim light for 24 hours. Plants were then returned to normal growing conditions until maturity, upon which seeds were harvested (T1 seeds). To select transgenic events, T1 seeds were sprinkled on soil, vernalized and germinated as before, however upon germination, the emerging seedlings were sprayed with dilute Basta (280 $M glufosinate-ammonium final concentration) every day. When Basta positive seedlings were distinguishable from moribund seedlings, they were transferred to new pots and cultivated to maturity. Seeds harvested from T1 plants (T2 seeds) were used for genome and peptide analysis.

112

Figure 4. 2 Schematic diagram of floral dip transformation and Basta screening in Arabidopsis. Flower dip transformation starts with submerging the inflorescence in the Agrobacterium suspension for a minute. After immersion plants were placed down on their side and covered with cling wrap for 24 hours in the dark to preserve humidity. Plants were then allowed to grow to maturity, and seeds (T1) were collected for Basta selection. To select transgenic events, T1 seeds were sprinkled on soil, and the emerging seedlings were sprayed with dilute Basta. When Basta positive seedlings were distinguishable from moribund seedlings, they were transferred to new pots and cultivated to maturity. Seeds harvested from T1 plants (T2 seeds) were used for genome and peptide analysis.

Segregation analysis Segregation analysis was achieved by counting Basta resistance ratios of T2 plants derived from self-pollinated T1 seeds germinated on Basta (280 $M) soaked Whatman filter paper placed on vermiculite. Scoring of resistance and susceptible plantlets was performed using imaged plantlets by ImageJ (1.49v).

)6;6C(DE?(#+,(FE?(&5%'#1%.3+(#+,(@!F( DNA & RNA extraction, cDNA synthesis DNA was extracted from leaves of Basta resistant lines. The method of DNA extraction was described previously in Section 2.2.4. Primers used in the current study are shown in Table 4.1. Primer Oleop_Fwd and HspT_Rev were designed to amplify the sequence between the oleosin promoter and heat shock protein terminator in Basta resistant lines. Primer kB1_Fwd and N_GLDN_Rev or D_GLDN_Rev were used to amplify Basta resistant lines harbouring Oak1_GLDN and Oak[N29D]kB1_GLDN. RNA was extracted from seeds of homozygous OaAEP1b lines. Methods of RNA extraction and cDNA synthesis were described in Section 2.2.4.

Semi-quantitative PCR (semi-qPCR) Equal amounts of cDNA prepared above were used as templates for semi-qPCR. The PCR system was prepared based on Taq DNA Polymerase (InvitrogenTM). Ta: 55 °C, extension time: 60 s, internal control gene: Cra1 (cruciferin A (AT5G44120)). 26 and 28 PCR cycles were selected for analysis of the amplification product on an agarose gel. Primers used for semi-qPCR are shown in Table 4.1.

113 Table 4. 1 Primers used in Chapter 4 Primer name Primer sequence qOaAEP_Fwd GGAAGCTTGTGAGTCGGGTAGCATG qOaAEP_Rev CATTAGCAGGGTTAGAACCCATATAG qCra1_Fwd ATGGCTCGAGTCTCTTCTC qCra1_Rev TTAAGCTGCAGCCACCCTTGG Oleop_Fwd CGTGTCTTTGAATAGACTCC HspT_Rev AATTCATAACACAACAAGCC kB1_Fwd ATGCGGCCAAACTAGGAA N_GLDN_Rev ATTATCAAGGCCATTGCGTG D_GLDN_Rev ATTATCAAGGCCATCGCGTG Fwd: Forward primer; Rev: Reverse primer; Primers starting with q were designed for RT-PCR

4.2.4 Peptide extraction and MALDI-TOF-MS analysis Methods of peptides extraction from six-day old infiltrated N. benthamiana leaf and T2 transgenic Arabidopsis seeds, and preparation for MALDI-TOF-MS analysis were described in Section 2.2.5.

4.3. Results 4.3.1. Sunflower AEPs are poor ligases in Nicotiana benthamiana As SFTI-1 is produced naturally in sunflower seeds, it is likely that sunflower AEPs might be the most efficient for cyclising SFTI-1. To test this hypothesis, a transient gene expression approach was used to compare cyclization efficiencies of sunflower AEPs in N. benthamiana leaves. OaAEP1b and three sunflower (Helianthus annuus) AEP genes, HaAEP1, HaAEP2 and HaAEP3 were assembled into the pEAQ-Dest1 vector which allows high level transient expression (Peyret & Lomonossoff, 2013) in N. benthamiana. To maximise SFTI-1 precursor expression in leaf, the Oak1 gene, which performs well for kB1 production in N. benthamiana (Jackson et al., 2019), was modified to include SFTI-1 in place of the kB1 cyclotide domain (Figure 4.3A). Additionally, the C-terminal propeptide (CTPP) was altered so that it resembled the sequence that is naturally adjacent to SFTI-1 when in the PawS1 precursor. Finally, two variants were tested, one where SFTI-1 ends with an Asn and one where an Asp acid residue (Asp) is present. The logic behind the latter change was certain AEPs have altered preferences for these two different residues (Harris et al., 2015). All vectors that were constructed are displayed in Figure 4.3A.

The modified peptide precursor genes, OakSFTI-1_GLDN and Oak[D14N]SFTI-1_GLDN, were co-infiltrated separately, with each of the three sunflower AEP genes and the OaAEP1b gene for a comparison. At six-days post infiltration, leaf tissue was pulverized and peptides extracted for analysis. MS signals representing cyclic and linear SFTI-1 or [D14N]SFTI-1, and associated truncated peptides were observed for all co-expressed AEPs (Figure 4.3B). However, all three sunflower AEPs tested did not cyclise

SFTI-1 as efficiently as OaAEP1b when judged by the ratio of cyclic to linear peptide produced.

114 Additionally, in the case of all HaAEPs, more linear and truncated peptides were observed from Oak[D14N]SFTI-1_GLDN when compared to OakSFTI-1_GLDN which harbours the native Asp at C-terminus of SFTI-1.

Figure 4. 3 Testing the efficiency of AEPs for cyclization of SFTI-1 in planta. A. Peptide expression constructs for SFTI-1 expression in leaf. Constructs OakSFTI-1_GLDN and Oak[D14N]SFTI-1_GLDN were designed by replacing the kB1 domain (green) in Oak1 with that of SFTI-1 (purple), followed by a short C-terminal propeptide (GLDN, marron). B. MALDI MS spectra of recombinant peptides with tested AEPs. Peptide expression constructs were co-expressed with HaAEP1, HaAEP2, HaAEP3 and OaAEP1b as a positive control. For OakSFTI-1_GLDN, cyclic SFTI-1(m/z 1513) was only observed upon co-expression with OaAEP1b, along with truncated peptides SFTI-FPD (m/z 1172), SFTI-GR (m/z 1319) and SFTI-D (m/z 1417). For the expression of Oak[D14N]SFTI-1_GLDN, mass signals for both linear (m/z 1530) and cyclic [D14N]SFTI-1 (m/z 1512) were observed with co-expression with all tested AEPs as well as truncated peptides [D14N]SFTI-FPN (m/z 1172), [D14N]SFTI-GR (m/z 1318) and [D14N]SFTI-1 (m/z 1417). C. Relative yields of recombinant peptides with tested AEPs. The relative yield of cyclic SFTI-1 is much higher than other linear SFTI-1 related peptides when OakSFTI-1_GLDN was co-expressed with OaAEP1b. In contrast a low relative yield of cyclic [D14N]SFTI-1 to linear peptides is observed upon co-expression of Oak[D14N]SFTI-1_GLDN with all AEPs tested. Relative yields were calculated based on the MS signal intensity to an internally spiked peptide control.

115 To gain insights into why the three sunflower AEPs were poor ligases in N. benthamiana, their sequences were aligned with known ligase- and protease-type AEPs (Figure 4.4). Recently, structural studies of AEP ligases have pointed to three structural determinants for ligase activity, including the ligase-activity determinants (LAD1 & LAD2) and the marker of ligase activity (MLA), which contribute to the cyclization or hydrolysis activity (Jackson et al., 2018, Hemu et al., 2019). Alignments of ligase-type AEPs of C. ternatea butelase1, O. affinis OaAEP1b, V. yedoensis VyPAL2 and Petunia PxAEP3b with sunflower HaAEPs revealed very little sequence homology within the LAD1, LAD2 and MLA regions. Instead, the sequences of the three tested sunflower AEPs are more similar to the protease-type AEP, PxAEP3a.

Figure 4. 4 Alignment of the LAD and MLA regions in HaAEPs. Ligase-type AEPs, butelase1, OaAEP1b, VyPAL2 and PxAEP3b share similar variants (red letters) in LAD1 and LAD 2. Three selected HaAEPs share similarity with protease-type AEP (PxAEP3a) at LAD1 and LAD2 (navy letters). Ligase-type AEPs share either a deletion or hydrophobic residues (red frame) in the MLA region. Three selected HaAEPs share neither similarities with ligase-type AEPs.

)6C6;6(>5$'&--.3+(34(1210.1($&$%.,&-(.+(?'#B.,3$-.-(-&&,-( To test for recombinant production of cyclic peptides in plant seeds, stable Arabidopsis transgenic lines harbouring a series of peptide precursors and AEP genes were prepared. To restrict expression to the seed, the vector pOH123 was used, because it carries the oleosin seed specific promoter (Figure 4.5A). Initially, a co-transformation strategy was used with the aim of obtaining transgenic lines harbouring both the cyclic peptide and the OaAEP1b genes, where these were introduced on separate pOH123 expression vectors. Agrobacterium cultures harbouring each vector were mixed 1:1 prior to floral dipping. Both wild type Arabidopsis Col-0 and a !vpe mutant line were transformed. The #VPE isoform represents the most highly expressed seed specific Arabidopsis AEP, and by targeting this mutant, a reduced level of background endogenous AEP activity (e.g. cleavage of heterologous cyclic peptide precursors) was expected.

Basta resistant transgenic lines were selected and are summarised in Table 4.2. In the case of OakSFTI-1_GLDN, no transgenic lines were obtained in either the Col-0 or !vpe mutant expression background. Likewise for Oak[D14N]SFTI-1_GLDN, no transgenic !vpe lines were obtained. However, in this case several lines were obtained and tested by PCR screening for transgene cassette integration. For this test, primers were used that spanned the oleosin promoter and Hsp terminator (common to both AEP and peptide precursor constructs (Figure 4.5A). Despite transforming Arabidopsis with equal volumes of

Agrobacterium culture harbouring the peptide and the OaAEP1b expression cassettes, the majority of

116 transgenic events selected contained only a single cassette as indicated by amplification of a single band of either OaAEP1b or the peptide precursor gene in question (Figure 4.5B). Some Basta resistant lines produced no bands, possibly indicating a complex integration event where the T-DNA was disrupted. None contained both cassettes as evidenced by PCR.

Table 4. 2 Summary of transgenic lines by co-infiltration Basta Resistant Peptide precursor transgene Expression vector (pOH123) Col-0 !vpe Col-0 !vpe Oak1 13 8 8 4 Oak1_GLDN 7 5 3 2 Oak[N29D]kB1_GLDN 16 5 10 2 OakSFTI-1_GLPSLAA 8 3 5 1 OakSFTI-1_GLDN 0 0 - - Oak[D14N]SFTI-1_GLDN 23 0 8 - Oak[T4Y,I7R]SFTI-1 18 !" 13 9 Paw[T4Y,I7R]SFTI-1 22 7 18 5

Figure 4. 5 continued on following page with caption

117

Figure 4. 5 Co-transformation approach for the expression of cyclic peptide precursors and OaAEP1b in Arabidopsis seeds. A. Expression vectors. In all cases, the oleosin promoter (navy) and the heat shock protein terminator (grey) were used to regulate and restrict expression to the Arabidopsis seed. PCR primer sites for screening of transgene integration are indicated by arrows. B. PCR assay for transgene integration. PCR amplicons for either precursor peptide gene cassettes or OaAEP1b cassettes are indicated. Amino acid sequences for kB1/[N29D]kB1 (green), SFTI-1/[D14N]SFTI-1 (purple) and an engineered SFTI-1 based plasmin inhibitor peptide [T4Y,I7R]SFTI-1 are displayed. SP: signal peptide sequence (blue), NTPP: N-terminal propeptide (orange), NTR: N-terminal repeat (yellow), linker (brown), CTPP: C-terminal propeptide (GLPSLAA is originally from Oak1 (red) and the shorten GLDN (maroon) is originally from PawS1). Expected PCR amplicon sizes are as follows: OaAEP1b, 1426 bp; Oak1, 725 bp; Oak1_GLDN or Oak[N29D]kB1_GLDN, 716 bp; OakSFTI-1_GLPSLAA, 680 bp; Oak[D14N]SFTI-1_GLDN or Oak[T4Y,I7R]SFTI-1, 671 bp; and Paw[T4Y,I7R]SFTI-1, 614 bp. Basta resistant lines of Oak1_GLDN and Oak[N29D]kB1_GLDN were only tested with peptide specific primers.

)6C6C6(>5$'&--.3+(34(LM7(#+,(.%-(#+#03/K&-(.+(?'#B.,3$-.-(-&&,-( To determine whether kB1 could be correctly folded and cyclized in Arabidopsis seeds, three expression cassettes were constructed to express Oak1, Oak1_GLDN and Oak[N29D]kB1_GLDN (Figure 4.6A). While Oak1 contained the native GLPSLAA CTPP, Oak1_GLDN and Oak[N29D]kB1_GLDN were

118 designed to include a shorter CTPP encoding GLDN, as is present on the SFTI-1 precursor protein PawS1. As Asn and Asp residues in the C-terminus are known to be recognized by AEPs, Oak1_GLDN and Oak[N29D]kB1_GLDN were used to test their effects on AEP mediated processing.

PCR based screening for transgene integration generated eight Col-0 lines and four βvpe lines carrying the Oak1 transgene, three Col-0 lines and four βvpe lines carrying the Oak1_GLDN transgene and ten Col-zero lines and two βvpe lines carrying the Oak[N29D]kB1_GLDN transgene. Peptides were extracted from T2 seeds and analysed by MALDI-TOF-MS. In the Col-0 background, peptides detected from the Oak1 and Oak1_GLDN genes showed a similar pattern, with cyclic (m/z 2891), linear (m/z 2909) and a truncated (kB1–G, m/z 2852) kB1, as illustrated in Figure 4.6B. Minor signals were detected representing correctly cyclized kB1, suggesting that endogenous Arabidopsis AEPs are inefficient for peptide cyclization. Compared to Oak1 and Oak1_GLDN, Oak[N29D]kB1_GLDN transformants produced more linear extended kB1 peptides, including kB1+G (m/z 2967), +GL (m/z 3080), +GLD (m/z 3195) and GLDN (m/z 3309), but almost no cyclic [N29D]kB1 was observed. Along with these extended kB1 peptides, a relatively low signal intensity was observed for the full length linear kB1. This low signal suggests that Arabidopsis AEPs might be less active at processing at the Asp site in Oak[N29D]kB1_GLDN. Without efficient AEP activity, carboxypetidase activity on the kB1 precursor substrate would result in a higher level of extended peptides that were trimmed back from the C-terminal tail (Figure 4.6B & C). On this basis, it appears that the endogenous βVPE plays a key role in the processing of the kB1 C-terminus, but with the AEP preferring hydrolysis over backbone cyclization. Interestingly, in both Col-0 and βvpe backgrounds, a small proportion of cyclic kB1 (m/z 2891) mass was detected, suggesting that other endogenous Arabidopsis AEPs have low level of peptide ligase ability. Comparatively, the Col-0 and βvpe expression backgrounds differed in that βvpe seed exhibited greater misprocessing of Oak1 and Oak1_GLDN.

119

Figure 4. 6 Expression of Oak1 in Arabidopsis seeds. A. Oak1 variants designed for expression in seeds. Variations included the CTPP (GLPSLAA (red) or GLDN (merlot)) and Asn or Asp (purple) at position 29 of kB1 (green). B. MALDI spectra of kB1 related peptide produced in Col-0 seeds. Cyclic (m/z 2891), linear (m/z 2909) and a truncated (kB1–G, m/z 2852) kB1 were detected from Oak1 and Oak1_GLDN. For Oak[N29D]kB1_GLDN, linear kB1 and extended linear peptides were readily observed, including kB1+G (m/z 2967), +GL (m/z 3080), +GLD (m/z 3195) and GLDN (m/z 3309). C. MALDI spectra of kB1 related peptides produced in !vpe seeds. Only linear and or C-terminal extended linear kB1 peptides were detected upon Oak1 and Oak[N29D]kB1_GLDN expression. A small MS signal for cyclic kB1 (m/z 2891) was detected for Oak1_GLDN expression in addition to the linear kB1 related peptides.

120 )6C6)6(>5$'&--.3+(34(AJIH-7(#+,(.%-(#+#03/K&-(.+(?'#B.,3$-.-(-&&,-( To study SFTI-1 production in Arabidopsis seeds, three expression cassettes, OakSFTI-1_GLPSLAA, OakSFTI-1_GLDN and Oak[D14N]SFTI-1_GLDN were prepared. However, no transgenic lines carrying the OakSFTI-1_GLDN transgene were obtained. For OakSFTI-1_GLPSLAA, five Col-0 lines and one !vpe line were generated while eight Col-0 lines and zero !vpe lines of Oak[D14N]SFTI-1_GLDN were obtained. Peptides were extracted from T2 seeds and analysed by MALDI-TOF-MS. No SFTI-1 peptide was detected from any transgenic lines (Col-0 lines #5, #7, #8, #9, #10, !vpe line #1) harbouring the OakSFTI-1_GLPSLAA transgene (Figure 4.7). By contrast, both cyclic and linear [D14N]SFTI-1 were detected from six lines (Col-0 line #4, #11, #17, #19, #21, #22) carrying the Oak[D14N]SFTI-1_GLDN transgene, while two lines (Col-0 line #8, #9) showed no detectable peaks in the range of 1400 to 1640 Da.

Figure 4. 7 SFTI-1 and [D14N]SFTI-1 expression in Arabidopsis seeds. For lines harbouring the OakSFTI-1_GLPSLAA transgene, there were no detectable SFTI-1 MS signals. For Oak[D14N]SFTI-1_GLDN lines, both cyclic (m/z 1512) and linear (m/z 1530) [D14N]SFTI-1 MS signals were readily detected across multiple independent transgenic lines. Representative resistant lines are shown in this figure.

)6C6N6(>5$'&--.3+(34(#($0#-=.+(.+".B.%3'(B#-&,(3+(AJIH-7(.+(?'#B.,3$-.-(-&&,-( To test the plasticity of seeds to express a potential therapeutic cyclic peptide, a recently developed SFTI-1 analogue with potent plasmin inhibition activity was chosen (Swedberg et al., 2018). This inhibitor has two amino acid differences from SFTI-1, with a Tyr replacing the Thr at position four and Arg replacing Ile at position seven (Figure 4.8A). This modified SFTI-1 peptide was produced in N. benthamiana leaves using a transient expression approach (Jackson et al., 2019). To test if stable seed expression is also a viable production approach, the plasmin inhibitor peptide, [T4Y,I7R]SFTI-1 was inserted into both the Oak1 and PawS1 precursor gene sequences, replacing the kB1 domain and SFTI-1 domain respectively. After translation and selection, 13 Col-0 lines and nine !vpe lines of Oak[T4Y,I7R]SFTI-1, and 18 Col-0lines and five !vpe lines of Paw[T4Y,I7R]SFTI-1 were obtained. Peptides from T2 seeds were extracted and

121 analysed using MALDI-TOF-MS. Both cyclic (m/z 1618) and linear (m/z 1636) [T4Y,I7R]SFTI-1 MS signals were detected from ten Col-0 lines and eight !vpe lines harbouring Oak[T4Y,I7R]SFTI-1. For Paw[T4Y,I7R]SFTI-1, 11 Col-0 lines and three !vpe lines produced detectable peptides. The representative MALDI spectra of Col-0 and !vpe lines of Oak[T4Y,I7R]SFTI-1 and Paw[T4Y,I7R]SFTI-1 are shown in Figure 4.8B. In Col-0 lines, accumulation of linear predominated over that of cyclic [T4Y,I7R]SFTI-1 for both expression cassettes. However, a higher proportion of cyclic [T4Y,I7R]SFTI-1 was detected from Oak[T4Y,I7R]SFTI-1 in !vpe lines, while only low MS signal intensities of peptides were detected from Paw[T4Y,I7R]SFTI-1 in !vpe lines (five lines analysed). Peaks shown at 1601 and 1652 Da are unknown peptides which might be post-translationally modified [T4Y,I7R]SFTI-1 analogues.

Figure 4. 8 Expression of an SFTI-1 engineered plasmin inhibitor in Arabidopsis seeds. A. sequences of SFTI-1 and [T4Y,I7R]SFTI-1. [T4Y,I7R]SFTI-1 is a potent plasmin inhibitor with two amino acid differences from SFTI-1 (purple), the 4th amino acid Tyr replaced Thr (blue) and the 7th amino acid Arg replaced Ile (blue). B. Representative MALDI spectra of [T4Y,I7R]SFTI-1 expression in Col-0 and !vpe lines. Cyclic (m/z 1618) and linear (m/z 1636) [T4Y,I7R]SFTI-1 were detected from both expression cassettes, Oak[T4Y,I7R]SFTI-1 and Paw[T4Y,I7R]SFTI-1. Peaks at 1601 and 1652 Da were detected as unknown peptides.

)6C6O6(8B%#.+.+/(-%#B0&(.)?>@7B(%'#+-/&+.1(0.+&-( Due to the low co-transformation efficiency, an alternative strategy to obtain transgenic lines harbouring dual expression cassettes was undertaken. Firstly, an OaAEP1b expression line was chosen from experiments in Section 4.3.2 where OaAEP1b was present as a single copy with strong expression. This parent line could then be used in either crossing experiments with peptide precursor expressing Arabidopsis transformants, or used in subsequent re-transformation experiments. To undertake these studies, Basta resistant T1 plants were first screened to select single copy insertion lines, which would segregate at a ratio of 3:1 (Figure 4.9A). T2 seeds collected from these single copy lines were then screened to select homozygous lines which would show no segregation with a 100% Basta resistance ratio. T3 seeds collected from those selected homozygous lines were then screened by semi-quantitative PCR to identify lines exhibiting high expression.

122 By the above segregation strategy, nine out of 12 lines from the Col-0 background and five out of nine lines from the !vpe mutant background were segregating at a 3:1 ratio (Figure 4.9B). From these, five OaAEP1b expression lines in the Col-0 background and three from the !vpe background displayed a Basta-resistance phenotype exclusively, and were therefore retained as lines that were homozygous for a single copy of

OaAEP1b (Figure 4.9B). As the location of transgene is known to affect the transgene expression level, semi-quantitative PCR was used to select a line with high expression. The seed storage protein gene, cruciferin A (Cra1) was used as the control gene to normalise expression strength of OaAEP1b. Based on the brightness of amplification bands, Arabidopsis transformants #2-2 in the !vpe background and #3-2 in the Col-0 background were selected as the best OaAEP1b expression lines (Figure 4.9C). Lines #7-1, #9-3 and #11-1 in the Col-0 background showed no amplification in 28-cycle PCR suggesting their expression were silenced.

Figure 4. 9 Selection of OaAEP1b expressing Arabidopsis transgenic lines. A. Schematic diagram of segregation analysis. After flower dip transformation, Basta resistant lines (T1 plants) were segregated to select single copy insertion lines (Aa), which would show 3:1 segregation ratio (AA & Aa / aa = 3:1). Seeds collected from those single copy insertion lines were then screened to select homozygous lines which would show no more segregation with a 100% Basta resistant ratio (AA). Their seeds (T3 seeds) were collected as the homozygous transgenic lines. Cross in circle stands for self-breeding. B. Segregation analysis of OaAEP1b transgenic lines. Seeds for segregation were germinated on wet filter paper with vermiculate underneath. Number of Basta resistant (green) and non-resistant seeds (yellow) were analysed for their segregation ratio. For example, as segregation ratio is 83:28 (~ 3:1) in segregation 1, line OaAEP1b Col-0 #3 was selected as the single copy insertion line. In segregation 2, progeny of line OaAEP1b Col-0 #3, OaAEP1b Col-0 #3-2 was selected as the homozygous line as there is no more segregation (61:0). C. Semi-quantitative PCR of OaAEP1b transgenic lines. 26 and 28 PCR cycles were used to estimate the brightness of amplifications. These seed storage protein gene cruciferin was used as the control gene. Arabidopsis transformant #2-2 in the !vpe background and #3-2 in the Col-0 background showed the brightest bands in both 26- and 28-cycle PCRs.

123 4.4. Discussion Plant seeds are natural reservoirs of storage proteins and provide an ideal environment to produce and accumulate heterologous peptides/proteins. Moreover, some cyclic peptide structures, like those of SFTI and MCoTI, are seed produced cyclic peptides (Luckett et al., 1999, Chan et al., 2013). Therefore, this current study examined the potential cyclization enzymes already present in sunflower that could aid the accumulation of seed-derived cyclic peptides. The plasticity of cyclic peptide production in seeds was also assessed.

4.4.1. Cyclization efficiency of sunflower AEPs

Co-expression of ligase type AEPs, such as OaAEP1b from O. affinis and butelase1 from C. ternatea have been shown to increase the yield of heterologously expressed cyclic peptides in plants (Poon et al., 2017, Jackson et al., 2019). Here, the potential of three sunflower AEPs (HaAEP1, 2 and 3) to cyclise SFTI-1 and its variant [D14N]SFTI-1 were tested in N. benthamiana leaves. There were no obvious improvements of the yield of cyclic SFTI-1 or [D14N]SFTI-1 with any of the sunflower AEPs tested, suggesting that none of these AEPs are suited to cyclise the tested peptide precursors in leaf. Subsequent sequence analysis revealed that, unlike OaAEP1b or butelase1, all three tested HaAEPs share more structural similarities with hydrolysis preferring AEPs over ligase-type AEPs. In addition, other studies indicated that HaAEP1 is a pH-dependent ligase (Elliott et al., 2014, Haywood et al., 2018). Considering that leaf cells and seed tissue are vastly different cell types, there are confounding differences that may inhibit the cyclization activity of these sunflower AEPs in leaf. Indeed, of the three sunflower AEPs tested, it is unclear if their signal peptide and propeptide regions provide suitable vacuole targeting elements for efficient targeting in N. benthamiana leaf cells. In this context, it would be interesting to test the co-expression of each of these three sunflower AEPs in the seeds of the well-established model Arabidopsis.

4.4.2. Co-transformation in floral dip transformation To understand the plasticity of seeds for producing cyclic peptides, precursor genes of cyclic peptides and AEP ligase genes were co-transformed into Arabidopsis. Co-transforming expression cassettes delivered on separate T-DNAs were used because this approach greatly simplified gene cloning. However, in the current study the co-transformation efficiency was very poor. Only transgenic lines carrying single expression cassettes either for a precursor peptide gene or an AEP ligase gene were obtained. One reason for this could be the concentration of Agrobacterium inoculant used, as co-transformation efficiency maybe the most optimal if a higher density of Agrobacterium was used (OD600 = 2~3) (Stuitje et al., 2003). An observed lower co-transformation frequency might also be due to an immune-like reaction, as plant cells may become resistant after the first successful T-DNA is integrated (De Buck et al., 1998). The virulence of Agrobacterium strains and the physiology of plants can also affect the co-transformation efficiency (Ghedira et al., 2013). Alternative strategies to obtain lines co-expressing cyclic peptide genes with AEP

124 ligase genes include hybridisation, transformation of dual expression cassettes, or by subsequent stacking of transgenes by sequential transformation (Weigel & Glazebrook, 2002). In the current study, a few peptide expressing lines and a stable OaAEP1b expressing line were obtained. These lines are now ready for introgression of cyclic peptide precursors and AEP transgenes in gene stacking experiments via hybridisation or secondary transformation. As this strategy increased cyclic peptide yield in N. benthamiana leaves (Poon et al., 2017), the expression and accumulation of cyclic peptides in seeds maybe improve as well. Such as expression platform will provide substantial information about the cyclic peptide production in seeds.

4.4.3. Expression of cyclic peptides in Arabidopsis seeds To initiate the understanding of seed plasticity and cyclic peptide production, it was worthwhile to investigate endogenous Arabidopsis seed AEPs may facilitate production of cyclic products. Overall, the peptides detected as masses were either full length linear, truncated or extended linear forms. In the case of kB1, a small proportion of MS signal was attributed to the cyclic mass, suggesting that one or several endogenous Arabidopsis AEPs have some ability to perform peptide ligation in the seed. When the Oak1 gene was introduced into the Arabidopsis βVPE knockout line (Shimada et al., 2003), the cyclic mass was still present, suggesting that another endogenous AEP other than βVPE may be active. Indeed, both αVPE (AT2G25940) and δVPE (AT3G20210) are expressed in seeds at some point during seed development (http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi).

As previously reported, the C-terminal tail of peptide precursor proteins plays a key role in determining the ligation efficiency of ligase-type AEPs (Harris et al., 2015). The residues GLDN that naturally flank SFTI-1 in the PawS1 precursor protein have been used in other studies for both in vitro and in planta cyclization studies (Haywood et al., 2018, Jackson et al., 2019), while GLPSLAA from Oak1 is commonly used to produce kB1 in planta (Poon et al., 2017, Jackson et al., 2018). Here, both GLDN and GLPSLAA flanking residues were tested, but no major differences were observed between the two in the case of kB1. For SFTI-1, only GLPSLAA was tested because no GLDN lines were obtained successfully. However, no masses for SFTI-1 related peptides were detected from all five transgenic GLPSLAA lines. When the GLDN tail was used with the [D14N]SFTI-1 variant, transgenic lines were obtained from which both cyclic and linear peptides were detected. The change from Asp to Asn residue in SFTI-1 is the likely reason for this success, as AEP prefer acting on Asn versus Asp residues, rather than the difference of C-terminal tails. Further work is required to confirm whether these contrasting results are due to the tail residues, the Asp to Asn residue exchange, or through inefficient transgene expression.

Asparaginyl endopeptidases are known to process residues at Asn or Asp sites, which correlates with the strict sequence conservation observed at the processing site of cyclotides and SFTI-1 (Mulvenna et al., 2006, Shafee et al., 2015). Although cyclotide variants exist that contain either an Asn or an Asp, native

125 SFTI-1 contains only an Asp. Recent studies indicated that some AEP ligases like OaAEP1b can function in peptide ligation at either Asn or Asp residues, while other ligases, like butelase-1, prefer only Asn (Poon et al., 2017). Consequently, an SFTI-1 variant carrying the D14N residue exchange was tested. It showed that the expression of this modified variant resulted in a higher MS signal for cyclic [D14N]SFTI-1 than for linear [D14N]SFTI-1. This suggested that [D14N]SFTI-1 may be processed more efficiently by endogenous Arabidopsis AEPs. This is in contrast to recent research in N. benthamiana, where only SFTI-1, but not [D14N]SFTI-1 was observed in leaf cells (Jackson et al., 2018). Together these results highlight that not all peptides have structures amenable for expression in leaf. This was supported by the fact that [D14N]SFTI-1 instead accumulated in seeds, suggesting that stability in leaf post-translation may be an issue and that the peptide is more stable in the seed environment.

A potent plasmin inhibitor based on SFTI-1, denoted as [T4Y,I7R]SFTI-1, was developed as a promising antifibrinolytic drug candidate (Swedberg et al., 2018). This potent plasmin inhibitor has been produced with a yield up to 56.5 ± 10.8 µg/g DW in N. benthamiana leaf (Jackson et al., 2019). In that study, the Oak1 and PawS1 precursors were both engineered to encode [T4Y,I7R]SFTI-1. Relying on endogenous AEP activity, there appears to be no significant difference in the peptide accumulation profiles between these two precursors when they were produced in the Col-0 background. This finding demonstrated that both Oak1 and PawS1 precursors encoding an SFTI-like molecule will lead to cyclic peptide accumulation in Arabidopsis seeds. However, the observation of a higher ratio of cyclic/linear [T4Y,I7R]SFTI-1 when expressed in βvpe lines suggests that endogenous βVPE expression may be counterproductive. To confirm this hypothesis, more lines expressing target peptides in βvpe lines will need to be analysed. Furthermore, it might be beneficial to co-express OaAEP1b to further improve the production of [T4Y,I7R]SFTI-1 in seeds. Other strategies, similar to those demonstrated with transient expression in N. benthamiana (Jackson et al., 2019), such as the expression of a precursor protein with tandem repeats of [T4Y,I7R]SFTI-1 could be helpful in improving the yield of [T4Y,I7R]SFTI-1 in seeds. Other potential approaches could also improve the cyclic peptide yield in seeds. These include reducing seed storage proteins (Kawakatsu et al., 2010), fusion of genes with strong expression signal propeptides (Fujiwara et al., 2016) or decreasing zein synthesis (Huang et al., 2006).

In summary, cyclic peptides have a wide range of structural and functional characteristics. Thus far, successful heterologous expression of interested cyclic peptides is still subject to case-by-case variation. The mechanism underpinning misprocessing including AEP activity, substrate stability, and cellular environmental differences, remains to be elucidated, and is a subject for future work.

126 4.5. References Cascales L, Henriques ST, Kerr MC, et al., 2011. Identification and characterization of a new family of cell-penetrating peptides. Journal of Biological Chemistry 286, 36932-36943. Chan LY, He W, Tan N, et al., 2013. A new family of cystine knot peptides from the seeds of Momordica cochinchinensis. Peptides 39, 29-35. D’souza C, Henriques ST, Wang CK, et al., 2014. Structural parameters modulating the cellular uptake of disulfide-rich cyclic cell-penetrating peptides: MCoTI-II and SFTI-1. European Journal of Medicinal Chemistry 88, 10-18. De Buck S, Jacobs A, Van Montagu M, et al., 1998. Agrobacterium tumefaciens transformation and cotransformation frequencies of Arabidopsis thaliana root explants and tobacco protoplasts. Molecular Plant-Microbe Interactions 11, 449-457. Elliott AG, Delay C, Liu H, et al., 2014. Evolutionary origins of a bioactive peptide buried within preproalbumin. The Plant Cell 26, 981-995. Fujiwara Y, Yang L, Takaiwa F, et al., 2016. Expression and purification of recombinant mouse interleukin-4 and-6 from transgenic rice seeds. Molecular Biotechnology, 58, 223-231. Ghedira R, De Buck S, Nolf J, et al., 2013. The efficiency of Arabidopsis thaliana floral dip transformation is determined not only by the Agrobacterium strain used but also by the physiology and the ecotype of the dipped plant. Molecular Plant-Microbe Interactions 26, 823-832. Gitlin-Domagalska A, Dębowski D, Łęgowska A, et al., 2017. Design and chemical syntheses of potent matriptase-2 inhibitors based on trypsin inhibitor SFTI-1 isolated from sunflower seeds. Peptide Science 108, e23031. Harris KS, Durek T, Kaas Q, et al., 2015. Efficient backbone cyclization of linear peptides by a recombinant asparaginyl endopeptidase. Nature Communications 6, 10199. Haywood J, Schmidberger JW, James AM, et al., 2018. Structural basis of ribosomal peptide macrocyclization in plants. eLife 7, e32955. Heitz A, Hernandez J-F, Gagnon J, et al., 2001. Solution structure of the squash trypsin inhibitor MCoTI-II. A new family for cyclic knottins. Biochemistry 40, 7973-7983. Hernandez JF, Gagnon J, Chiche L, et al., 2000. Squash trypsin inhibitors from Momordica cochinchinensis exhibit an atypical macrocyclic structure. Biochemistry 39, 5722-30. Hemu X, El Sahili A, Hu S, et al., 2019. Structural determinants for peptide-bond formation by asparaginyl ligases. Proceedings of the National Academy of Sciences, 116, 11737-11746. Huang S, Frizzi A, Florida CA, et al., 2006. High and high transgenic maize resulting from the reduction of both 19-and 22-kD α-zeins. Plant Molecular Biology 61, 525-535. Jackson M, Gilding E, Shafee T, et al., 2018. Molecular basis for the production of cyclic peptides by plant asparaginyl endopeptidases. Nature Communications 9, 2411.

127 Jackson MA, Yap K, Poth A, et al., 2019. Rapid and scalable plant based production of a potent plasmin inhibitor peptide. Frontiers in Plant Science 10, 602. Kawakatsu T, Hirose S, Yasuda H, et al., 2010. Reducing rice seed storage protein accumulation leads to changes in nutrient quality and storage organelle formation. Plant Physiology 154, 1842-1854. Łęgowska A, Dębowski D, Lesner A, et al., 2009. Introduction of non-natural amino acid residues into the substrate-specific P1 position of trypsin inhibitor SFTI-1 yields potent chymotrypsin and cathepsin G inhibitors. Bioorganic & Medicinal Chemistry 17, 3302-3307. Luckett S, Garcia RS, Barker J, et al., 1999. High-resolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds1. Journal of Molecular Biology 290, 525-533. Mulvenna JP, Wang C, Craik DJ, 2006. CyBase: a database of cyclic protein sequence and structure. Nucleic Acids Research 34, D192-D194. Mylne JS, Colgrave ML, Daly NL, et al., 2011. Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nature Chemical Biology 7, 257. Nagaya S, Kawamura K, Shinmyo A, et al., 2009. The HSP terminator of Arabidopsis thaliana increases gene expression in plant cells. Plant and Cell Physiology 51, 328-332. Nguyen GK, Wang S, Qiu Y, et al., 2014. Butelase 1 is an Asx-specific ligase enabling peptide macrocyclization and synthesis. Nature Chemical Biology 10, 732. Peyret H, Lomonossoff GP, 2013. The pEAQ vector series: the easy and quick way to produce recombinant proteins in plants. Plant Molecular Biology 83, 51-58. Poon S, Harris KS, Jackson MA, et al., 2017. Co-expression of a cyclizing asparaginyl endopeptidase enables efficient production of cyclic peptides in planta. Journal of Experimental Botany 69, 633-641. Riley BT, Ilyichova O, Harris JM, et al., 2018. Cyclic Peptide Serine Protease Inhibitors Based on the Natural Product SFTI-1. In. Janetka JW, Benson RM (ed.) Extracellular Targeting of Cell Signaling in Cancer: Strategies Directed at MET and RON Receptor Kinase Pathways, Chapter 10 pp. 277-306. Shafee T, Harris K, Anderson M, 2015. Biosynthesis of cyclotides. In. Craik DJ (ed) Advances in Botanical Research Plant Cyclotides. Elsevier, Chapter 8 pp. 227-269. Shimada T, Yamada K, Kataoka M, et al., 2003. Vacuolar processing enzymes are essential for proper processing of seed storage proteins in Arabidopsis thaliana. Journal of Biological Chemistry 278, 32292-32299. Stuitje AR, Verbree EC, Van Der Linden KH, et al., 2003. Seed-expressed fluorescent proteins as versatile tools for easy (co) transformation and high-throughput functional genomics in Arabidopsis. Plant Biotechnology Journal 1, 301-309.

128 Swedberg JE, Wu G, Mahatmanto T, et al., 2018. Highly potent and selective plasmin inhibitors based on the sunflower trypsin inhibitor-1 scaffold attenuate fibrinolysis in plasma. Journal of Medicinal Chemistry 62, 552-560. Weigel D, Glazebrook J, 2002. Arabidopsis: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor.

Note: Credits of images used in page http://labs.mcdb.lsa.umich.edu/labs/pichersky/arabi/arabi.html https://commons.wikimedia.org/wiki/File:Arabidopsis_thaliana_sl2.jpg http://www.aecageco.com/Products/Worlds-Best-Bird-Food-Bulk/Grey-Stripe-Sunflower-Seed-25-lb-.html https://www.freepng.ru/png-gaocqz/

129

!"#$%&'(N((

!3+10K-.3+-(#+,(4K%K'&(,.'&1%.3+-(

5.1. Conclusions Three aims were explored in the current study to assess the potential of plants for the production of cyclic peptides using four plant expression systems. For the first time, rice suspension cells were used as a system for producing the prototypical cyclotide kB1. It also reported the first successful heterologous expression of a cyclized peptide from a dicot that is structurally equivalent to that from a monocot cell. Moreover, it demonstrates that rice seeds possess promising potential for the production and accumulation of cyclic peptides. The capability of cyclising engineered monocot cyclotide-like peptides using a transient leaf expression system was demonstrated for the first time. Finally, the flexibility of seeds for the production of various cyclic peptides was explored, and a stable seed-specific OaAEP1b expression line was created in Arabidopsis as a platform to further explore seed production of cyclic peptides.

In Chapter 1, literature was reviewed to provide the background information about cyclic peptide and the potential for plant molecular pharming to provide green alternatives in cyclic peptide production. Plant-derived cysteine-rich cyclic peptides have been developed as promising drug scaffolds and agricultural agents over the last two decades. At the same time, plant expression systems have been developed to enable recombinant protein/peptide production in various systems, like leaf, seed and cell expression systems. Bringing these advances together by developing in planta systems for cyclic peptides is an ideal strategy to achieve production in an environmental friendly, and cost effective and efficient manner.

In Chapter 2, two cyclic peptide production approaches based on rice were investigated, including suspension cells and seeds. No naturally occurring monocot cyclic peptides were previously reported to date and thus the hypothesis that monocots were able to produce cyclic peptides was tested here. Natural and engineered cyclic peptide production was tested in suspension cells. In the case of cyclic kB1, the production yield reached 64.21 ug/g (DW) with the assistance of a cyclization efficient enzyme, OaAEP1b. The structure of the heterologous kB1 was shown to be structurally equivalent to the native kB1 extracted from O. affinis by co-elution, reduction, alkylation, enzyme digestion and NMR. A curious phenotype consisting of the up-regulation of endogenous OsAEPs when OaAEP1b was co-expressed might limit the yield of kB1. Although the observation of mature kB1 production in rice was exciting, the engineered SFTI-1 candidates tested could not be produced in rice suspension cells and showed there are limits to this system. For the production in rice seeds, kB1 and SFTI-1 were produced with yields up to 1.05 ug/g (DW) and 0.165 ug/g (DW) respectively. Although the yield is not of the same magnitude as in suspension cells, rice seed is a promising bioreactor to produce and accumulate cyclic peptides when considering the seed biomass and stable storage for heterologous peptides.

In Chapter 3, the occurrence of cyclotide-like genes in a phylogenetically broad range of monocot plants was surveyed. Transcriptome analysis supported the findings of previous studies of cyclotide-like

131 sequences in monocots (Nguyen et al., 2013, Salehi et al., 2017) where they are apparently restricted to the Poaceae family. A pyroGlu modification at the N-terminus was observed in linear cyclotide-like peptides when the cyclotide-like genes, rL1 and pL1, were expressed in N. benthamiana. This modification has been proposed to stabilise the linear peptides, and also observed in some linear cyclotides from dicots (Poth et al., 2012, Hara et al., 1989). These discoveries provide alternative paths to stabilise small peptides and widen the application of cyclisable scaffolds based on cyclotide-like peptides. In addition, this was the first report of the cyclization of the engineered cyclotide-like peptides in planta, including mLA from S. italica and pL1 from P. laxum.

In Chapter 4, production flexibility in a seed system was explored. First, three sunflower HaAEPs were assessed to investigate potential cyclization efficient AEPs. Sequence alignment analysis showed all these HaAEPs are protease-type AEPs, and co-expressing these HaAEPs showed no obvious improvements in cyclising peptides in N. benthamiana leaf. This result supported previous studies of recombinant HaAEP1 which suggested that these HaAEPs might be pH-dependent ligases, with differing activity in seeds versus leaf cells (Elliott et al., 2014, Haywood et al., 2018). Consequently, various engineered cyclic peptides were tested for production in Arabidopsis seeds. A potent plasmin inhibitor based on SFTI-1, [T4Y,I7R]SFTI-1 was detected in seeds in both cyclic and linear forms, even without co-expressing

OaAEP1b. Attempts were made to co-expressing OaAEP1b with peptide precursor genes in order to increase the yield, as was shown previously in N. benthamiana. However, those failed due to the low co-transformation efficiency of cyclic peptide precursors and AEP genes. To circumvent this problem, a homozygous AEP transgenic line exhibiting high expression was created, which would be used in the future as a yield-boosting platform to further explore cyclic peptide production in seeds. All in all, these studies contribute valuable advances for the selection of biofactory hosts, tissue specificity, and the genetic modifications required to produce cyclic peptides efficiently in plants.

5.2. Future directions Important novel insights were gained in this thesis regard to the development of plants as systems to heterologously produce cyclic peptides. Consequently, a number of potential future research directions are outlined in this section to further optimise the production, and understand more about cyclic peptides in plants.

5.2.1. Transformation and production in plants Following this study of cyclic peptides production in rice, Arabidopsis and N. benthamiana, strategies are suggested for gene modifications to production scale-up that would further enhance the production of cyclic peptides. These strategies could include: 1. Further gene modification and expression vector design development, including co-expression of

cyclization and maturation enzymes (e.g. C-terminal processing enzymes, such as OaAEP1b and

132 N-terminal processing enzymes, such as OaRD21A), inserting tandem repeats of cyclic peptide in precursor genes (e.g. kalata type peptides in a multimeric cyclotide domain precursor), infusion of cyclic peptides into endogenous proteins (e.g. glutelin or oleosin), and knock-out endogenous AEPs

(e.g. OsVPE1 and OsVPE2 which showed a significant increase when OaAEP1b was expressed). 2. The rice suspension cell system could be further scaled up in a bioreactor with controlled temperature, medium, and gas exchange (e.g. Sartorius® benchtop bioreactor with a range of 10-20 L for suspension cultures) to achieve high-yield production. Another strategy to improve yield would be to decouple cell growth from production by using metabolic regulatory promoters (e.g. the sugar-sensitive RAmy3D promoter (Corbin et al., 2016)). The coupling of product expression to sugar levels in the bioreactor minimizes the interference of cell growth and enables the production of potentially cytotoxic cyclic peptides. 3. Expand the diversity of peptides tested for expression. It would be particularly interesting to try and produce additional kB1 and MCoTIs analogues, as these were shown to retain cell penetrating ability (Contreras et al., 2011, D’Souza et al., 2014, Henriques et al., 2015), creating the opportunity to produce peptide drugs for intercellular targets. 4. Rice seed produced therapeutic cyclic peptides could be further developed into orally delivered drugs, especially for epitopes involved in immunotherapies, e.g. rheumatoid arthritis, HIV and allergies. 5. The cyclotide-like sequences (e.g. rL1 and mLA) from Chapter 3 are interesting candidates that are need be fully functionally characterised. These intriguing peptides were produced in experiments from this work. In the future they could be utilised in Poaceae cereals to confer a useful function such as in plant defence in rice and maize. Furthermore, they might be engineered to be cyclized in planta. 6. Following the experiments presented here for the production of cyclic peptides in Arabidopsis seeds, various precursor genes of engineered cyclic peptides could be similarly assessed. It was proposed that

this may be accomplished by either crossing with, or transforming into an OaAEP1b stable expression line to facilitate production of cyclic peptides in seeds. In addition, the cyclization efficiency of HaAEPs should be assessed in Arabidopsis seeds, as this may represent a preferred environment for peptide cyclization by these AEPs.

5.2.2. Cyclotide-like peptides in monocots and the monocot bioreactor Monocot cyclotide-like peptides were shown here to be cyclized in vitro and in planta. It would be advantageous to produce these peptides and discover their function. They might be good candidates as pesticides or fungicides that should be characterised further for possible biological applications more broadly. However, the inconsistent chemical synthesis of peptides (e.g. rL1) and their low expression in both native and heterologous plant systems was an obstacle to understanding their function. The lack of adequate starting material hinders functional testing. Despite this limitation, understanding their function would provide information to develop promising candidates into agricultural applications or therapeutic

133 drugs. If no cellular toxicity is observed from these cyclotide-like peptides, they would be ideal candidates for development as cyclization scaffolds for grafting.

Following the expression of monocot cyclotide-like peptides in N. benthamiana, a pyro-Glu post-translational modification was observed at their N-termini. This was hypothesized to assist with peptide stability as it was also reported in other small peptides (Beck et al., 2001). This hypothesis could be tested using monocot cyclotides-like sequences to further clarify the contribution pyro-Glu may have upon the cyclic peptide stability in extremes of heat, pH and proteolytic enzymes.

The bioinformatic analysis presented in the current work for monocot plants together with published reports (Mulvenna et al., 2006a, Salehi et al., 2017, Nguyen et al., 2013) indicate that cyclotide-like genes are restricted to the Poaceae family in monocots. Broader transcriptome sequencing and exploration of cyclotide-like genes in Poaceae species would provide more information on their distribution and sequence diversity. Moreover, developing other cereals with attractive agronomic traits (e.g. high biomass maize) to produce cyclic peptides is another promising direction, as these Poaceae cereals might also possess the machinery to easily produce cyclic peptides.

5.2.3. Cyclization efficient AEPs These studies reported here found that cyclization efficient AEPs increased the yield of cyclic peptides in planta, which is consistent with previous reports (Poon et al., 2017, Jackson et al., 2019). It would be useful to mine cyclization efficient AEPs from a phylogenetically broad range of native cyclotide-producing plants by transcriptome analysis, particularly as such analyses are becoming more routine and cost effective. Alternatively, AEPs from the same species as the cyclic peptides might provide specific processing (e.g. precursor or seed specific) and greatly increase cyclization efficiency. Moreover, highly cyclization efficient AEPs could be designed based on analysis of a variety of cyclization efficient

AEPs (e.g. [C247A]OaAEP1b). It would be innovative to use CRISPR/Cas9 to modify endogenous AEPs through homologous recombination to become cyclization efficient AEPs. In this case, there may be no need to introduce exogenous cyclization efficient AEPs. Instead key regions (e.g. LAD1, LAD2, and MLA) could be edited to states found in the cyclization efficient AEPs.

In conclusion, the work reported in this thesis has provided several important insights into heterologous cyclic peptide production in planta, there remains promise for many exciting future developments.

134 5.3. References D’souza C, Henriques ST, Wang CK, et al., 2014. Structural parameters modulating the cellular uptake of disulfide-rich cyclic cell-penetrating peptides: MCoTI-II and SFTI-1. European Journal of Medicinal Chemistry 88, 10-18. Elliott AG, Delay C, Liu H, et al., 2014. Evolutionary origins of a bioactive peptide buried within preproalbumin. The Plant Cell 26, 981-995. Hara S, Makino J, Ikenaka T, 1989. Amino acid sequences and disulfide bridges of serine proteinase inhibitors from bitter gourd seeds. The Journal of Biochemistry 105, 88-92. Haywood J, Schmidberger JW, James AM, et al., 2018. Structural basis of ribosomal peptide macrocyclization in plants. eLife 7, e32955. Henriques ST, Huang YH, Chaousis S, et al., 2015. The prototypic cyclotide kalata B1 has a unique mechanism of entering cells. Chemistry & Biology 22, 1087-1097. Jackson MA, Yap K, Poth A, et al., 2019. Rapid and scalable plant based production of a potent plasmin inhibitor peptide. Frontiers in Plant Science 10, 602. Nguyen GKT, Lian Y, Pang EWH, et al., 2013. Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. Journal of Biological Chemistry 288, 3370-3380. Poon S, Harris KS, Jackson MA, et al., 2017. Co-expression of a cyclizing asparaginyl endopeptidase enables efficient production of cyclic peptides in planta. Journal of Experimental Botany 69, 633-641. Poth AG, Mylne JS, Grassl J, et al., 2012. Cyclotides associate with leaf vasculature and are the products of a novel precursor in petunia (Solanaceae). Journal of Biological Chemistry 287, 27033-27046. Rehm FB, Jackson MA, De Geyter E, et al., 2019. Papain-like cysteine proteases prepare plant cyclic peptide precursors for cyclization. Proceedings of the National Academy of Sciences, 116, 7831-7836. Salehi H, Bahramnejad B, Majdi M, 2017. Induction of two cyclotide-like genes Zmcyc1 and Zmcyc5 by abiotic and biotic stresses in Zea mays. Acta Physiologiae Plantarum 39, 131.

Note: Credits of images used in page http://labs.mcdb.lsa.umich.edu/labs/pichersky/arabi/arabi.html https://commons.wikimedia.org/wiki/File:Arabidopsis_thaliana_sl2.jpg

135