Understanding : Transcriptomics, Molecular Evolution and Pest Control

Submitted By

Amol Bharat Ghodke

Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy

JUNE 2018

School of Biosciences, Faculty of Science

The University of Melbourne

Melbourne, Australia Abstract

Aphids exhibit fascinating biological features including parthenogenesis, symbiosis, altruism and host-plant preference; all of which would be better understood if genetic tools and molecular biological techniques were applied to them. Aphids are also agricultural pests that vector plant viruses and new approaches to control them are required. This thesis addresses questions motivated by an interest in the biology of aphids and a desire to improve the agricultural impact of aphids. It does so through transcriptomic analyses and RNA interference (RNAi) technology.

I examined the ways in which the transcriptome of changes with host-plant, between tissues, within and between species. The three-aphid species studied (the green peach aphid:

Myzus persicae, the mustard aphid: , and the cabbage aphid: Brevicoryne brassicae) are all pests of economically important brassica crops (such as cabbage, cauliflower, mustard and canola). These data may provide insights into the way different aphid species deal with plant secondary compounds such as glucosinolates. These data also allowed me to examine the structure, function and evolution of myrosinase enzymes that have allowed some aphid species to develop an anti-predator ‘mustard bomb’.

RNAi has been suggested as a way to specifically target pest that would be more

‘environmentally friendly’ than conventional insecticides. I experimentally assessed the feasibility of orally-delivered RNAi to control aphids and the potential of this technology to be developed as a functional genomic tool. RNAi was fed to aphids via artificial diets at various concentrations and with various delivery agents and via transgenic Arabidopsis thaliana plants that I created that produced dsRNA’s corresponding to aphid genes. These studies lead me to suggest that more work needs to be done to limit the effects of RNase enzymes of the aphid gut digesting orally delivered

RNAi and to more carefully characterize factors that may affect within-species variation in RNAi efficacy.

II | P a g e

Declaration

This is to certify that:

1. The thesis comprises only my original work towards the PhD except where indicated.

2. Due acknowledgement has been made in the text to all other material used

3. The thesis is fewer than 100,000 words in length, exclusive of tables, amps,

bibliographies, and appendices as approved by the Research Higher Degrees

Committee.

SIGNATURE

(Amol Bharat Ghodke)

III | P a g e

Preface

The tackling of large problems using science often requires team efforts and so funding agencies often support large international collaborations. The so called ‘Australia: India Grand-

Challenge’ scheme funded a research program within which this PhD was nested. It was entitled Caterpillar and Aphid Resistance in Brassica (CARiB). This was a joint project among six major institutes, three from Australia and three from India. It included The University of

Melbourne, the University of Queensland and the CSIRO from Australia, and the International

Centre for Genetic Engineering and Biotechnology, the National Research Centre of Plant

Biotechnology and the Indian Agriculture Research Institute from India.

The CARiB projects main focus was to develop transgenic cultivars of cabbage, canola, cauliflower and mustard that will be resistant for aphid as well as caterpillars. The project had three main parts, 1) development of new Bt (Bacillus thuringenesis) gene-based control strategy for caterpillars, 2) development of the transgenic plant lines with specific landing sites for targeted gene insertion and 3) the development of RNAi (RNA interference) technology to control aphid species.

This thesis falls with the 3rd objective and focussed on aphids and the development of

RNAi directed against them. The specific aims of the third objective was to find the superior gene targets for RNAi technology that would greatly affect the fitness of the pest aphid species.

The technology development was aimed to achieve universal effectiveness with applicability against different aphid species. Thus, there were several criteria that needed to be considered.

The targeted gene sequences needed to be conserved among the pest aphids, but ideally would minimize the chance of affecting other organisms in the environment. They were to be designed so minimize the chance of RNAi resistance arising and therefore polymorphism within species needed to be considered. Target genes also needed to affect aphid fecundity or viability

IV | P a g e regardless of environmental variation such as host plants. This justified substantial transcriptomic analyses of the pest aphid species.

This thesis describes my contribution to the CARIB project. I performed many RNAi assays and generated two major transcriptome datasets. It also describes how I utilized these data to address some specific questions about the aphid biology. This included characterizing differences between generalist and specialist aphids, and in particular the ways they handled plant-generated secondary metabolite toxicity.

I hope, the reader will enjoy this short scientific tour in the world of aphids and appreciate the need to further improve our understanding about the aphids with greater depth.

V | P a g e

Acknowledgements

This thesis is a result of an arduous, yet a momentous journey of some considerable years of my life. As it happens with every journey, there are a number of people who have been instrumental in shaping the story of this journey. My supervisors, Charles Robin and John Golz along with my committee members, Philip Batterham (Chair), Derek Russell, Owain Edwards have been the biggest constructors of my PhD’s journey. Charlie, you not only guided me as a supervisor but nourished me as a researcher and helped me develop a necessary attitude towards science. John, you introduced me to the world of Arabidopsis and granted me all the freedom in your lab. I am also grateful to Phil Batterham and Derek Russell for their invaluable support and guidance within and beyond the research. Owain Edward, even though we had long distance relationship, it only flourished with time and I had wonderful collaboration experience with you and your group. Dr. Keshav Kranthi and Dr. Sandhya Kranthi, without your encouragement and support I wouldn’t be doing PhD at The University of Melbourne.

Thank you all for timely advice and all the support that you have provided during this period.

You are the people who have still kept my hunger for research and knowledge alive. I cannot thank you enough gentlemen!

I would also like to thank my colleagues at School of Biosciences, Bio21 Institute and

CSIRO. You made this journey more enjoyable for me with your fun loving but supportive attitude. Alex Fournier-level, I feel blessed for your charismatic company during my PhD that helped me explore some unknown territories of science. I am grateful to Nancy Endersby, Jill

Williams and Venessa White, three Wonder-Women who pulled me out from several different apocalyptic situations and helped me without any expectations.

I would like to thank to Ronald Lee, Heng-lip Yeap, Mike Murry, Tom Walsh and

Andrew Warden for their generous advice during my PhD. During my short experience at

VI | P a g e

CSIRO, Canberra, I met my dear colleague Chris Coppin, I am thankful to you Chris for teaching me some of the cool molecular biology methods and tricks.

Huge thanks to my fellow lab members, especially Rob, Sue, Caitlyn, Llewellyn, Paul,

Rebecca S, Pontus, Jack and Kat as well as Hoffman lab Rahul, Rebecca J, who shared my journey with me not just intellectually but emotionally as well. I will forever be thankful to you all for those hilarious moments that made us laugh making this ride a lot more fun. I would like to mention the special contribution of Rob, Rahul, Sue, Alex and Rebecca J who helped me tackle the completely unknown territory, bioinformatics. All the anger, joy, sorrows, I felt during this bioinformatics adventure you all made it worthwhile. This thesis wouldn’t have been possible without you guys.

The story of this journey will especially remain incomplete without mentioning a vast number of friends who always made this tough period of my life a lot happier. The gang of

‘Breakers’ in India, Snehal-Rohit, Tejas-Super, Mona-Gaurav, Rohan-Nikita, Avinash-Rohini,

Sachin, Shweta, you guys pulled me out of my sadness every now and then and helped me stand strong in difficult time. The remote encouragement provided by Dipak, Sachin, Bala sir,

Sandip sir, Varsha. My PhD travel would not have been possible without you guys. Thanks!

Finally, big thanks to my grandfather (Baba), Aai-Dada, Bhau-Mothi aai, Raju kaka –

Kaki, lovely Mama’s and mami’s and all uncles and aunts, Manisha, Kavita Rajshree (Sisters),

Swapnil, Suhas, Sangram and Sanchit (Brothers) and The Ghodke’s for unconditional love, unshakable belief and very long patience. Just thank you is not enough and I will always feel blessed to have you in my life. I hope, I have lived up to your expectations and hope to make this life more fruitful with your support and blessings.

Sincerely

Amol Bharat Ghodke

VII | P a g e

Contents

Abstract ...... II

Declaration ...... III

Preface...... IV

Acknowledgements ...... VI

List of figures ...... XIII

List of Tables ...... XIX

Chapter I ...... 1

Introduction ...... 1

1.1 Introduction ...... 2

1.1.1 Aphids ...... 2

1.1.2 Aphid life cycle ...... 5

1.1.3 Digestive system ...... 9

1.1.4 Bacteriocytes ...... 11

1.1.4.1 Genome analysis of Buchnera aphidicola ...... 13

1.1.5 Genome analysis of aphids ...... 14

1.1.6 Aphid reproduction and B. aphidicola ...... 15

1.1.7 The Buchnera and Aphid symbiosis from an evolutionary perspective ...... 16

1.1.8 Generalists and specialists ...... 17

1.1.9 Evolution of brassica plants ...... 18

1.1.10 Glucosinolates driven Defence mechanism in plants ...... 20

VIII | P a g e

1.2 References ...... 23

Chapter II ...... 29

Transcriptome studies and aphid phylogeny ...... 29

2.1 Introduction ...... 30

2.1.1 Transcriptome study I ...... 33

2.1.2 Transcriptome study II ...... 35

2.1.3 Transcriptome-based phylogenetics ...... 36

2.2 Materials and Methods ...... 38

2.2.1 rearing and maintenance for Experiment 1 ...... 38

2.2.2 Aphid phylogeny analysis ...... 38

2.2.3 Aphid ...... 42

2.2.4 Library preparation and sequencing...... 42

2.2.5 Quality control for NGS data ...... 42

2.2.6 Gene expression analysis ...... 43

2.3 Results ...... 49

2.3.1 Aphid phylogeny ...... 49

2.3.2 Host Range ...... 52

2.3.3 Initial data screening ...... 56

2.3.3.1 Tissue transcriptome data ...... 56

2.3.3.2 Aphid-host transcriptome ...... 61

2.4 Discussion ...... 65

IX | P a g e

2.5 References ...... 69

Chapter III ...... 73

Potential of RNAi technology to control aphids and its limitations ...... 73

3.1 Foreword to manuscript ...... 74

3.2 Manuscript ...... 76

Introduction ...... 78

Results ...... 82

Discussion ...... 89

Methods...... 93

References ...... 100

Acknowledgements ...... 104

Author Contribution Statement ...... 104

Additional information...... 104

Figure legends ...... 105

Figures...... 107

Supporting Information – ...... 113

3.3 Supplementary material ...... 115

Chapter IV ...... 126

Glucosinolate detoxification in generalist and specialist aphid species ...... 126

4.1 Introduction ...... 127

4.1.1 Insect species differ in how they deal with glucosinolates ...... 127

X | P a g e

4.1.2 Sequestration and the Mustard bomb of B. brassicae...... 132

4.2 Material and Methods ...... 136

4.2.1 Insect rearing and maintenance ...... 136

4.2.2 Gene family sequence retrieval ...... 136

4.2.3 Sequence analysis ...... 137

4.3 Results ...... 139

4.3.1 Detoxification Multigene Family gene sets ...... 139

4.3.2 Transcript abundance of GSTs within tissues of different species...... 147

4.3.3 Differences in the number of transcripts with similar type of expression pattern in

different tissues of different aphid species ...... 169

4.4 Discussion ...... 171

4.5 References ...... 184

Chapter V ...... 191

Role of the myrosinase in glucosinolate detoxification and aphid evolution ...... 191

5.1 Introduction ...... 192

5.1.1 Myrosinase in plants ...... 192

5.1.2 Myrosinase in aphids ...... 193

5.1.3 Crystal structure of the aphid myrosinase ...... 194

5.1.4 Interaction of myrosinase and glucosinolate...... 195

5.1.5 Aim and hypotheses ...... 196

5.2 Material and Methods ...... 198

5.2.1 Myrosinase gene search ...... 198

XI | P a g e

5.2.2 Myrosinase isoform search ...... 199

5.2.3 Myrosinase isoform expression analysis ...... 200

5.2.4 Positive selection study ...... 202

5.2.5 Myrosinase structure and docking study...... 202

5.2.6 Identification of the amino acid residues responsible for the substrate specificity 205

5.3 Results ...... 206

5.3.1 Preliminary myrosinase gene search in the transcriptome data ...... 206

5.3.2 Expression analysis of different isoforms of myr genes ...... 213

5.3.3 Tests for positive selection...... 216

5.3.4 3D myrosinase structure analysis...... 223

5.3.5 Substrate docking into the protein model ...... 224

5.3.6 Substrate preference ...... 228

5.3.7 Identification of the amino acid residues involved in substrate specificity ...... 236

5.4 Discussion ...... 238

5.5 References ...... 245

Chapter VI ...... 249

Discussion and future prospectus ...... 249

6.1 General Discussion and future prospectus ...... 250

6.2 References ...... 255

XII | P a g e

List of figures

Figure 1.1.1 The reproductive output of M. persicae gynoparae. Initially sexual females are born. Then males (3) or parthenogenetic females (2) may (or may not-1) be born after the sexual females and subsequently the offspring may alternate between males and parthenogenetic females...... 6

Figure 1.1.2 General life cycle of aphids ...... 8

Figure 1.1.3 The aphid digestive track can be divided in 4 different regions based on the pH. The figure adopted from Cristofoletti et al. (2003). FG: foregut; V1-V4: sections of ventriculus; R: rectum (hindgut). Parentheses refer to averages of at least four determinations (reproducible within 0.2 pH units) carried out in isolated gut contents...... 10

Figure 1.1.4 The location of bacteriocytes cells (red) and Buchnera aphidicola (black dots) within an aphid ...... 12

Figure 1.1.5 The fundamental structure of a glucosinolate compound ...... 19

Figure 1.1.6 Formation of different toxic compounds from glucosinolated by the action of myrosinase ...... 21

Figure 2.3.1 Insect phylogenetic tree showing evolution of aphid species with bootstrap support. Red coloured nodes showing the calibration constraints used for the time tree analysis. Numbers on the branch showing bootstrap values...... 50

Figure 2.3.2 Insect phylogenetic tree showing evolution of aphid species generalists vs specialists. A timetree inferred using the Reltime method [1] and the Dayhoff w/freq. model [2]. The timetree was computed using 7 calibration constraints. The estimated log likelihood value is -457930.9164. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.9148)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 12.2456% sites). The analysis involved 25 amino acid sequences. All positions with less than 95% site coverage were eliminated. That is, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. Red dot points representing nodes with 95% confidence interval for estimated divergent time. There were a total of 27267 positions in the final dataset. Evolutionary analyses were conducted in MEGA7. X-axis – million years ...... 51

Figure 2.3.3 Host Plant species from different plant families associated with four major aphid species (M. persicae, B. brassicae, L. erysimi and A. pisum) ...... 54

Figure 2.3.4 Host plant species from Brassicaceae family associated with four major aphid species (M. persicae, B. brassicae, L. erysimi and A. pisum) ...... 55

Figure 2.3.5 The number of reads in the tissue samples of B. brassicae and L. erysimi. bcyt – bacteriocyte, gut – mid gut, whole – whole body, brevi – B. brassicae and lip – L. erysimi. R1 – forward read, R2 – reverse read, digit in between to underscores represents biological replicate number for each sample. The Y-axis shows millions of reads...... 57

XIII | P a g e

Figure 2.3.6 The number of reads of tissue samples of M. persicae. bcyt – bacteriocyte, mgut – mid gut, whole – whole body. R1 – forward read, R2 – reverse read, digit after tissue code represents biological replicate number for each sample. The y-axis is in millions of reads. .. 58

Figure 2.3.7 BUSCO analysis showing presence of gene groups from transcriptome data of M. persicae, B. brassicae and L. erysimi. n: total number of BUSCO reference gene groups. .... 59

Figure 2.3.8 MDS plot for green – whole body (WB), blue – bacteriocyte (Bcyt), red – mid- gut (MG) tissues, from A) M. persicae, B) B. brassicae and C) L. erysimi ...... 61

Figure 2.3.9 The number of reads from each M. persicae sample fed on different host plants. Strain names: Bun, LUP, C61, Host plants: cab – cabbage, can – canola, cau – cauliflower, lup – lupin, mus – mustard, rad – radish. The digit in each name represents the biological replicate of the sample. The y-axis is in millions of reads...... 63

Figure 2.3.10 MDS plot based on Voom expression values showing differential behaviour off three M. persicae strains when fed on different host plants. Host plants: cab-cabbage, cau- cauliflower, mus-mustard, can-canola, lup-lupin, rad-radish. M. persicae strains: Bun-Bunbury strain, C61 – Victorian strain, LUP – Lupin strain...... 64

Figure 4.1.1 Process of desulfonation. The glucosinolate present in the gut gets desulfonated by sulfatase enzyme. The desulfoglucosinolte cannot react with myrosinase. The glucosinolate does not react with myrosinase hence do not produce toxic products. The and myrosinase action (adapted from (Ratzka et al. 2002)); Glu- Glucose molecule ...... 128

Figure 4.1.2 Mechanism of glucosinolate detoxification in Plutella xylostella, Bemisia tabaci and the larva of the sawfly Athalia rosae showing upregulation of the sulfatases enzyme in the MG that removes the sulfate group from the glucosinolate molecule and stops the synthesis of toxic products like isothiocyanates...... 129

Figure 4.1.3 Mechanism of Glucosinolate detoxification in Pieris brassicae or H. virescens. Showing the upregulation of the CYP P450s, GSTs, CCEs and ABC transporters in lepidopteran capable of feeding on brassica family plants...... 130

Figure 4.1.4 Interaction of glucosinolate and myrosinase enzyme in B. brassicae ...... 133

Figure 4.2.1 Representation for In-paralogues explanation Image from - https://genomevolution.org/wiki/index.php/File:Otu.png ...... 137

Figure 4.3.1 Blast hits for the A. pisum GSTB (XP_008188180.1) sequence showing maximum similarity with gram negative bacteria (Family: Enterobacteriaceae) sequences suggesting possible bacterial sequence contamination in the A. pisum genome assembly...... 140

Figure 4.3.2 Maximum-likelihood tree showing manually annotated GST complements of M. persicae and A. pisum. D. melanogaster GST sequences were used for classification purpose. The tree also contains M. persicae GST related transcripts from the ‘corset’ assembly. Tree was rooted with D. melanogaster GST-zeta...... 143

Figure 4.3.3 Maximum-likelihood unrooted tree showing manually annotated GST complements of M. persicae and GST transcript sequences retrieved from corset analysis of RNAseq data...... 144

XIV | P a g e

Figure 4.3.4 Maximum likelihood tree showing relationship between GST-like transcripts generated from tissue transcriptome dataset of M. persicae, B. brassicae, L. erysimi. A. pisum sequences were generated by manual annotation on genome scaffolds. D. melanogaster sequences were used a reference for GST classification. The red square highlights the GST- epsilon clade and illustrates that they are not present in aphids...... 146

Figure 4.3.5 Glutathione S-transferases (GST) cladogram showing relationship of expressed GST clusters from M. persicae, B. brassicae and L. erysimi with A. pisum and D. melanogaster’s annotated GST types. Bar graph on the right showing Expression level of GST transcripts in MG compared to WB tissue. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi. Red box showing clusters exclusively showing down regulation in M. persicae MG but upregulated in B. brassicae and L. erysimi only. Purple box showing clusters upregulated in only M. persicae MG but down regulated in B. brassicae and L. erysimi MG...... 148

Figure 4.3.6 Cladogram showing Cyp P450 related transcripts classification based on the well characterised P450 genes from Drosophila melanogaster. Red- Cyp6; green-Cyp315; Brandeis blue-Cyp49; Bright green-Cyp301; Blue-Cyp306; Fluorescent blue-Cyp4g; purple-Cyp305a; dark olive green-Cyp303a; dark salmon-Cyp4aa; Dark orange-Cyp18a ...... 152

Figure 4.3.7 Cytochrome P-450s (p450s) dendrogram showing relationship of expressed P450 clusters from M. persicae, B. brassicae and L. erysimi with D. melanogaster’s annotated P450 types. Bar graph on the right showing Expression level of p450s transcripts in MG compared to WB tissue. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi. Red box indicating clusters that shows opposite expression patterns in the MG and Bcyt tissues of M. persicae, B. brassicae and L. erysimi...... 158

Figure 4.3.8 Sulfatase transcripts phylogenetic tree showing their differential expression level in MG compared to WB in three aphid species. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi...... 159

Figure 4.3.9 Quinone reductase transcripts phylogenetic tree showing their differential expression level in MG compared to WB in three aphid species. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi ...... 160

Figure 4.3.10 Cladogram showing different categories of ABC transporter like transcripts from M. persicae, B. brassicae, L. erysimi. The categories created based on the well annotated ABC transporter sequences from D. melanogaster. Blue-ABCA; cyan-ABCB; fluorescent green- ABCC; fluorescent yellow-ABCD; green-ABCE; copper-ABCF; red-ABCG; yellow-ABCH ...... 163

Figure 4.3.11 ABC Transporter transcripts phylogenetic tree showing their differential expression level in MG compared to WB in three aphid species. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi ...... 168

Figure 4.4.1 Proposed model for molecular mechanism of glucosinolate detoxification in specialist aphid species. GST-D and ABCC are predicted to be involved in the detoxification of aliphatic glucosinolate whereas Cyp6 and ABCB genes are hypothesized to be involved in the indolic glucosinolates detoxification...... 181

XV | P a g e

Figure 4.4.2 Proposed model for molecular mechanism of glucosinolate detoxification in generalist aphid species. GST-D and ABCC showed downregulation (grey color) in the MG tissue. Suggesting the reason for excretion of aliphatic glucosinolates in the honeydew without processing. Whereas Cyp6 and ABCB genes are hypothesized to be involved in the indolic glucosinolates detoxification...... 182

Figure 5.1.1 Crystal structure of a myrosinase monomer from B. brassicae (PDB – 1WGC) orientated to show the active site location. A) A ribbon diagram showing the architecture of a protein monomer with alpha helices and beta sheets, B) A space-filling model showing a map of the hydrophobic (red) and hydrophilic (blue) surfaces of the Myr protein...... 194

Figure 5.1.2 Representative chemical structures of three different types of glucosinolate, A) Aliphatic glucosinolate, B) Indolic glucosinolate, C) Aromatic glucosinolate ...... 195

Figure 5.2.1 Flowchart explaining the retrieval of myr gene. Pipeline was performed individually on each species dataset ...... 200

Figure 5.2.2 Bioinformatics pipeline used for the myr gene targeted analysis ...... 201

Figure 5.3.1 Preliminary phylogenetic tree showing different myrosinases retrieved from the differential gene expression study ...... 208

Figure 5.3.2 Mapping of B. brassicae reads against the well characterized aphid mustard bomb myrosinases. The mapping of all the reads from WB, visualized with the IGV software, are shown in the top panel. The MG reads are in the middle panel and the Bcyt in the bottom panel. Each read is represented as a thick grey pointed line and any nucleotide differences compared to the reference are represented as coloured line on read. Read coverage is also graphed for every sample with label “coverage”. The WB sample fastq reads show the presence of reads similar to the Bbre_myr gene (reference), the reads from MG and Bcyt sample show completely diverged myr isoforms...... 209

Figure 5.3.3 Mapping of L. erysimi reads against the well characterized aphid mustard bomb myrosinases. The mapping of all the reads from WB, visualized with the IGV software, are shown in the top panel. The MG reads are in the middle panel and the Bcyt in the bottom sample. Each read is represented as a thick grey pointed line and any nucleotide differences compared to the reference are represented as coloured line on read. Read coverage is also graphed for every sample with label “coverage”. Whereas the WB sample fastq reads showing presence of the reads similar to the Bbre_myr gene (reference) with species specific changes with full coverage, the reads from MG and Bcyt sample showing completely diverged myr isoforms...... 210

Figure 5.3.4 Mapping of M. persicae reads against the well characterized aphid mustard bomb myrosinases. The mapping of all the reads from WB, visualized with the IGV software, are shown in the top panel. The MG reads are in the middle panel and the Bcyt in the bottom sample. Each read is represented as a thick grey pointed line and any nucleotide differences with the reference are represented as coloured line on read. Read coverage is also graphed for every sample with label “coverage”. The reads from MG, Bcyt and WB sample showing no read coverage in any tissue sample suggesting missing no gene expression...... 211

Figure 5.3.5 Phylogenetic tree of glycoside hydrolase family sequences showing distribution of all the myrosinase like isoforms found in B. brassicae, L. erysimi and M. persicae. Five

XVI | P a g e clades were formed with well characterised Bbre_myr-like sequence forming clade 1 with similar sequences from other aphid species...... 212

Figure 5.3.6 Phylogenetic tree with differentially expressed myrosinase-like sequence isoforms of M. persicae, B. brassicae and L. erysimi with myrosinase and glycoside hydrolase family sequences from different insects showing five different clades. Two myr-like isoforms found in B. brassicae are located in clade 1...... 215

Figure 5.3.7 Myrosinase alignment of clade 1 and clade 2. Residues with back background showing position of the glucose binding site, residues with blue background showing position of aglycone binding site, residues with green background showing sites found under positive selection based on MEME analysis. Residues highlighted with red star showing positions altered in In-silico mutation experiment. Red highlighted amino acids represents conserved residues across all species examined...... 220

Figure 5.3.8 Myrosinase alignment of clade 1. Residues with back background showing position of the glucose binding site, residues with blue background showing position of aglycone binding site, residues with green background showing sites found under positive selection based on MEME analysis. Residues highlighted with red star showing positions altered in In-silico mutation experiment. Red highlighted amino acids represents conserved residues across all species examined...... 222

Figure 5.3.9 The graph showing structural differences between myr clade 1, 2, 3 and 4. Graph showing Clade 1 and 2 are profoundly diverged from clade 3 and 4 based on RMSD values...... 224

Figure 5.3.10 The overlaid image of the docking experiment performed using the artificial substrate, CGTi and the protein 1W9B of S. alba and the published docking state of the 1W9B protein from the PDB database. The near perfect superimposition of two substrates, (PDB structure in blue, and docked pose in red) shows the validity of docking method used in this study...... 226

Figure 5.3.11 The overlaid image of the docking experiment performed using the artificial substrate, CGTi and the protein brevi_wb_2399 of B. brassicae and the published docking state of the 1W9B protein from PDB database. The close resemblance of the two superimposited substrates blue from PDB structure and red docked pose suggesting validity of the docking method used in this study...... 227

Figure 5.3.12 The docking of GRMi and GBMi with Bbra_1a. a) The alignment faces the glucose molecules (yellow arrow) of the reference substrate, CGTi (green), and glucose of GRMi (red) towards glucose binding site (black residues), while the aglycone of the glucosinolate molecule points towards the aglycon binding site (blue residues). b) The alignment faces the glucose moity of CGTi (green) towards the glucose binding site (black residues) whereas glucose molecule (yellow arrow) of GBMi (red) does not align with the glucose molecule of reference molecule, CGTi...... 231

Figure 5.3.13 The docking of GRMi and GBMi on Bbra_2. a) Showing orientation of the glucose and aglycone part of GRMi (red) pointing towards their respective binding sites (glucose binding site – black, aglycone binding site – blue). b) Showing alignment of the glucose moity of reference substrate, CGTi (green) facing towards glucose binding site (black coloured residues) and glucose of GBMi (red) aligning to the glucose group of reference

XVII | P a g e molecule, CGTi. Yellow arrow showing position of the glucose molecule of CGTi and GBMi...... 231

Figure 5.3.14 The docking of GRMi and GBMi on Lery_1. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) pointing towards their respective binding sites (glucose binding site – black, aglycone binding site – blue) b) Showing alignment of the glucose moity of reference substrate, CGTi (green) facing towards glucose binding site (black coloured residues) and glucose of GBMi (red) aligning to the glucose group of reference molecule, CGTi. Yellow arrow showing position of the glucose molecule of GBMi...... 233

Figure 5.3.15 The docking of GRMi and GBMi on Lery_2. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) facing in the middle of glucose binding site – black, aglycone binding site – blue. b) Showing alignment of the glucose moity of reference substrate, CGTi (green) facing towards glucose binding site (black coloured residues) but glucose of GBMi (red) pointing towards aglycone binding site, yellow arrow showing position of the glucose molecule of GBMi...... 233

Figure 5.3.16 The docking of GRMi and GBMi on Mper_1. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) pointing towards their respective binding sites (glucose binding site – black, aglycone binding site – blue) b) Showing alignment of the glucose molecule of GBMi (red) aligning to the glucose binding site (showed in black). Yellow arrow showing position of the glucose molecule of GRMi and GBMi. ... 235

Figure 5.3.17 The docking of GRMi and GBMi on Mper_2. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) pointing towards their respective binding sites (glucose binding site – black, aglycone binding site – blue) b) Showing alignment of the glucose molecule of GBMi (red) aligning to the glucose binding site (showed in black). Yellow arrow showing position of the glucose molecule of GRMi and GBMi. ... 235

Figure 5.3.18 The docking of GBMi on Bbra_1a and in-silico mutated version of Bbra_1a. a) the docking poses of GBMi ligand with original Bbra_1a showing orientation of the glucose (indicated by yellow arrow) of GBMi ligand (red) not showing its orientation towards its expected glucose binding site, whereas the glucose molecule of reference is oriented to glucose binding site (glucose binding site – black, aglycone binding site – blue). b) the docking poses of GBMi with in-silico mutated Bbra_1a showing alignment of the glucose molecule of GBMi (red) aligning to the glucose binding site (showed in black). Yellow arrow showing position of the glucose molecule of reference (green) and GBMi (red) ...... 237

Figure 5.4.1 The picture showing hydrophobic surface map of myrosinase enzyme from A) Brevicoryne brassicae and B) Myzus persicae. The picture clearly shows high hydrophobic region within active site of the M. persicae as compared to the active site of the B. brassicae. The area highlighted with black rectangle shows the location of active site. The red colour shows very high hydrophobic region, white region shows neutral region and blue colour region shows the hydrophilic region...... 244

XVIII | P a g e

List of Tables

Table 2.1.1 Summary of different transcriptome studies performed on aphids...... 31

Table 2.2.1 Insect species used for phylogenetic analysis with dataset reference ...... 40

Table 2.2.2 The number of transcripts generated by the Trinity analysis and the predicted gene numbers for each insect species ...... 44

Table 2.3.1 The total number of up and down regulated transcripts found in bacteriocyte and midgut tissue samples of M. persicae, B. brassicae and L. erysimi. The whole-body sample was used as a reference for comparison. Log fold change (LFC) cut off used: 2 LFC and -2 LFC...... 60

Table 4.3.1 Gene loss and gain events in GST gene family within M. persicae and A. pisum...... 142

Table 4.3.2 The number of GST genes in aphid species classified by subfamily...... 145

Table 4.3.3 Log fold change (LFC) values for all the clusters from the transcriptome data of M. persicae, B. brassicae and L. erysimi showing similarity with GST ...... 149

Table 4.3.4 The number of Cyp P450 genes in aphid species classified by subfamily...... 151

Table 4.3.5 Log fold change (LFC) values for all the clusters from the transcriptome data of M. persicae, B. brassicae and L. erysimi showing similarity with P450s. #N/A – transcript not differentially expressed...... 155

Table 4.3.6 Log fold change (LFC) values for all the clusters from the transcriptome data of M. persicae, B. brassicae and L. erysimi showing similarity with sulfatases ...... 159

Table 4.3.7 Log fold change (LFC) values for all the clusters from the transcriptome data of M. persicae, B. brassicae and L. erysimi showing similarity with quinone reductase ...... 160

Table 4.3.8 The number of ABC transporter genes in aphid species classified by subfamily...... 162

Table 4.3.9 The total number of ABC transporters showing enrichment in the MG and Bcyt tissues of the aphid species ...... 162

Table 4.3.10 Log fold change (LFC) values for all the clusters from the transcriptome data of M. persicae, B. brassicae and L. erysimi showing similarity with ABC transporters ...... 165

Table 4.3.11 Table showing number of genes from each sample of each species showing transcript abundance as compared to WB samples. Numbers in the bracket representing transcripts showing upregulation in that tissue but downregulation in other tissue of same species...... 170

Table 5.2.1 All the Myrosinase-like genes used in myrosinase tree formation with NCBI accession numbers...... 198

XIX | P a g e

Table 5.3.1 Differentially expressed transcripts from M. persicae, B. brassicae and L. erysimi in the MG and Bcyt tissues. Values are in LFC (Log Fold Change) of transcripts in the tissues as compared to WB tissue sample. All LFC numbers shows significant changes (p<0.05). 214

Table 5.3.2 The affinity score generated between ligand and Myr protein from B. brassicae, L. erysimi and M. persicae. Affinity score in kcal/mol ...... 229

XX | P a g e

Chapter I

Introduction

1.1 Introduction

“In a season the potential descendants of one female aphid contain more substance than 500 million stout men” – Thomas Huxley (1858)

The incredible power of reproduction is undoubtedly the main reason behind the success of aphid as an organism. Aphids are emerging as primary pests for several different crops and thus pose a major to the agricultural sector. Aphid control is even more complicated due to the group's diversity, with more than 4400 aphid species known. Although only a few hundred causes actual agricultural damage, it is important to understand their biology to manage emerging pests. Comparative study is a powerful way to understand similarities and differences between species. This thesis will take a reader on a short journey to understand three economically important aphid species, Myzus persicae, Brevicoryne brassicae and Lipaphis erysimi. The result chapters of this thesis will try to address several aphid-related questions, covering agricultural applications as well as questions related to the biological similarities and differences between these aphid species. This thesis will try to understand the potential of RNA interference (RNAi) to control aphids in agricultural settings, and then to shed light on the categorisation of the generalist and specialist aphid species using transcriptome analysis. It will also attempt elucidate molecular evolution underpinning a particular interesting adaptation of the two specialist aphid species and that is their adoption of a ‘mustard bomb’ as an anti-predator defence mechanism. Finally, this thesis will conclude by commenting on the possible future studies that will be important to broaden our understanding about these economically and biologically amazing insects.

1.1.1 Aphids

Aphids have a typical insect body structure with distinct head, thorax and abdominal sections. The head has two compound eyes and two antennae for sensory purposes. The body

2 | P a g e is oval shaped and soft and two special tube-like structures called cornicles or siphuncles occur on the posterior side of the abdomen. These cornicles have a role in aphid defence by secreting chemicals like triacylglyceroles (Kennedy and Stroyan 1959). Perhaps the most elaborate external structures are the aphid mouth parts which are used to pierce plant tissue and then suck plant phloem.

Although this piercing and sucking behaviour of aphids is not as destructive as the chewing behaviour of other insects (e.g. caterpillars), they can cause significant damage to healthy plants because of the huge number of aphids that can infest a given plant. Such infestations can occur because of the fast, asexual clonal reproduction (also known as parthenogenesis) which can take place in aphids. However, feeding is not the only way that aphids destroy plants: they are also the major vectors of several different plant viruses. In fact, crop loss due to aphid-vectored viruses is believed to be greater than that caused by direct feeding damage. In Australia alone, the estimated economic loss caused by all major aphid species (Myzus persicae, Aphis craccivora, Acyrthosiphon kondoi and Aulacorthum solani) across all grain crops was calculated to be around $240.5 million/year due to aphid feeding pressures and $481.5 million/year due to the vectoring of viruses (Valenzuela and Hoffmann

2013, Valenzuela and Hoffmann 2015).

The aphid species of interest in this thesis are three pests of Brassicaceae crops: Myzus persicae, Lipaphis erysimi and Brevicoryne brassicae. They can damage crops such as canola to the extent that 33% of a crop is lost. In Australia, such losses are typically due to B. brassicae and L. erysimi (Berlandier et al. 2010). The loss is greater if the infestation starts during flowering or budding stage and has been estimated to cost around $63.8 million/year across

Australia (Valenzuela and Hoffmann 2015). In contrast, M. persicae has the ability to cause huge losses in the early life stages of plants (Edwards et al. 2008). The Beet Western Yellow

3 | P a g e

Virus (BWYV) is one of the major viruses causing damage to canola crops. M. persicae is considered as major vector, and B. brassicae and L. erysimi minor vectors, for BWYV.

For several different crops, aphid attack is of secondary importance, because lepidopteran insects with their destructive feeding cause greater immediate loss. To avoid this loss scientists developed transgenic crops using bacterial genes from Bacillus thuringiensis.

These genes encode for crystal proteins and are hence named cry genes. The Cry protein is an endotoxin that paralyses the lepidopteran insect gut and eventually kills the caterpillar due to starvation. However, after the introduction of genetically modified crops like Bt (Bacillus thuringiensis) cotton to control lepidopteran insects, there is less competition for food between lepidopterans and hemipterans. This has benefited aphids overall and now aphids have emerged as primary pests (Zhao et al. 2011, Hagenbucher et al. 2013). As Bt-brassica plants are being developed it is possible that aphids will increase in importance to brassica crop yields in years to come.

Until now, the most effective way of controlling aphids has been the use of chemical insecticides. Several different contact-based insecticides do not show consistent efficacy against aphids, partly because aphids are located underneath leaves where insecticide spray cannot easily reach (Edwards et al. 2008). Nonetheless several different types of insecticides are being used intensively against aphids in the agriculture sector. Several aphid populations have now developed resistance in different regions of the world, e.g. A. gossypii (O'Brien and

Graves 1992), A. pisum (Sadeghi et al. 2009), L. erysimi (Chen and Wu 2005). Anthon (1955) first reported development of resistance in M. persicae against organophosphate insecticides, and since then this species has also developed resistance against synthetic pyrethroids, carbamates and neonicotinoids (Bass et al. 2014, Watt 2016). Chemical pesticides are well known for their deleterious effects on the environment, human health, food and water quality and soil health, as well as their effects on non-target organisms (Aktar et al. 2009). This

4 | P a g e situation demands a new technology for aphid control that will minimally affect the environment, avoid toxicity to humans and other organisms, and be less prone to resistance development. We are best placed to find such a technology if we first understand the biology of aphids. The following sections will explain important and interesting aspects of aphid biology that could help us to develop a future control strategy.

1.1.2 Aphid life cycle

Most aphid species are host-specific, meaning they prefer to feed only on plants from a specific family. But there are also a few species, such as Myzus persicae, that can feed on hosts from multiple plant families. A minority of aphid species have one primary host on which they spend autumn, winter and spring and a secondary host during summer. These aphids are categorised as heteroecious (having a primary and secondary host), in contrast with monoecious species with a single host plant species. The secondary and primary hosts are rarely closely related; the secondary is often herbaceous and the primary woody. While woody trees are believed to be the ancestral aphid host, the herbaceous secondary hosts are thought to be nutritionally superior to the primary host. The primary host, most often, serves as a place to survive through harsh conditions, largely by changing the mode of reproduction. One experimental aphid species used in this thesis, M. persicae, is heteroecious, whereas the other two aphid species, B. brassicae and L. erysimi are monoecious. The following section will explain different modes of reproduction present among aphids, namely the holocyclic and anholocyclic life cycles. The description provided focuses on the well-studied life cycle of M. persicae, which undergoes both type of life cycles depending on its location.

Holocyclic life cycle

This life cycle is observed in areas where cold winters (<4˚C) occur. Gynoparae

(parthenogenetic females that produce sexual females) migrate to their primary host (in the case of M. persicae it is Prunus persicae, the peach) for production of sexual females (Hille

5 | P a g e

Ris Lambers 1946). It has been suggested that 5-15 oviparae (sexual females) are produced per gynopara. The gynoparae also produce sons which are typically apterous (lack wings) but are occasionally alate (winged). The development of males from gynoparae is triggered by day length. There are three different patterns (Figure 1.1.1) that have been observed in production

M. persicae males and females (Hales and Mittler 1988).

Figure 1.1.1 The reproductive output of M. persicae gynoparae. Initially sexual females are born. Then males (3) or parthenogenetic females (2) may (or may not-1) be born after the sexual females and subsequently the offspring may alternate between males and parthenogenetic females.

These patterns are differentiated mainly by the production of different types of aphids: male, sexual female and parthenogenetic female. In the first pattern, no parthenogenetic females are produced, whereas a small number are produced in the other two. The second type produces parthenogenetic females mostly in the middle of their life cycle, while in the third

6 | P a g e type they are mostly produced at the end. Also, the number of parthenogenetic females produced with the second reproductive pattern is higher than with the third. Sexual females are always produced at the beginning of reproduction, though their number varies according to reproductive pattern.

The sexual females mate with males and because they lay eggs they are called oviparae.

Eggs laid by oviparae immediately go into a diapause state which helps them to become extremely cold resistant; the eggs will not hatch at winter ambient temperatures. Each female lays 50-60 viable eggs. The hatching of eggs is nearly synchronised with the vegetational phenology of the peach (Ronnebeck 1950, Ronnebeck 1952) which allows the newborn

‘fundatrix’ to feed on swelling buds (Ward 1934). Winged (alate) individuals form when the population density crosses a threshold and then they travel to different secondary host plants.

The development of an M. persicae colony on a secondary host is complex and depends on different environmental factors including suitable temperature, availability of the preferred host and presence of predators. Each alate aphid deposits around 20 offspring (Broadbent 1949) across 7 to 10 different plants. The offspring that develop on these plants are always apterous

(wingless). These apterae then reproduce parthenogenetically for several generations until the next winter (Emden et al. 1969); (Figure 1.1.2).

7 | P a g e

Anholocyclic life

Figure 1.1.2 General life cycle of aphids

Anholocyclic life cycle

In this type of life cycle, parthenogenesis is the only form of reproduction. Aphids can survive when mean temperature remains above 4˚C (Heie and Petersen 1961). M. persicae shows an anholocyclic life cycle in warm-temperate to tropical regions. The presence of the primary host also has an impact on the determination of aphid life-cycle, although it has been reported that in temperate regions even the presence of the primary host fails to induce holocycly. Despite generations of anholocyclic reproduction, some individuals within the population can retain the ability to produce sexual forms (Daiber and Schöll 1959) (Figure

1.1.2).

The aphid life cycle is not possible without proper nutrition. Also, an aphid's nutrition will determine how well it performs in terms of reproductive output, and hence nutrition will

8 | P a g e at one level dictate the extent to which aphids can damage a crop. One of the main drivers of aphid biodiversity is their utilization of food resources. To understand this, we therefore need to understand the aphids nutritional and digestive systems.

1.1.3 Digestive system

Aphids have piercing and sucking mouth parts which are ideal to feed on a liquid diet of plant phloem sap. The main component of phloem sap is sucrose, with other carbohydrates like raffinose, stachyose and verbascose also present. Plant phloem sap also contains several non-essential amino acids (phenylalanine, threonine, histidine, arginine, tryptophan, methionine, leucine, isoleucine, valine) and inorganic components such as sodium (Na), phosphorous (P), sulfur (S), potassium (K) etc. Plant phloem sap usually contains low levels of proteins (Fukumorita and Chino 1982).

Although gut morphology is simple in aphids, after entering the aphid’s gut, sap goes through a complex digestive environment. The pH of the digestive tract varies between sections

(Figure 1.1.3). The aphid gut differs from that of most other insects in containing a network of apical lamellae where other species possess organised structures of microvilli. This network is thickest in ventricle one, and becomes progressively thinner through ventricles two, three and four (Cristofoletti et al. 2003).

9 | P a g e

Figure 1.1.3 The aphid digestive track can be divided in 4 different regions based on the pH.

The figure adopted from Cristofoletti et al. (2003). FG: foregut; V1-V4: sections of ventriculus;

R: rectum (hindgut). Parentheses refer to averages of at least four determinations

(reproducible within 0.2 pH units) carried out in isolated gut contents.

Considering the nature of aphid food, the main challenge that the aphid digestive system

faces is undoubtedly the exceptionally high concentration of sugar in the diet. The high sugar

content of the phloem sap results in high osmotic pressure (2-5 MPa), which can be a

dehydration threat (Downing 1978, Wilkinson et al. 1997, Fisher et al. 2000, Ponder et al.

2000). Aphids are equipped with two protein classes to face this situation, aquaporins (AQP)

and sucrases (SUC) (Shakesby et al. 2009). AQP are water channels which allow the movement

of water through membranes in several different physiological processes across a diversity of

organisms including bacteria, plants and (Engel and Stahlberg 2002). They not only

allow the passive movement of the water across the membrane but also enable small solutes

like ammonia, anions and CO2 to diffuse. An in-depth study performed by Shakesby et al.

(2009) showed an almost 3-fold upregulation of AQP transcripts in the gut as compared to

bacteriocytes (Bcyt), embryo, fat body and head in M. persicae. Specifically, the AQP was

mainly expressed in the V1 and V4+R region. Shakesby et al. (2009) reported that RNA

interference could be used to silence AQP transcripts, leading to elevated osmotic pressure

compared to the control treatment.

10 | P a g e

The other mechanism involved in the balancing of osmotic pressure is sugar transformation by sucrase (SUC). SUC shows differential expression throughout the V2-V4 region of the A. pisum gut (Price et al. 2007). This enzyme is involved in the breakdown of the sucrose present in the phloem sap and produces monosaccharide sugars like glucose and fructose. The glucose is then used to synthesize oligosaccharides which help to reduce the osmotic pressure (Cristofoletti et al. 2003, Price et al. 2007). Three different genes with roles in sugar metabolism and maintenance of osmotic pressure, AQP, SUC and sugar transporters

(ST), were silenced simultaneously in M. persicae using RNAi technology. The treatment resulted in the significant elevation of osmotic pressure. The change in osmotic pressure in M. persicae then significantly affected the weight and fecundity of the insects (Tzin et al. 2015).

The phloem sap of the majority of plants contains little proteinaceous material (Ziegler

1975), and plant phloem sap does not contain all 20 amino acids (Douglas 2006). Aphids naturally struggle for essential amino acids and for a source of nitrogen. To remedy this situation, aphids have evolved a special partnership. The obligate symbiotic bacteria, Buchnera aphidicola, supplies all the essential amino acids to aphids. These bacteria live in specialized cells in the aphid body called bacteriocytes.

1.1.4 Bacteriocytes

A defining feature of aphids is their symbiotic relationship with the intracellular bacterium B. aphidicola. B. aphidicola are housed in bacteriocytes which are clustered in two multicellular lobes in the abdominal cavity (Figure 1.1.4) that extend along the midgut and then join above the hindgut.

B. aphidicola is a member of the phylum Proteobacteria, which is a group of bacteria that includes the well-known Escherichia coli. B. aphidicola provides aphids with essential amino acids that are lacking in the plant phloem sap (Shigenobu et al. 2000, Baumann 2005).

11 | P a g e

There are 60-80 bacteriocytes present in a single aphid and each contains thousands of

Buchnera cells (Wilkinson and Douglas 1998). Interestingly, there are higher numbers of bacteriocytes in aphid embryos than in adults.

Figure 1.1.4 The location of bacteriocytes cells (red) and Buchnera aphidicola (black dots)

within an aphid

Most of the bacteriocytes are spherical, but bacteriocytes located in close proximity to the embryo and other structures in the aphid haemocoel are irregular in shape (Wilkinson and

Douglas 1998). Buchnera is present in almost all aphid species, but some exceptional species possess other symbionts such as fungi (Novakova et al. 2013, Vogel and Moran 2013). Further analysis of B. aphidicola will provide information about the molecular, cellular and inter- cellular mechanisms that underlie this symbiotic association. Perhaps this symbiosis offers an

Achilles heel that could lead to novel aphid control strategies.

12 | P a g e

1.1.4.1 Genome analysis of Buchnera aphidicola

The B. aphidicola genome was first sequenced in 2003 (van Ham et al. 2003). To date there are four Buchnera genome sequences available at www.buchnera.org. The average size of a B. aphidicola genome is 450-641kb and 83 – 88% of the genome represents the coding region for 504-545 proteins. There are two plasmids present in a B. aphidicola cell along with its circular genomic DNA. The plasmid codes for genes involved in leucine biosynthesis

(Bracho et al. 1995), while trpEG codes for enzymes regulating tryptophan biosynthesis (Lai et al. 1994). Genes present in B. aphidicola are involved in several functions essential for aphids, making their symbiotic association necessary. The B. aphidicola genome codes for enzymes that synthesize essential amino acids required by aphids due to their absence from phloem sap. In contrast, genes responsible for the synthesis of non-essential amino acids are absent from B. aphidicola (Shigenobu et al. 2000). Several metabolic pathways are split between B. aphidicola and aphids, as neither has the full gene complement needed to produce and process certain essential metabolites. For example, glutamine is re-circulated in aphids to provide nitrogen to B. aphidicola cells, allowing production of essential amino acids, while pantothenate is synthesized in B. aphidicola but metabolized to pantothenate-Coenzyme A in aphids.

Although several metabolites are synthesized by B. aphidicola and transported to the aphid, there are limited numbers of transporter genes present in the endosymbiont genome.

However, B. aphidicola has maintained complete gene sets for fundamental metabolic pathways like glycolysis, cysteine biosynthesis and the pentose phosphate pathway (except the

TCA cycle). It is also important to note that the B. aphidicola genome has been prone to a relatively high mutation rate during evolution. The B. aphidicola genome has a relatively small number of immune genes (such as restriction enzymes and methylation machinery) and yet

13 | P a g e there is little evidence of phage activity, presumably because its cytosolic location and endosymbiotic relationship provide it with sufficient protection.

1.1.5 Genome analysis of aphids

To date six aphid genome sequences are available on www.bipaa.genouest.org/is/aphidbase. The first aphid species to have its genome sequenced was Acyrthosiphon pisum (the pea aphid). It was selected as a model aphid by the International

Aphid Genomics Consortium (IAGC 2010) because it has low chromosome number (n=4), a medium sized genome and is a well-studied aphid species in terms of physiology and symbiosis

(Tagu et al. 2010). The total size of the genome assembly is 464 Mb. The GC content is 29.6% which is lower than the average GC content of any insect sequenced to date. At the time of sequencing, only 30% - 55% of the pea aphid genes were shared with other characterized insects. The International Aphid Genomics Consortium reported extensive gene duplication in

A. pisum, with 2,459 gene families showing duplication. The aphid has also lost genes relative to other insects, including those required to synthesize selenoproteins, which is a pathway conserved in almost all other animals (Chapple and Guigó 2008).

The genome of the pea aphid provides some interesting signs of the symbiotic relationship with B. aphidicola. For instance, there are cases of apparent lateral gene transfer from prokaryotes to aphids. There is a total of 12 genes in aphids showing similarity with LD- carboxypeptidase–A (Ldc-A), N–acetylmuramoyl–L–alanine amidase (AmiD) and rarelipoprotein–A (RlpA), which are of bacterial origin (Nakabachi et al. 2005, Nikoh and

Nakabachi 2009).

Having obligate bacterial symbionts within aphid cells makes them a special organism but raises questions about their immune system. How can bacteria survive in the presence of the aphid’s immune system? Genome analysis revealed that there are immune system pathways

14 | P a g e that are missing from the aphid’s genome. For example, members of the immunodeficiency

(IMD) pathway, namely IMD, dFADD, Dredd and Relish, are missing from aphids.

Peptidoglycan recognition protein, a protein important to the detection of pathogens which triggers IMD and has a role in the Toll pathway is also missing from aphids. In a proteomic study of bacterial- and fungal-challenged aphids some antimicrobial peptides (AMP), which are usually found in other insects after infection, were completely absent (Altincicek et al.

2008, Gerardo et al. 2010). This major loss of immune system components in aphids has been attributed to their microbe-free natural diet (phloem sap), and an absolute dependency on their bacterial symbiont for survival and reproduction (IAGC 2010).

The detoxification system of aphids also shows loss of genes as compared to other herbivorous insects (although there are even fewer detoxification genes in Apis mellifera);

(Claudianos et al. 2006, HGSC 2006, IAGC 2010). Aphids have a sugar-rich diet which needs to be distributed around the body to maintain osmotic pressure. The aphid genome shows high numbers of sugar uniporters facilitating transport of sugar (glucose, fructose) to the hemolymph via epithelial cells. Aphids have lost their protective and digestive peritrophic membrane in the gut lumen presumably because of their diet (Cristofoletti et al. 2003). A special chitin protein necessary for the synthesis of insect peritrophic membranes is also missing from the aphid’ genomes.

1.1.6 Aphid reproduction and B. aphidicola

During reproduction, the B. aphidicola cells transfer maternally from the bacteriocytes to the embryo (Brough and Dixon 1990). A recent study on this maternal transmission provides detailed analysis of the transmission of B. aphidicola and its infection dynamics using fluorescent microscopy (Koga et al. 2012). After extensive electron microscopic analysis, the authors proposed that exo-/endocytotic transfer processes may be responsible for B. aphidicola transfer. There is no evidence at the molecular level to confirm this hypothesis yet. This

15 | P a g e transport may occur in response to signalling from B. aphidicola cells. Genome analysis of B. aphidicola suggests that the flagellum of the bacteria may serve as a transporter structure

(Kubori et al. 1998, Young et al. 1999, Shigenobu et al. 2000). The B. aphidicola genome is missing the filament gene (fliC) required for motility and also the genes responsible for chemotaxis (Shigenobu et al. 2000) suggesting that the existing flagellum-related assembly may serve as a source of signals to initiate bacterial transfer to the developing embryo. It has also been reported that there are very few genes in B. aphidicola that code for outer membrane proteins (Shigenobu et al. 2000). This suggests that the movement of B. aphidicola from bacteriocytes to embryo could take place due to signalling between bacteriocytes and embryos alone, a possibility which could be explored by searching bacteriocyte-specific gene expression profiles.

Due to their symbiotic relationship, the presence of Buchnera has a strong impact on the aphid’s health and survival. Therefore, it is interesting to study the biology of B. aphidicola and aphid symbiosis in order to develop strategies to control aphids in agricultural contexts.

1.1.7 The Buchnera and Aphid symbiosis from an evolutionary perspective

In comparison with free living bacteria, endosymbiotic populations of B. aphidicola are very small. This leads to the accumulation of mildly deleterious mutations with an unusually low ratio of synonymous to nonsynonymous mutations (Moran 1996). The B. aphidicola genome has a faster rate of genome evolution which has been attributed to the increased rate of fixation of mildly deleterious mutations. Genome evolution has been taking place without recombination in B. aphidicola due to its clonal reproduction. Also, the [A+T] content of the

B. aphidicola genome is exceptionally high, a characteristic shared by most endosymbionts

(Moran 1996).

16 | P a g e

Aphids usually reproduce clonally, preventing gene segregation. It is interesting to examine how this reproductive strategy is reflected in aphid evolution. Aphids have different defining features in comparison to other insects. Several aphid phylogenetic studies failed to answer questions regarding the grouping of different aphid species (Heie et al. 1987,

Wojciechowski 1992, Dohlen and Moran 2000, Martinez-Torres et al. 2001, Ortiz-Rivas and

Martinez-Torres 2010). Novakova et al. (2013) showed that genes derived from B. aphidicola are also helpful in determining the phylogenetic relationships among aphids.

Although almost all aphid species possess B. aphidicola symbionts, different species vary in many respects, including morphology, behaviour, secondary symbiont and, most importantly, their preferred host plants. As mentioned earlier in the section Aphid life cycle, aphid species may either have one preferred host or may feed on several different host plants.

The following section will try to categorise aphid species based on host preferences and consider how these preferences affect aphid biology. As explained in the Preface of this thesis, the main focus of this project is the species M. persicae, B. brassicae and L. erysimi and plants from Brassicaceae plant family, and the following section will therefore concentrate on these.

1.1.8 Generalists and specialists

Feeding habits are important criteria by which insect species can be categorized. Some insects, including many aphids, feed on a variety of plants that produce diverse phytochemicals.

Insects feeding on such plants ingest phytochemicals that may be toxic or nutritious. Therefore, these insects have preferences for host plants or sites to feed on or to lay their eggs. This host/site preference presumably enhances the insects’ long-term fitness (Scheirs et al. 2000).

The insect-host association can lead to speciation events during the evolutionary process and can create insect species that feed exclusively on specific types of plants. These insects, feeding on one (“monophagous”) to few types of host plant, are called “oligophagous” insects. On the other hand, when an insect can exploit a large number of different plants as hosts it is termed

17 | P a g e

“polyphagous”. For simplicity, oligophagous insects in this study will be referred to as

“specialists”, as they feed predominantly on members of only plant family, while polyphagous insects will be referred to as “generalists”. An important fact about generalist insects is that they have the ability to tolerate toxic plant secondary metabolites (PSMs) from different hosts.

Approximately 10% of insects are considered generalists in the sense of feeding on plants belonging to more than three families (Bernays and Graham 1988). This suggests most insects are specialists.

Aphid species can also be classed as generalists and specialists. Among aphids, >75% of species are specialists or monophagous (Schoonhoven et al. 2005). Of the aphid species used in this study, M. persicae is known as a generalist feeding on more than 400 plant species, whereas B. brassicae and L. erysimi are specialist aphid species which feed exclusively on

Brassicaceae family plants. The following section explores the key differences between these species’ adaptation to Brassicaceae family host plants and the role of different aphid tissues in this process.

1.1.9 Evolution of brassica plants

The Brassicaceae plant family contains more than 350 genera and 3000 species. It includes horticultural crops like cabbage, cauliflower, mustard, radish and capers as well as broad-acre crops like canola. These plants are well-known for their ability to produce sulfur-containing compounds formally described as β-thioglucoside-N-hydroxysulfates, (Figure 1.1.5) but more commonly known as “glucosinolates” (Ettlinger and Kjaer 1968, Fenwick et al. 1983).

18 | P a g e

R S Glu

N - OSO3

Figure 1.1.5 The fundamental structure of a glucosinolate compound

Glucosinolates are responsible for the characteristic sharp taste of mustard seeds. The bulk of glucosinolate production takes place in plant species belonging to the Brassicaceae,

Capparaceae and Caricaceae families, though it also occurs in 13 other plant families (Fahey et al. 2001). The Brassicaceae family evolved around 32 million years ago and has the highest number of different types glucosinolates (Edger et al. 2015). Plants from the order Brassicales alone, can synthesise more than 130 different types of glucosinolate (Fahey et al. 2001, Halkier and Gershenzon 2006, Mithen et al. 2010, Collett et al. 2014). Within the Brassicaceae family,

30-40 different types of glucosinolates can be found (Halkier and Gershenzon 2006). There are three broad classes of glucosinolate compounds: aliphatic, aromatic and indolic. The -R groups present in these glucosinolate structures are derived from methionine, phenylalanine and tryptophan respectively (Velasco et al. 2008).

A phylogenetic study based on the 15 different glucosinolate producing plant families shows that they diverged around 92 million years ago and that specific plant families gained the ability to produce indolic glucosinolates using tryptophan as a substrate after this point.

Before this divergence, plants could only use phenylalanine and branched chain amino acids in glucosinolate synthesis. The ability to produce glucosinolates was preserved in the

Brassicaceae plant family. The glucosinolate biosynthesis process has evolved sequentially from acting only on phenylalanine → phenylalanine + tryptophan → phenylalanine + tryptophan + methionine. Several steps within these pathways were preserved during the evolutionary process (Edger et al. 2015). The glucosinolate compounds are the precursor of

19 | P a g e toxic chemical compounds like isothiocyanates which act as chemical deterrents for herbivorous insects.

1.1.10 Glucosinolates driven Defence mechanism in plants

Glucosinolates contain sulfur and nitrogen and are derived from sugar and amino acids like tryptophan, phenylalanine and methionine. Usually, glucosinolates are water-soluble anions. Glucosinolates elicit their antifeedant effect as a part of a “two-component defence system” of plants and act as precursors for various toxic compounds. In this system, the precursor compound requires specific enzymatic activity to be converted into a defensive compound (Wittstock et al. 2004, Bak et al. 2006, Morant et al. 2008). Plants contain an enzyme called ß-thioglucosidase (EC number 3.2.1.147) which acts specifically on glucosinolates to convert them into toxic compounds like isothiocyanates, thiocyanates, nitriles and epithoalkanes. ß-thioglucosidases are also known as “myrosinases” (IUBMB Enzyme nomenclature number E.C.3.2.1.147). They are present in the vacuoles of special idioblast cells called “myrosin cells” (Guignard 1890, Harnischfeger 1974). Naturally, myrosin cells are physically separated from the glucosinolate-containing cells. Upon insect herbivory, the tissue disruption allows myrosinase to encounter the glucosinolate molecules, which liberates a glucose molecule and produces an unstable thiohydroximate-O-sulfonate, also known as an aglucon (Figure 1.1.6). The subsequent release of the sulfate from the aglucon structure can produce isothiocyanate or nitriles or elementary sulfur based on the pH of its surroundings

(Benn 1977, Bones and Rossiter 1996). At neutral pH, this reaction produces isothiocyanates, whereas at lower pH it yields nitriles. In the presence of an epithiospecifier protein, myrosinase can produce epithoalkanes, another less toxic product (Figure 1.1.6); (Tookey 1973).

20 | P a g e

Figure 1.1.6 Formation of different toxic compounds from glucosinolated by the action of myrosinase

Understanding aphids' importance to agriculture and economics, aphid biology, aphids’

symbiotic association with B. aphidicola, the relationship between aphids and their host

Brassicaceae plants, special features of the Brassicaceae and their interaction with different

aphid species have set up a narrative for this thesis. Using the above-mentioned earlier findings,

I have performed several experiments including transcriptome analysis, bioassays for testing

new methods of aphid control various modern bioinformatic analyses to understand interesting

facets of comparative biology aphid. The structure of the thesis is as follows–

Chapter II describes the two major transcriptomic experiments that are drawn on to varying

degrees in subsequent chapters in this thesis. It also describes the use of this transcriptomic

data to improve our understanding of the phylogeny of the key aphid species.

21 | P a g e

Chapter III describes my experiments exploring the potential of RNA interference (RNAi) technology to control aphids and the hurdles that need to be surmounted for this approach to work.

Chapter IV uses a comparative approach to elucidate glucosinolate detoxification in generalist and specialist aphid species and proposes a mechanistic model.

Chapter V examines different types of myrosinase enzymes present in generalist and specialist aphid species – their sequence variation, structure, role in the evolution of aphids and possible substrate specificity based on in silico analysis.

Chapter VI provides a general discussion and future prospectus.

22 | P a g e

1.2 References

Aktar, M. W., et al. (2009). "Impact of pesticides use in agriculture: their benefits and hazards." Interdisciplinary Toxicology 2(1): 1-12.

Altincicek, B., et al. (2008). "Wounding-mediated gene expression and accelerated viviparous reproduction of the pea aphid, Acyrthosiphon pisum." Insect Mol Biol 17(6): 711-716.

Anthon, E. W. (1955). "Evidence for Green Peach Aphid Resistance to Organo-Phosphorous Insecticides1." Journal of Economic Entomology 48(1): 56-57.

Bak, S., et al. (2006). "Cyanogenic glycosides: a case study for evolution and application of cytochromes P450." Phytochemistry Reviews 5(2): 309-329.

Bass, C., et al. (2014). "The evolution of insecticide resistance in the peach potato aphid, Myzus persicae." Insect Biochem Mol Biol 51(Supplement C): 41-51.

Baumann, P. (2005). "Biology of Bacteriocyte-Associated Endosymbionts of Plant Sap- Sucking Insects." Annual Review of Microbiology 59: 155-189.

Benn, M. (1977). "Glucosinolates." Pure and Applied Chemistry 49(2): 197-210.

Berlandier, F., et al. (2010). "Aphid management in canola crops." Farm note, Department of Agriculture and Food, Govt of Western Australia.

Bernays, E. and M. Graham (1988). "On the Evolution of Host Specificity in Phytophagous ." Ecology 69(4): 886-892.

Bones, A. M. and J. T. Rossiter (1996). "The myrosinase-glucosinolate system, its organisation and biochemistry." Physiologia Plantarum 97(1): 194-208.

Bracho, A. M., et al. (1995). "Discovery and molecular characterization of a plasmid localized in Buchnera sp. bacterial endosymbiont of the aphid Rhopalosiphum padi." Journal of Molecular Evolution 41(1): 67-73.

Broadbent, L. (1949). "Factors affecting the activity of alatae of the aphids Myzus persicae (Sulzer) and Brevicoryne brassicae (L.)." Annals of applied Biology 36(1): 40-62.

Brough, C. N. and A. F. Dixon (1990). "Ultrastructural feachers of egg development in oviparae of the vetch aphid Megoura viciae Buckton." Tissue and Cell 22(1): 13.

Chapple, C. E. and R. Guigó (2008). "Relaxation of selective constraints causes independent selenoprotein in insect genomes." PLoS ONE 3(8): e2968.

Chen, G. and G. Wu (2005). "Resistance to seven insecticides and analysis of enzymatic characteristics in Lipaphis erysimi (Homoptera: ) in Fuzhou, China." Journal of Fujian Agriculture and Forestry University(Natural Science Edition) 34(2): 204-207.

Claudianos, C., et al. (2006). "A deficit of detoxification enzymes: pesticide sensitivity and environmental response in the honeybee." Insect Mol Biol 15(5): 615-636.

23 | P a g e

Collett, M. G., et al. (2014). "Could Nitrile Derivatives of Turnip (Brassica rapa) Glucosinolates Be Hepato- or Cholangiotoxic in Cattle?" Journal of agricultural and food chemistry 62(30): 7370-7375.

Cristofoletti, P. T., et al. (2003). "Midgut adaptation and digestive enzyme distribution in a phloem feeding insect, the pea aphid Acyrthosiphon pisum." Journal of Insect Physiology 49(1): 11-24.

Daiber, C. and S. Schöll (1959). "Further notes on the overwintering of the green peach aphid, Myzus persicae (Sulzer)." South Africa. J Entomol Soc South Afr 22: 494-520.

Dohlen, C. D. V. and N. A. Moran (2000). "Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alternation." Biological Journal of the Linnean Society 71(4): 689-717.

Douglas, A. E. (2006). "Phloem-sap feeding by animals: problems and solutions." Journal of Experimental Botany 57(4): 747-754.

Downing, N. (1978). "Measurements of the osmotic concentrations of stylet sap, haemolymph and honeydew from an aphid under osmotic stress." Journal of Experimental Biology 77(1): 247-250.

Edger, P. P., et al. (2015). "The butterfly plant arms-race escalated by gene and genome duplications." Proceedings of the National Academy of Sciences 112(27): 8362-8366.

Edwards, O. R., et al. (2008). "Insecticide resistance and implications for future aphid management in Australian grains and pastures: a review." Australian Journal of Experimental Agriculture 48(12): 1523-1530.

Emden, H. F. V., et al. (1969). "The Ecology of Myzus persicae." Annual Review of Entomology 14(1): 197-270.

Engel, A. and H. Stahlberg (2002). Aquaglyceroporins: Channel proteins with a conserved core, multiple functions, and variable surfaces. International Review of Cytology, Academic Press. 215: 75-104.

Ettlinger, M. G. and A. Kjaer (1968). "Sulfur compounds in plants." Recent Advances in Phytochemistry(1): 59-144.

Fahey, J. W., et al. (2001). "The chemical diversity and distribution of glucosinolates and isothiocyanates among plants." Phytochemistry 56(1): 5-51.

Fenwick, G. R., et al. (1983). "Glucosinolates and their breakdown products in food and food plants." C R C Critical Reviews in Food Science and Nutrition 18(2): 123-201.

Fisher, J. W., et al. (2000). Long-distance transport. Biochemistry and Molecular Biology of Plants. American Society of Plant Physiologists, Citeseer.

Fukumorita, T. and M. Chino (1982). "Sugar, Amino Acid and Inorganic Contents in Rice Phloem Sap." Plant and Cell Physiology 23(2): 273-283.

24 | P a g e

Gerardo, N. M., et al. (2010). "Immunity and other defenses in pea aphids, Acyrthosiphon pisum." Genome Biol 11(2): R21.

Guignard, L. (1890). "Recherche.s sur la localisation des principesactifs des Cruciferes. - J. Bot. 4(22): 38,S-395." Journal of Botony 4(22): 385- 395.

Hagenbucher, S., et al. (2013). "Pest trade-offs in technology: reduced damage by caterpillars in Bt cotton benefits aphids." Proceedings of the Royal Society B: Biological Sciences 280(1758).

Hales, D. F. and T. E. Mittler (1988). "Male production by aphids prenatally treated with precocene: Prevention by short-term kinoprene treatment." Arch Insect Biochem Physiol 7(1): 29-36.

Halkier, B. A. and J. Gershenzon (2006). "Biology and biochemistry of glucosinolates." Annual Review of Plant Biology 57(1): 303-333.

Harnischfeger, G. (1974). "Studies on photosynthetic pigments using fluorescence at liquid nitrogen temperature: Evidence for light induced pigment alignment in the photosynthetic apparatus." Berichte der Deutschen Botanischen Gesellschaft 87(3): 483-491.

Heie, O., et al. (1987). "Paleontology and phylogeny." Aphids: Their Biology, Natural Enemies and Control, Vol. 2a: 367-391.

Heie, O. and B. Petersen (1961). "Investigations on Myzus persicae Sulz." Aphis fabae: 7-52.

HGSC, T. H. G. S. C. (2006). "Insights into social insects from the genome of the honeybee Apis mellifera." Nature 443: 931.

Hille Ris Lambers, D. (1946). "The hibernation of Myzus persicae Sulzer and some related species, including a new one." Bull Entomol Res 37: 197-199.

IAGC, I. A. G. C. (2010). "Genome sequence of the pea aphid Acyrthosiphon pisum." PLoS Biol 8(2): e1000313.

Kennedy, J. S. and H. L. G. Stroyan (1959). "Biology of Aphids." Annual Review of Entomology 4(1): 139-160.

Koga, R., et al. (2012). "Cellular mechanism for selective vertical transmission of an obligate insect symbiont at the bacteriocyte-embryo interface." Proc Natl Acad Sci U S A 109(20): E1230-1237.

Kubori, T., et al. (1998). "Supramolecular structure of the salmonella typhimurium type III protein secretion system." Science 280(5363): 602-605.

Lai, C. Y., et al. (1994). "Amplification of trpEG: adaptation of Buchnera aphidicola to an endosymbiotic association with aphids." Proceedings of the National Academy of Sciences 91(9): 3819-3823.

Martinez-Torres, D., et al. (2001). "Molecular systematics of aphids and their primary endosymbionts." Mol Phylogenet Evol 20(3): 437-449.

25 | P a g e

Mithen, R., et al. (2010). "Glucosinolate biochemical diversity and innovation in the Brassicales." Phytochemistry 71(17): 2074-2086.

Moran, N. A. (1996). "Accelerated evolution and Muller's rachet in endosymbiotic bacteria." Proceedings of the National Academy of Sciences Evolution 93: 2873-2878.

Morant, A. V., et al. (2008). "β-Glucosidases as detonators of plant chemical defense." Phytochemistry 69(9): 1795-1813.

Nakabachi, A., et al. (2005). "Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera." Proc Natl Acad Sci U S A 102(15): 5477-5482.

Nikoh, N. and A. Nakabachi (2009). "Aphids acquired symbiotic genes via lateral gene transfer." BMC Biol 7: 12.

Novakova, E., et al. (2013). "Reconstructing the phylogeny of aphids (: Aphididae) using DNA of the obligate symbiont Buchnera aphidicola." Mol Phylogenet Evol 68(1): 42- 54.

O'Brien, P. J. and J. B. Graves (1992). "Insecticide resistance and reproductive biology of Aphis gossypii Glover." Southwestrn Entomologist 17: 115-122.

Ortiz-Rivas, B. and D. Martinez-Torres (2010). "Combination of molecular data support the existence of three main lineages in the phylogeny of aphids (Hemiptera: Aphididae) and the basal position of the subfamily Lachninae." Mol Phylogenet Evol 55(1): 305-317.

Ponder, K. L., et al. (2000). "Difficulties in location and acceptance of phloem sap combined with reduced concentration of phloem amino acids explain lowered performance of the aphid Rhopalosiphum padi on nitrogen deficient barley (Hordeum vulgare) seedlings." Entomologia Experimentalis et Applicata 97(2): 203-210.

Price, D. R. G., et al. (2007). "Molecular characterisation of a candidate gut sucrase in the pea aphid, Acyrthosiphon pisum." Insect Biochem Mol Biol 37(4): 307-317.

Ronnebeck, W. (1950). "On the spring development of the green Peach Aphid (Myzus persicae Sulzer) on the primary host with respect to its importance as a virus vector in the Potato field." Zeitschrift fur Pflanzenkrankheiten, Pflanzenpathologie und Pflanzenschutz 57(9-10): 351- 357.

Ronnebeck, W. (1952). "(German title.) Experiment on the reduction of virus infection of potato plants." Nachrichtenblatt des Deutchen Pflanzenschutzdienstes 4: 189-190.

Sadeghi, A., et al. (2009). "Evaluation of the Susceptibility of the Pea Aphid, Acyrthosiphon pisum, to a Selection of Novel Biorational Insecticides using an Artificial Diet." Journal of Insect Science 9(65): 1-8.

Scheirs, J., et al. (2000). "Optimization of Adult Performance Determines Host Choice in a Grass Miner." Proceedings: Biological Sciences 267(1457): 2065-2069.

Schoonhoven, L. M., et al. (2005). Insect-plant biology. New York, Oxford University Press on Demand.

26 | P a g e

Shakesby, A. J., et al. (2009). "A water-specific aquaporin involved in aphid osmoregulation." Insect Biochem Mol Biol 39(1): 1-10.

Shigenobu, S., et al. (2000). "Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS." Nature 407: 8.

Tagu, D., et al. (2010). "The anatomy of an aphid genome: from sequence to biology." C R Biol 333(6-7): 464-473.

Thomas Huxley, H. (1858). "On the agmaic reproduction and morphology of Aphis – Part I." Transactions of the Linnean Society London 22: 193-219.

Tookey, H. L. (1973). "Crambe Thioglucoside Glucohydrolase (EC 3.2.3.1): Separation of a Protein Required for Epithiobutane Formation." Canadian Journal of Biochemistry 51(12): 1654-1660.

Tzin, V., et al. (2015). "RNA interference against gut osmoregulatory genes in phloem-feeding insects." Journal of Insect Physiology 79: 105-112.

Valenzuela, I. and A. A. Hoffmann (2013). Scoping study for further R&D on managing aphids and virus transmission and economic impact of IPM in grain production zones. Barton, ACT, Australia, Grains Research Development Corporation.

Valenzuela, I. and A. A. Hoffmann (2015). "Effects of aphid feeding and associated virus injury on grain crops in Australia." Austral Entomology 54(3): 292-305. van Ham, R. C., et al. (2003). "Reductive genome evolution in Buchnera aphidicola." Proc Natl Acad Sci U S A 100(2): 581-586.

Velasco, P., et al. (2008). "Comparison of Glucosinolate Profiles in Leaf and Seed Tissues of Different Brassica napus Crops." Journal of the American Society for Horticultural Science 133(4): 551-558.

Vogel, K. J. and N. A. Moran (2013). "Functional and Evolutionary Analysis of the Genome of an Obligate Fungal Symbiont." Genome Biol Evol 5(5): 891-904.

Ward, K. (1934). "The green peach aphid (Myzus persicae Sulzer) in relation to the peach in Victoria and the measures investigated for its control." J. Agric., Victoria 32: 97-104.

Watt, S. (2016) Green peach aphid confirmed resistant to another insecticide.

Wilkinson, T., et al. (1997). "Honeydew sugars and osmoregulation in the pea aphid Acyrthosiphon pisum." Journal of Experimental Biology 200(15): 2137-2143.

Wilkinson, T. L. and A. E. Douglas (1998). "Host cell allometry and regulation of the symbiosis between pea aphids, Acyrthosiphon pisum, and bacteria, Buchnera." Journal of Insect Physiology 44(7-8): 7.

Wittstock, U., et al. (2004). "Successful herbivore attack due to metabolic diversion of a plant chemical defense." Proc Natl Acad Sci U S A 101(14): 4859-4864.

27 | P a g e

Wojciechowski, W. (1992). Studies on the systematic system of aphids (Homoptera, Aphidinea), Katowice.

Young, G. M., et al. (1999). "A new pathway for the secretion of virulence factors by bacteria: The flagellar export apparatus functions as a protein-secretion system." Proceedings of National Academy of Science:Microbiology 96: 6456–6461.

Zhao, J. H., et al. (2011). "Benefits of Bt cotton counterbalanced by secondary pests? Perceptions of ecological change in China." Environmental Monitoring and Assessment 173(1): 985-994.

Ziegler, H. (1975). Nature of transported substances. Transport in Plants I, Springer: 59-100.

28 | P a g e

Chapter II

Transcriptome studies and aphid phylogeny

29 | P a g e

2.1 Introduction

Transcriptomics is a recent and powerful scientific approach that studies global gene expression patterns in biological samples. This approach provides researchers with a powerful tool to understand diverse biological processes in model and non-model organisms alike. For aphids, which are not genetic model organism, transcriptomics provides an unparalleled molecular insight into their unique and sometimes bizarre biology. In this chapter, I lay the foundation of further chapters by describing the two transcriptomic experiments that I have performed. The initial motivation and justification for these experiments was to identify genes that could be targeted with an RNAi-based insecticide. However, the designs also aimed to elucidate certain aspects of aphid biology and evolution.

Gene expression analysis was previously dominated by microarray-based techniques. The biggest downside of microarray technology is the need for prior knowledge of interesting sequences (Schena et al. 1995, Bumgarner 2013). Microarray technology is based on hybridisation of the transcripts with specifically designed probes complementary to the genes of interest (Schena et al. 1995, Pozhitkov et al. 2007). Today, the central place, once held by microarray analysis, has been claimed by the nascent and digital approach of RNA-sequencing

(RNA-seq), which can produce sequencing data without any prior knowledge about the organism (Nelson 2001, Wang et al. 2009). RNA-seq directly sequences all the cDNAs present in a sample (Morozova et al. 2009, Wang et al. 2009). The main difference between microarrays and RNA-seq is the dynamic range of RNA-seq which allows quantitation of abundance ranging from a single transcript to a large number of transcripts. Moreover, the direct count-based abundance measured by RNA-seq is far more accurate than microarray measurement, which are based on hybridization intensity and fluorescence. The cost of RNA- seq is dropping rapidly (von Bubnoff 2008) and it is providing researchers with a powerful tool to understand diverse biological processes in model and non-model organisms. There are

30 | P a g e several different types of transcriptome data that researchers use to address different biological questions for example, biological samples which have undergone specific treatments, different types of cells/tissues, samples of different life stages of an organism etc. Table 2.1.1 summarises studies performed on different aphid species to answer different biological questions.

Table 2.1.1 Summary of different transcriptome studies performed on aphids

Author/year Species Samples Treatment Sequencing Details Technology Nakabachi et Acyrthosiphon Bacteriocyte, - ABI 3700 Single, al. (2005) pisum capillary >100bp sequencer Burke and Acyrthosiphon Clone Cured with Illumina Single, Moran (2011) pisum without ampicillin Genome 25- Regiella treatment, Analyzer II 36bp insecticola, Infection with platform Clone with Serratia Serratia symbiotica symbiotica using microinjection Hansen and Acyrthosiphon Bacteriocyte, - Illumina 74bp, Moran (2011) pisum whole body Genome single (without Analyzer II bacteriocyte and embryo) Eyres et al. Acyrthosiphon Head Feeding on Illumina 75bp, (2016) pisum Vicia faba and HiSeq 2000 paired Ononis spinosa end Ji et al. (2016) Myzus persicae Nymph, Fed on wild Illumina 125bp, adult type HiSeq 2500 paired Arabidopsis for end 4 days (nymph) and 8 days (adult) Thorpe et al. Myzus Head, body - Illumina- 100bp, (2016) persicae, without head HiSeq paired Myzus cerasi, end Rhopalosiphum padi Bandopadhyay Lipaphis Wingless Samples Illumina 100bp, et al. (2013) erysimi whole body collected at Genome paired different time Analyzer IIx

31 | P a g e

points after infestation Liu et al. Aphis glycines Gut, - Illumina Single, (2012) whole body GAII 75bp sequencing platform Li et al. (2013) Aphis gossypii Full body Fed on cotton Illumina 150bp, seedling, fed in HiSeq 2000 Paired summer end Nan et al. Aphis Head, - Illumina - (2016) craccivora winged adult (Solexa) GAII Anathakrishnan Diuraphis Gut Two strains fed Sanger 250bp, et al. (2014) noxia on wheat for sequencing paired different hours end Zhang et al. Sitobion Gut Different Illumina Paired (2013) avenae instars on HiSeq 2000 end, wheat plant before and after feeding Shang et al. Toxoptera first instar, - Illumina Paired (2016) citricida second Hiseq 2000 end, instar, third 100bp instar winged, third instar wingless, fourth instar winged, fourth instar wingless, wined adults, wingless adults

The transcriptome studies performed on various aphid species so far show that researchers have focused on different aspects of aphid biology including different life stages, different body parts and interactions of aphids with their obligate and facultative symbionts (Table 2.1.1).

This thesis focuses on three aphid species, M. persicae, B. brassicae and L. erysimi. Although several transcriptome studies have been performed in M. persicae, transcriptome studies on L. erysimi and B. brassicae are infrequent. Hence, such studies will shed light on the similarities and differences between these species, which will be an asset for future studies. These aphid

32 | P a g e species hold extreme economic importance in the field of agriculture. One of the aims of the current study is to control aphids on Brassicaceae crop plants. Therefore, the three above mentioned aphid species were selected due to their ability to feed on Brassicaceae family crop plants.

The aim of the project is to standardise RNA interference (RNAi) technology that can simultaneously be useful against all three-aphid species. RNAi needs an effective gene target.

Several rules for the selection of an RNAi target were devised based on analysis of the transcriptome dataset, which are as follows: 1) The target must be expressed in a tissue that is easily accessible to incoming dsRNA from the diet, 2) the expression level of the selected gene must be moderate, 3) the target sequence must not show predicted off-target interactions with the sequences of other insect species, 4) the gene must play an important role in the physiology of the insect and 5) the target must be expressed constitutively in the organism without the need for any inducer. The mode of RNAi treatment investigated in this study was dietary. Therefore, to find a gene target that satisfies the above-mentioned criteria, it is important to study the transcriptome of the insect’s digestive system. In addition, bacteriocyte tissue were chosen for transcriptome analysis as it is key for the survival of almost all aphids due to their obligate symbiotic association. The following section provides a detailed rationale for the selection of midgut (MG) and bacteriocyte (Bcyt) tissues to perform transcriptome studies for RNAi target selection.

2.1.1 Transcriptome study I

This study was performed on three different aphid species, 1) Myzus persicae, 2) Brevicoryne brassicae and 3) Lipaphis erysimi. For each of these species, the MG and the Bcyt were dissected. Whole-body transcriptomes of these aphid species were also generated to serve as a comparison.

33 | P a g e

Midgut (MG)

The midgut is the most important part of the aphid digestive system (Chapter 1 section

1.1.3). The main reasons for the selection of the MG for the study follow. If transgenic plants are used to deliver aphidicidal dsRNA (double-stranded RNA), then the MG will be one of the first tissues reached and thus potentially interfered with. If the dsRNA can be delivered directly to the target tissue, there is a greater chance of achieving efficient silencing of the target gene.

Therefore, selecting a target gene showing elevated expression in the MG would result in a dsRNA construct better able to reach to the target with minimum degradation, as well as reducing the requirement for inter-tissue, and inter-cellular movement of the dsRNA.

Bacteriocyte (Bcyt)

The obligatory association of Buchnera aphidicola and aphids is essential to the success of aphids in nature (Chapter 1, Section 1.1.4). The Bcyt tissue was selected for the current transcriptome study because (1) transcriptomics could elucidate the molecular basis of the symbiosis between Buchnera and aphids, (2) the location of the Bcyt tissue is close to the gut tissue and hence it would theoretically be easy for the dsRNA to reach to the target tissue and

(3) as B. aphidicola is associated exclusively with aphids, this approach provides an opportunity to target biology unique to aphids potentially allaying fears about new control agents and increasing the chance of avoiding off-target dsRNA effects. The contention of this third reason, that society may more readily endorse pest control measures that target biology that is unique to aphids, was never intended to be tested within the scope of this thesis, yet the potential to discover a novel control mechanism helped motivate the collection of bacteriocyte transcriptome dataset.

Whole-body (WB)

The whole-body sample was selected to identify genes differentially expressed in the target tissues (MG and Bcyt). An alternate strategy, of sampling aphids that had their midgut and

34 | P a g e bacteriocytes removed was considered, but ultimately avoided. Not only were whole bodies easier to work with, they would likely have fewer spurious results by showing the list of differentially expressed genes would represent a more realistic gene set with fewer false positives.

The second transcriptome study performed in this thesis aimed to understand how the transcriptome of different M. persicae strains is affected by host plant changes.

2.1.2 Transcriptome study II

The design was partly influenced by a study of the Helicoverpa armigera transcriptome which showed that insects gene expression profile changes considerably depending upon what it is eating (Pearce et al. 2017). It seemed plausible that the transcriptome of M. persicae could be dramatically affected by host plant changes, and so six plants were tested. An additional design feature of this study was that three different strains of M. persicae were profiled.

In terms of the development of an RNAi based insecticide, the design of this second transcriptomic experiment would help to identify target genes that are stably expressed in the insect when fed on different plants and thus available as a target irrespective of the host plant.

The experiment would give some indication of strain-specific expression differences which could also be useful to choosing a broadly effective target. I decided to use this study for RNAi target selection in the advanced stage of target screening, meaning that for the primary RNAi target screening I focused only on tissue-specific transcriptomes.

The transcriptome studies hold wide-ranging potential due to their non-targeted nature. Hence, the data not only allow checking of the proposed hypothesis but can also speak to secondary biological questions. I will use this data not just to find potential RNAi targets (Chapter III) but also to address biological questions related to glucosinolate detoxification (Chapter IV and V).

35 | P a g e

Before analysing the actual transcriptome data, it was important to develop a picture of the relationships between the experimental aphid species, M. persicae, B. brassicae and L. erysimi.

This was achieved by producing a phylogenetic tree based on the sequences from previously published transcriptome data from different aphid species

2.1.3 Transcriptome-based phylogenetics

Kim et al. (2011) studied aphid phylogeny using the DNA sequence of the nuclear gene

EF1 and four mitochondrial genes, namely COI, COII, 12S/16S and cytochrome B (CytB).

The aim of the study was to understand the diversification and host associations of different aphid species. The phylogenetic and molecular clock analysis of 80 species of the tribe

Aphidini with 7 outgroup aphid species, found that the aphids and their hosts show co- diversification. The aphid secondary hosts evolved between ~10-35 million years ago. This suggests that aphid before this time were restricted to their modern primary hosts (Kim et al.

2011). The study also showed that the aphid tribe (to which the three-aphid species studied in this thesis belong) diverged from its sister tribe Aphidini about 60-65 million year ago. This split took place contemporaneous with the divergence of the Rosaceae plant family. Present-day aphids are either monoecious, surviving on only one type of host plant throughout its life, or heteroecious, changing host plant as part of their life cycle. The multiple hosts of heteroecious aphids are usually of completely different types with secondary hosts generally of greater nutritional values. This improvement in nutrition drove a burst of aphid speciation. Whether the monoecious and heteroecious group are monophyletic is still a debatable question, as indeed is the monophyly of some aphid taxa (von Dohlen et al. 2006,

Kim et al. 2011).

Therefore, along with the application-oriented RNAi-based control strategy, this thesis also addresses questions related to aphid biology more broadly. One focus is understanding the differences between generalist and specialist aphid species, particularly those that feed on

36 | P a g e

Brassicaceae. One way in which transcriptomics can help in this endeavour is to provide a rich set of characters that can be used in phylogenetic analyses. Such an approach has been championed by the 1KITE consortium (Misof et al. 2014). The scope of my analysis was not as taxonomically broad. However, I set out to combine the transcriptomics of aphid pests of

Brassicaceae crops, which I would generate, with data available from other aphid species. At the time of analysis, transcriptomes from 15 species were available. The three species that I was interested in are known to be phylogenetically close to each other.

37 | P a g e

2.2 Materials and Methods

2.2.1 Insect rearing and maintenance for Experiment 1

The experiments were performed on a Myzus persicae strain collected by Dr. Paul

Umina from Bona Vista Road, Warragul, Victoria, Australia (38°13’01.6”S; 145°58’19.5”E;

Collection date: 22/03/2012, Host plant: Raphanus raphanistrum). Brevicoryne brassicae and

Lipaphis erysimi lab strains were received from Dr. Owain Edwards, CSIRO, Western

Australia, Australia. All strains of aphids were maintained at 20°C with 12 h/12 h dark/light period on radish plants (Raphanus sativus) grown from seed. At the start of these experiments, a clonal aphid colony was established from a single aphid to produce a genetically homogenous population.

2.2.2 Aphid phylogeny analysis

The phylogenetic relationship of the aphid species to one another and other insect species was studied using a combination of whole genome ortholog predictions with RefSeq genesets (11 species) and RNA-seq data (14 species) where whole body transcriptomes were available on the Sequence Read Archive (SRA) in NCBI (Table 2.2.1).

Previously Rane et al. (2017) had identified 4,117 genes in different species from

Hemiptera, Coleoptera, Diptera, Lepidoptera and that were placed in 1:1 orthologous groups (1 gene per species). Aphid species with the RNA-Seq data and NCBI SRA data were then assembled into consensus transcripts using Trinity (Grabherr et al. 2011) and filtered to identify high quality transcripts (Table 2.2.2) following recommendations by the

1KITE project (Misof et al. 2014). The protein coding regions in each transcript were then identified by means of ab initio gene prediction using GeneMarkS-T (Tang et al. 2015).

Following the transcript-derived gene prediction, the 4,117 true insect orthologs were used to identify the most likely orthologs in each of the 14 remaining species using Smith-Waterman

38 | P a g e alignments and the S’ score for higher sensitivity (Rane et al. 2017). Genes were then selected for phylogenomic analyses which included (a) one showing highest similarity with each of the

14 species and (b) gene size greater than the lower end of the median average deviation of gene lengths in the 11 RefSeq species. This led to the identification of 303 high quality one-to-one orthologs across all 25 species. Codon alignments of all the 303 genes were then generated using MAFFT (v7.310, CBRC) for protein sequence alignment (Katoh and Standley 2013) and

Pal2nal (v1.4) for nucleotide alignment at codon level (Suyama et al. 2006) and the best models for each codon estimated using ModelFinder (Kalyaanamoorthy et al. 2017) which enables free rate variation and finds best-fit partitioning schemes and estimates a consensus phylogeny using IQ-TREE (v1.6.0b4) (Nguyen et al. 2015). The phylogeny was then used as a prior for estimating the divergence times with nine calibration points (Supplementary file 1) using models enabled in MEGA7 (Kumar et al. 2016). A timetree inferred using the Reltime method

(Tamura et al. 2012) and the Dayhoff w/freq. model (Schwarz and Dayhoff 1979). The timetree was computed using 7 calibration constraints (Figure 2.3.1). The estimated log likelihood value is -457863.75 A discrete Gamma distribution was used to model evolutionary rate differences among sites (6 categories (+G, parameter = 0.9246)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 0.0047% sites). All positions with less than

95% site coverage were eliminated. That is, each site for phylogenetic consideration had fewer than 5% of the sequances with alignment gaps, missing data, and/or ambiguous bases. There were total of 27,267 positions in the final dataset. Evolutionary analyses were conducted in

MEGA7 (Kumar et al. 2016).

39 | P a g e

Table 2.2.1 Insect species used for phylogenetic analysis with dataset reference

Species name NCBI accession number

Hemiptera – Aphids

Myzus persicae Transcriptome Data

Lipaphis erysimi Transcriptome Data

Brevicoryne brassicae Transcriptome Data

Acyrthosiphon pisum Current RefSeq geneset (as of November 2016)

Macrosiphum euphorbiae SRR965347, SRR965348

Sitobion avenae SRR1023729

Myzus cerasi ERR983168

Rhopalosiphom padi SRR5133459, SRR5133472

Hyalopterus amygdali SRR1509877

Aphis nerii SRR2500734, SRR2500537

Aphis glycines SRR2181917

Aphis citricidus SRR3216095

Tamalia cownei SRR2559332, SRR2559237

Tamalia inquilinus SRR2558771, SRR2559384

Schizaphis graminum SRR3188117, SRR3188116

Hemiptera – other

Diaphorina citri Current RefSeq geneset (as of November 2016)

Pachypsylla venusta Current RefSeq geneset (as of November 2016)

Homalodisca vitripennis Current RefSeq geneset (as of November 2016)

Halyomorpha halys Current RefSeq geneset (as of November 2016)

40 | P a g e

Oncopeltus fasciatus Current RefSeq geneset (as of November 2016)

Frankliniella occidentalis Current RefSeq geneset (as of November 2016)

Coleoptera

Tribolium castanum Current RefSeq geneset (as of November 2016)

Diptera

Drosophila melanogaster Current RefSeq geneset (as of November 2016)

Lepidoptera

Bombyx mori Current RefSeq geneset (as of November 2016)

Hymenoptera

Apis mellifera Current RefSeq geneset (as of November 2016)

41 | P a g e

2.2.3 Aphid Dissection

The midgut (MG) and bacteriocyte (Bcyt) of aphids from three species, M. persicae, B. brassicae and L. erysimi, were dissected. All were performed in diethyl pyrocarbonate (DEPC) treated phosphate buffered saline under a dissection microscope against a dark background. Around 1,000 second and third-instar individuals of each species were dissected. The dissected tissues were immediately transferred to fresh ice-cold Trizol® solution

(Ambion) and stored at -80°C until further processing.

2.2.4 Library preparation and sequencing

For each species, dissected material from each tissue was pooled into three tubes to form the three biological replicates for the tissue sample under study. Similarly, three pools containing ten aphids each were also placed in separate tubes to be used as whole body (WB) samples, resulting in a total of nine tubes per species. RNA isolation was performed using a

DirectZol™ RNA MiniPrep (Zymo Research) according to the manufacturer’s instructions.

The quality of isolated RNA was checked on an agarose gel and quantified using a Qubit™ fluorometric quantitation (ThermoFisher) system. 3 µg of RNA was supplied to the Australian

Genome Research Facility (AGRF) for library preparation using poly-A selection. The sample libraries were barcoded, pooled and sequenced on a HiSeq 2500 system (Illumina) with 100 bp read length. The 9 M. persicae samples were sequenced in one lane and 18 samples from B. brassicae and L. erysimi were sequenced in a single lane. Both runs were performed in on different occasions with same settings.

2.2.5 Quality control for NGS data

All FASTQ files were first checked with the FastQC software (Andrews 2010) for length distribution, quality of each base, duplication level, overrepresented sequences and adapter contamination. All files were then processed with Trimmomatic software (Bolger et al.

42 | P a g e

2014) to remove short sequences, adaptor contamination and 3 nucleotides from the leading and trailing end. All FASTQ files were processed with the same filtering parameters.

2.2.6 Gene expression analysis

Amongst the three experimental organisms, M. persicae is the only species that has a reference genome sequence. Consequently, de novo transcriptome analysis was performed for all samples, including M. persicae, in order to achieve uniform processing. First, the reference dataset was prepared using de novo transcriptome analysis using Trinity software with default settings (Haas et al. 2013). Trinity de novo analysis generated different number of transcripts for each sample (Table 2.2.2). The output of Trinity assembly was then merged with the A. pisum NCBI transcript sequence dataset, to generate the combined reference set.

Aphid tissue transcriptome

The NCBI transcripts provide reference boundaries for each exon and possible gene structure to the Corset program (Davidson and Oshlack 2014). Using this information, Corset merges clusters that are not differentially expressed. This merged FASTA file was used as a reference for mapping all the corresponding FASTQ read samples. The Corset analysis was performed using BAM files generated by the BWA software with default settings (Li and

Durbin 2009). All differentially expressed clusters were generated with the number of reads assigned to them by the Corset software. Corset software, while performing read counting, analyses the different isoforms of the same gene and defines the longest isoform as a possible gene structure. Here, when provided with the previously annotated A. pisum transcript, Corset uses it as a reference gene structure to identify the boundaries of differentially expressed transcripts/isoforms. All the novel transcripts in the dataset are kept separate. The sequence coverage was analysed using BUSCO. Insecta_odb9 database was used as a reference. The

FASTA sequences were then retrieved and functionally annotated using an array of software including Interproscan (Jones et al. 2014) and BLASTx (Altschul et al. 1990). The pipeline is

43 | P a g e graphically explained in Figure 2.2.1 and Figure 2.2.2. The read data for each sample was used as input for edgeR analysis with default settings (Robinson et al. 2010). Differential gene expression analysis was performed for each aphid species individually using the WB transcriptome as the reference sample. All the expression values provided in this study are statistically significant (p<0.05).

Table 2.2.2 The number of transcripts generated by the Trinity analysis and the predicted gene numbers for each insect species

Species name Assembled transcripts Genes predicted for

phylogenetic analysis

Hemiptera – Aphids

Myzus persicae 47096 15418

Lipaphis erysimi 26056 14894

Brevicoryne brassicae 24674 13796

Acyrthosiphon pisum 16782

Macrosiphum euphorbiae 47624 46987

Sitobion avenae 24111 18882

Myzus cerasi 43439 23563

Rhopalosiphom padi 76886 17389

Hyalopterus amygdali 47482 52267

Aphis nerii 58839 44697

Aphis glycines 39068 17127

Aphis citricidus 42014 21804

Tamalia cownei 31746 25305

Tamalia inquilinus 28749 27838

44 | P a g e

Schizaphis graminum 185936 170219

Hemiptera-other

Diaphorina citri 19311

Pachypsylla venusta 14390

Homalodisca vitripennis 33019

Halyomorpha halys 11374

Oncopeltus fasciatus 19615

Coleoptera

Tribolium castanum 16524

Diptera

Drosophila melanogaster 13953

Lepidoptera

Bombyx mori 15100

Hymenoptera

Apis melifera 15314

45 | P a g e

Figure 2.2.1 The formation of the reference sequence dataset from transcriptome data

without genome sequence availability

Figure 2.2.2 The differential gene expression analysis pipeline using Corset

46 | P a g e

Experiment 2: The transcriptional response to host plants

The aphid-host plant transcriptome experiment was carried out in collaboration with

Dr. Owain Edwards and Dr. Anna Simonsen at CSIRO, Perth, Australia. The experiment was performed with three different M. persicae strains, 1) BUN (Bunbury strain, collected from

Bunbury, Western Australia), 2) LUP (an M. persicae strain that is adapted to feed on lupin plant, collected from Western Australia), and 3) C61 (an M. persicae strain, collected from

Victoria, Australia, that has been maintained in the laboratory for many years and is known to have the ability to form sexual stages. Both the BUN and C61 strains were maintained on radish for ~10 years whereas the LUP strain was maintained on lupin for ~10 years. These aphid strains were then fed on six different host plants: cabbage, cauliflower, canola, mustard, radish and lupin (Lupinus albus). All three strains were maintained on each host plant for seven generations. All treatments were replicated 5 times (18 treatment combinations, n = 5 per treatment combination, total n = 90). Each replicate used a growing plant in a pot in a pest-free greenhous chamber. After two weeks of initial growth, 10 nymphs (1 day old) were released onto each plant. All the plants were separated from each other using mesh cloth. All the replicates were distributed in a randomised block design across two large mesh boxes in the greenhouse. Aphids were maintained and allowed to reproduce clonally for approximately 3-4 weeks (roughly 3 generations). 50 aphids were harvested from each replicate.

RNA was extracted from each sample using a standard Trizol™ protocol. cDNA libraries were prepared for sequenceing from each sample using an Illumina library kit. Two libraries failed during preparation, hence only 88 libraries were used in the final analysis.

Aphid-host transcriptome libraries were sequenced using the Hiseq 2500, single end sequencing protocol with 100bp length. A Genome guided expression analysis was performed, with M. persicae genome scaffolds were used for read mapping. The mapping software, STAR

(Dobin et al. 2013), also produced a read count data file for each sample; these were then used

47 | P a g e for expression analysis using Voom program (Law et al. 2014) to obtain neutralised expression values from the count files and multi-dimentional scaline (MDS) plots was generated to visualise the overall differences between each treatment.

48 | P a g e

2.3 Results

2.3.1 Aphid phylogeny

To understand the evolutionary relationship between aphid species (Table 2.2.2) the three-experimental species, twelve other aphid species for which sequence data was available and nine insect species from other orders were used for phylogenetic analysis (Table 2.2.2). A phylogenetic tree based on x phylogenetically informative characters was generated from 304 genes with 1:1 orthology across the dataset. This tree broadly agreed with the species tree created by the 1KITE project for the species present in both (Misof et al. 2014). The twelve aphids formed a well-supported monophyletic clade sister to two psyllid species (Diaphorini citri, Pachypsylla venusta); (Figure 2.3.1). All three genera represented by multiple species

(Tamalia, Aphis and Myzus) were retrieved as monophyletic groups. The Myzus species were basal within a clade that contained the two other focal species of this study, Lipaphis erysimi and Brevicoryne brassica, and three other aphid species: the relatively well characterized pea- aphid Acyrthosiphon pisum, the potato aphid (Macrosiphum euphorbiae) and the English grain aphid (Sitobian avenae). L. erysimi and B. brassica were found to be sister species and A. pisum, a Fabaceae specialist, was identified as an outgroup to these Brassicaceae specialists.

To place a time scale on this phylogeny, dates derived from the literature were placed on the relevant nodes (citations). This analysis suggests that the common ancestor of aphids lived about 146 million years ago (Figure 2.3.2), whereas the three-experimental species, M. persicae, B. brassicae and L. erysimi, used the current study were estimated to have diverged around 43 million years ago. The tree also shows the three well-known specialist aphid species,

B. brassicae, L. erysimi and A. pisum, diverged from each other approximately 20 million years ago.

49 | P a g e

Figure 2.3.1 Insect phylogenetic tree showing evolution of aphid species with bootstrap support. Red coloured nodes showing the calibration constraints used for the time tree analysis. Numbers on the branch showing bootstrap values.

50 | P a g e

Figure 2.3.2 Insect phylogenetic tree showing evolution of aphid species generalists vs specialists. A timetree inferred using the Reltime method [1] and the Dayhoff w/freq. model

[2]. The timetree was computed using 7 calibration constraints. The estimated log likelihood value is -457930.9164. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.9148)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 12.2456% sites). The analysis involved 25 amino acid sequences. All positions with less than 95% site coverage were eliminated. That is, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. Red dot points representing nodes with 95% confidence interval for estimated divergent time. There were a total of 27267 positions in the final dataset.

Evolutionary analyses were conducted in MEGA7. X-axis – million years

51 | P a g e

2.3.2 Host Range

I set out to quantify the ‘generalist’ versus ‘specialist’ descriptors given to the three

Brassicaceae-feeding aphid species studied in this chapter and their close relative, the model aphid species Acyrthosiphon pisum. This was done by examining the Host Plant Catalog of

Aphids (Holman 2009), which reports the 133 plant families used as host plants for almost all aphid species.

M. persicae was found on at least two species of 104 plant families with the most commonly used host species (species numbers are in brackets) belonging to Asteraceae (180), followed by Brassicaceae (132), Rosaceae (68), Solanaceae (68) and Fabaceae (53). B. brassicae was found on a total of 12 different plant families with an extreme preference for

Brassicaceae (163) as a host, followed by Recedaceae (5) and Solanaceae (2). L. erysimi was recorded only on 6 plant families, with the greatest number of host plants in Brassicaceae

(102), followed by Asteraceae (4) and Solanaceae (3). A. pisum was only associated with the

Fabaceae family, with minor exceptions. It was associated with 12 different plant families with a preference for Fabaceae (216) followed by Ulmaceae (12), Atreraceae (4) and Brassicaceae

(3) (Figure 2.3.3). This analysis shows that M. persicae, B. brassicae and L. erysimi preferred

Brassiccaceae family plants (Figure 2.3.3).

To understand if there is any pattern of host preference within Brassicaceae, all the plant species from this family associated with the above-mentioned aphid species were retrieved and categorised by . The number of plant species from each genus preferred by each aphid species was plotted (Figure 2.3.4) but no pattern was found for host species preference within Brassicaceae.

Given these data, there are a number of ways to map host feeding preferences on to the phylogenetic tree created above. The three Brassicaceae specialists do not form a monophyletic

52 | P a g e clade, but the generalist species, Myzus persicae, is an outgroup to a group of specialist species with varied host specialisations. Perhaps the generalist M. persicae exhibits the ancestral state of polyphagy, and all the specialist species show a derived feeding pattern. Host specialization may therefore have been associated with the loss of genes associated with feeding on diverse plants. A formal alternative is that M. persicae evolved to be a generalist, a process which may been associated with gene duplications and neofunctionalization in its lineage. To explore these differences and identify these possible gene loss/gain events it is important to study gene expression patterns in tissue samples from these species.

53 | P a g e

Host plant species associated with Aphid species from different plant families

Myzus persicae Brevicoryne brassicae Lipaphis erysimi Acyrthosiphon pisum

200 180 180 160 163 140 132 120 100 102 80

60 NUMBER OF SPECCIES OF NUMBER 40 39 30 27 20 23 26 15 14 13 9 6 11 8 9 11 10

0 0 02 02 04 0 01 0 04 05 01 0 01 0 0 05 0 14 03 01 04 01 0 0 3 04 02 02 0 02 04 02 024 015 01 0 01 05 01 0 02 0 03 01 0

Araceae

Apiaceae

Cistaceae

Buxaceae

Aizoaceae

Cactaceae

Aceraceae

Araliaceae

Cornaceae

Agavaceae

Caricaceae

Cannaceae

Corylaceae

Betulaceae

Asteraceae

Basellaceae

Adiantaceae

Begoniaceae

Brassicaceae

Celastraceae

Acanthaceae

Capparaceae

Crassulaceae

Apocynaceae

Alismataceae

Bignoniaceae

Cannabaceae

Buddlejaceae

Boraginaceae

Aquifoliaceae

Bromeliaceae

Caprifoliaceae

Casuarinaceae

Anacardiaceae

Balsaminaceae

Amaryllidaceae

Asclepiadaceae

Convolvulaceae

Amaranthaceae

Campanulaceae

Caryophyllaceae

Aristolochiaceae

Chenopodiaceae Cercidiphyllaceae HOST PLANT FAMILY

Figure 2.3.3 Host Plant species from different plant families associated with four major aphid species (M. persicae, B. brassicae, L. erysimi and

A. pisum)

54 | P a g e

Host Plant species associated with Aphids from brassicaceae family

Myzus persicae Brevicoryne brassicae Lipaphis erysimi Acyrthosiphon pisum

25

21 20

15

10 9 8 8 8 7 7 6 6 6 5 5 5 5 4 4 4 4 4 4 4 Numberplant of species from eachgenus 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Zilla

Isatis

Iberis

Eruca

Cakile

Draba

Neslia

Arabis

Bunias

Alliaria

Bularia pidium

Sinapis

Thlaspi

Lunaria

Peltaria

Rorippa

Crambe

Erucago

Brassica

Erucaria

Capsella

Alyssum

Cardaria

Clypeola Eutrema

Hesperis

Aubrieta

Savignya

Berteroa

Barbarea

Lepidium

Lobularia

Camelina

Teesdalia

Reboudia

Conringia

Biscutella

Myagrum

Diplotaxis Matthiola Raphanus

Erysimum

Senebiera

Heliophila

Rapistrum

Armoracia

Alyssoides

Achlimidia Malcolmia

Cochlearia

Hutchinsia Schimpera

Cardamine

Coronopus

Chorispora

Carrichtera

Nasturtium

Moricandia

Erucastrum

Descurainia

Arabidopsis

Sisymbrium

Hirschfeldia

Cheiranthus

Cardaminopsis Rhynchosinapis Name of the genus from Brassicaceae family

Figure 2.3.4 Host plant species from Brassicaceae family associated with four major aphid species (M. persicae, B. brassicae, L. erysimi and A. pisum)

55 | P a g e

2.3.3 Initial data screening

2.3.3.1 Tissue transcriptome data

The Hiseq 2500 based sequencing generated FASTQ files with paired-end data and the

FASTQ file analysis showed high confidence in the base calls (Phred scores above 30) and low adaptor contamination. 98% of the FASTQ sequence files were retained after Trimmomatic was used to clean the ends of reads, and the resulting sequence lengths averaged 99.86 nucleotides across all samples. The GC content ranged from 37 - 40% and there were no poor- quality sequences present in any of the samples. FastQC analysis showed failures in sequence duplication and k-mer content. This means the data had more than 50% non-unique sequences and thus failed the sequence duplication test, whereas the k-mer test fails with any imbalanced k-mer with a binomial p-value <10-5.

56 | P a g e

15M

15M

14M

14M

13M NumberReads of

13M

12M

Sample Name

Figure 2.3.5 The number of reads in the tissue samples of B. brassicae and L. erysimi. bcyt – bacteriocyte, gut – mid gut, whole – whole body, brevi – B. brassicae and lip – L. erysimi. R1 – forward read, R2 – reverse read, digit in between to underscores represents biological replicate number for each sample. The Y-axis shows millions of reads.

57 | P a g e

40M

35M

30M

25M

20M

Name Name reads of 15M

10M

5M

0

Sample Name

Figure 2.3.6 The number of reads of tissue samples of M. persicae. bcyt – bacteriocyte, mgut – mid gut, whole – whole body. R1 – forward read, R2 – reverse read, digit after tissue code represents biological replicate number for each sample.

The y-axis is in millions of reads.

58 | P a g e

All the paired FASTQ files for each sample were then mapped to the reference FASTA

file for further analysis. The mapping percentage ranged from 97% - 98% in all the samples.

After the BAM file generation, Corset analysis identified differentially expressed transcripts

from the total transcripts present in each species, M. persicae – 47096, B. brassicae – 26056,

L. erysimi – 24674. The EdgeR analysis was performed using a counts-per-million (CPM) cut-

off of one for at least three samples in each set of replicate samples. This threshold was

conservative and was chosen so as to reduce false positive calls of differentially expressed

genes.

The BUSCO analysis was performed to measure the completeness of the transcriptome.

The analysis clearly shows the presence of ~95% of single copy gene groups from Insecta in

all three species. This suggest that the transcriptomes presented in this study can be comparable

to certain extent.

Figure 2.3.7 BUSCO analysis showing presence of gene groups from transcriptome data of M. persicae, B. brassicae and L. erysimi. n: total number of BUSCO reference gene groups.

59 | P a g e

Total number of genes showing differential expression.

Table 2.3.1 shows the total number of genes showing differential gene expression in the tissue samples of each species with the WB sample as a reference.

Table 2.3.1 The total number of up and down regulated transcripts found in bacteriocyte and midgut tissue samples of M. persicae, B. brassicae and L. erysimi. The whole-body sample was used as a reference for comparison. Log fold change (LFC) cut off used: 2 LFC and -2 LFC.

Species Bacteriocytes Mid gut

Up Down Up Down

Myzus persicae 1553 4797 1887 5016

Brevicoryne brassicae 2639 3486 2217 4314

Lipaphis erysimi 1070 1538 1168 1843

The tissue analysis was performed individually for each species using the edgeR program. The Multidimensional Scaling (MDS) plot (Figure 2.3.8) shows the dissected tissues from all three species are biologically different from one another suggesting that there is

60 | P a g e

minimal tissue contamination. Moreover, the overlap of the technical replicates from each

tissue suggests that the differences between tissues are biologically relevant.

A B

C

Figure 2.3.8 MDS plot for green – whole body (WB), blue – bacteriocyte (Bcyt), red – mid-gut

(MG) tissues, from A) M. persicae, B) B. brassicae and C) L. erysimi

2.3.3.2 Aphid-host transcriptome

For the host-plant response transcriptomic experiments, single-end sequence data was

generated from the Hiseq 2500. Single-end sequencing was performed, as a reference genome

was available for M. persicae. The FASTQ file quality analysis was performed using FastQC

software and showed high base quality with Phred scores above 30 and low adaptor

contamination. The Trimmomatic analysis cleaned the FASTQ files, retaining on average 98%

of sequence data. The Trimmomatic processed FASTQ files were used for differential gene

expression analysis. The total number of reads from different samples varied from 578,270 to

61 | P a g e

13,262,702 (Figure 2.3.9). The GC content of the sequences in the FASTQ files ranges from

37 - 43%. The average sequence length was 100 nucleotides in all samples. There were no poor-quality sequences present in any of the samples. The FastQC analysis showed failures on the sequence duplication and k-mer content matrics.

62 | P a g e

14M

12M

10M

8M

6M Number Reads Number of 4M

2M

0

lup_C61.1 lup_C61.2 lup_C61.3 lup_C61.4 lup_C61.5

lup_LUP.1 lup_LUP.3 lup_LUP.4 lup_LUP.5

lup_Bun.1 lup_Bun.2 lup_Bun.3 lup_Bun.4 lup_Bun.5

rad_C61.1 rad_C61.2 rad_C61.3 rad_C61.4 rad_C61.5

rad_LUP.1 rad_LUP.2 rad_LUP.3 rad_LUP.4 rad_LUP.5

rad_Bun.1 rad_Bun.2 rad_Bun.3 rad_Bun.4 rad_Bun.5

cab_C61.1 cab_C61.2 cab_C61.3 cab_C61.4 cab_C61.5 can_C61.1 can_C61.2 can_C61.3 can_C61.4 can_C61.5 cau_C61.2 cau_C61.3 cau_C61.4 cau_C61.5

cab_LUP.1 cab_LUP.2 cab_LUP.3 cab_LUP.4 cab_LUP.5 can_LUP.1 can_LUP.2 can_LUP.3 can_LUP.4 can_LUP.5 cau_LUP.1 cau_LUP.2 cau_LUP.3 cau_LUP.4 cau_LUP.5

cab_Bun.1 cab_Bun.2 cab_Bun.3 cab_Bun.4 cab_Bun.5 can_Bun.1 can_Bun.2 can_Bun.3 can_Bun.4 can_Bun.5 cau_Bun.1 cau_Bun.2 cau_Bun.3 cau_Bun.4 cau_Bun.5

mus_C61.1 mus_C61.2 mus_C61.3 mus_C61.4 mus_C61.5

mus_LUP.1 mus_LUP.2 mus_LUP.3 mus_LUP.4 mus_LUP.5

mus_Bun.1 mus_Bun.2 mus_Bun.3 mus_Bun.4 mus_Bun.5 Sample Names

Figure 2.3.9 The number of reads from each M. persicae sample fed on different host plants. Strain names: Bun, LUP, C61, Host plants: cab – cabbage, can

– canola, cau – cauliflower, lup – lupin, mus – mustard, rad – radish. The digit in each name represents the biological replicate of the sample. The y-axis is in millions of reads.

63 | P a g e

For each of the three M. persicae strains, 4 or 5 replicate samples were generated from each of

the six plants. Genome-guided read mapping was performed using the M. persicae genome.

The expression values across all the samples clearly showed three different clusters

corresponding to the three different strains. This clear M. persicae strain effect contrasts with

the effects of host plant use, which at a broad level shows little uniformity between strains

(Figure 2.3.10).

Figure 2.3.10 MDS plot based on Voom expression values showing differential behaviour off three M. persicae strains when fed on different host plants. Host plants: cab-cabbage, cau- cauliflower, mus-mustard, can-canola, lup-lupin, rad-radish. M. persicae strains: Bun-

Bunbury strain, C61 – Victorian strain, LUP – Lupin strain.

64 | P a g e

2.4 Discussion

Previous studies of aphid phylogeny have used morphological characters (Heie et al.

1987), nucleotide sequences of single nuclear genes (Ortiz-Rivas and Martinez-Torres 2010), mitochondrial sequences (Von Dohlen 2000, Martinez-Torres et al. 2001) and sequences from

Buchnera aphidicola genomes, which are argued to evolve in concert with their aphid hosts

(Novakova et al. 2013). In this chapter, I tried to resolve aphid relationships using 304 gene ortholog sequences derived from the transcriptome datasets. While the taxa sampled in the aforementioned studies do not overlap, similarities and differences between their findings and my analysis emerge.

The species of most interest in this thesis are the three species that are Brassicaceae pests, and analysis of my gene set gives strong support for B. brassicae and L. erysimi being more closely related to each other than to M. persicae. Furthermore, these two Brassicaceae specialists form a trichotomy with A. pisum, a Fabaceae specialist. In contrast, the study of

Buchnera sequences by Novakova et al. (2013), resulted in four trees, depending on whether the genes were concatenated or treated separately, which codons were included in the analysis, whether amino acids or nucleotides were used, and whether Bayesian or maximum likelihood phylogenetic methods were employed. Each tree shows a different relationship between M. persicae, A. pisum and Lipaphis pseudobrassica ([Mp,[Ap, Lp]] or [Mp,Ap, Lp] or

[Lp,[Mp,Ap]] or [Ap,[Mp,Lp]]). Therefore, the phylogenetic signal from Novakova et al.

(2013) is not as strong as the tree based on 304 amino acid sequences which is presented here.

M. persicae, B. brassica, L. erysimi and A. pisum are all members of the tribe

‘Macrosiphoni’. Sitibion avenae and Macrosiphum euphorbia also belong to this taxon, and all six species form a monophyletic clade in my analyses. Macrosiphini has previously been analysed as paraphyletic (von Dohlen et al. 2006, Kim et al. 2011, Novakova et al. 2013); these

65 | P a g e contrasting results can be explained by the different species sampled. Support for the tree shown here comes from Ortiz-Rivas and Martinez-Torres (2010) who presented a phylogeny based on the EF1 gene which also retrived M. persicae as an outgroup of S. avenae and A. pisum.

There is, however, a difference between this study and those of Novakova et al. (2013) and Ortiz-Rivas and Martinez-Torres (2010), which report that the Aphidini tribe is monophyletic. Surprisingly, Aphidini appears paraphyletic in the current analysis with

Schizaphis graminum as an outgroup.

The species phylogeny can be used to address many outstanding questions about aphid biology, but the key focus of this thesis is the evolution of host plant use. A careful analysis of the Host Plant Catalogue of Aphids (Holman 2009) shows that B. brassicae and L. erysimi have clear preferences (with minor exceptions) for plants of the Brassicaceae family, whereas

M. persicae is found on host plants from numerous plant families (Figure 2.3.3). Thus, there is little doubt about the accuracy of ‘generalist’ versus ‘specialist’ categorization of these species.

The tissue transcriptome data generated from the various samples is of good quality.

The MDS (Multi-Dimensional Scaling) plot generated from the tissue samples showed gene expression differences between them, suggesting that the tissue dissection was successful.

Fewer reads were obtained from B. brassicae and L. erysimi than from M. persicae tissue samples. This is due to the separate RNA-seq run performed in a single lane for the 9 M. persicae tissue samples, in contrast to the running of 18 B. brassicae and L. erysimi tissue samples in the same single lane. Overall, the sequence reads achieved good coverage for all the aphid species. The Corset program was used for the analysis. Corset program uses an algorithm that analyses the difference between the abundance of a transcript and select for the differential expression analysis only if the transcript is showing possible differential expression

66 | P a g e signal and expresses above pre-set expression threshold. This strategy reduces the false positive differential expression signal. Corset also removes all the splice isoforms that does not show differential expression between samples.

The data generated from the RNA-seq aphid-host bioassays showed variation between samples (Figure 2.3.9). The generation of 4-5 replicates for each treatment will help in removing noise from the data. The bioassay was performed by randomly choosing M. persicae strains and their treatments. RNA extraction from all the samples was performed at the same time. During library preparation, all the samples were distributed randomly in the 96-well plate to avoid any positional effects. RNA sequencing was performed at the same time for all the samples in a single lane of an Illumina HiSeq instrument. These controls suggest that the observed treatment effects can likely be directly attributed to the respective treatments.

The MDS plot generated from the expression values from the aphid-host bioassay indicated differential responses of each M. persicae strain (Figure 2.3.10). Different host plants synthesize different phytochemical, which can induce various defensive mechanisms in feeding insects. Equally, compounds of nutrition value can induce special digestive mechanism to allow their utilization by insects. It was expected that all treatments with the same host plant species would produce similar expression patterns in the tested aphids, as despite their different geographical origins, they belong to the same species. Clustering according to host plant in the

MDS plot analysis was therefore predicted (Pearce et al. 2017). The obtained results were unexpected and showed tight clustering of the LUP and Bun strains expression profiles regardless of host plant treatment. The C61 strain showed a slightly relaxed clustering pattern, which can be attributed to occasional induction of the sexual cycle in this train. In contrast, the

LUP and Bun strains are known to be exclusively parthenogenic strains (Owain Edwards

CSIRO personal communication). The sexual life cycle results in a more diverse population than purely parthenogenetic strains. Hence, the different behaviour of different M. persicae

67 | P a g e strains was unexpected and surprising and showed new biological scenario in aphid biology.

The M. persicae strains used in this study were observed for the response to the different hosts.

Most of the host plants tested in this study are from Brassicaceae family containing similar type of secondary metabolite profile. These changes may keep the transcriptional response indifferent within same strain due to clonal genetic background. The sexual clone used in this study shows versatile response to different host plants suggesting that genetic background could be playing role. In natural condition these strains have chance to prefer one type of food over other based on availability hence the different behaviour of these strains may not suggest the directional selection. Although these strains do not show phenotypic plasticity towards host plant used in this study, it is necessary to confirm this by using more diverse types of host plants. Also, to confirm the effect of both plasticity and directional selection there is a need to perform extensive work on this subject.

In conclusion, this chapter has drawn on two transcriptomic datasets that were generated during in the course of the project. In one, data was obtained from three species under identical conditions, and in the second host-plant usage was examined. Combined these data provide an insight into the evolutionary relationship between the experimental aphid species.

In the subsequent chapters, these transcriptomic datasets are interrogated at gene and gene family level.

68 | P a g e

2.5 References

Altschul, S. F., et al. (1990). "Basic local alignment search tool." Journal of Molecular Biology 215(3): 403-410.

Anathakrishnan, R., et al. (2014). "Comparative gut transcriptome analysis reveals differences between virulent and avirulent Russian wheat aphids, Diuraphis noxia." -Plant Interactions 8(2): 79-88.

Andrews, S. (2010). "FastQC: a quality control tool for high throughput sequence data." from http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

Bandopadhyay, L., et al. (2013). "Identification of Genes Involved in Wild Crucifer Rorippa indica Resistance Response on Mustard Aphid Lipaphis erysimi Challenge." PLoS ONE 8(9): e73632.

Bolger, A. M., et al. (2014). "Trimmomatic: a flexible trimmer for Illumina sequence data." Bioinformatics 30(15): 2114-2120.

Bumgarner, R. (2013). "DNA microarrays: Types, Applications and their future." Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] 0 22: Unit-22.21.

Burke, G. R. and N. A. Moran (2011). "Responses of the pea aphid transcriptome to infection by facultative symbionts." Insect Mol Biol 20(3): 357-365.

Davidson, N. M. and A. Oshlack (2014). "Corset: enabling differential gene expression analysis for de novoassembled transcriptomes." Genome Biol 15(7): 410.

Dobin, A., et al. (2013). "STAR: ultrafast universal RNA-seq aligner." Bioinformatics 29(1): 15-21.

Eyres, I., et al. (2016). "Differential gene expression according to race and host plant in the pea aphid." Mol Ecol. 25.

Grabherr, M. G., et al. (2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome." Nat Biotech 29(7): 644-652.

Haas, B. J., et al. (2013). "De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis." Nat Protoc 8.

Hansen, A. K. and N. A. Moran (2011). "Aphid genome expression reveals host-symbiont cooperation in the production of amino acids." Proc Natl Acad Sci U S A 108(7): 2849-2854.

Heie, O., et al. (1987). "Paleontology and phylogeny." Aphids: Their Biology, Natural Enemies and Control, Vol. 2a: 367-391.

69 | P a g e

Holman, J. (2009). Host Plant Catalog of Aphids, Springer Netherlands.

Ji, R., et al. (2016). "Transcriptome Analysis of Green Peach Aphid (Myzus persicae): Insight into Developmental Regulation and Inter-Species Divergence." Front Plant Sci 7(1562).

Jones, P., et al. (2014). "InterProScan 5: genome-scale protein function classification." Bioinformatics 30(9): 1236-1240.

Kalyaanamoorthy, S., et al. (2017). "ModelFinder: fast model selection for accurate phylogenetic estimates." Nat Meth 14(6): 587-589.

Katoh, K. and D. M. Standley (2013). "MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability." Mol Biol Evol 30(4): 772-780.

Kim, H., et al. (2011). "Macroevolutionary Patterns in the Aphidini Aphids (Hemiptera: Aphididae): Diversification, Host Association, and Biogeographic Origins." PLoS ONE 6(9): e24749.

Kumar, S., et al. (2016). "MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets." Mol Biol Evol 33(7): 1870-1874.

Law, C. W., et al. (2014). "voom: precision weights unlock linear model analysis tools for RNA-seq read counts." Genome Biol 15(2): R29.

Li, H. and R. Durbin (2009). "Fast and accurate short read alignment with Burrows–Wheeler transform." Bioinformatics 25(14): 1754-1760.

Li, Z.-Q., et al. (2013). "Ecological Adaption Analysis of the Cotton Aphid (Aphis gossypii) in Different Phenotypes by Transcriptome Comparison." PLoS ONE 8(12): e83180.

Liu, S., et al. (2012). "Deep Sequencing of the Transcriptomes of Soybean Aphid and Associated Endosymbionts." PLoS ONE 7(9): e45161.

Martinez-Torres, D., et al. (2001). "Molecular systematics of aphids and their primary endosymbionts." Mol Phylogenet Evol 20(3): 437-449.

Misof, B., et al. (2014). "Phylogenomics resolves the timing and pattern of insect evolution." Science 346(6210): 763-767.

Morozova, O., et al. (2009). "Applications of new sequencing technologies for transcriptome analysis." Annual review of genomics and human genetics 10: 135-151.

Nakabachi, A., et al. (2005). "Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera." Proc Natl Acad Sci U S A 102(15): 5477-5482.

70 | P a g e

Nan, S., et al. (2016). "All 37 Mitochondrial Genes of Aphid Aphis craccivora Obtained from Transcriptome Sequencing: Implications for the Evolution of Aphids." PLoS ONE 11(6).

Nelson, N. J. (2001). "Microarrays have arrived: gene expression tool matures." Journal of the National Cancer Institute 93(7): 492-494.

Nguyen, L.-T., et al. (2015). "IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies." Mol Biol Evol 32(1): 268-274.

Novakova, E., et al. (2013). "Reconstructing the phylogeny of aphids (Hemiptera: Aphididae) using DNA of the obligate symbiont Buchnera aphidicola." Mol Phylogenet Evol 68(1): 42- 54.

Ortiz-Rivas, B. and D. Martinez-Torres (2010). "Combination of molecular data support the existence of three main lineages in the phylogeny of aphids (Hemiptera: Aphididae) and the basal position of the subfamily Lachninae." Mol Phylogenet Evol 55(1): 305-317.

Pearce, S. L., et al. (2017). "Genomic innovations, transcriptional plasticity and gene loss underlying the evolution and divergence of two highly polyphagous and invasive Helicoverpa pest species." BMC Biol 15(1): 63.

Pozhitkov, A. E., et al. (2007). "Oligonucleotide microarrays: widely applied—poorly understood." Briefings in Functional Genomics and Proteomics 6(2): 141-148.

Rane, R. V., et al. (2017). "Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes." BMC Genomics 18(1): 673.

Robinson, M., et al. (2010). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." Bioinformatics 26.

Schena, M., et al. (1995). "Quantitative monitoring of gene expression patterns with a complementary DNA microarray." Science: 467-467.

Schwarz, K. and M. Dayhoff (1979). Matrices for detecting distant relationships Atlas of protein sequences. M. Dayhoff, National Biomedical Research Foundation. 5.

Shang, F., et al. (2016). "Differential expression of genes in the alate and apterous morphs of the brown citrus aphid, Toxoptera citricida." Scientific Reports 6: 32099.

Suyama, M., et al. (2006). "PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments." Nucleic Acids Res. 34.

Tamura, K., et al. (2012). "Estimating divergence times in large molecular phylogenies." Proc Natl Acad Sci U S A. 109.

71 | P a g e

Tang, S., et al. (2015). "Identification of protein coding regions in RNA transcripts." Nucleic Acids Research 43(12): e78-e78.

Thorpe, P., et al. (2016). "Comparative transcriptomics and proteomics of three different aphid species identifies core and diverse effector sets." BMC Genomics. 17. von Bubnoff, A. (2008). "Next-Generation Sequencing: The Race Is On." Cell 132(5): 721- 723.

Von Dohlen, C. (2000). "Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alternation." Biological Journal of the Linnean Society 71(4): 689- 717. von Dohlen, C. D., et al. (2006). "A test of morphological hypotheses for tribal and subtribal relationships of (Insecta: Hemiptera: Aphididae) using DNA sequences." Mol Phylogenet Evol 38(2): 316-329.

Wang, Z., et al. (2009). "RNA-Seq: a revolutionary tool for transcriptomics." Nat Rev Genet 10(1): 57-63.

Zhang, M., et al. (2013). "Identifying potential RNAi targets in grain aphid (Sitobion avenae F.) based on transcriptome profiling of its alimentary canal after feeding on wheat plants." BMC Genomics 14(1): 560.

72 | P a g e

Chapter III

Potential of RNAi technology to control aphids and

its limitations

73 | P a g e

3.1 Foreword to manuscript

This manuscript was submitted to Scientific Reports.

Aphids are emerging as a primary pest for several economically important crop plants including brassica family crops like cabbage, cauliflower, canola and mustard. The insecticide mediated control has several flaws like resistant development, killing beneficial insects, deteriorating quality of soil, water and air, hazard to human and domestic health. This situation demands new technology that has fewer of these flaws. To address this problem, several institutes from India and Australia teamed up and designed a project, Caterpillar and

Aphids Resistance in Brassica (CARiB). The aim of the project was to develop a technology that will allow to develop transgenic brassica crops like cabbage, cauliflower, canola and mustard that will be resistant for caterpillar and aphid attack. The caterpillar controlling strategy had development of resistance based on Bt toxin technology whereas RNAi technology was proposed for controlling aphids. Following manuscript will explain all the efforts that went in to understanding the potential of RNAi technology to control aphids and will also mention possible limitations of the technology.

The manuscript that makes up this chapter mainly focuses on the work performed to evaluate the potential of RNA interference (RNAi) technology to control aphids. All the experiments were performed on Myzus persicae. In the first part of the manuscript Dr. Owain

Edwards group tested several gene targets by feeding aphids their respective dsRNA. The attempt was unsuccessful hence I carry forward the project by testing RNAi with several previously successful RNAi targets. Several different ways of dsRNA delivery were used – naked dsRNA, transfection reagent assisted dsRNA, dsRNA expressed in the recombinant bacteria, dsRNA expressed in transgenic plants. Bioassays of variable lengths were performed to see effects of the treatment on different generations of M. persicae. It was found that none of the treatment were effective. Hence further investigation was carried out to find the cause of

74 | P a g e the failure. Several different hypotheses including, dsRNA instability in the saliva, in the diet and in the gut, were tested. The dsRNA was found extremely unstable in the presence of dissected gut tissues. This observation was then supported by the tissue transcriptome analysis of the mid-gut and whole body of M. persicae. dsRNase enzyme were found to be enriched in the mid gut tissue suggesting that dsRNase could be a potential hurdle in the application of

RNAi technology in agriculture.

The manuscript was prepared and submitted to Scientific Reports. The editor reported that the three reviewers ‘found our work of interest’ even though the results we presented were generally negative. We were invited to revise and resubmit the manuscript after addressing their concerns. Their main concern and the one we have decided to address by performing further experiments (after the submission of this thesis) is whether a compelling enough case has been made that dsRNA is degraded by dsRNases in the aphid midgut. As this was a positive result that explained the outcomes of previous experiments neatly we perhaps over emphasized it and one reviewer questioned whether it should be in the title of our manuscript. That dsRNases could be affecting dsRNA efficiency was suggested to me by scientists I met at the

International Entomology Congress (where I presented much of this work) and there is also some support for it in the literature (cited below). In the experiments described below I show that aphid midguts rapidly degrade dsRNA and that genes that are orthologous to dsRNases from other species are abundant in the aphid midgut. However, to make the case more compelling we intend to repeat the dsRNA degradation experiments with a battery of dsRNAi inhibitors. One reviewer suggested we use RNAi against RNAses, and we may try that as well.

75 | P a g e

3.2 Manuscript

Title:

Midgut RNases limit the efficacy of orally delivered RNAi in the Green Peach Aphid, Myzus persicae.

Authors: Amol Bharat Ghodke1, Robert Trygve Good1, John Golz1, Derek A. Russell2, Owain Edwards3, Charles

Robin1*

1. School of BioSciences, The University of Melbourne, Australia 2. The Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Australia 3. Ecosystems Science, CSIRO, Perth, Australia

* Corresponding author

Prepared for submission to Scientific Reports

76 | P a g e

Abstract

Myzus persicae is a major pest of many crops including canola and brassica vegetables, partly because it vectors plant viruses. Previously it has been reported that double-stranded RNA delivered to aphids by injection, artificial diet or transgenic plants have knocked down target genes and caused phenotypic effects. While these studies suggest that RNA interference

(RNAi) might be used to suppress aphid populations, none have shown effects sufficient for field control. The current study analyses the efficacy of dsRNA on previously reported gene- targets on Australian Myzus persicae strains. No silencing effect was observed when dsRNA was delivered in artificial diet with or without transfection reagents. dsRNA produced in planta also failed to induce significant RNAi in M. persicae. Transcriptome analyses of the midgut suggested other potential targets including the Ferritin heavy chain transcripts, but they also could not be knocked down. Here we show that dsRNA is rapidly degraded by midgut secretions of Myzus persicae. Analysis of the transcriptome of the M. persicae midgut revealed that an ortholog of RNAses from other insects was abundant. This work suggests that RNAses are a significant barrier to adopting RNAi-based pest control strategies.

77 | P a g e

Introduction

In 2007, two independent studies reported the successful use of dietary-delivered RNA interference (RNAi) to impair pest insect growth(Baum et al. 2007, Mao et al. 2007). These studies held out the prospect of a wealth of new possible transgene-based insecticides that could be used to make numerous crops resistant to specific insect pests(Gordon and Waterhouse

2007). As RNAi works via sequence identity, transgenes can be designed to selectively target pest species without affecting other organisms in the environment(Fire et al. 1998, Whyard et al. 2009, Good et al. 2016). One of the studies reported that beetles (Diabrotica virgifera virgifera) fed an artificial diet containing double-stranded RNA (dsRNA) had their target genes silenced in a sequence-dependent fashion and their development disrupted (Baum et al. 2007).

It also showed that transgenic plants expressing dsRNA were less damaged by beetles than control plants. The other study reported that plants producing a particular dsRNA significantly reduced the growth of Helicoverpa armigera caterpillars (Lepidoptera) reared upon them(Mao et al. 2007). In the ten years that have followed, hundreds of studies have examined the utility of dietary RNAi against various insect pests, with widely varying success. While RNAi seems to work well in some species (e.g. some beetles) the success is inconsistent in other species(Huvenne and Smagghe 2010, Scott et al. 2013, Joga et al. 2016, Shukla et al. 2016) .

For example, a review of more than 150 experiments using dsRNA found that RNAi worked much better in the family Saturniidae than other Lepidoptera and that effectiveness depended on dsRNA concentration and the particular genes being targeted(Terenius et al. 2011).

The first publication of successful RNAi in aphids reported that microinjecting short interfering

RNA’s (siRNAs) into adult pea aphids (Acyrthrosiphon pisum), knocked down the transcripts of C002, a transcript that is usually highly expressed in the salivary glands(Mutti et al. 2006).

The transcript level of the targeted gene had dropped after three days, and ultimately led to insect by eight days, compared to 16 days for aphids injected with a control siRNA

78 | P a g e sequence. Further studies reported that C002 is mainly expressed in a subset of cells within the salivary gland and that reduced C002 levels prevent the aphid stylus from reaching the phloem and initiating feeding(Mutti et al. 2008). Microinjection of dsRNA into pea aphids, like siRNAs, also reduced transcript abundance of target genes calreticulin and cathepsin by 41% and 35% respectively, however, in this case there was no phenotype associated with these knock-downs(Jaubert-Possamai et al. 2007).

Dietary dsRNA has been delivered to aphids via artificial diet droplets placed between two sheets of parafilm(Whyard et al. 2009, Sapountzis et al. 2014) . Using this approach, pea aphids fed dsRNA homologous to vATPase (a gene successfully used in the initial beetle study(Baum et al. 2007)) exhibited ~32% reduction in transcript abundance after three days and remarkably, achieved greater than 50% aphid mortality5. Another study reported that dsRNA directed to aquaporin transcript via artificial diet sachets reduced transcripts of this gene by more than

50%(Shakesby et al. 2009). Although this treatment did not affect aphid weight, a reduction in the osmotic pressure of the hemolymph was observed - a specific phenotype expected from the knockdown of this gene.

Some strains of the green peach aphid, Myzus persicae, can feed on Nicotinia benthamiana, a plant for which Agrobacterium-mediated transient transfection protocols are well established.

Leaf discs can be transiently transfected with vectors that express dsRNA and then fed to aphids. Two genes targeted by this approach, C002 and Rack1 (the latter of which encodes a gut protein) displayed 30-40% reduction in transcript level and resulted in a moderate reduction in the number of nymphs produced by treated aphids(Pitino et al. 2011). Another study used transient transfection of N. benthamiana discs to knockdown three aphid genes (aquaporin, sucrase and a sugar transporter) individually and in combination and found that the combined treatment yielded a greater effect on the hemolymph osmotic pressure and body weight than the individual dsRNA treatments(Tzin et al. 2015).

79 | P a g e

Delivery of dietary dsRNA to aphids has also been achieved using plants that express dsRNA transgenes stably integrated into their genomes(Terenius et al. 2011, Mathers et al. 2017). M. persicae fed on Arabidopsis thaliana plants expressing dsRNA directed against C002 and

Rack1, displayed a 50-60% knockdown of the their transcripts(Pitino et al. 2011). In contrast to the earlier pea aphid microinjection study on C002(Huvenne and Smagghe 2010), there was no observed mortality, but there was a significant effect on aphid fecundity(Pitino et al. 2011).

This fecundity effect elicited by C002 dsRNA has been reported in a second study by the same research group but was only detected when the aphids were exposed to transformed plants over several generations(Shakesby et al. 2009).

Given the apparent success of dsRNAs to induce gene knockdowns, we set out to develop dsRNA transgenes that target aphid pests of brassica crops including cabbage, cauliflower and canola. Three closely related species of aphids - M. persicae, Brevicoryne brassicae, and

Lipaphis erysimi - damage these crops directly and vector plant viruses. If these species could be controlled through plant-delivered RNAi, farmers of brassicaceous crops might reduce their use of conventional insecticides, which are not only expensive but pollute the environment and threaten human health. While the studies listed above identified genes that could be targeted, we initially sought novel candidates with the hope that we might elicit a greater, and therefore more effective, knockdown and hence a stronger phenotypic response. We sought genes with the following criteria:

(1) those that were likely to elicit a strong phenotypic effect even if the knockdown was

only partial (30—80% was the range of target gene knockdown reported by most

previous studies).

(2) were not too highly expressed so that targeted knockdown could have the greatest effect

per unit of dietary dsRNA.

80 | P a g e

(3) were unlikely to have a regulatory feedback mechanism that could counteract the

effects of RNAi.

(4) were expressed in tissues where RNAi was most likely to generate the strongest

knockdown (i.e. the gut which is proximal to the site of orally delivered dsRNA entry).

(5) possessed nucleotide sequences that enable the design of dsRNAs specific to particular

pest species or groups.

Our initial studies focussed on (i) genes whose knockdown was known to exhibit dosage dependent phenotypes in model insects and (ii) generating a transcriptome of the M. persicae midgut so that novel targets could be identified. However, following our failure to generate phenotypes or significant transcript knockdown, we resorted to targeting genes successfully used by others. Here we report that dietary delivered dsRNAs have little or no effect on target gene abundance, even when different dsRNA sources (different genes produced by in vitro transcription or synthesized commercially) and delivered via multiple methods (artificial diet, artificial diet with transfection reagents, transgenic A. thaliana plants) and assessed multiple phenotypes (including digital PCR and quantitative real time PCR). We also report that M. persicae guts have high RNase activity and that transcripts orthologous to silkworm dsRNases are abundantly expressed in their guts. We cannot fully reconcile our results with those previously published and warn that dietary delivery of dsRNA to aphids, particularly M. persicae, yields variable results unlike the situation in some Coleoptera species where dietary dsRNA works robustly(Li et al. 2013, Li et al. 2016).

81 | P a g e

Results

Preliminary screens of novel dsRNA targets only marginally affect aphid weight

Given the previous reports that dietary dsRNAs may only have modest effects against aphids (Pitino et al. 2011, Bhatia et al. 2012, Mao and Zeng 2012, Guo et al. 2014, Sapountzis et al. 2014, Mathers et al. 2017), and knowing that dsRNA rarely knocks down 100% of the transcripts of the targeted gene(Mutti et al. 2006, Mutti et al. 2008), we fed M. persicae various dsRNAs homologous to genes known to have dose-dependent phenotypes in other organisms.

In particular, ribosomal proteins (RpS13, RpS5a, Rp19a) that produce minute phenotypes in

Drosophila, and genes involved in endosome recycling which have proven to be effective targets in beetles (Katanin60, Snf7, Vps2(Baum et al. 2007)). These dsRNAs were generated by first amplifying the gene from M. persicae genomic DNA and then using the product for in vitro transcription, using the available M. persicae sequence data to design the PCR primers(2017). dsRNAs at a concentration of 7.5ng/uL were then added to artificial diet and placed between two layers of parafilm, thereby creating sachets for aphids to feed upon. Each sachet was sufficient to maintain five aphids for at least five days. For each of these dsRNAs, we performed ten replicate experiments (i.e. total of 50 aphids per dsRNA) and then weighed the aphids. We saw no significant difference between these treatments and the negative controls in most comparisons (Fig. 1a & 1b). While there was a significant difference observed between the control diet and Rp19a in the direction consistent with an RNAi effect (Students t-test, p =

0.02), the other significant effect (control vs Katanin 60) was in the other direction (p=0.03).

We therefore sought to feed M. persicae dsRNAs homologous to those genes that have been shown to affect aphids in previous studies.

82 | P a g e

Mp_vATPase-like_dsRNA fed to M. persicae in the artificial diet does not affect aphid mortality, size or alter target gene abundance, even when delivered with transfection agents.

The M. persicae ortholog of the pea aphid vATPase, which was reportedly knocked down by dietary delivery of dsRNA(Whyard et al. 2009), was identified using the cited 684nt mRNA sequence (XM_00194689) as a BLAST query against the M. persicae genome(2017).

This identified a gene encoding a 215-amino acid protein spread across three exons

(MYZPE13164_0_v1.0_000035590.1). Phylogenetic analysis confirmed the orthologous relationship between this M. persicae sequence and the pea aphid (Acyrthrosiphon pisum) sequence. However it also revealed that this was a paralog not an ortholog of the vATPase targeted by dsRNA experiments in other species(Whyard et al. 2009) and so from hereon we refer to it as vATPase-like (Fig. S1).

A 185nt vATPase-like dsRNA sequence, produced by in vitro transcription from a plasmid, was combined with artificial aphid diet, placed between two layers of parafilm and fed to aphids. After 12 days, the mortality of the aphids placed on the food, the number of offspring they produced and their size was assessed. These data were compared to a control group that had been fed 700nt dsRNA derived from a Green Fluorescent Protein (GFP) gene.

No significant difference was observed between treatment and control for insect size (p=0.24;

Fig.2b), mortality (p=0.39) or fecundity (p=0.81). Furthermore, we did not detect any knockdown of the vATPase-like transcript (p= 0.96; Fig. 2a).

Transfection reagents have been used to improve the knockdown of target genes in insects(Cancino-Rodezno et al. 2010, Singh et al. 2013, Murphy et al. 2016). We compared four commercially available transfection products. Two were lethal to aphids at the doses we used (Ribojuice and Fugene). While aphids survived having Lipofectamine and

Happyfect in their diets, they were marginally smaller (10-13% for the Lipofectamine treatment and 23% for the Happyfect treatment) than those without transfection reagent in the

83 | P a g e diet irrespective of the dsRNA used. There was no additional effect on the size of the aphids resulting from the presence of the candidate Mp_vATPase-like dsRNA added to their diet

(Happyfect: p=0.38, Lipofectamine: p=0.94). Furthermore, an ANOVA of the Mp-vATPase- like transcript abundance revealed no silencing of the gene (ANOVA: p=0.96; Fig. 2a).

Mp_C002_dsRNA fed to M. persicae in the artificial diet does not knockdown the target or affect aphid fecundity even when the experiment spans two generations.

We designed a dsRNA corresponding to 432 nt of the M. persicae C002 gene reported in the literatureColeman et al. (2015) and had it commercially synthesized (Genolution Inc.).

Aphids were fed for 11 days on diet containing this dsRNA, or a negative control dsRNA

(designed against vATPase from Drosophila melanogaster), which had no sequence similarity to M. persicae genes. Nymphs born from these aphids were transferred to new diet sachets maintaining the same treatment and reared for another 12 days. Thus, the second-generation cohort would have developed within mothers that were reared on the dsRNA. The presence of

Mp_C002_dsRNA did not reduce fecundity compared with the results using the control diet, either in the first (p=0.37) or the second (p=0.77) generation (Fig. S2). As C002 knockdown has been shown to contribute to mortality by disrupting the aphid’s ability to find and access phloem with its stylus, perhaps phenotypic differences between treatment and control in our artificial diet experiments would not be expected(Mutti et al. 2006, Pitino et al. 2011).

However, we did not detect a significant knockdown in the Mp_C002_dsRNA transcript expression relative to the control treatment of Dm_vATPase_dsRNA (Student’s t-test, p= 0.19, n=4).

Arabidopsis plants expressing anti-aphid dsRNA have only a minor impact on aphid numbers even in multi-generation experiments.

Ultimately, if field control of aphids is to be elicited by dietary dsRNA then it will most likely be delivered via transgenic plants. We transformed Arabidopsis thaliana plants of the

84 | P a g e

Columbia ecotype with two transgene constructs each placed downstream of the strong CaMV

35S promoter: (i) one designed to express exactly the same Mp_C002_dsRNA sequence as reported previouslyColeman et al. (2015), and (ii) one designed to express a 747nt dsRNA with sequence similarity to four M. persicae genes (hereafter called ‘bestbet’). The BestBet construct

(Fig. S3) contained a concatemer of ~100 bp fragments from four M. persicae genes likely to elicit RNA silencing based on the literature (148nt of C002(Mutti et al. 2006, Coleman et al.

2015), 132nt of vATPase-like(Whyard et al. 2009), 154nt of Acetylcholine esterase(Guo et al.

2014), and 136nt of the snf7 ortholog (Baum et al. 2007)). Inverted copies of the combined sequence were cloned on either side of a plant intron, so that a single 400nt hairpin RNA would be produced in plants.

Ten, 1-2 day old aphid nymphs were placed onto the plants of each genotype. After eleven days, ten 1-2 day old nymphs were collected from plants and transferred to fresh plants of the same genotype to start the second-generation cohort. The number of aphids present per plant were counted after 23 days of feeding for both the initial and the second-generation cohorts (Fig. 3a & 3b). While insect numbers on the Mp_C002_dsRNA plants did not differ significantly from the empty vector controls in either generation, the BestBet plants did show a reduction in aphid numbers at a marginal significance level in the second generation

(p=0.025; Fig. 3b).

We used digital PCR to assess the extent of knockdown. Fig. 3c depicts the extent of knockdown of the C002 gene in aphids reared for one and two generations on each of the plant genotypes. In the first generation, the variance in the ‘empty vector’ measurement in the first generation is large and no significant effect of plant genotype is observed between any of the treatments. There was no significant knockdown of C002 observed in the second generation either. The levels of the other three genes targeted in the BestBet construct found that only the

ACE gene had lower expression than the control and that difference was not statistically

85 | P a g e significant (MpC002 - p= 0.57, MpSnf7 - p=0.82, MpACE - p= 0.39, MpvATPase_like - p=

0.47, n =3; Fig.3d).

Ferritin satisfies the criteria as a good target gene for dsRNA yet Mp_FeHC_dsRNA fed to M. persicae in the artificial diet does not knockdown the target or affect aphid number.

MpC002 is a salivary gland protein, MpACE functions at neural synapses, MpvATPase- like is not well characterized and the other genes we examined are expressed broadly across tissues. We therefore sought potential novel dsRNA targets that show aphid midgut expression, as the cells of the midgut may be most accessible to dietary dsRNA and thus not require systemic spread of dsRNA following feeding. RNA-seq was performed on triplicate samples from M. persicae midguts and triplicate samples from whole aphids. An average of 29,374,419 reads was obtained from the midgut samples, and 33,489,745 reads from the whole-body samples. De novo transcriptome analysis was performed using the Corset pipeline(Davidson and Oshlack 2014). The gene that attracted our attention from this analysis was that encoding the ferritin heavy chain (Mp_FeHC) as it was enriched in the midgut (2.4x) relative to the whole body and was distinct from the ferritin gene of non-aphid species. This second point was confirmed through the application of the dsRNA taxa-specific design tool Offtarget finder(Good et al. 2016) that showed that there are few 21mers in the M. persicae ferritin heavy chain gene found in other invertebrates, with the exception of aphids and their relatives (Fig.

4a). By these criteria, dsRNA targeting Mp_FeHC transcripts could easily be designed to be aphid-specific.

Another reason to focus on ferritin as an RNAi target is that aphids may be sensitive to changes in its abundance. Iron is essential and yet toxic at high doses and ferritin is thought to play a key role in its homeostasis. Whereas in mammals, ferritin is considered an iron storage protein, in insects it is believed to be involved in iron transport(Pham and Winzerling 2010).

For M. persicae, early studies developing artificial diets revealed that trace amounts of dietary

86 | P a g e iron are indeed essential(Dadd et al. 1967). Our experiments confirm that if we leave iron out of the artificial diet, aphids halve in size (Fig. 4b. p=0.0001) and do not reproduce. The ferritin levels in aphids does not change on diet lacking iron (p=0.72, n=3, ten aphids per replicate).

Furthermore, if iron levels in the artificial diet are elevated four-fold, it becomes toxic to aphids within two days.

Therefore, we tried to manipulate ferritin levels by feeding aphids commercially synthesized Mp_FeHC_dsRNA in artificial diet. However, no effect was observed on ferritin heavy chain transcript abundance (p=0.52, n=3) and while there was a reduction in aphid fecundity it was not statistically significant (p=0.10, n=3, Fig. 4c).

The transcript abundance of dsRNA machinery in M. persicae

The transcriptome sequences afforded us the ability to confirm that transcripts corresponding to the RNAi machinery were present in the gut (and whole body) of the M. persicae strain used in our experiments. All the expected genes were present and most were expressed at about the same levels in the midgut as the whole body (Table 1). We include in this analysis genes implicated in the spread of dsRNA from cell to cell. While SID1 is important in the systemic spread of RNAi in C. elegans, it may not play a role in many insects either because it is not present in the genome (e.g. Drosophila melanogaster) or because other proteins, such as scavenger receptors associated with clathrin-dependent endocytosis may perform the function. The latter seem to take on the role in the desert locus, Shistocerca gregaria, which shows a strong RNAi response(Wynant et al. 2014). We note that in the M. persicae transcriptomes analysed here, the transcripts of the scavenger receptors are much lower in the gut relative to the rest of the body (Table 1).

87 | P a g e dsRNAse transcripts are abundant in the aphid midguts and dsRNA activity is high in midgut extracts

A previous study reporting the failure of RNAi treatment in pea aphids, Acyrthrosiphon pisum, found that aphid saliva was capable of degrading dsRNA(Christiaens et al. 2014). We repeated same experiment with M. persicae. This showed that there is no significant effect of saliva on the integrity of dsRNA after 4 days incubation (Fig. 5b). However, we were motivated to examine dsRNase activity in aphid midguts. Five midguts were dissected, rinsed in PBS and immediately transferred to fresh RNase-free water containing double stranded RNA at a concentration of 50ng/µL. The samples were vortexed briefly and incubated at room temperature. Analysis of dsRNA concentration after 15, 30 and 60 minutes of incubations showed that the dsRNA was rapidly degraded (Fig. 5c).

The enzymes responsible for dsRNA degradation have been identified at the sequence level in Bombyx mori(Arimatsu et al. 2007). A search for M. persicae homologs to this B. mori sequence identified three transcript clusters with high identity, each of which is highly abundant in the midgut and enriched in the midgut relative to the whole-body samples (Table

2; Fig. 5a). These map to a single gene in the M. persicae genome (Official Myzus assembly

ID - MYZPE13164_G006_v1.0_000023850.1) and this gene is therefore a candidate to encode the dsRNase enzyme that degrades dietary dsRNA and may prevent the effective application of environmental RNA applications to this species of aphid.

88 | P a g e

Discussion

In contrast to a previous study on pea aphids (Acyrthrosiphon pisum) we were unable to elicit gene knockdown by feeding aphids vATPase-like dsRNA in an artificial diet, even though we were using higher concentrations of dsRNA (6 ng/µl in the pea aphid(Whyard et al.

2009) versus 37.5 ng/µl in the present study). Similarly, we were unable to elicit an effect by supplementing the dsRNA with various transfection reagents. Our experiments were not a strict replication of the pea aphid study because the study aphids were different. So, it is possible that ineffectiveness of dsRNA towards M. persicae might reflect species differences.

The function of vATPase-like, is not well understood, but it is expressed at low levels in the whole body and gut transcriptome datasets that we generated and confirmed by qRT-PCR and digital PCR assays. It is possible that its role is more critical in A. pisum than M. persicae.

The C002 gene, which encodes a salivary protein, has been targeted by dsRNA in both

A. pisum and M. persicae. In A. pisum, injected siRNA interferes with the ability of the aphids to feed on plants and consequently results in mortality(Mutti et al. 2006). In M. persicae, in planta delivered dsRNA against C002 did not impact upon mortality but was reported to significantly reduce reproductive output. The impact on fecundity has been reported in two studies from the same research group, with the second study only observing an impact in a multigenerational study. Thus, it has been attributed to a transgenerational effect where nymphs, born to aphids fed upon dsRNA plants, were affected by maternally ingested dsRNA(Coleman et al. 2015). In the experiments described here, Mp_C002_dsRNA delivered through artificial diet did not affect reproductive output even when it spanned two generations of feeding. We also saw no decrease in reproductive output of aphids reared for two generations on transgenic A. thaliana plants (selected to produce high Mp_C002_dsRNA levels) relative to control plants of the same genetic background. We also did not see significant knockdown of

89 | P a g e

C002 transcripts in aphids reared on these plants even when we assessed this by the potentially more accurate digital PCR technique.

We did observe that feeding on our multigene composite dsRNA ‘Bestbet’ plants did produce significantly reduced numbers of aphids relative to those fed on ‘empty vector’ control plants, at a marginal level (p=0.025). Given the number of statistical tests performed this may be a type I error (occasionally we expect a number less than the significance threshold even if the null hypothesis is true) Our analysis of the transcripts of the four genes targeted by this construct revealed that those from MpACE, which encodes acetylcholine esterase, was the only one with reduced abundance albeit not significant. The expression level of all four targets were analysed from the tissue transcriptome data. The only difference that is evident between these transcripts is the low level of MpACE as compared to other transcripts. This could also suggest that perhaps genes that are expressed at low level could serve as a better target. But this needs to be tested with different targets and different dose levels (Mulot et al. 2016).

We also selected a gut enriched target gene for RNAi study. Such a gene should give incoming dsRNA access to the target gene immediately after entering the digestive system.

The midgut specific ferritin heavy chain transcript was selected from the tissue transcriptome study based on the five selection criteria described in the introduction (i.e. partial knockdown might yield a changed phenotype, it is not overly expressed, feedback mechanisms not observed, it was expressed in the midgut, and containing aphid-specific sequences). In D. melanogaster, knockdown of ferritin resulted in the iron deficiency, iron accumulation in the gut and neuronal damage(Tang and Zhou 2013). The artificially synthesized

Mp_FeHC_dsRNA did not show any effect on M. persicae survival, size, fecundity or expression level of the Mp_FeHC (Fig. 4c) suggesting that dsRNA-mediated gene silencing does not elicit a silencing response within the gut tissue or digestive system of M. persicae.

90 | P a g e

In general, the evidence that dietary delivered dsRNA can elicit a robust response in M. persicae is far from compelling in our experimental analyses. We have shown that transcripts for all the proteins known to be required for RNAi to work are expressed in the midguts of M. persicae, although the scavenger receptors that may play a role in the systemic spreading of dsRNA in other organisms, such as the desert locust(Wynant et al. 2014) are reduced in the gut relative to the whole body.

Perhaps an explanation for the failure of dsRNA to elicit knockdown effects comes in the apparent abundance of dsRNases in the gut of the aphids we have studied. We demonstrated that dsRNA incubated with aphid guts is quickly degraded. Such an effect has also been observed in the desert locust, where injected dsRNA reportedly works robustly, but dsRNA delivered by feeding is ineffective(Wynant et al. 2014). Furthermore, a recent study reported that dsRNA was completely degraded when incubated with midgut homogenate of the cotton boll weevil (Anthonomous grandis)(Gillet et al. 2017). We identified dsRNases in the gut transcriptome that were homologous to dsRNases that had been biochemically characterized in the silk moth, Bombyx mori. Moth, locust, beetle and aphid RNases all cluster in phylogenies relative to outgroups and support the proposition that these sequences are indeed responsible for the rapid disappearance of dsRNA when incubated with the gut.

How then do we interpret the results of previous studies showing effective RNAi in aphids? If gut RNases are degrading dsRNA, there is no issue with respect to the microinjection studies because dsRNA is not exposed to the gut dsRNases. The feeding studies, via artificial diet, or in planta, are less easy to explain. Perhaps however, work on another species of locust

(Locusta migratoria) provides an answer(Sugahara et al. 2017) . It was recently reported that geographically defined strains of locust differed in their susceptibility to dsRNA mediated

RNAi. Furthermore, by crossing different strains the authors present evidence that the variation in susceptibility has a genetic basis and that the resistant form was probably dominant to the

91 | P a g e susceptible forms. So, it is possible that the M. persicae strains used in this study (Bona vista and c61), that were collected in Australia, have genetic variants that make them more resistant to RNAi than M. persicae strains used by other researchers. Another argument for strain to strain variation was posed by Swevers et al. (2013) who suggest that some insects could harbour viruses that interfere with the RNAi process.

The possibility that RNAi resistant forms already exist at high frequencies in field populations of pests, suggest that dietary delivered dsRNA is unlikely to be an effective pest control method, unless more elaborate approaches to eliciting RNAi are used. Variation in dsRNA susceptibility between aphid strains warrants further study because it may suggest strategies to make RNAi generally effective.

Finally, what are the prospects for the dsRNA technology in aphids? If there are some strains of M. persicae that are susceptible to dsRNA then there is still hope that RNAi could be the much-needed functional genomics tool, as once identified, these strains could become the subject of such studies. In contrast, given strains already exist that are recalcitrant to the effects of RNAi then naked dsRNA is not likely to be useful as an insecticide as these strains will be rapidly selected for in any field application, especially given the parthenogenic nature of aphid biology. Thus, although microinjection was outside the scope of this study, because the aim was to develop a pest control strategy applicable to the field, there are now two motivations to reconsider microinjection studies. Firstly, microinjection could be used to determine the function of genes in a laboratory setting. Secondly since dsRNA injection is in to the hemocoel where there is less dsRNase activity then it will provide a test for the RNAses hypothesis.

Further experiments that should now be considered are to repeat feeding experiments with a large number of replicates because that could potentially identify significant differences with small effect sizes. Furthermore, understanding the effect of RNAi treatment on individual

92 | P a g e insects instead of group of insects could also be useful to determine the variation of RNAi resistance within single population. In particular a factor that needs to be carefully explored is the way in which RNAi potency changes with dsRNA doses consumed by individual aphids

(Bilgi et al. 2017). So, dye can be included into the food sachets and aphids can be sorted by the amount of dye (and therefore dsRNA) in their diet.

Recent work suggests that there is some hope that dsRNA can be protected from

RNAses with a protein that also helps transport it into insect cells(Gillet et al. 2017).

Transgenic plants that have ribonucleoprotein particles (consisting of dsRNA and these proteins that protect and direct dsRNA) would need to be directed to the phloem of plants in a way that does not interfere with normal plant physiological processes. Alternatively, perhaps there are prospects for ribonucleoprotein particle sprays that could be absorbed by the aphid cuticle and thereby minimize the effect of midgut RNAses.

Methods

Insect Rearing

Apart from the initial experiments reported in Fig. 1a & 1b, which used the C61 strain, the experiments were performed on a M. persicae strain collected by Dr. Paul Umina from

Bona Vista Rd Warragul, VIC, AU (38°13'01.6"S 145°58'19.5"E; Collection date: 22/03/2012,

Host plant: Raphanus raphanistrum). Aphids were maintained at 20˚C with 12h/12h, dark/light period on Radish plants (Raphanus sativus). At the start of these experiments, the aphid colony was established from a single aphid to avoid natural diversity within the population.

NextGen sequencing and new target selection

After feeding on the dsRNA-containing diet, dsRNAs enter the digestive system (gut) of the insect. This motivated us to identify genes that are differentially expressed in the midgut (MG) of the insect as compared to the whole body (WB). The midgut of about 1000 M. persicae (2nd

93 | P a g e and 3rd instar) were dissected out over several days. Every day, dissected guts were immediately transferred to fresh ice-cold Trizol® solution and stored at -80°C until further processing. On the day of RNA isolation, dissected samples were randomly combined into three pools. Similarly, three groups of ten 2nd and 3rd instar, whole body M. persicae were placed in separate tubes. RNA isolation was performed using a DirectZol® RNA isolation kit.

The quality of isolated RNA was checked on an agarose gel and quantified using Qbit™ system. A 3µg/sample of RNA was supplied to the Australian Genome Research Facility

(AGRF) company for library preparation using poly-A selection. The sample libraries were pooled and sequenced on HiSeq 2500 system with 100bp read length.

Differential gene expression analysis was performed using the standard EdgeR analysis pipeline using the following software – FastQC, Trimmomatic, STAR and EdgeR. FastQC and

EdgeR was used with default settings. Supplementary Data-2 outlines the modified commands used for STAR and Trimmomatic. A list of genes was generated showing differentially expressed transcripts in the midgut of M. persicae compared to whole-body samples.

Effect of Iron (Fe) on development of M. persicae

Artificial aphid diet(Kunkel 1977, Prosser and Douglas 1992) containing three different concentrations of iron (FeCl3.6H2O) were fed to the M. persicae: no iron in the diet, a normal diet with recommended iron concentration, and four times more iron in the diet. Three replicates of ten, 1-2 day old, nymphs were fed on the respective diets for 12 days. The insects were observed for phenotypic changes, fecundity and mortality. dsRNA preparation

dsRNA was generated using two different methods. The M. persicae orthologs of

Drosophila melanogaster Rps13, Rps5a, Snf7, Katanin60, Vpd2, and Rp19a were identified by

BLAST against the available databases and specific PCR were designed and amplified.

94 | P a g e dsRNAs from these sequences and from dsMPvATPase-like and dsGFP were synthesized artificially for diet incorporation using MEGAscript® T7 in vitro transcription kit (Ambion®) for artificial diet incorporation. The MP_vATPase-like_dsRNA was 185bp long. It was generated using PCR with oligonucleotide primers (Supplementary Data 1).

Mp_C002_dsRNA and Dm_vATPase_dsRNA were artificially synthesized by

Genolution, Inc. to ensure that any effects, or lack of them, could not be attributed to contaminants that may be present in in vitro transcribed dsRNA. The sequence for C002 dsRNA of M. persicae (Mp_C002_dsRNA) was obtained from Coleman et. al. (2015)(Coleman et al. 2015). For the artificial synthesis of Mp_C002_dsRNA, the fragment length was 496bp, deleting 64bp from the start and 150bp from the end. Drosophila melanogaster vATPase dsRNA (Dm_vATPase_dsRNA) was synthesized as a negative control.

Artificial diet bioassay

Artificial diet bioassay for Rps13, Rps5a, Snf7, Katanin60, Vpd2, Rp19a were carried out for 5 days. dsRNA of each candidate gene was fed via artificial diet at a concentration of

7.5ng/µl. There were 10 replicates of each treatment with 5 insects in replicate. Mortality of insects was recorded on day 3 and day 5. On day 5, all the live insects from each replicate were weighed together on a microbalance.

Artificial diet bioassays were carried out using Mp_vATPase_dsRNA and

Mp_C002_dsRNA to explore their potential to affect the fitness of M. persicae. The final concentration of dsRNA in the diet was 37.5ng/µl for Mp_vATPase_dsRNA and 50 ng/µl for

Mp_C002_dsRNA. GFP_dsRNA (37.5ng/µl) and Dm_vATPase_dsRNA (50 ng/µl) were used as negative controls for Mp_vATPase_dsRNA and Mp_C002_dsRNA respectively.

Clear acrylic pipes (25mm X 35mm) open at both ends were used as cages to perform the bioassay. One side of the cage was closed by stretching two layers of parafilm over it. A

95 | P a g e group of ten, 1-2 day old, M. persicae nymphs were carefully transferred into the cage using a paint brush. The other end of the cage was sealed with parafilm layers containing diet with/without dsRNA. All the cages were incubated at 20˚C with 12/12 hr day/night photoperiod. Observations on the survival and fecundity of M. persicae were recorded after 12 days.

To identify any transgenerational effects of Mp_C002_dsRNA, M. persicae were monitored for 12 days (with three changes of diet sachets) in the cages with/without dsRNA containing diets. All the newborn nymphs were then transferred to new cages with fresh diet of the same type as the previous generation. Every treatment was repeated three times (n=3).

The final number of aphids were monitored over 12 days.

We also tested transfection reagents (Happyfect® by Tecrea, Fugene® by Promega,

Ribojuice® by Millipore and Lipofectamine®by Thermo Fisher Scientific) for their potential to enhance the delivery of dsRNA via artificial diet. Toxicity of the transfection reagent was determined by feeding 2 µl of transfection reagent mixed in 100µl diet and fed to aphids for 12 days. During all the artificial diet bioassays, the diet was changed after every four days or when bacterial or fungal contamination was found in the diet if this was earlier than four days.

M. persicae size analysis

Aphid fecundity and weight are directly proportional to size(Leather and Dixon 1984).

We developed a macro script for ImageJ software that enables determination of the size of the insect in mm2 (Supplementary Data-6).

Relative gene expression analysis of samples from the artificial diet experiment

M. persicae samples were collected from all the treatments at the end of the bioassay for each generation. The insects were separated based on their morphology (alate or wingless), and morphologically similar insects from each replicate were pooled. Only wingless insects

96 | P a g e were carried forward for expression analysis. Mp_C002_dsRNA treated replicates had 10, 10,

10, 9, 6 insects in five biological replicates whereas Dm_vATPase_dsRNA treated replicates had 10, 9, 9, 9, 6 insects. Total RNA was extracted from each sample using a Direct-zol™

RNA kit. The first strand of cDNA was synthesized using MuMLV reverse transcriptase (NEB) according to the manufacturer’s instructions. Gene expression analysis was then performed for each sample using real-time PCR analysis. The ∆∆ct method of relative quantification determined the difference in expression of target genes in test and control samples(Livak and

Schmittgen 2001). The primers for five different housekeeping reference genes (GDPH, RpL7

RpS3, Actin, Tubulin; Supplementary Data-1) were tested for their efficiency and stable expression. RpL7 was selected based on its stable expression and primer efficiency

(Slope=1.9).

Development of transgenic Arabidopsis expressing dsRNA

The Mp_C002_dsRNA fragment was designed using primers described by Coleman et. al.

(2015)(Coleman et al. 2015). The Mp_C002_dsRNA fragment of 710bp was amplified and cloned into a RNA hairpin producing vector, pL4440 then sub-cloned into the binary vector pMLBART (Fig. S3).

The multigene ‘Bestbet’ dsRNA construct was created by concatenating the sequences of four genes thought had orthologs shown to elicit a dsRNA response in the literature: Snf7(Baum et al. 2007), vATPase-like(Whyard et al. 2009),C002(Navdeep S. Mutti et al. 2006, Coleman et al. 2015, Zhang et al. 2015), AChE(Guo et al. 2014); sequence available in Supp. Data-1). This was synthesized by Biomatik with specific restriction sites at both ends. The synthesized sequence was initially cloned into a RNA hairpin producing vector, pL4440 then sub-cloned into pMLBART (Supplementary Figure 3).

97 | P a g e

Mp_C002_dsRNA and Mp_BestBet_dsRNA constructs were introduced into Agrobacterium tumefaciens C58 strain and then used to transformed A. thaliana plants (Col-0 ecotype) using the floral dip method(Clough and Bent 1998). Seeds from the dipped plants were sown in potting soil and germinating seedlings sprayed with phosphonocithrin (Basta: Bayer) to select transformants. Lines were established from these individual plants by letting them self- pollinate. Seeds from each of these transformant plants (T1) were harvested, sown and seedlings also exposed to BASTA screening. Lines displaying a ratio 3:1 Basta resistant/Basta sensitive were likely to possess a single transgene insertion. Basta resistant individuals were subsequently screen for homozygosity in the next generation. This screening procedure resulted in 10 Mp_C002_dsRNA and 6 Mp_BestBet_dsRNA lines, that were confirmed to have the transgene by PCR using insert-specific primers and subsequently screened for expression levels of dsRNA at the seedling stage. The three highest expressing lines from each construct were selected for further analysis.

All transgenic and non-transgenic Arabidopsis thaliana plants were maintained at 20˚C under a 24h photoperiod.

Transgenic Plant bioassay

The selected A. thaliana plants were sown individually in small cups. After four weeks of growth, a group of ten 1-2-day old M. persicae nymphs were released on each plant. Every plant was then caged in a separate plastic box and maintained at 20˚C and 12/12hr, day/night photoperiod. This group of treatments were regarded as the first generation. On the 10th-11th day from the start of the experiment, M. persicae started reproducing new nymphs. A group of newly born, 1-2 day old, nymphs were then released onto a fresh plant of the same genotype.

This group of treatments were regarded as the second generation. Plants harbouring insects of both generations were maintained in plastic containers for 23 days. Then the number of aphids on each plant were counted and represented the fecundity measure. A parallel set of plants were

98 | P a g e set up for both generations that enabled ten insects to be harvested on the fourth day, from the start of the experiment, for the RNA isolation and digital PCR analysis.

Effect of M. persicae saliva on dsRNA stability in artificial diet

The effect of M. persicae saliva was analysed by feeding insects on a diet containing

Mp_C002_dsRNA with 50ng/µl of diet. A diet with Mp_C002_dsRNA was also incubated in the same condition without insects to observe diet-dsRNA interaction and its effect on dsRNA stability. dsRNA+diet was incubated with/without insects for 2 days and for 4 days. Diet samples were used for cDNA synthesis and digital PCR to quantify the number of dsRNA copies present in the solution.

Effect of dissected gut tissues on dsRNA stability

To examine dsRNA degradation in gut tissue, five midguts were dissected from 2nd and

3rd instar M. persicae. The dissected guts were washed with PBS and immediately transferred in a fresh RNase free water containing dsRNA of concentration 50ng/µl. The samples were vortexed briefly and incubated at room temperature. Subsamples were collected after 15min,

30min and 60min incubation and stored at -80°C. The sample with no midgut was used as an experimental control. The collected samples were used for cDNA synthesis and digital PCR for absolute quantification of dsRNA copies in the sample.

Phylogenetic analysis

Multiple sequence alignment was performed with MAFFT software(Katoh and

Standley 2013)_. Tree building was performed using FastTreeMP software using Maximum- likelihood NNI method. We used default settings except that we chose the following options as recommended by the fasttree website (Price et al. 2010): -pseudo –spr 4 –mlace 2 –slownni.

99 | P a g e

References

1 Baum, J. A. et al. Control of coleopteran insect pests through RNA interference. Nature biotechnology 25, 1322-1326, doi:10.1038/nbt1359 (2007). 2 Mao, Y.-B. et al. Silencing a cotton bollworm P450 monooxygenase gene by plant- mediated RNAi impairs larval tolerance of gossypol. Nat Biotech 25, 1307-1313, doi:http://www.nature.com/nbt/journal/v25/n11/suppinfo/nbt1352_S1.html (2007). 3 Gordon, K. H. J. & Waterhouse, P. M. RNAi for insect-proof plants. Nat Biotech 25, 1231-1232 (2007). 4 Fire, A. et al. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806-811, doi:http://www.nature.com/nature/journal/v391/n6669/suppinfo/391806a0_S1.html (1998). 5 Whyard, S., Singh, A. D. & Wong, S. Ingested double-stranded RNAs can act as species-specific insecticides. Insect biochemistry and molecular biology 39, 824-832, doi:http://dx.doi.org/10.1016/j.ibmb.2009.09.007 (2009). 6 Good, R. T. et al. OfftargetFinder: a web tool for species-specific RNAi design. Bioinformatics 32, 1232-1234, doi:10.1093/bioinformatics/btv747 (2016). 7 Scott, J. G. et al. Towards the elements of successful insect RNAi. Journal of Insect Physiology 59, 1212-1221, doi:http://dx.doi.org/10.1016/j.jinsphys.2013.08.014 (2013). 8 Huvenne, H. & Smagghe, G. Mechanisms of dsRNA uptake in insects and potential of RNAi for pest control: A review. Journal of Insect Physiology 56, 227-235, doi:http://dx.doi.org/10.1016/j.jinsphys.2009.10.004 (2010). 9 Shukla, J. N. et al. Reduced stability and intracellular transport of dsRNA contribute to poor RNAi response in lepidopteran insects. RNA Biology 13, 656-669, doi:10.1080/15476286.2016.1191728 (2016). 10 Joga, M. R., Zotti, M. J., Smagghe, G. & Christiaens, O. RNAi efficiency, systemic properties, and novel delivery methods for pest insect control: What we know so far. Frontiers in Physiology 7, 553, doi:10.3389/fphys.2016.00553 (2016). 11 Terenius, O. et al. RNA interference in Lepidoptera: an overview of successful and unsuccessful studies and implications for experimental design. J Insect Physiol 57, 231- 245, doi:10.1016/j.jinsphys.2010.11.006 (2011). 12 Mutti, N. S., Park, Y., Reese, J. C. & Reeck, G. R. RNAi knockdown of a salivary transcript leading to lethality in the pea aphid, Acyrthosiphon pisum. J Insect Sci 6, 1- 7, doi:10.1673/031.006.3801 (2006). 13 Mutti, N. S. et al. A protein from the salivary glands of the pea aphid, Acyrthosiphon pisum, is essential in feeding on a host plant. Proceedings of the National Academy of Sciences of the United States of America 105, 9965-9969, doi:10.1073/pnas.0708958105 (2008). 14 Jaubert-Possamai, S. et al. Gene knockdown by RNAi in the pea aphid Acyrthosiphon pisum. BMC biotechnology 7, 63, doi:10.1186/1472-6750-7-63 (2007). 15 Sapountzis, P. et al. New insight into the RNA interference response against cathepsin- L gene in the pea aphid, Acyrthosiphon pisum: Molting or gut phenotypes specifically induced by injection or feeding treatments. Insect biochemistry and molecular biology 51, 20-32, doi:http://dx.doi.org/10.1016/j.ibmb.2014.05.005 (2014). 16 Shakesby, A. J. et al. A water-specific aquaporin involved in aphid osmoregulation. Insect biochemistry and molecular biology 39, 1-10, doi:10.1016/j.ibmb.2008.08.008 (2009).

100 | P a g e

17 Pitino, M., Coleman, A. D., Maffei, M. E., Ridout, C. J. & Hogenhout, S. A. Silencing of aphid genes by dsRNA feeding from plants. PLoS ONE 6, e25709, doi:10.1371/journal.pone.0025709 (2011). 18 Tzin, V. et al. RNA interference against gut osmoregulatory genes in phloem-feeding insects. Journal of Insect Physiology 79, 105-112, doi:http://dx.doi.org/10.1016/j.jinsphys.2015.06.006 (2015). 19 Mathers, T. C. et al. Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species. Genome biology 18, 27, doi:10.1186/s13059-016-1145-3 (2017). 20 Li, J., Wang, X.-P., Wang, M.-Q., Ma, W.-H. & Hua, H.-X. Advances in the use of the RNA interference technique in Hemiptera. Insect Science 20, 31-39, doi:10.1111/j.1744-7917.2012.01550.x (2013). 21 Li, H. et al. Systemic RNAi in western corn rootworm, Diabrotica virgifera virgifera, does not involve transitive pathways. Insect Science, n/a-n/a, doi:10.1111/1744- 7917.12382 (2016). 22 Bhatia, V., Bhattacharya, R., Uniyal, P. L., Singh, R. & Niranjan, R. S. Host Generated siRNAs Attenuate Expression of Serine Protease Gene in Myzus persicae. PLoS ONE 7, e46343, doi:10.1371/journal.pone.0046343 (2012). 23 Mao, J. & Zeng, F. Feeding-Based RNA Intereference of a Gap Gene Is Lethal to the Pea Aphid, Acyrthosiphon pisum. PLoS ONE 7, e48718, doi:10.1371/journal.pone.0048718 (2012). 24 Guo, H. et al. Plant-Generated Artificial Small RNAs Mediated Aphid Resistance. PLoS ONE 9, e97410, doi:10.1371/journal.pone.0097410 (2014). 25 Mutti, N. S. et al. A protein from the salivary glands of the pea aphid, Acyrthosiphon pisum, is essential in feeding on a host plant. Proceedings of the National Academy of Sciences 105, 9965-9969, doi:10.1073/pnas.0708958105 (2008). 26 Myzus persicae (ed Bioinformatics Platform for Agroecosystem Arthropods) (Bioinformatics Platform for Agroecosystem Arthropods https://bipaa.genouest.org/is/, 2017). 27 Singh, A. D., Wong, S., Ryan, C. P. & Whyard, S. Oral delivery of double-stranded RNA in larvae of the yellow fever mosquito, Aedes aegypti: Implications for pest mosquito control. Journal of Insect Science 13, 69-69, doi:10.1673/031.013.6901 (2013). 28 Cancino-Rodezno, A. et al. The mitogen-activated protein kinase p38 is involved in insect defense against Cry toxins from Bacillus thuringiensis. Insect biochemistry and molecular biology 40, 58-63, doi:http://dx.doi.org/10.1016/j.ibmb.2009.12.010 (2010). 29 Murphy, K. A., Tabuloc, C. A., Cervantes, K. R. & Chiu, J. C. Ingestion of genetically modified yeast symbiont reduces fitness of an insect pest via RNA interference. 6, 22587, doi:10.1038/srep22587. 30 Coleman, A. D., Wouters, R. H., Mugford, S. T. & Hogenhout, S. A. Persistence and transgenerational effect of plant-mediated RNAi in aphids. J Exp Bot 66, 541-548, doi:10.1093/jxb/eru450 (2015). 31 Davidson, N. M. & Oshlack, A. Corset: enabling differential gene expression analysis for de novoassembled transcriptomes. Genome biology 15, 410, doi:10.1186/s13059- 014-0410-6 (2014). 32 Pham, D. Q. D. & Winzerling, J. J. Insect ferritins: Typical or atypical? Biochimica et Biophysica Acta (BBA) - General Subjects 1800, 824-833, doi:http://dx.doi.org/10.1016/j.bbagen.2010.03.004 (2010).

101 | P a g e

33 Dadd, R. H., Krieger, D. L. & Mittler, T. E. Studies on the artificial feeding of the aphid Myzus persicae (Sulzer)—IV. Requirements for water-soluble vitamins and ascorbic acid. Journal of Insect Physiology 13, 249-272, doi:http://dx.doi.org/10.1016/0022- 1910(67)90152-7 (1967). 34 Wynant, N., Santos, D., Van Wielendaele, P. & Vanden Broeck, J. Scavenger receptor- mediated endocytosis facilitates RNA interference in the desert locust, Schistocerca gregaria. Insect molecular biology 23, 320-329, doi:10.1111/imb.12083 (2014). 35 Christiaens, O., Swevers, L. & Smagghe, G. DsRNA degradation in the pea aphid (Acyrthosiphon pisum) associated with lack of response in RNAi feeding and injection assay. Peptides 53, 307-314, doi:http://dx.doi.org/10.1016/j.peptides.2013.12.014 (2014). 36 Arimatsu, Y., Kotani, E., Sugimura, Y. & Furusawa, T. Molecular characterization of a cDNA encoding extracellular dsRNase and its expression in the silkworm, Bombyx mori. Insect biochemistry and molecular biology 37, 176-183, doi:http://dx.doi.org/10.1016/j.ibmb.2006.11.004 (2007). 37 Tang, X. & Zhou, B. Ferritin is the key to dietary iron absorption and tissue iron detoxification in Drosophila melanogaster. The FASEB Journal 27, 288-298, doi:10.1096/fj.12-213595 (2013). 38 Wynant, N. et al. Identification, functional characterization and phylogenetic analysis of double stranded RNA degrading enzymes present in the gut of the desert locust, Schistocerca gregaria. Insect biochemistry and molecular biology 46, 1-8, doi:http://dx.doi.org/10.1016/j.ibmb.2013.12.008 (2014). 39 Gillet, F.-X. et al. Investigating engineered ribonucleoprotein particles to improve oral RNAi delivery in crop insect pests. Frontiers in Physiology 8, doi:10.3389/fphys.2017.00256 (2017). 40 Sugahara, R., Tanaka, S., Jouraku, A. & Shiotsuki, T. Geographic variation in RNAi sensitivity in the migratory locust. Gene 605, 5-11, doi:http://dx.doi.org/10.1016/j.gene.2016.12.028 (2017). 41 Swevers, L., Vanden Broeck, J. & Smagghe, G. The possible impact of persistent virus infection on the function of the RNAi machinery in insects: a hypothesis. Frontiers in Physiology 4, doi:10.3389/fphys.2013.00319 (2013). 42 Prosser, W. A. & Douglas, A. E. A test of the hypotheses that nitrogen is upgraded and recycled in an aphid (Acyrthosiphon pisum) symbiosis. Journal of Insect Physiology 38, 93-99, doi:http://dx.doi.org/10.1016/0022-1910(92)90037-E (1992). 43 Kunkel, H. in Aphids As Virus Vectors (ed Karl Maramorosch) 311-338 (Academic Press, 1977). 44 Leather, S. R. & Dixon, A. F. G. Aphid growth and reproductive rates. Entomologia experimentalis et applicata 35, 137-140, doi:10.1111/j.1570-7458.1984.tb03373.x (1984). 45 Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real- time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25, 402-408, doi:10.1006/meth.2001.1262 (2001). 46 Navdeep S. Mutti, Yoonseong Park, John C. Reese & R.Reeck, G. RNAi knockdown of a salivary transcript leading to lethality in the pea aphid, Acyrthosiphon pisum. Journal of Insect Science 6, 1-7 (2006). 47 Zhang, Y., Fan, J., Sun, J. R. & Chen, J. L. Cloning and RNA interference analysis of the salivary protein C002 gene in Schizaphis graminum. J Integr Agr 14, 698-705, doi:10.1016/S2095-3119(14)60822-4 (2015).

102 | P a g e

48 Clough, S. J. & Bent, A. F. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. The Plant Journal 16, 735-743, doi:10.1046/j.1365-313x.1998.00343.x (1998). 49 Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular biology and evolution 30, 772-780, doi:10.1093/molbev/mst010 (2013). 50 Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – Approximately maximum- likelihood trees for large alignments. PLOS ONE 5, e9490, doi:10.1371/journal.pone.0009490 (2010).

103 | P a g e

Acknowledgements

We thank Dr. Alexandre Fournier-Level for his advice especially with the Image J macroscript.

We thank Crystal Jones for technical assistance with the initial aphid experiments. We are grateful to Paul Umina for his general collegiality and for providing us the aphid strains. We also thank Dr. Govind Gujar, Dr. Vinay Kalia, Meenu Singla, Dr. Sujatha Sunil, Dr. Bernie

Carroll for their discussions and visits. This research has been supported by the Australia India

Strategic Research Fund, Grand Challenge Scheme, Grant Number GCF010009

Author Contribution Statement

A.G. performed the bulk of the experimental work, statistical analyses and prepared figures.

R.T.G. assisted with experimental implementation and technical design. O.E. conceived experiments and advised on aphid biology and experimental design. R.T.G. prepared dsRNA for the initial analysis and O.E. group performed the bioassays. D.R. advised on the experimental design and managed the larger ‘Grand Challenge’ project and provided detailed feedback on the manuscript drafts. J.G. supervised the creation of transgenic plants and contributed to the manuscript. C.R. supervised all the work. A.G. and C.R. wrote the manuscript.

Additional information

Accession codes: RNA-Seq data is lodged in NCBI’s SRA database with accession numbers

XXXXXXX

Competing Financial Interests: The authors declare that they have no competing financial interests.

104 | P a g e

Figure legends

Figure 1: Different dsRNA fed to M. persicae in artificial diet only marginally affect aphid weight.

Two experiments (a and b) were conducted at different times and targeted the following target genes:

a) Rps13 (Student t-test p= 0.051), Rps5a (p= 0.1), Snf7 (p=0.23), Katanin60 (p= 0.03). Error bars show the standard error of the mean for 10 replicates. b) Vpd2 (p=0.15), Rp19a (p= 0.02), * Asterisks shows significant change in M. persicae weight. Figure 2: Mp_vATPase-like_dsRNA fed to M. persicae in the artificial diet does not affect size or cause a knockdown of the target, even with transfection agents.

(a) Aphid size does not differ between those fed on a diet containing Mp_vATPase- like_dsRNA versus GFP_dsRNA (Student t-test, p=0.24), although use of the transfection agent does affect aphid size (Happyfect: p=0.38, Lipofectamine: p=0.94). The error bars show the standard error of the mean based on n=3 replicates of 10 insects (b) Quantitative real-time PCR shows that MP_vATPase-like transcripts level is not significantly different whether the aphids are fed Mp_vATPase-like_dsRNA or GFP_dsRNA. The fold change is not significantly different regardless of the transfection reagent used (ANOVA, p=0.96, n=3). Total number of insects present in each replicate GFP_dsRNA - 4, 10, 8, Mp_vATPase – 4, 7, 8. Figure 3: Arabidopsis plants expressing anti-aphid dsRNAs have a minor impact on aphid fecundity even in multi-generation experiments. M. persicae were reared on four

Arabidopsis thaliana genotypes: Mp_BestBet_dsRNA, Mp_C002_dsRNA, GFP_dsRNA and

Empty vector.

(a) The fecundity of aphids reared on these plants was measured by counting the number of aphids. No significant difference between plants was observed in the first generation. (b) The number of second generation aphids on the plants. One significant pairwise is observed (ANOVA, p= 0.025 , n= 3). (c) The level of transcript for the MpC002 gene measured on aphids from the first and second generation using digital PCR (n = 3). Error bars are the standard error of the mean. (d) The level of the transcripts for MpACE, MpSnf7, MpvATPase_like genes was measured on aphids from the second generation using digital PCR (n =3). * Asterisks shows significant change in M. persicae fecundity.

105 | P a g e

Figure 4: Ferritin satisfies the criteria as a good target gene for RNAi yet

Mp_FeHC_dsRNA fed to M. persicae in the artificial diet does not affect target gene activity or aphid number.

(a) A histogram bar representing the number of 21mers matches to the 495nt of the MP_FeHC gene is shown for each insect around the perimeter of the species tree cladogram (drawn based on Cytochrome oxidase I ). (b) Dietary iron deficiency reduces aphid size (Student t-test, n=3, p=0.0001). * Asterisks shows significant change in M. persicae size. (c) Mp_FeHC_dsRNA does not significantly affect newly born aphid numbers after 11 days relative to treatment with Dm_vATPase_dsRNA from Drosophila melanogaster. The error bars represent the standard error of the mean. Figure 5: dsRNase transcripts are abundant in the aphid midguts, and dsRNase activity is high in midgut extracts

(a) A phylogenetic analysis showing that M. persicae midgut contain transcripts orthologous to the Bombyx mori dsRNAse. Numbers represents bootstrap values. (b) Digital PCR showing that commercially synthesized Mp_C002_dsRNA levels in artificial diet remain high after four days even if aphids are allowed to feed on the diet (ANOVA, n=3, p=0.41) (c) Digital PCR showing that commercially synthesized Mp_C002_dsRNA is rapidly degraded if incubated with aphid midguts (ANOVA, p=0.0001, n=3). The control contained dsRNA without midgut after 1hr, * Asterisks shows significant change in MP_C002_dsRNA concentration.

106 | P a g e

Figures

Figure 1: * *

107 | P a g e

Figure 2:

Figure 3:

*

108 | P a g e

Figure 4:

a

109 | P a g e

Figure 5

a

110 | P a g e

Table 1: The RNAi machinery of M. persicae. (MG: Midgut. WB: Whole body, RPKM: Reads

Per Kilobase of transcript per Million mapped reads, Std Err: Standard Error, FC: Fold Change)

Log RPKM Similar 2 FC RPKM and Std Gene top blast hit from AphidBase transcrip I in and Std Err in Name transcript data t MG Err in MG WB - MYZPE13164_G006_v1.0_000029270 Cluster- 0.0 Dcr-1 .4 35820.0 2 4.7±0.4 4.8±0.1 MYZPE13164_G006_v1.0_000182910 Cluster- 0.2 .3 40234.1 7 7.1±0 5.9±0.2 MYZPE13164_G006_v1.0_000182910 Cluster- 0.4 .2 40234.2 7 8.2±0 5.9±0.1 MYZPE13164_G006_v1.0_000182910 Cluster- 0.2 Dcr-2 .3 40234.1 8 7.1±0 5.9±0.2 MYZPE13164_G006_v1.0_000182910 Cluster- 0.4 .2 40234.2 7 8.2±0 5.9±0.1 MYZPE13164_G006_v1.0_000149940 Cluster- 0.9 Sid-1 .1 34552.0 8 16.8±1.4 8.5±0.3 - MYZPE13164_G006_v1.0_000102290 Cluster- 0.1 31.2±1. Ago-2 .1 24454.1 6 31.1±2.9 7 MYZPE13164_G006_v1.0_000150740 Cluster- 0.6 142.4±2. 89.8±0. .4 24298.0 6 7 6 MYZPE13164_G006_v1.0_000119300 Cluster- 19.8±0. Ago-3 .6 26358.0 0.2 22.9±1.2 8 MYZPE13164_G006_v1.0_000117430 Cluster- 14.1±0. R2D2 .1 36917.0 0.4 18.8±1 6 - Aubergin MYZPE13164_G006_v1.0_000039280 Cluster- 1.7 21.7±2. e .2 23016.2 7 6.3±0.4 7 - MYZPE13164_G006_v1.0_000084140 Cluster- 5.1 32.3±3. .1 28609.0 8 0.9±0.1 8 MYZPE13164_G006_v1.0_000119300 Cluster- 19.8±0. .6 26358.0 0.2 22.9±1.2 8 MYZPE13164_G006_v1.0_000186990 Cluster- 0.0 Pasha .1 34631.0 3 8.9±0.1 7±0.1 MYZPE13164_G006_v1.0_000018350 Cluster- Drosha .1 30412.0 0.2 3.2±0.1 2.8±0.1 Loquaciu MYZPE13164_G006_v1.0_000068140 Cluster- 28.1±0. s .1 37550.1 1.6 85±7.3 7

111 | P a g e

Scavenge - r MYZPE13164_G006_v1.0_000124360 Cluster- 1.8 Receptor .4 40212.0 6 2.5±0.1 8.9±0.8 - MYZPE13164_G006_v1.0_000164000 Cluster- 3.9 48.1±7. .1 18505.1 1 3.2±0.3 7 - MYZPE13164_G006_v1.0_000072270 Cluster- 6.1 .5 35050.0 6 2.6±2.5 6±0.1 - MYZPE13164_G006_v1.0_000067150 Cluster- 6.8 .1 25721.0 3 1.2±1.2 3.1±0.1 - MYZPE13164_G006_v1.0_000203080 Cluster- 3.2 18.4±1. .3 36169.0 4 5.4±3.5 3

Table 2: Expression level of dsRNases in the midgut (MG) and whole body (WB) of M. persicae (RPKM: Reads Per Kilobase of transcript per Million mapped reads, Std Err: Standard

Error, FC: Fold Change)

Log2 Fold Similar chang RPKM and Gene top blast hit from AphidBase transcript transcrip e in RPKM and Std Err in Name data t Gut Std Err in MG WB dsRNas MYZPE13164_G006_v1.0_000023850. Cluster- 526.4±65. e 1 21088.1 4.13 6 30±1.2 MYZPE13164_G006_v1.0_000023850. Cluster- 523.5±62. 30.5±1. 1 21088.2 4.1 7 8 MYZPE13164_G006_v1.0_000023850. Cluster- 595.1±68. 34.6±1. 1 21088.3 4.1 8 5

112 | P a g e

Supporting Information –

Supplementary Figure 1 – The aphid genome encodes vATPase and vATPase-like genes. The phylogenetic tree suggests these two genes diverged before hemipteran species radiated. The numbers in small font shown below the branches represent the proportion of bootstrap replicates that support that branch and the numbers in large font above the branches indicate branch lengths. The branch length of the MP_vATPase-like sequence (circled) is long compared to the real vATPase sequences from a diverse range of insects.

Supplementary Figure 2: Mp_C002_dsRNA fed to Myzus persicae in the artificial diet does not knockdown the target or affect aphid fecundity even when the experiment spans two generations. Commercially synthesized Mp_C002_dsRNA was fed to 5 replicates of 10 - aphids for 11 days and the number of progeny was recorded. dsRNA sequence derived from

Drosophila melanogaster vATPase was used as a negative control. For each replicate, 10 of the progeny were then transferred to fresh media, and the number of their progeny produced was recorded (Student t-test, first generation - P=0.37, Second Generation - P=0.77) The error bars represent the standard error of the mean.

Supplementary Figure 3 – Plant transformation vector maps. The vector on the left is designed to express dsMpC002 and the vector on the right is designed to express the multigenic dsBestBet (C002, vATPAse-like, Ace, snf7).

Supplementary figure 4: The phylogeny of ferritin heavy chain genes.

Supplementary Table 1: The transcript abundance of genes studied in this manuscript.

(MigGut: MB, Whole Body: WB, RPKM: Reads Per Kilobase of transcript per Million mapped reads, Std Err: Standard Error, FC: Fold Change)

Supplementary data 1: All the dsRNA sequences and primers used in the study.

113 | P a g e

Supplementary data 2: Custom commands used in bioinformatics analysis pipeline

Supplementary figure 5: Photographs showing effect of different concentrations of iron on

Myzus persicae size.

Supplementary data 6: Link for aphid size detection script at GitHub

114 | P a g e

3.3 Supplementary material

Supplementary Figure 1:

Supplementary Figure 2:

115 | P a g e

Supplementary Fig. 3:

116 | P a g e

Supplementary Figure 4:

117 | P a g e

Supplementary Table 1:

RPKM and RPKM and top blast hit from AphidBase Similar Std Err in Std Err in Gene Name transcript data Cluster MG WB Cluster MYZPE13164_G006_v1.0_0001032 - MpACE 90.1 5961.1 1.8±0.1 2.7±0.1 Cluster - MYZPE13164_G006_v1.0_0000862 32898. MpC002 00.2 1 2±0.2 154.4±8.9 Cluster - MYZPE13164_G006_v1.0_0001831 23766. MpSnf7 80.2 0 32.7±0.5 16.2±0.3 Cluster - MpvATPase_li MYZPE13164_G006_v1.0_0001725 28471. ke 60.3 1 9.8±0.2 5.4±0.1 Cluster - MYZPE13164_G006_v1.0_0000959 21274. MpvATPase 50.1 0 944±39.6 178.7±2.6 Cluster - MYZPE13164_G006_v1.0_0000722 26442. 1842±285. MpFerritin 20.2 0 5 255.5±4.7 Cluster - MYZPE13164_G006_v1.0_0000007 35712. 1161.8±60. MpGADPH 20.1 0 2413.6±71 4 Cluster - MYZPE13164_G006_v1.0_0000521 37187. 345.8±12. MpTubulin 80.2 0 5 821.7±29.7 Cluster - MYZPE13164_G006_v1.0_0000986 27421. 917.5±38. MpRpL7 80.1 0 1 635.2±13.4

118 | P a g e

Supplementary Data 1

Artificially synthesized dsRNA sequences

>Mp_C002_dsRNA

5’TGAACGATAATCAGGGAGAAGAGAACGATAATCAGGGAGAAGAGAACGATAA

TCAGGGAGAAGAGAACGATAATCAGGGAGAAGAGAAGGAAGAAGTTTCCGAAC

CAGAGATGGAGCACCATCAGTGCGAAGAATACAAATCGAAGATCTGGAACGATG

CATTTAGCAACCCGAAGGCTATGAACCTGATGAAACTGACGTTTAATACAGCTA

AGGAATTGGGCTCCAACGAAGTGTGCTCGGACACGACCCGGGCCTTATTTAACTT

CGTCGATGTGATGGCCACCAGCCCGTACGCCCACTTCTCGCTAGGTATGTTTAAC

AAGATGGTGGCGTTTATTTTGAGGGAGGTGGACACGACATCGGACAAATTTAAA

GAGACGAAGCAGGTGGTCGACCGTATCTCGAAAACTCCAGAGATCCGTGACTAT

ATCAGGAACTCGGCCGCCAAGACCGTCGACTTGCTCAAGGAACCCAAGATTAGA

GCACGACTGTT 3’

>Mp_FeHC_dsRNA

5’GGTCCAAATTCCAGACGATTGGATCACCATGGTTGACCCATGCACGAAAAAAATGAAAGAGCAGG

TCCAGGAAGAACTTACTGCAGCAATGACATATTTTGCAATGGGAGCACATTTTTCTAAAGACACAGT

GAATCGTCCAGGATTCGCAAAAATCTTCTTTGATAGTGCCAGCGAAGAACGTGACCATGCTATTAAA

ATTATTGGATATCTGTTGATGAGAGGAGGTTTGACCAAAGATATCAGTCAATTAATTCGTGACCCTC

AACCTTTGTCTGAAGCATGGGCTGATGGTCTAAGTGCATTGAAAGATGCTTTAAAATTGGAAGCTCA

TGTCACTCGTAAGATAAGAGATATTGCCACCACATGTGAGGAACCTGGACGTGATGGACAAGATTTC

AACGACTATCATTTAGTTGATTGGTTAACTGGCGATTTCTTGACTGAACAATATGAAGGTCAGCGTG

ACTTAGCT 3’

>Dm_vATPase_dsRNA

119 | P a g e

5’AAGGCCCAGATCAATCAGAACGTCGAGCTGTTCATCGACGAGAAAGACTTCCT CTCTGCTGATACCTGCGGTGGTGTTGAGCTGCTGGCCCTCAACGGACGCATCAAG GTGCCCAATACGCTGGAGTCCAGATTAGACCTCATTTCGCAGCAGCTGGTGCCCG AGATTCGTAACGCACTTTTCGG 3’

>Mp_C002_dsRNA expressed in Arabidopsis

5’-

GGGGTACCTGCAAGGTTCAGACTTCCGAACAGGACGATGATCAGGAAGGATATTACGATGATGAG

GGAGGAGTGAACGATAATCAGGGAGAAGAGAACGATAATCAGGGAGAAGAGAACGATAATCAGG

GAGAAGAGAACGATAATCAGGGAGAAGAGAAGGAAGAAGTTTCCGAACCAGAGATGGAGCACCA

TCAGTGCGAAGAATACAAATCGAAGATCTGGAACGATGCATTTAGCAACCCGAAGGCTATGAACCT

GATGAAACTGACGTTTAATACAGCTAAGGAATTGGGCTCCAACGAAGTGTGCTCGGACACGACCCG

GGCCTTATTTAACTTCGTCGATGTGATGGCCACCAGCCCGTACGCCCACTTCTCGCTAGGTATGTTTA

ACAAGATGGTGGCGTTTATTTTGAGGGAGGTGGACACGACATCGGACAAATTTAAAGAGACGAAGC

AGGTGGTCGACCGTATCTCGAAAACTCCAGAGATCCGTGACTATATCAGGAACTCGGCCGCCAAGA

CCGTCGACTTGCTCAAGGAACCCAAGATTAGAGCACGACTGTTCAGAGTGATGAAAGCCTTCGAGA

GTCTGATAAAACCAAACGAAAACGAAGCATTAATCAAACAGAAGATTAAGGGGTTAACCAATGCTC

CCGTCAAGTTAGCCAAGGGTGCCATGAAAACGGTTGGACGTTTCTTTAGACATTTTTAACGGAATTC-

3’

>Mp_Bestbet_dsRNA expressed in Arabidopsis

5’- ACGACGGGAACTACAAGACACGTGCTGAAGTCAAGTTTGAGGGAGACACCCTCG TCAACAGGATCGAGCTTAAGGGAATCGATTTCAAGGAGGACGGAAACATCCTCT AGACTCGAGAGGAATTGGGCTCCAACGAAAGTGTGCTCGGACACGACCCGGGCC TTATTTAACTTCGTCGATGTGATGGCCACCAGCCCGTACGCCCACTTCTCGCTAG GTATGTTTAACAAGATGGTGGCGTTTATTTTGAGACGGGCAAAGGTCGAAGGTCA ATGGTTGAAAATACGCTGGTAGTGCGATTGCTCCATCTCACCCAACAAGCTATAC

120 | P a g e

CGTTGATATGCACTGGACTGTTTGGACTGAATCCCACTAGAACTTACTCAAGACA TAGATAAGATAGCAACCCGATTGCCACCATGTCGTGCATGAGAGCTGTTGACGC AAGTACCATATCCAAAAAGCAGTGGAACAGCTATTCCGGTATTTTGGGTTTCCCG TCTGCGCCCACCGTGGAGGCGTTCTCCTTCCGGAACACCCGCTGGACATGTTGAA ATGGTACAACTAATAAACGAGCTGCATTGCAAGCATTGAAGCGTAAGAAACGGT ACGAACAACAATTAGCCCAAATTGATGGTTCCATGTTAACTATTGAACAACAGC GGGAGGCATTAGAAGGTGCCAACACAA-3’

>Mp_vATPase_like_dsRNA in vitro transcribed

5’-

TTTAGCCAACACGGGAATGAACATCAAAATAAACATTGACCAAGCAACTAAGTT

ACCGACTCAAGAAATAGGAGGCGTCGTGGTCACGGGCAAAGGTCGAAGGTCAAT

GGTTGAAAATACGCTGGTAGTGCGATTGCTCCATCTCACCCAACAAGCTATACCG

TTGATATGCACTGGACTGTTTG-3’

121 | P a g e

Primers used in this Study

Aphid qPCR primers Forward Reverse 5’CCACAACACGGTTGGAGTAA qMpGAPDH 5’GTCTTCCGACTTCATTGGTGATA3’ 3’ 5’CTTGAATCGACGATCAGCTCT qMpRpL7 5’GGACAACGCATTCCAATCAC3’ A3’ 5’GGAACTGCTACTTCGGGTTT qMpRpLS3 5’CCATGGGACCAAACAGGTAAA3’ AG3’ 5’CTAGCTGTAAGTGGAGCGAA qMpTubulin 5’CGGTTTCCCGGTCAGTTAAA3’ TC3’ 5’TCGAAGTCCAAAGCGACATA qMpActin 5’CCGAAAGAGGTTACAGCTTCA3’ G3’ 5’TGATGTTCATTCCCGTGTTG3 qMpvATPase_like 5’CTACAGGCGATGTATCAAATCTT3’ ’ 5’TAACTTGACGGGAGCATTGG qMpC002 5’GCTCAAGGAACCCAAGATTAGA3’ 3’ 5’AAGTCACGCTGACCTTCATAT qMpFerritin 5’CTGGACGTGATGGACAAGATT3’ T3’ Arabidopsis primers for internal control 5’ACCACATGCCTTCCATCTAAC qCyclophilin_AT 5’CACTGGACCAGGTGTACTTTC3’ 3’ 5’CATGCATACCCTCCCCAACAA qTCTP_AT 5’ACACCCAAGCTCAGCGAAGAA3’ 3’ 5’ qTubulin_AT 5’CATTTGCTTCGGTACACTCCA3’ CCAGGGAACCTAAGACAGCA3’ qActin_AT 5’TCTTCCGCTCTTTCTTTCCA3’ 5’TCCTTCTGGTTCATCCCAAC3’ Digital PCR Primers and probes 5'GTTAATGGTATCTTCACGGTT MpRpL7_digi 5'AATGAACTTCCTTTGGCCTTTC3' TCC3' MpRpL7_digi_Prob 5'GATTGGTCTTCTTACGCCAGCCTCC e 3' 5'CCATCAGTGCGAAGAATACAAATC 5'GTTGGAGCCCAATTCCTTAG MpC002_digi 3' C3' MpC002_digi_Prob 5'TCAGGTTCATAGCCTTCGGGTTGC e 3' MpvATPase_like_di gi 5'CTACAGGCGATGTATCAAATCTT3' 5'TGATGTTCATTCCCGTGTTG3' MpvATPase_like_pr 5'TACCAGACGACGTGGAGTACGTG obe A3' 5'TCCGTAACCTACGTGCAATTC MpACE_digi 5'ATGGAAGCGTTCACCAGATT3' 3'

122 | P a g e

5'CTGTCTCAGATGACATAGATTGGC MpACE_digi_prob CACT3' 5'GCTCAAATGTTGGAACTGAA MpSnf7_digi 5'GGAACTGATGTAGACGAGGATG3' GG3' MpSnf7_digi_prob 5'AGGCAAGACACCTGTCCA3'

Supplementary Figure 5

Primer design for dsMP_C002 dsRNA amplification and RT-PCR analysis.

123 | P a g e

Custom Bioinformatics Commands –

Trimmomatic java -jar trimmomatic-0.33.jar PE -threads 14

fastq_paired_R1.fastq fastq_unpaired_R1.fastq fastq_paired_R2.fastq fastq_unpaired_R2.fastq

ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3

MINLEN:40

STAR –

STAR --runThreadN 16 --runMode alignReads --sjdbGTFfile

--sjdbOverhang 99 --outSAMstrandField intronMotif -- twopassMode Basic --quantMode GeneCounts --readFilesIn

--outFilterMatchNminOverLread 0.4

--outFilterScoreMinOverLread 0.2 --outReadsUnmapped Fastx -- limitBAMsortRAM 8729257684 --genomeDir . --outSAMtype BAM

SortedByCoordinate --outSAMunmapped Within -- outFilterIntronMotifs RemoveNoncanonical > star.log

124 | P a g e

Supplementary Figure 6

Effect of Iron on aphid size –

Supplementary Data 6

Link - https://github.com/amoltej/Aphid_Ruler

125 | P a g e

Chapter IV

Glucosinolate detoxification in generalist and

specialist aphid species

126 | P a g e

4.1 Introduction

As affirmed in chapter II, one of the three aphid species that are commonly pests on brassica plants, Myzus persicae, is a host plant generalist while the other two, Lipaphis erysimi and Brevicoryne brassica, are specialists. In order to elucidate the genetic basis of this difference, this chapter focuses on understanding the key differences in the detoxification related genes among these three-species using comparative transcriptomics. On the one hand we might expect the generalist species to have a more diverse set of detoxification genes in order to cope with the broader range of plant secondary compounds, while on the other the specialist species may have evolved specializations that enable them to detoxify or avoid

Brassicaceae plant secondary metabolites, in particular the glucosinolates.

4.1.1 Insect species differ in how they deal with glucosinolates

Over the course of evolution, several herbivorous insects have developed resistance to glucosinolate toxicity and different mechanisms have been employed (Kazana et al. 2007,

Opitz et al. 2011, Beran et al. 2014, Edger et al. 2015). These include various enzymatic detoxification reactions, sequestration and excretion and these are reviewed here.

Nitrile Specifier Proteins

Perhaps one of the most efficient mechanism is demonstrated by Pierinae caterpillars that consume food by biting and chewing plant leaf tissues. The consequence of the chewing action is that plant-based myrosinase enzymes react with the glucosinolate from neighbouring tissues and produces highly toxic compounds like isothiocyanates. Pierinae butterflies have a special ability to detoxify glucosinolates by converting them into less toxic compounds. They use Nitrile Specifier Proteins (NSPs) that convert glucosinolates into relatively benign nitrile compounds (Wittstock et al. 2004, Edger et al. 2015). A NSP requires Fe2+ for activity and it reacts with the sulfur group in the thioglycosidic bond. The release of this sulfur prevents the

127 | P a g e

formation of isothiocyanates and less toxic nitriles are formed (Zhang et al. 2017). The NSP

mediated glucosinolate detoxification system is thought to have evolved within 10 million

years after the evolution of the plant glucosinolate defence system (Wheat et al. 2007, Edger

et al. 2015). This mechanism is seen in lepidopteran insects, but not in other insects such as

hemipterans, and it has been proposed that this may be attributable to the destructive feeding

mode of lepidopteran insects (as compared with hemipterans) whereby they get higher

exposure to the toxic chemicals and so this radical counter-measure has been selected for in a

relatively short evolutionary time (Edger et al. 2015).

Desulfonation

Another lepidopteran, Plutella xylostella detoxifies by desulfonation of glucosinolates

2- using a sulfatase enzyme (Ratzka et al. 2002). Sulfatases remove the sulphate group (SO4 )

from glucosinolate molecules and the resulting molecule can no longer bind to myrosinase and

hence its conversion to toxic isothiocyanate compounds is blocked (Figure 4.1.1, Figure 4.1.2).

Figure 4.1.1 Process of desulfonation. The glucosinolate present in the gut gets desulfonated by sulfatase enzyme. The desulfoglucosinolte cannot react with myrosinase. The glucosinolate does not react with myrosinase hence do not produce toxic products. The and myrosinase action (adapted from (Ratzka et al. 2002)); Glu- Glucose molecule

128 | P a g e

Figure 4.1.2 Mechanism of glucosinolate detoxification in Plutella xylostella,

Bemisia tabaci and the larva of the sawfly Athalia rosae showing upregulation of

the sulfatases enzyme in the MG that removes the sulfate group from the

glucosinolate molecule and stops the synthesis of toxic products like

isothiocyanates.

Generalist hemipterans like Bemisia tabaci also use desulfonation mechanisms to detoxify glucosinolates (Malka et al. 2016). An LC-MS analysis of honeydew of B. tabaci suggests the strong presence of several different desulfonated glucosinolates like 4- methylsulfinylbutyl, 3-methylsulfinylpropyl, 4-methoxyindol-3-ylmethyl, 2-hydroxy-3- butenyl and 2-propenyl side chain containing glucosinolates. The corresponding intact glucosinolates were also present in the sample when fed on the brussel sprout, WT A. thaliana and artificial diet containing glucosinolates but not when fed on the artificial diet containing no glucosinolates (Figure 4.1.2).

Athalia rosae (Hymenoptera: Tenthredinidae) can also detoxify glucosinolates by a desulfonation process in the hemolymph (Opitz et al. 2011). In A. rosae benzylglucosinolate gets sequestered into the hemolymph before it can be detoxified and excreted. The

129 | P a g e benzylglucosinolate gets converted into desulfobenzylglucosinolate-3-sulphate in the hemolymph that can accumulate in the excreta over time (Figure 4.1.2).

Glutathione S-transferases and more General detoxification pathways

Schweizer et al. (2017) performed a comparative transcriptome study of the specialist lepidopteran, Pieris brassicae, and the generalist, H. virescens, to ascertain how they differed in their glucosinolate response. Both species were reared on two Arabidopsis thaliana plant accessions; one that had the glucosinolate pathway genetically knocked out and the other was wildtype (Schweizer et al. 2017). An upregulation of several detoxification-related genes like

CYP450s, ABC transporters, GSTs, sulfatases and CCEs (carboxyl/choline esterase) was found in H. virescens, whereas P. brassicae showed no change in the expression of these genes

(Figure 4.1.3). Thus, the generalist and the specialist species show very different responses.

Figure 4.1.3 Mechanism of Glucosinolate detoxification in Pieris brassicae or H. virescens.

Showing the upregulation of the CYP P450s, GSTs, CCEs and ABC transporters in

lepidopteran insects capable of feeding on brassica family plants.

130 | P a g e

Unlike Lepidopterans, not many dipterans feed on living plants. However, there are exceptions such as the leaf and stem miners of the genus Scaptomyza from the family

Drosophilidae. Most members of this family may not directly come across to the glucosinolates containing food as they mostly feed on deteriorating material. However, some Scaptomyza do and some of those are even considered Brassicaceae family host specialists. Whiteman et al.

(2012) showed that the feeding of Scaptomyza flava on wild-type (WT) Arabidopsis thaliana and glucosinolate knockout A. thaliana lines showed contrasting larval growth and contrasting induction of stress-related genes. Furthermore, changes in the DNA sequences gives information about the possible positive selection within a lineage. For example when the ratio between nonsynonymous (dN) and synonymous (dS) amino acid substitution is higher than one, it is usually considered to be under positive selection. Whiteman et al. (2012) found that the glucosinolate-upregulated genes show this signal of higher positive selection (dN/dS > 1). These results suggest an adaptive response to the presence of the glucosinolates in the diet.

However perhaps the most telling sign of adaptation among the Scaptomyza was the finding that the brassica specialist Scaptomyza nigrita has a gene duplication of Glutathione S- transferase D1 relative to more generalist sibling species S. flava (Gloss et al. 2014). One of the daughter genes of the gene duplication (GST-D1a) was then shown to encode a protein that efficiently conjugates glutathione to isothiocynates.

Glutathione S-transferase are part of what is referred to as the phase II detoxification process, where molecular adducts such as sugars or glutathione get added to a molecule that is often the oxidation product of a phase I enzyme (typically cytochrome P450s; (Williams 1959)). The formation of glutathione (GSH) conjugates is catalysed by Glutathione S-transferases (GST) enzymes. Isothiocyanates can passively transfer from the gut to the hemolymph, conjugates with glutathione are formed, without requiring P450s (and therefore in contrast to the phase

I/phase II dogma); , and then they are excreted from the system (Jeschke et al. 2016). An

131 | P a g e example is observed in Spodoptera littoralis, where the ITC-GSH is found among the excreted products (Schramm et al. 2012). GST epsilon 1 from S. littoralis has been found to be upregulated when fed on the diet containing indole-3-carbinol and allyl-ITC suggesting it may have a role in this detoxification mechanism (Zou et al. 2016).

A less frequently considered gene detoxification family is the Quinone reductases

(QRs), also known as NADPH dehydrogenases, which are considered phase II detoxification enzymes. QR activity has been reported in mammals, plants, bacteria (Wosilait and Nason

1954) and insects (Yu 1987). The QR present in insects also known for their induction due to allelochemicals present in the insect diet (Yu 1987, Barch and Rundhaugen 1994, Perez et al.

2010). Hence, understanding QR expression profile in generalist and specialist aphid species may provide more clarity about their detoxification/procession of glucosinolates in aphids.

4.1.2 Sequestration and the Mustard bomb of B. brassicae

In general, the two-component defence system in plants is less efficient against sucking pests like aphids than chewing insects like caterpillars. Aphids do not rupture plant cells during the feeding process. Hence, they avoid mixing of glucosinolates and myrosinases that are stored in different cells (Pentzold et al. 2014, Züst and Agrawal 2016). However, glucosinolates are still present in the plant phloem sap and the piercing-sucking type of insects still may have to deal with glucosinolates and/or their derivatives. Specialists aphid species like B. brassicae and L. erysimi can store the glucosinolates from the plant within its body (Pratt et al. 2008).

Furthermore, B. brassicae has been shown to have its own myrosinase enzyme (more on myrosinase gene family in Chapter V) that is present in the head and thorax region and non- flight muscles of aphid body but not in the abdominal muscles. The glucosinolates are stored in the cells adjacent to myrosinase storage cells (Bridges et al. 2002, Kazana et al. 2007) and these contain up to 16-fold higher concentrations than their host plant (Francis et al. 2001).

When predators like ladybird beetles attack B. brassicae, the dying aphid becomes a vessel for

132 | P a g e the reaction between glucosinolates and the endogenous myrosinase, which then releases toxic chemicals like isothiocyanates (Figure 4.1.4) that can stop the growth of predators and eventually kills the predator (Francis et al. 2001, Pratt et al. 2008). Reports also show that during this reaction, the dying aphid releases allyl isothiocyanate along with aphid pheromone

ß-farnesene. ß-farnesene is known for its function as an alarm pheromone, released by the dying aphid to let other aphid communities know of the death threat (Gibson and Pickett 1983).

These two types of chemical behaviour shown by B. brassicae illustrate mechanisms of interesting altruistic responses of an individual aphid towards its colony.

Figure 4.1.4 Interaction of glucosinolate and myrosinase enzyme in B. brassicae

Excretion and the honey dew of Myzus persicae

By contrast, generalist aphids like M. persicae take the more conservative approach and excrete glucosinolates in the honeydew but do not sequester glucosinolates in their body.

Hence, M. persicae feeding on glucosinolate-containing plants cannot affect the growth of the predators (Francis et al. 2001, Pratt et al. 2008). The HPLC analysis of the honeydew and

133 | P a g e whole body of M. persicae and B. brassicae shows that M. persicae excretes intact glucosinolates molecules through honeydew whereas B. brassicae stores a massive amount of glucosinolates in its body that can later be used against predator attack (Francis et al. 2001).

More detailed analysis shows that M. persicae excretes aliphatic glucosinolates directly into the honeydew whereas indolic glucosinolates get detoxified first and then released from the body (Kim and Jander 2007, Kim et al. 2008). As the M. persicae honeydew contains GSH conjugates, GST enzymes are implicated in this detoxification (Ramsey et al. 2010). Aim and hypothesis

The aim of this chapter is to examine the extent to which comparative transcriptomics can support the literature about the evolution of generalist and specialist aphid species to the glucosinolates produced by brassica plants. In particular, I will test the following specific hypotheses:

Hypothesis 1: Myzus persicae has a more diverse suite of genes in the canonical detoxification gene families than the specialist species B. brassicae and L.erysimi. This is in fact the hypothesis that was proposed by Ramsey et al. (2010) for which they found partial support by comparing detoxification gene families in the Acyrthrosiphon pisum genome to Expressed

Sequence Tags from Myzus persicae. However it is timely to revisit this in the light of the

M.persicae genome sequence, and the RNA-Seq data from two more specialist species,

B.brassica and L.erysimi.

Hypothesis 2: Midgut (MG) tissues of aphids have a greater diversity of genes belonging to gene families encoding detoxification enzymes (e.g. P450’s, GST’s) than other tissues such as the specialised bacteriocytes whose primary function is a house-keeping function. While it is well accepted that insect MGs typically harbour detoxification enzymes, it may not be the case

134 | P a g e in the aphid context and there is a formal possibility that the bacteriocytes have a role in detoxification as well as nutrition.

Hypothesis 3: Considering the differences in the glucosinolate handling in specialist and generalist insect I hypothesize that generalist and specialists have different mechanisms in glucosinolate detoxification.

135 | P a g e

4.2 Material and Methods

4.2.1 Insect rearing and maintenance

Same as mentioned in chapter II material and methods

4.2.2 Gene family sequence retrieval

Different detoxification gene family like transcripts were retrieved from the corset clusters. The detoxification related genes were searched using orthonome pipeline (Rane et al.

2017). Orthonome pipeline uses the database of well annotated gene structures from different aphid species. There are five different steps in orthonome pipeline. In the first step, it retains the longest isoform structure of a gene. A sequence similarity matrix is then created using

BLASTP (Mi et al. 2013) and a Smith-Waterman alignment algorithm. The second step involves the identification of ‘in-paralogue1’ sequences across different species which are then discarded from the analysis as they may distort the ancestral phylogenetic relationship between species. In the third step, gene sets are created that show the highest similarity with the five sequences from the reference database. The selected five sequences must show 1) low gap penalty (-1), 2) Alignment with >50% match (Smith-Waterman), 3) BLASTP bit score ≥30, 4)

E-value ≤10e-7. The selection of five related sequences reduce the requirement of high computing power in next step also it does not affect orthologue discovery as most of the in- paralogues are removed in second step. Orthonome then uses MSOAR algorithm (Shi G and

MC 2011) for orthonome prediction based on modified Smit-Waterman scores. In the last step,

1 In-paralogues are genes derived from two ancestral orthologues with same component that differentiate them from their orthologues counterpart.

136 | P a g e he output of the analysis categorises gene list into 1:1 orthologes (orthologes present in all the species), in-paralogues and de novo genes.

To identify genomic GST sequences in A. pisum and M.persicae BLAST searches of

Figure 4.2.1 Representation for In-paralogues explanation Image from -

https://genomevolution.org/wiki/index.php/File:Otu.png the latest genome assemblies (Acyr2.0 and MPER_G0061.0) were performed from NCBIs

Genome Data Viewer (https://www.ncbi.nlm.nih.gov/genome/gdv/?org=acyrthosiphon- pisum&group=hemiptera). The contigs were downloaded, the GST annotations from

Annotation release 102 of the A. pisum genome and Annotation release 100 of the M. persicae genome were acquired and manually inspected using the software program Artemis

(Rutherford et al. 2000).

4.2.3 Sequence analysis

The corset cluster sequences showing similarity with specific detoxification gene family were aligned using MAFFT. Sequence alignment was curated by removing all truncated

137 | P a g e and low-quality sequences. Sequence motifs represented by all the available sequences from the three species were selected for tree building using the Maximum-likelihood NNI (ML) method on FastTreeMP software (Price et al. 2010). The corresponding log fold change (LFC) values for all the transcripts were retrieved from EdgeR analyses and plotted on the tree using iTOL online tree drawing software (https://itol.embl.de) (Letunic and Bork 2007).

138 | P a g e

4.3 Results

4.3.1 Detoxification Multigene Family gene sets

The Orthonome pipeline was used to search a de novo assembled contig library containing transcripts (M. persicae: 47096, B. brassicae: 24674, L. erysimi: 26056) derived from RNA-Seq datasets derived from the three tissue samples from the three focal aphid species (M. persicae, L. erysimi, B. brassica; Chapter 2). As the hypotheses centre around gene families the first step was to establish the gene sets detected in each gene family. At the time of the initial analyses genome sequence data of Myzus persicae was just becoming available and no genomic data was available for the other two species. However, the genome of

Acyrthosiphon pisum was available and provided a useful reference although even the gene annotations from that species needed scrutiny. After identifying the gene sets the transcript abundance of each gene in each tissue was assessed.

4.3.1.1 Glutathione S-transferases (GSTs)

Particular attention is given to the Glutathione s-transferase because of their specific role in glucosinolate detoxification in other insect species and the presence of glutathione conjugates in M. persicae honey dew. GST sequences were retrieved from the literature and the genome databases and used to BLAST search the A. pisum and M. persicae genomes.

Several issues were immediately evident. One of these was that the previous datasets (Ramsey et al. 2010 and Shi et al. 2012) included microsomal GST but analyses here were limited to the cytosolic GST gene family for two reasons. Firstly, the microsomal gene families have not been associated with detoxification. Secondly the microsomal GST are not homologous to the cytosolic GSTs (e.g. the mitochondrial have multiple transmembrane domains and form trimers) and therefore inclusion of them in phylogenetic analyses violates the standard assumption that is evolution from a common ancestor.

139 | P a g e

Figure 4.3.1 Blast hits for the A. pisum GSTB (XP_008188180.1) sequence showing maximum similarity with gram negative bacteria (Family: Enterobacteriaceae) sequences suggesting possible bacterial sequence contamination in the A. pisum genome assembly.

Another issue was that many of the GenBank records of the GSTs analysed in these

previous publications have changed as better curation has been performed. For example, of the

18 sequence accession numbers listed in the supplementary table of Ramsey et al. (2010), 11

have been removed, 5 replaced, 1 updated and only one remained unchanged. A major reason

for removal included poor gene annotation often deriving from poor genome assemblies.

Another possible reason is illustrated in Figure 4.3.1 where a GSTs originally attributed to the

A. pisum genome (XP_008188180.1) is likely encoded by bacteria, possibly from the plant

pathogen Pantoea sp. For these reasons, the more recent annotations of the A. pisum

(annotation release 102) and M. persicae (annotation release 1000) genomes encoding GST-

like sequences were extracted. Then several iterations involving alignment, phylogeny

reconstruction, gene annotation adjustments were performed before settling with a final GST

gene list from the genomes of these two species of aphids.

140 | P a g e

A maximum likelihood tree of the GST complement of these two species is shown in

Figure 4.3.2. The GSTs were classified into subfamilies according to D. melanogaster cytosolic

GST types. None of the aphid species had any GSTs belonging to the highly conserved zeta

GST subfamily. Whereas A. pisum had two GST-omegas, one GST-theta, five GST-sigmas and eleven GST-deltas, Myzus persicae had two GST-omegas, one GST-theta, eight GST- sigmas and three GST-deltas. The subfamily classifications were independently supported by the intron:exon gene structure because the different subfamilies had quite distinct intron placements and intron phases. Some of these sequences do not appear to be complete GSTs and often it was not clear whether they resulted from incomplete sequence coverage or were in fact partial, possibly pseudogenic copies. If we assume the former, then A. pisum has more

GSTs than M. persicae.

Furthermore, if this phylogenetic tree in Figure 3.3.2 is taken at face value then gene gain and loss can be deduced. M. persicae shows evidence at least two GST-delta losses (there are no orthologs of ‘ApGst-1-like scaffold 646b’, or ‘ApGstD5’). There could be more losses of GST-deltas in M. persicae depending on whether Mp_NW019103970a diverged from a group of pea aphid GSTs (from scaffolds 18 and 647) at a speciation event or not. If

Mp_NW019103970a is interpreted as an ortholog (i.e. diverged at a speciation event) then there has been a rapid radiation in the pea aphid lineage spawning a set of highly divergent genes. However, the extent of divergence seems to be so large that the alternate hypothesis that

M. persicae has lost these genes may seem more likely. Because these genes are clustered in two locations, more than one of the GSTs may have been lost in the lineage of the sequenced

M. persicae strain simultaneously.

In contrast, M. persicae shows significantly more GST-sigmas (8) than A. pisum (5) and this can be attributed to gene loss in A. pisum (e.g. either the ancestor of

Mp_NW_019103900a and b, or each individually) and gene gain (Mp_NW_019100615a and

141 | P a g e b). GST-theta and GST-omega shows one likely loss event in both A. pisum and M. persicae respectively (Table 4.3.1).

Table 4.3.1 Gene loss and gain events in GST gene family within M. persicae and A. pisum.

M. persicae A. pisum

Loss Gain Loss Gain

GST-Delta 2(+7?) - - (7 ?)

GST-Sigma - 2(+1?) 2 (+1?) -

GST-Omega 1 - -

GST-Theta - - 1 -

142 | P a g e

Figure 4.3.2 Maximum-likelihood tree showing manually annotated GST complements of M. persicae and A. pisum. D. melanogaster GST sequences were used for classification purpose.

The tree also contains M. persicae GST related transcripts from the ‘corset’ assembly. Tree was rooted with D. melanogaster GST-zeta.

143 | P a g e

Figure 4.3.3 Maximum-likelihood unrooted tree showing manually annotated

GST complements of M. persicae and GST transcript sequences retrieved from

corset analysis of RNAseq data.

Glutathione S-transferase sequences were also extracted from the Orthonome analysis of M. persicae, L. erysimi, and B. brassicae transcriptomes and their inferred amino acid sequences were aligned to those obtained from the genome analyses. Figure 4.3.3 shows a tree with the M. persicae genome derived GSTs and the M. persicae cDNA derived GSTs. This shows a high concordance between the two data sets with only two annotated M. persicae sequences not expressed in the transcriptome dataset (NW_019103900_d and

NW_019103970_b).

144 | P a g e

Finally, a tree with GSTs from all four species (A. pisum, M. persicae, B. brassicae and L. erysimi) was constructed (Figure 4.3.4). This revealed that the two brassicae specialists, were much more like M. persicae and they had more sigma GSTs and less delta GSTs (Figure

4.3.4, Table 4.3.2). The transcript data from B. brassicae and L. erysimi were not included as there is no genomic sequence data available for these species. Because, conclusions solely based on the transcriptome data could give wrong picture about the gene loss and gain in aphids. Currently presented data can be considered trustful as it is validated by both genomic and transcriptome datasets.

Table 4.3.2 The number of GST genes in aphid species classified by subfamily. GST type M. persicae B. brassicae L. erysimi A. pisum

Sigma 8 8 7 5

Theta 2 3 2 1

Delta 2 2 2 11

Omega 1 2 2 2

Epsilon 0 0 0 0

Zeta 0 0 0 0

Other 0 0 0 0

145 | P a g e

Figure 4.3.4 Maximum likelihood tree showing relationship between GST-like transcripts generated from tissue transcriptome dataset of M. persicae, B. brassicae, L. erysimi. A. pisum sequences were generated by manual annotation on genome scaffolds. D. melanogaster sequences were used a reference for GST classification. The red square highlights the GST- epsilon clade and illustrates that they are not present in aphids. 146 | P a g e

4.3.2 Transcript abundance of GSTs within tissues of different species.

The expression values for all the GST related transcripts were extracted from the standard EdgeR output file (Table 4.3.3) showing log fold change (LFC) values for each transcript in MGs and bacteriocytes and expressed relative to the whole-body samples.

The overwhelming majority of GST genes (36/43) were transcriptionally enriched in the MG relative to the whole aphid (Figure 3.3.4). Most of these (27/36) were also enriched in the bacteriocytes relative to the whole body. Eight of the nine that were enriched in the MG but not in the bacteriocyte belonged to the GST-sigma clade. More specifically, four GST- sigma transcripts from M. persicae, two from B. brassicae and two from L. erysimi showed upregulation in the MG but showed down regulation in the Bcyt.

If the GSTs are partitioned into those that are enriched in the MG and those that are under-represented in the MG, for the generalist (over-represented=10, and under- represented=5) and the combined specialist species (26 and 2) then a significant interaction is revealed (Chi-square=4.9, P=0.027, Fisher’s test – 0.040). This appears to be driven partly by a delta (myzus_corset_23016: -7.1 LFC), and an omega GST (myzus_corset_28024: -1.66

LFC) that are depauperate in both tissues for M. persicae but the phylogenetically closest genes in the specialist species are enriched. A significant interaction is also observed if the GSTs are partitioned by enrichment/under expression in bacteriocytes for generalist and specialist species (Chi-square=4.5, P=0.033, Fisher’s test – 0.046).

There are several other interspecies differences worth noting. Both GST-Deltas from

M. persicae are downregulated in the gut (myzus_corset_23016: -7.1 LFC, myzus_corset_35564: -3.81 LFC), yet their orthologous transcripts from B. brassicae and L. erysimi are upregulated in the gut. GST-Sigma showed variable extents of under-representation in the Bcyt tissue. GST-theta showed similar expression trends in both MG and Bcyt, in all

147 | P a g e three species with the exception of one out of two transcripts from M. persicae showing downregulation.

Mid-Gut Bacteriocyte

Figure 4.3.5 Glutathione S-transferases (GST) cladogram showing relationship of expressed

GST clusters from M. persicae, B. brassicae and L. erysimi with A. pisum and D. melanogaster’s

annotated GST types. Bar graph on the right showing Expression level of GST transcripts in MG

compared to WB tissue. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi. Red

box showing clusters exclusively showing down regulation in M. persicae MG but upregulated

in B. brassicae and L. erysimi only. Purple box showing clusters upregulated in only M. persicae

MG but down regulated in B. brassicae and L. erysimi MG.

148 | P a g e

Table 4.3.3 Log fold change (LFC) values for all the clusters from the transcriptome data of

M. persicae, B. brassicae and L. erysimi showing similarity with GST

GST-Omega MG Bcyt GST-Sigma 1 MG Bcyt lip_corset_10040 0.81 0.38 myzus_corset_22952 3.10 -4.29 lip_corset_3348 0.40 0.02 myzus_corset_28396 3.58 -4.56 brevi_corset_9099 1.72 1.07 brevi_corset_22265 7.40 -0.76 brevi_corset_21982 1.39 0.70 brevi_corset_22274 4.13 -1.10 myzus_corset_28024 -1.66 -0.59 lip_corset_19369 7.54 -3.28 GST-theta MG Bcyt myzus_corset_22104 -1.29 1.40 lip_corset_3374 0.78 0.33 GST-Sigma3 MG Bcyt brevi_corset_5466 2.28 3.51 brevi_corset_12304 6.86 2.97 brevi_corset_22430 1.92 2.60 brevi_corset_13255 7.02 3.28 myzus_corset_14112 -4.88 -3.22 myzus_corset_32915 0.57 0.03 lip_corset_10641 5.75 1.43 myzus_corset_32919 1.27 1.36 lip_corset_16271 1.59 -0.41 GST-Sigma4 MG Bcyt brevi_corset_16898 1.79 1.45 lip_corset_7339 5.06 1.97 GST-Sigma 5 MG Bcyt brevi_corset_13066 5.82 3.44 myzus_corset_4170 2.25 -3.59 brevi_corset_13256 6.34 3.59 myzus_corset_4171 3.14 -3.49 lip_corset_11956 5.00 1.19 lip_corset_17336 -3.95 -1.70 myzus_corset_14119 1.26 1.00 brevi_corset_5020 -3.76 -2.89 GST-Delta MG Bcyt GST-sigma MG Bcyt myzus_corset_23016 -7.10 -4.38 myzus_corset_19458 3.26 -4.17 myzus_corset_35564 -3.81 0.69 myzus_corset_19459 2.62 1.52 lip_corset_11001 1.32 0.58 myzus_corset_19461 1.38 2.63 lip_corset_15313 1.54 0.73 lip_corset_11987 1.27 1.42 brevi_corset_12837 3.04 0.60 lip_corset_11997 1.27 1.21 brevi_corset_12834 2.76 0.44 brevi_corset_13677 4.05 2.86

149 | P a g e

4.3.2.1 The P450 gene set

For the other gene families, I went straight to the orthonome derived gene sets, rather than annotate the genomic families. This undoubtedly under-represents the actual numbers of genes, because not all the members of the families may be expressed. Furthermore, I was conservative and excluded partial sequences of which there were many.

Phylogenetic analysis with previously known P450s representatives sequences from D. melanogaster (Good et al. 2014) were used to classify the aphid P450 transcripts. Transcripts showing similarity with Cyp 4 (Cyp 4a+Cyp4g), Cyp315a1 Cyp49a1, Cyp301a1, Cyp306a1,

Cyp18a1, Cyp305a1, Cyp303a1, and Cyp6 were found from the all three experimental aphid species (Figure 4.3.6). Table 4.3.4 shows different type of P450 like transcripts retrieved from the tissue transcriptome. Interestingly there are more Cyp6-like genes transcribed in M. persicae (23) than B. brassicae (18) and L. erysimi (17). This is consistent with the hypothesis that generalist species M. persicae being equipped with large number detoxification genes that are constitutively expressed. Furthermore, B. brassicae shows less Cyp4-like transcripts (12) than M. persicae (17) and L. erysimi (14).

150 | P a g e

Table 4.3.4 The number of Cyp P450 genes in aphid species classified by subfamily.

M. persicae B. brassicae L. erysimi

Cyp4 17 12 14

(Cyp4a + Cyp4g) Cyp315a1 1 1 1

Cyp49a 1 1 1

Cyp301a 1 1 1

Cyp18a1 1 1 1

Cyp306a1 1 1 1

Cyp305a1 1 1 1

Cyp303a1 2 2 2

Cyp9 1 1 1

Cyp6 23 18 17

151 | P a g e

Figure 4.3.6 Cladogram showing Cyp P450 related transcripts classification based on the well characterised P450 genes from Drosophila melanogaster. Red- Cyp6; green-Cyp315; Brandeis blue-Cyp49; Bright green-Cyp301; Blue-Cyp306; Fluorescent blue-Cyp4g; purple-Cyp305a; dark olive green-Cyp303a; dark salmon-Cyp4aa; Dark orange-Cyp18a

152 | P a g e

P450s expression in the MG

Thirty-Nine P450 transcripts were enriched in the MG (18 from M. persicae, 13 from

B. brassicae and 8 from L. erysimi) and thirty-two were enriched in the bacteriocytes (11 from

M. persicae, 10 from B. brassicae and 11 from L. erysimi). The ratio of MG-enriched P450s to

MG-depauperate P450s was not significantly different between the generalist species and the specialist species (P>0.05). Nor was there a significant interaction between bacteriocyte expression of P450s and the generalists versus specialist species.

However, there were some specific P450s displaying clear differences in the species- specific MG expression. For example, the Cyp4-like clade showed a clear differential expression pattern between generalists and specialists by showing downregulation of the transcript only on M. persicae (myzus_corset_35883: -2.41 LFC) but upregulation in both specialists, B. brassicae (brevi_corset_8127: 3.42 LFC, brevi_corset_8129: 1.67 LFC) and L. erysimi (lip_corset_1154: 0.50 LFC) (Table 4.3.5). The Cyp305a1 clade shows downregulation in L. erysimi’s MG (lip_corset_2374: -2.94 LFC) but upregulation in M. persicae

(myzus_corset_23061: 4.3 LFC) and B. brassicae (brevi_corset_22417: 5.83 LFC) MG tissue.

P450s expression in the Bcyt

In Cyp4-like clade, one of the B. brassicae transcripts (brevi_corset_8129: -1.24 LFC) showed downregulation along with M. persicae (myzus_corset_35883: -2.69 LFC) while another B. brassicae (brevi_corset_8127: 1.10 LFC) transcript and L. erysimi (lip_corset_1154:

3.12 LFC) showed upregulation.

153 | P a g e

Comparison of P450 expression in the MG and Bcyt tissues

The striking difference between P450s MG and Bcyt expression is in a CYP6-like clade

(Figure 4.3.7). Most of the transcripts from this clade that showed upregulation in the MG tissue showed opposite pattern in the Bcyt tissues (Figure 4.3.7). Clades like CYP4-like transcripts that are upregulated in the Bcyt tissues of the experimental aphid species are downregulated in the MG tissues. Overall there are fewer CYP transcripts enriched in the Bcyt than MG tissues in M. persicae (MG – 17 Bcyt – 11). Out of 11 upregulated transcripts, only

5 transcripts showed considerable upregulation of one transcript from clade CYP6, CYP9,

CYP305a1, CYP306a1, CYP4. But the number of CYPs expressed in B. brassicae (MG – 11,

Bcyt - 10) and L. erysimi (MG – 9, Bcyt – 12) have not changed much.

154 | P a g e

Table 4.3.5 Log fold change (LFC) values for all the clusters from the transcriptome data of

M. persicae, B. brassicae and L. erysimi showing similarity with P450s. #N/A – transcript not differentially expressed. cyp4aa1 MG Bcyt Cyp303a1 MG Bcyt myzus_corset_29026 -4.36 0.42 myzus_corset_36441 -1.96 -1.94 lip_corset_21011 -1.88 0.97 lip_corset_4019 -1.03 -2.13 brevi_corset_10405 -1.42 1.04 brevi_corset_4506 -2.82 -3.17 brevi_corset_2882 #N/A #N/A Cyp9b2 MG Bcyt myzus_corset_778 #N/A #N/A brevi_corset_15263 2.51 4.24 lip_corset_18966 #N/A #N/A myzus_corset_19929 1.18 3.76 Cyp4g15 MG Bcyt lip_corset_18786 1.01 2.52 brevi_corset_13097 -2.46 2.23 Cyp6w1 MG Bcyt lip_corset_11102 -3.01 1.39 myzus_corset_18908 -5.99 -1.73 myzus_corset_17235 -5.09 0.31 lip_corset_15458 -3.54 0.57 brevi_corset_20385 #N/A #N/A brevi_corset_21606 -3.94 0.37 Cyp4C3 MG Bcyt brevi_corset_22984 -9.00 -5.85 brevi_corset_8127 3.42 1.10 lip_corset_19828 -5.02 0.75 brevi_corset_8129 1.67 -1.24 lip_corset_19832 -4.04 1.05 lip_corset_1154 0.50 3.12 lip_corset_23787 #N/A #N/A myzus_corset_35883 -2.41 -2.69 lip_corset_25283 #N/A #N/A Cyp6a20 MG Bcyt lip_corset_19460 #N/A #N/A brevi_corset_10036 -3.56 -0.87 myzus_corset_21936 -2.27 6.76 brevi_corset_10477 4.19 2.11 myzus_corset_8654 -6.60 -6.76 brevi_corset_10955 2.96 1.19 myzus_corset_16486 -7.80 -6.16 brevi_corset_16289 -8.64 -2.81 myzus_corset_8656 #N/A 0.34 brevi_corset_20332 -2.87 -4.39 myzus_corset_8664 -7.93 -7.42 brevi_corset_20733 8.32 -1.79 myzus_corset_8666 -5.31 0.50 brevi_corset_20735 8.93 -1.93 myzus_corset_35989 -8.95 -8.93 brevi_corset_20762 2.70 -0.92 myzus_corset_46885 -7.04 -7.54 brevi_corset_21482 -7.10 -5.14 myzus_corset_5073 -8.75 -6.30 brevi_corset_4379 -7.72 -4.46 brevi_corset_4493 8.94 -2.26 brevi_corset_3028 -3.00 -0.04 brevi_corset_5697 5.48 2.67 brevi_corset_3015 #N/A #N/A brevi_corset_690 -2.69 0.18 brevi_corset_18874 -0.32 -3.25 brevi_corset_8352 -3.72 -1.35 brevi_corset_18872 0.45 -0.90 brevi_corset_8432 -3.26 -4.95 brevi_corset_18869 0.79 -1.33 brevi_corset_9300 -3.14 -8.34 lip_corset_22002 #N/A #N/A lip_corset_11276 -3.58 -6.49 lip_corset_24507 #N/A #N/A lip_corset_11278 #N/A #N/A lip_corset_9878 -4.98 -2.32 lip_corset_12053 5.16 -1.93

155 | P a g e lip_corset_9923 -4.45 -2.07 lip_corset_14964 #N/A #N/A lip_corset_9934 -3.44 0.41 lip_corset_19942 8.28 -3.30 myzus_corset_22928 2.98 -0.62 lip_corset_20968 5.33 -3.34 myzus_corset_20172 -6.66 -5.51 lip_corset_20971 4.20 -3.41 myzus_corset_12328 1.09 -3.22 lip_corset_22263 #N/A #N/A myzus_corset_36689 -0.06 -0.95 lip_corset_2345 -5.02 3.97 Cyp315a1 MG Bcyt lip_corset_23457 #N/A #N/A myzus_corset_10824 0.98 0.43 lip_corset_23469 #N/A #N/A lip_corset_20730 0.61 0.38 lip_corset_25489 #N/A #N/A brevi_corset_15028 -0.75 -0.22 lip_corset_3954 5.87 1.15 Cyp49a1 MG Bcyt lip_corset_3956 4.77 -0.35 myzus_corset_13314 -4.65 -4.55 lip_corset_5059 -3.63 0.81 lip_corset_13122 -4.11 -2.97 myzus_corset_13456 1.59 0.03 brevi_corset_6775 -1.73 -3.13 myzus_corset_15988 3.94 -4.88 Cyp301a1 MG Bcyt myzus_corset_15991 3.44 -4.30 myzus_corset_32734 -4.30 -4.46 myzus_corset_17651 -3.71 -3.89 lip_corset_6813 -3.85 -2.61 myzus_corset_19544 -0.18 0.56 brevi_corset_6827 -4.80 -3.36 myzus_corset_24145 -6.99 -6.14 Cyp18a1 MG Bcyt myzus_corset_25221 -3.28 -7.49 lip_corset_21209 -5.20 -2.72 myzus_corset_25597 2.96 -1.57 myzus_corset_15808 -6.25 -4.84 myzus_corset_26346 2.95 -5.68 brevi_corset_21762 -5.94 -3.83 myzus_corset_34915 3.76 -7.14 Cyp306a1 MG Bcyt myzus_corset_37153 2.59 -9.04 lip_corset_2713 #N/A #N/A myzus_corset_37154 2.64 -7.30 brevi_corset_895 -2.44 -1.46 myzus_corset_37155 3.19 -7.04 myzus_corset_36768 -0.02 1.18 myzus_corset_37156 3.05 -7.40 Cyp305a1 MG Bcyt myzus_corset_39561 -3.15 -3.31 lip_corset_2374 -2.94 3.75 myzus_corset_39562 -3.96 -4.74 brevi_corset_22417 5.83 1.98 myzus_corset_43650 3.24 -2.95 myzus_corset_23061 4.30 1.63 myzus_corset_5099 3.46 2.48 myzus_corset_6987 -0.80 -0.82 myzus_corset_31287 -4.75 -3.47 myzus_corset_6987.1 -0.80 -0.82 lip_corset_23189 #N/A #N/A myzus_corset_6989 0.18 -0.42 brevi_corset_2307 -6.92 -2.74

156 | P a g e

157 | P a g e

Figure 4.3.7 Cytochrome P-450s (p450s) dendrogram showing relationship of expressed P450

clusters from M. persicae, B. brassicae and L. erysimi with D. melanogaster’s annotated P450

types. Bar graph on the right showing Expression level of p450s transcripts in MG compared

to WB tissue. Orange – M. persicae, Green – B. brassicae and Blue – L. erysimi. Red box

indicating clusters that shows opposite expression patterns in the MG and Bcyt tissues of M.

persicae, B. brassicae and L. erysimi.

4.3.2.2 Sulfatases

Compared to above mentioned gene families, the sulfatases are encoded by a much smaller gene family, although they hold a significant position in glucosinolate detoxification process.

The Orthonome analysis found three sulfatase-like transcripts in each specialist aphid whereas four where detected in the generalist aphid species. P. xylostella glucosinolates sulfatases sequences was used as an outgroup.

Two distinct patterns pertaining to the sulfatases were exhibited in the transcriptome dataset. Firstly, no sulfatases in any species was enriched in the bacteriocytes. Secondly one clade (containing transcripts - myzus_corset_33610 (LFC: 2.87), brevi_corset_4022 (LFC:

5.40), and lip_corset_963 (LFC: 5.04) showed upregulation in the MG in all three species

(Table 4.3.6). This suggests that it is the best candidate for a role in a glucosinolate detoxification mechanism.

158 | P a g e

Table 4.3.6 Log fold change (LFC) values for all the clusters from the transcriptome data of

M. persicae, B. brassicae and L. erysimi showing similarity with sulfatases

Sulfatases Gut Bcyt brevi_corset_4022 5.40 -3.68 lip_corset_963 5.04 -2.97 myzus_corset_33610 2.87 -3.51 lip_corset_15916 0.63 -0.52 myzus_corset_28183 -2.75 -1.06 brevi_corset_24357 -2.75 -4.77 myzus_corset_13209 -3.40 -5.28 myzus_corset_40464 -5.22 -9.81 lip_corset_25917 -5.22 -4.07 brevi_corset_1787 -5.32 -6.25

Figure 4.3.8 Sulfatase transcripts phylogenetic tree showing their differential expression

level in MG compared to WB in three aphid species. Orange – M. persicae, Green – B.

brassicae and Blue – L. erysimi

159 | P a g e

4.3.2.3 Quinone Reductase

Quinone reductase is another phase II detoxification mechanism candidate. There is one

full transcript showing similarity to Riptortus pedestris’s quinone reductase from all three-

species transcriptome dataset. Interestingly, all the transcripts showed upregulation in the MG

tissues but downregulation in the Bcyt tissues as compared to WB sample. L. erysimi transcript,

lip_corse_3170 is an incomplete transcript that could be the isoform of the quinone reductase

as it shows some non-consensus amino acid replacements towards the end of the transcript.

Table 4.3.7 Log fold change (LFC) values for all the clusters from the transcriptome data of

M. persicae, B. brassicae and L. erysimi showing similarity with quinone reductase

Quinone Reductase Gut Bcyt

myzus_corset_34057 2.31 -0.32

lip_corset_10778 5.89 -0.39

lip_corset_3170 5.56 -0.04

brevi_corset_20030 5.77 2.31

Mid-Gut Bacteriocyte

Figure 4.3.9 Quinone reductase transcripts phylogenetic tree showing their differential expression level in MG compared to WB in three aphid species. Orange – M. persicae, Green

– B. brassicae and Blue – L. erysimi

160 | P a g e

4.3.2.4 ABC transporters

ABCC type transporters have a synergistic relationship with Glutathione S-transferases

(GSTs) gene family (Sau et al. 2010, Leslie 2012, Liu et al. 2012) and the P450 gene family

(Bariami et al. 2012) in the process of detoxification/resistance development. The ABCC types are involved in the transportation of different cellular components including processed xenobiotics (Sau et al. 2010, Leslie 2012, Liu et al. 2012). Hence it is imperative to see the expression pattern of ABC transporters gene family in the generalist and specialist aphids.

Figure 4.3.10 shows the classification of ABC transporter like transcripts from different aphid species. They were classified based on the well annotated ABC transporters gene families from D. melanogaster. Overall ABC transporters are more common in the specialists, B. brassicae and L. erysimi. Large numbers of ABCB, ABCC, ABCG like transcripts show expression in B. brassicae and L. erysimi as compared to M. persicae. It is interesting to note that ABCB and ABCC transporters are known for their association with multi-drug resistance mechanisms. ABCG like transcripts in particular have expanded in the brassica specialists (26,

27) compared to generalist M. persicae (18) and this subfamily of ABC transporters is known for its role in lipid transportation.

161 | P a g e

Table 4.3.8 The number of ABC transporter genes in aphid species classified by subfamily.

M. persicae B. brassicae L. erysimi

ABCA 8 16 9

ABCB 2 7 7

ABCC 10 15 14

ABCD 2 2 2

ABCE 1 2 1

ABCF 3 4 4

ABCG 18 26 27

ABC-H 1 1 1

Table 4.3.9 The total number of ABC transporters showing enrichment in the MG and Bcyt tissues of the aphid species

Species Name MG Bcyt

M. persicae 19 23

B. brassicae 31 37

L. erysimi 30 32

162 | P a g e

Figure 4.3.10 Cladogram showing different categories of ABC transporter like transcripts from

M. persicae, B. brassicae, L. erysimi. The categories created based on the well annotated ABC transporter sequences from D. melanogaster. Blue-ABCA; cyan-ABCB; fluorescent green-ABCC; fluorescent yellow-ABCD; green-ABCE; copper-ABCF; red-ABCG; yellow-ABCH

163 | P a g e

ABC transporter expression in the Gut

The expression pattern for ABC transporters in the MG tissue is complex in all three species. Most of the ABC transporter subfamilies show mixed expression patterns with some transcripts upregulated in the MG relative to the whole body while others are downregulated.

The ABC-H clade were all downregulated in the MG tissue whereas ABCE was not differentially expressed in any of the three species. The ABCD subfamily showed downregulation in the M. persicae MG tissue but was not differentially expressed in B. brassicae and L. erysimi transcripts compared to WB except for one transcript

(brevi_corset_10327: -1.71 LCF). All the ABCC type transporters were upregulated in all three species except two from B. brassicae (brevi_corset_10558: -4.66 LCF, brevi_corset_10574: -

2.97 LCF and one from L. erysimi (lip_corset_20000: -3.60 LCF) (Table 4.3.10). A total of 16 transcripts from M. persicae showed upregulation in the MG tissues, of which 9 were from the

ABCC clade, whereas 28 transcripts from B. brassicae and 32 transcripts across all the ABC’s showed upregulation (Table 4.3.9).

ABC transporter expression in the Bcyt

In Bcyt tissues, ABCB type transcripts from M. persicae showed downregulation while most of the B. brassicae and L. erysimi transcripts showed upregulation. Half of the ABCC

Type M. persicae transcripts showed downregulation, and most of the B. brassicae and L. erysimi transcripts showed upregulation (Table 4.3.10).

Comparison of ABC transporters expression in MG and Bcyt

One subclade from the ABCA subfamily (myzus_corset_39649, brevi_corset_8235, lip_corset_21328) shows over represented in the MG but underrepresented in the bacteriocytes.

Most of the ABCD and ABCG like transcripts shows underrepresentation in the MG but overrepresentation in the Bcyt tissues (Table 4.3.10)

164 | P a g e

Table 4.3.10 Log fold change (LFC) values for all the clusters from the transcriptome data of

M. persicae, B. brassicae and L. erysimi showing similarity with ABC transporters

ABCA MG Bcyt ABCE MG Bcyt myzus_corset_39649 3.30 -0.47 lip_corset_19017 -0.31 -0.86 lip_corset_21328 4.60 -4.02 myzus_corset_26496 0.09 -0.11 brevi_corset_8235 2.26 -5.75 brevi_corset_5410 0.03 0.01 myzus_corset_21961 1.93 2.70 brevi_corset_7487 -0.27 0.71 lip_corset_8038 2.51 3.55 ABCF MG Bcyt lip_corset_8042 2.38 3.28 lip_corset_13652 0.00 0.10 brevi_corset_6200 1.92 2.20 brevi_corset_23514 -0.64 -1.45 myzus_corset_22857 1.06 2.91 myzus_corset_36850 0.33 -0.12 lip_corset_11083 0.96 1.75 brevi_corset_23339 -0.72 -1.63 brevi_corset_7804 2.01 3.80 myzus_corset_30907 0.20 -0.08 lip_corset_2585 -0.95 -0.76 lip_corset_12359 -0.36 -1.22 lip_corset_1792 -0.09 1.67 myzus_corset_12394 0.24 0.43 brevi_corset_10877 0.82 2.27 lip_corset_19038 0.34 0.07 brevi_corset_20725 0.36 0.08 lip_corset_24648 -5.05 -3.79 lip_corset_8502 -2.29 7.16 lip_corset_17259 0.35 -0.15 brevi_corset_11655 -3.32 6.11 brevi_corset_2986 -8.04 -8.04 ABCG MG Bcyt myzus_corset_46010 -1.92 -5.31 myzus_corset_22316 -1.89 1.25 lip_corset_10607 0.93 1.78 lip_corset_8020 -0.25 1.37 brevi_corset_1313 -4.18 -5.12 brevi_corset_5043 -1.38 0.75 lip_corset_21420 5.33 -0.09 brevi_corset_11426 5.26 -1.11 lip_corset_10362 -1.02 0.08 myzus_corset_35312 -3.90 -7.16 myzus_corset_32023 -1.22 1.64 lip_corset_3246 -3.45 -4.58 lip_corset_4266 -4.22 -1.59 brevi_corset_18172 -4.86 -4.74 brevi_corset_7448 -4.59 -2.11 lip_corset_19615 -7.16 -4.49 lip_corset_13854 -0.07 -1.40 brevi_corset_19233 -6.69 -3.91 brevi_corset_6848 -6.18 -1.62 myzus_corset_28972 -5.99 -4.72 myzus_corset_33567 -0.67 -0.57 lip_corset_17475 -0.01 -0.45 lip_corset_19534 -0.52 0.58 myzus_corset_33808 -0.65 3.21 brevi_corset_18457 0.74 -0.25 lip_corset_7089 -2.69 2.31 myzus_corset_16996 -4.38 1.45 lip_corset_4786 -3.20 -0.85 lip_corset_20150 -3.98 -0.86 myzus_corset_30668 -3.78 -0.20 brevi_corset_23062 -2.92 1.83 lip_corset_4789 -2.74 -0.54 ABCB MG Bcyt myzus_corset_17363 -0.69 1.64 lip_corset_10504 0.71 1.86 lip_corset_16309 -0.41 2.42 brevi_corset_10083 0.77 0.73 brevi_corset_19282 -0.07 1.70 lip_corset_249 1.62 0.61 myzus_corset_36062 -0.60 3.13

165 | P a g e brevi_corset_7620 2.00 1.92 brevi_corset_17128 0.89 4.04 lip_corset_12584 1.21 -0.24 lip_corset_10508 0.96 3.53 lip_corset_1603 0.67 -0.82 brevi_corset_11597 -0.31 2.72 brevi_corset_4365 1.50 1.49 brevi_corset_11819 0.77 4.09 myzus_corset_13192 -3.05 -0.74 myzus_corset_37711 0.38 3.82 lip_corset_18625 -3.64 -1.38 brevi_corset_13794 -3.05 -2.10 myzus_corset_35346 -3.49 -0.39 lip_corset_9951 4.97 0.00 lip_corset_3423 -3.66 -0.10 brevi_corset_13885 7.26 4.69 brevi_corset_11482 -0.60 1.20 myzus_corset_29772 2.84 -4.63 brevi_corset_21773 -3.29 3.07 brevi_corset_13884 6.17 3.72 myzus_corset_35335 -1.22 0.28 lip_corset_5452 4.94 -0.08 lip_corset_2719 -1.00 0.89 brevi_corset_24343 -2.41 6.75 lip_corset_2721 -1.00 0.64 ABCC MG Bcyt myzus_corset_22456 -2.38 0.31 myzus_corset_37995 1.17 3.08 lip_corset_3834 -1.75 -0.13 myzus_corset_37996 1.79 3.97 brevi_corset_4557 -1.35 -0.27 myzus_corset_40640 3.65 -4.63 myzus_corset_21904 -0.20 0.27 myzus_corset_40638 3.17 2.90 lip_corset_17072 -0.61 0.63 myzus_corset_40646 3.08 2.33 brevi_corset_13810 -2.52 -1.10 myzus_corset_20440 3.17 -4.39 myzus_corset_14752 -2.94 -2.10 myzus_corset_40643 2.39 -5.37 lip_corset_4071 -2.42 -1.66 myzus_corset_40647 3.90 -7.60 myzus_corset_26163 -1.21 -2.45 myzus_corset_27820 0.41 1.06 lip_corset_5352 -1.93 1.43 myzus_corset_8553 -2.07 2.37 brevi_corset_5562 -8.42 -3.60 lip_corset_3218 0.83 0.19 lip_corset_7942 0.89 1.40 myzus_corset_25819 -5.04 -2.99 lip_corset_9628 0.76 1.61 lip_corset_6797 -5.35 -3.13 lip_corset_21177 0.41 -0.23 brevi_corset_22873 -4.37 -4.34 lip_corset_7666 -0.51 2.22 myzus_corset_31056 -4.87 -3.21 lip_corset_5355 2.48 2.88 lip_corset_2464 -5.15 -2.02 lip_corset_12377 2.61 2.83 lip_corset_2460 -5.36 -0.85 lip_corset_7676 2.98 3.43 brevi_corset_19333 -4.21 -3.75 lip_corset_4404 6.61 -0.41 brevi_corset_8952 6.85 0.00 lip_corset_9724 -1.02 0.41 lip_corset_5290 -3.16 -2.45 lip_corset_6536 3.84 -1.50 brevi_corset_19477 -3.13 -3.42 lip_corset_11293 6.13 -0.02 lip_corset_10649 3.13 -0.27 lip_corset_8212 2.04 5.66 brevi_corset_17950 3.34 0.62 lip_corset_9620 4.01 3.85 myzus_corset_38401 -3.42 -5.06 lip_corset_20000 -3.60 -1.01 lip_corset_16005 5.10 -0.72 brevi_corset_13707 0.41 1.60 brevi_corset_2017 0.40 6.75 brevi_corset_12845 0.29 2.01 lip_corset_22474 -0.12 -0.01 brevi_corset_15443 4.66 3.11 brevi_corset_10417 5.31 3.02 brevi_corset_16696 5.57 3.64

166 | P a g e brevi_corset_12223 -0.60 3.68 myzus_corset_17364 -4.44 4.99 brevi_corset_10558 -4.66 1.99 brevi_corset_13235 0.07 1.77 brevi_corset_10574 -2.97 1.74 lip_corset_3441 -0.20 3.98 brevi_corset_10568 0.21 0.05 lip_corset_9750 -1.12 5.01 brevi_corset_7088 5.53 -1.29 brevi_corset_11540 4.13 6.66 myzus_corset_24631 0.57 0.74 brevi_corset_11543 5.20 6.17 brevi_corset_11652 -0.05 -0.12 brevi_corset_11546 4.02 4.71 lip_corset_12325 -1.62 3.93 brevi_corset_11541 7.14 1.22 brevi_corset_11654 0.91 0.65 brevi_corset_10467 1.63 1.88 ABCD MG Bcyt myzus_corset_20826 2.04 -0.30 myzus_corset_36029 -0.75 0.20 lip_corset_18248 1.69 0.06 lip_corset_6674 0.41 1.27 brevi_corset_6663 0.56 0.65 brevi_corset_7210 -0.17 0.75 ABC-H MG Bcyt myzus_corset_20157 -0.91 0.96 myzus_corset_34423 -3.04 -5.50 lip_corset_8098 0.41 1.08 lip_corset_24830 -4.54 -5.63 brevi_corset_10327 -1.71 -0.30 brevi_corset_7547 -2.12 -5.33

167 | P a g e

Mid-Gut Bacteriocyte

Figure 4.3.11 ABC Transporter transcripts phylogenetic tree showing their differential expression level in MG compared to WB in three aphid species. Orange – M. persicae, Green

– B. brassicae and Blue – L. erysimi

168 | P a g e

4.3.3 Differences in the number of transcripts with similar type of expression pattern in

different tissues of different aphid species

The MG tissues of aphids constantly get exposed to the stream of secondary metabolites trough their diet that could be toxic. Generalist and specialist aphid species may have different mechanisms to neutralise toxic compounds. Here I tried to find if certain gene families had a greater MG abundance than Bcyt or WB abundance. This will also help identify specific transcripts from these gene families that act as housekeeping gene for MG and Bcyt tissues.

The number of transcripts from each gene family were counted that showed upregulation in the MG or Bcyt tissues. Then the transcripts that are showing upregulation in the MG tissues but downregulation in Bcyt tissues were counted, representing transcripts exclusively upregulated in the MG tissue. Similarly, transcripts with specific Bcyt upregulation were counted.

Overall, the total number of transcripts belonging to the GSTs and the Cyp P450s showing upregulation in the MG tissue is slightly higher than the number of transcript upregulated in the Bcyt tissues of all three species with minor exceptions (Cyp P450 in L. erysimi). The ABC transporters like transcripts shows maximum presence in the Bcyt tissue.

When comparing transcripts that shows upregulation in either one of the tissues, MG shows dominance for GSTs and Cyp P450s, whereas ABC transporters shows comparable or slightly higher in Bcyt tissues. Most of the sulfatases and quinone reductase shows upregulation in MG tissues only.

169 | P a g e

Table 4.3.11 Table showing number of genes from each sample of each species showing transcript abundance as compared to WB samples. Numbers in the bracket representing transcripts showing upregulation in that tissue but downregulation in other tissue of same species.

M. persicae B. brassicae L. erysimi

MG Bcyt MG Bcyt MG Bcyt

GSTs 10(5) 6(2) 14(2) 12(0) 12(2) 9(0)

Cyp P450s 18(12) 12(2) 12(5) 10(2) 9(5) 13(7)

ABC transporters 19(8) 23(8) 31(6) 37(9) 30(6) 32(5)

Sulfatases 1(1) 0(0) 1(1) 0(0) 2(2) 0(0)

Quinone reductases 1(1) 0(0) 1(1) 1(0) 2(2) 0(0)

170 | P a g e

4.4 Discussion

Detoxification mechanisms in any insect form a complex and dynamic system. Several factors like the nature of a compound, the concentration of a compound and the different combinations of the compounds affect the response that is elicited. This chapter explores the possibility that different detoxification mechanisms are triggered in the generalist relative to the specialist aphids upon feeding on brassicaceae plants that produce glucosinolates as the primary defence compound against herbivores. Although the proposed hypotheses for this study focus on universal pattern of detoxification related genes, my analysis of each of these gene families provides insights into the gene family composition and evolution and therefore they will be first being discussed in the context of the literature. At the end of this discussion section I explain the outcomes with respect to the proposed hypotheses.

4.4.1 GST

GST mediates the binding of glutathione (GSH) to several molecules including processed products of plant secondary metabolites. This binding of GSH inactivates or detoxifies these compounds that are hazardous for the system (Ranson and Hemingway 2005).

Hence, the presence of glucosinolates in the diet hints at the induction of GST enzyme mediated mechanisms in the MG tissues.

Among the several different types of GSTs, delta and epsilon type GSTs are known to be exclusive to insects (Ranson and Hemingway 2005) and both are known for their involvement in the detoxification of xenobiotics in insects (Ranson et al. 2002). A previous study reports one predicted a GST epsilon and 16 GST delta from A. pisum (Shi et al. 2012).

However neither the initial official gene annotation for A. pisum, which reported the presence of only three types of GSTs (GST-delta, GST-sigma, GST-theta) (IAGC 2010), nor the set of

GSTs retrieved from the current genome annotation includes any epsilon GSTs (rather it has

171 | P a g e sigma, theta, delta and omega types). GST phylogenetic analysis performed in this study also confirms the absence of GST-epsilon in aphids (Figure 4.3.4)

In contrast to the A. pisum genome there are much fewer GST-deltas in the three brassica feeding species. Furthermore, only two transcripts showing similarity with GST-delta type were observed in my transcriptomes of each species. The GSTs of the A. pisum genome were manually annotated and the GST delta radiation in that species is confirmed. This suggests either species specific gene gain events in the A. pisum lineage or gene loss events in other aphid species (Figure 4.3.2). These GST-delta like transcripts in the A. pisum could have possible role in the detoxification of large number of alkaloids present in the plants from

Fabaceae family.

The induction of GST activity in M. persicae upon feeding on brassica family plant has already been shown (Francis et al. 2005). But in the GST activity shown in this study is universal within M. persicae and does not mention any specific subcategory GST.

The pattern of GST-omega and delta expression shows clear downregulation in the generalist M. persicae and upregulation in the specialists, B. brassicae and L. erysimi. This suggests that certain generalised defence mechanisms governed by GST delta are not in the generalist but active in specialist species. This result is in congruence with the fact that the delta and omega GST has similar substrate (4-nitrophenethyl bromide) preferences

(Yamamoto and Yamada 2016).

The sigma clade of GSTs is the largest clade among the brassica feeding aphids (but not the pea feeding aphid). Furthermore, most of them are upregulated in the MG tissue and have low expression in the bacteriocytes. In other species GST sigma’s are known for their role against reactive oxygen species (Yang et al. 2012). Given that glucosinolates can form ROS and that previous studies show that the injury caused due to aphid feeding and the components

172 | P a g e in the aphid saliva promotes the production of reactive oxygen species in the phloem tissue

(Moran et al. 2002, Zhu-Salzman et al. 2004, Divol et al. 2005), these sigma GSTs could have played an important role in aphid specialization on brassica plants.

No GSTs from the zeta class were found among the aphids which was, at first glance unusual because they are highly conserved between plants and animals. However, GST zeta are known as maleylacetoacetate isomerases and play an essential step in amino acid catabolism. They reduce the phenylalanine toxicity in fungi and humans (Fernández-Cañón and Peñalva 1998). However, an aphid diet lacks phenylalanine, they cannot synthesise it due to lack of necessary machinery and they rely on getting phenylalanine from their obligate symbiont B. aphidicola. Hence, the absence of the GST-zeta class indicates that phenylalanine provided by Buchnera rarely reaches toxic presumable because it is efficiently assimilated by the aphid. And there are no epsilon type GSTs present in the aphids.

4.4.2 P450s

The induction of the P450s has been found to be associated with the high tolerance to plant secondary metabolites in several insects. Helicoverpa zea is highly tolerant to xanthotoxins whereas Manduca sexta is tolerant to nicotine and found involve in the induction of P450 enzymes expression (Snyder and Glendinning 1996, Li et al. 2000, Sasabe et al. 2004).

P450s belonging to the Cyp6 clade (subclades B, AB, AE, DA), in particular, have been associated with detoxification in Lepidoptera (Wen et al. 2003, Li et al. 2004, Mao et al. 2006,

Wen et al. 2006, Mao et al. 2007, Niu et al. 2011, Crava et al. 2016), Hemiptera (Peng et al.

2016), Coleoptera (Blomquist et al. 2010), Diptera (Andersen et al. 1997) and Hymenoptera

(Mao et al. 2009) Aphis gossypii shows the upregulation of cyp6DA2 in response to nicotine in the diet. The study performed here found 23 transcripts found in M. persicae showing similarity with cyp6a type and 13 of those were upregulated in MG tissue. In B. brassicae out

173 | P a g e of 16 transcripts only 7 were upregulated while in L. erysimi out of 15 transcripts only 6 were up-regulated in MG and showed similarity with cyp6a. The MG expression of these P450s supports the hypothesis that cyp6a might be involved in detoxification, possibly glucosinolates, but would require biochemical or biological assays to confirm their role.

4.4.3 ABC transporters

There are several different types of ABC transporters, and they have diverse roles in the management of molecule trafficking across cell membranes. ABCs can play a role in the transport of sugars, polysaccharides, lipids, amino-acids, peptides, toxic metabolites, metals, inorganic ions and drugs (Dermauw and Van Leeuwen 2014). Several PSM or toxic compounds ingested through diet need to be exported from the system, and ABC transporters play an important role in this process (Sorensen and Dearing 2006). In general, my dataset shows ABCB and ABCC type transporters are particularly upregulated in the MG tissue of all three experimental aphid species. ABCB and ABCC transporters are also known as multidrug resistant protein (MDR) and multidrug resistant associated protein (MRP) respectively

(Dermauw and Van Leeuwen 2014). ABCB is implicated in insecticide transport or causing resistance in D. melanogaster (Chahine and O’Donnell 2009). The permeability glycoproteins

(P-gp), a type of ABCB transporter, helps to regulate the absorption or secretion of the xenobiotic compounds in different tissues. It also helps to distribute these chemicals in order to reduce their toxicity at one location (Dietrich et al. 2003, Chan et al. 2004). This may suggest that the ABCB transporter could have role in glucosinolate detoxification. The ABCB transporters are usually present in large amounts in the top cell layer of barrier membrane structures, and these transporters have a broad range of substrates e.g. organic, lipophilic, polar or nonpolar compounds (Sharom 1997, Schinkel and Jonker 2003).

P-gp has been found to play a role in the removal of xenobiotics against the concentration gradient by active efflux processes (Sharom 1997, Dietrich et al. 2003). This

174 | P a g e type of mechanism has also been found in the tobacco hornworm, a specialist insect that can feed on nicotine containing plants. A protein showing similarity with P-gp has been found in the malpighian tubules and blood-brain barrier tissues suggesting this protein is acting as a nicotine pump in tobacco hornworm (Murray et al. 1994, Gaertner et al. 1998). In current dataset, there are only two ABCB like transcripts found in M. persicae transcriptome dataset whereas there are 7 transcripts from each specialist aphid species, B. brassicae and L. erysimi.

Only one transcript from L. erysimi (lip_corset_18625) and two from B. brassicae

(brevi_corset_13794, brevi_corset_24343) showed less abundance in the MG tissue. Whereas the out of two transcripts from M. persicae only one showed enrichment in MG tissue

(myzus_corset_29772). ABCB like transcripts shows mostly enrichment for the B. brassicae in the Bcyt tissue, mixed expression pattern for L. erysimi and less enrichment for both the transcripts in M. persicae. This suggests the upregulation of ABCB type transporters in the MG tissues could help remove extra glucosinolate ingested from the diet.

ABCC transporters (MRP) are known for their role in the removal of different drugs from the system, (Deeley et al. 2006). ABCC type transporters help to export the glutathione conjugates from the cells (Hayes et al. 2005). The transcriptome data reports downregulation

(-2.07 LFC) of one transcript (myzus_corset_8553) that shows high similarity with an ABCC transporter. This can be attributed to the downregulation of GST delta in M. persicae. The immediate orthologs of ABCC transporters from B. brassicae and L. erysimi shows the upregulation pattern with one minor exception.

4.4.4 Sulfatases

Glucosinolate molecules contain a sulfate group that allow them to bind to the myrosinase enzyme to form aglucon. Aglucon is then converted in to toxic products. The defensive strategy that some lepidopterans use is to by remove the sulfate group from the glucosinolate with sulfatase enzymes. The presence of the sulfatases in the aphid transcriptome

175 | P a g e dataset raises the possibility of a sulfatase-mediated defence mechanism in aphids. There are four transcripts from M. persicae and three transcripts from B. brassicae and L. erysimi that are similar to arylsufatases from A. pisum. Only one type of sulfatase was found upregulated in the MG tissues of all three species, whereas all the transcripts showed downregulation in the

Bcyt cells. This suggests that the sulfatases are very active in the hemolymph and hints at the involvement of the sulfatases in glucosinolate detoxification that has been absorbed in the hemolymph. The other downregulating sulfatases transcripts suggests their high expression level in the hemolymph that can also act on the glucosinolate molecules that are absorbed via gut.

4.4.5 Quinone reductase (QR)

Several plant secondary metabolites are known to induce the QRs expression (Barch and Rundhaugen 1994, Perez et al. 2010). The upregulation of all the transcripts from M. persicae (3), B. brassicae (1) and L. erysimi (3) suggests QRs could be involved in the glucosinolate detoxification process.

Overall the detoxification of the glucosinolates seems to be an impressively dynamic process in aphids. This chapter has shed some light on some of the aspects of the glucosinolate detoxification mechanism that might be active in aphids. This chapter also suggested different possible ways that a generalist and specialist aphid could handle glucosinolate toxicity based on comparative transcriptomic analysis between tissues and between different aphid species.

To understand the relevance of this data in broader sense, the findings were analysed considering the following proposed hypotheses

Hypothesis 1 – The generalist has more detox genes than the specialist

M. persicae can feed on host plants of different plant families. These plant families have similar to completely different secondary metabolites profiles that have role in plant

176 | P a g e defence system. Hence, it was hypothesized that M. persicae may have more number of diverse set of genes of known detoxification gene families as compared to specialist aphid species, B. brassicae and L. erysimi that can feed on only host plants from Brassicaceae family. While brassica eating aphids have many more sigma GSTs and data is consistent with them acting in detoxification mechanism, there is not a difference between the generalist and specialist species

(Table 4.3.2). There are similar number of transcripts present in all three-aphid species.

Whereas the Cyp P450 dataset shows higher number of Cyp4 and Cyp6 like transcripts present in the M. persicae (Cyp4 –17, Cyp6 – 23) as compared to B. brassicae (Cyp4 – 12, Cyp6 – 18) and L. erysimi (Cyp4 – 14, Cyp6 – 17); (Table 4.3.4). The transporters that have possible role in the transportation of PSM or detoxified products shows different pattern with high numbers of ABC transporter like transcripts present in specialist, B. brassicae (ABCB – 7, ABCC – 15) and L. erysimi (ABCB – 7, ABCC – 14). This number is smaller in generalist M. persicae

(ABCB – 2, ABCC – 10); (Table 4.3.8). Overall the results suggest that our hypothesis is true for Cyp-P450 gene family with diverse suite of genes present in the generalist, M. persicae but it is not true for GSTs and ABC transporters.

Hypothesis 2 –The midgut has more detox genes than the bacteriocytes

Hypothesis 1 is enhanced if the genes from each of the detoxification gene families

(GSTs, Cyp P450’s and ABC transporters), shows enrichment of transcripts in MG tissue as this tissue gets exposed to all the toxic compounds present in the ingested food. Hence, the second hypothesis was proposed to understand if MG tissue shows enrichment of the higher number of transcripts than Bcyt tissues. GST like transcripts shows five transcripts exclusively enriched in the MG tissue only. The Bcyt tissue on the other hand has only 2 transcript sequences showing enrichment in M. persicae. The MG tissue of specialist aphid (B. brassicae and L. erysimi) showed similar pattern but with only two GST like transcripts enriched in MG tissues each but none in Bcyt tissue. This suggests that the generalist aphid have extra GST

177 | P a g e transcripts expressing in the MG tissue, some of them, perhaps, playing role in the detoxification of variety of toxic compounds from different host plants

Table 4.3.11, numbers in the parentheses). Cyp P450 also showed only MG specific enrichment for twelve transcripts of M. persicae whereas only 2 enriched in Bcyt tissue. In specialist aphid species, similar pattern is observed in B. brassicae (MG – 5, Bcyt – 2) but not in L. erysimi

(MG – 5, Bcyt – 7);

Table 4.3.11, numbers in the parenthese). On the other hand, ABC transporter like transcripts does not show accumulation of MG tissue specific transcripts expression. Hence, the MG tissue does have a greater diversity of genes belonging to detoxification enzymes (GSTs and P450s).

The specialist aphid species sequester glucosinolates in their body. Generalist aphid species cannot sequester glucosinolates. The mechanism for sequestration of glucosinolates is unknown. The gene families investigated in this thesis are not known for such mechanism, but they have roles in the detoxification of several different plant secondary metabolites/xenobiotic compounds. Overall the number of GST-like and Cyp P450-like transcripts showing enrichment in MG tissue of all three species is higher than number of transcripts enriched in the Bcyt tissue (exception – Cyp P450 in L. erysimi,

Table 4.3.11). The total number of ABC transporter like transcripts showing enrichment is higher in Bcyt tissue than MG tissue. This can be attributed to the high volume of metabolite communication between Bcyt – MG and Bcyt – hemolymph where ABC transporters has role in the transport of sugar, amino acids, lipids, lipopolysaccharides, peptides (Higgins 1992).

This suggests that the specialist aphid species has highest number of transcripts related to transport mechanism, but not specific to glucosinolate transport mechanism, expressing in their body as compared to generalist M. persicae species. The detoxification related transcripts showed MG specific expression in all the species with minor exceptions. But the transcript

178 | P a g e diversity was higher in generalist aphid species suggesting specialist might have lost genes that were present in common ancestor of generalist and specialist aphid species whilst generalist aphid has retained them to be able to be feed on different host plants.

Hypothesis 3: Generalist and specialists have different mechanisms in glucosinolate detoxification.

The comparison of the detoxification related gene families of three aphid species was conducted by considering the experimental design suggested by (Ali and Agrawal 2012). This comparison is referring to specific generalist/specialist comparison and may not be applicable to all the generalist or specialist aphid species. As already explained, specialist aphids store aliphatic glucosinolates whereas generalist is not capable of storing any type of glucosinolates.

Based on the enrichment pattern of different detoxification related gene families, I developed two models that explains the possible mechanisms for glucosinolate detoxification in both generalists and specialist aphid species. These models do not represent comprehensive mechanism for the glucosinolate detoxification. But it will be helpful provide guidance for future research in glucosinolate detoxification.

Possible mechanism of glucosinolate detoxification in specialist aphid species

These models are inspired from and based on two main facts: 1) Previous study showed that the generalist, M. persicae, shows intact/unprocessed aliphatic glucosinolates in its honeydew whereas specialist, B. brassicae, honeydew shows detoxified products derived from aliphatic glucosinolate (Kim and Jander 2007, Kim et al. 2008). 2) Brassica family plants produces indolic glucosinolates upon aphid herbivory (Kim and Jander 2007). This suggests that these two insects have different mechanisms to deal with the toxicity of indolic and aliphatic glucosinolate glucosinolates.

179 | P a g e

The detoxification related gene families show complementarity towards each other.

Hence, by comparing expression pattern of all the detoxification genes analysed from generalist and specialist aphid species, I scrutinised two gene pairs that could have complimentary role in detoxification of specific type of glucosinolate:

1) GST-D and ABCC, which are known for their co-expression and synergistic relationships

(Sau et al. 2010, Leslie 2012, Liu et al. 2012). The GST-D and ABCC gene pair is hypothesised to have role in the aliphatic glucosinolate detoxification. Transcripts related to these genes showed enrichment in the MG tissue of specialist aphid species but not in the generalist, M. persicae. This is complimentary to the fact that only specialist aphid species process aliphatic glucosinolate and excrete. Among all the GST types GST delta was selected as it shows enrichments in MG of only specialist aphid species whereas consistent downregulation in the generalist MG tissue (Table 4.3.3, Figure 4.3.5). The association of the ABCC transporters with the transport of sulfate and glutathione conjugated organic anions also supports the paring of GST-D and ABCC together (Dermauw and Van Leeuwen 2014); (Figure 4.4.1, Figure

4.4.2).

2) The second gene set selected contained Cyp6 and ABCB. Cyp6 is well known for its involvement in the detoxification process (Ranasinghe et al. 1998, Grubor and Heckel 2007).

ABCB transporter has role in the transportation of insecticides in different arthropod (Buss and

Callaghan 2008, Dermauw and Van Leeuwen 2014). This gene pair is hypothesized to have role in indolic glucosinolate detoxification. As mentioned above indolic glucosinolate is known as a part of common defence mechanism against aphids in brassica plants. The expression pattern of these two genes show upregulation in both generalist and specialist aphid species suggesting possible similar mechanism to deal with the indolic glucosinolates (Figure 4.4.1,

Figure 4.4.2).

180 | P a g e

Both of these observations were also supported by the observation that knockdown of Cyp6g1

in D. melanogaster shows the upregulation of ABCC transporters (Shah et al. 2012).

Figure 4.4.1 Proposed model for molecular mechanism of glucosinolate detoxification in specialist aphid species. GST-D and ABCC are predicted to be involved in the detoxification of aliphatic glucosinolate whereas Cyp6 and ABCB genes are hypothesized to be involved in the indolic glucosinolates detoxification.

181 | P a g e

Figure 4.4.2 Proposed model for molecular mechanism of glucosinolate detoxification in generalist aphid species. GST-D and ABCC showed downregulation (grey color) in the MG tissue. Suggesting the reason for excretion of aliphatic glucosinolates in the honeydew without processing. Whereas Cyp6 and ABCB genes are hypothesized to be involved in the indolic glucosinolates detoxification.

In conclusion, This chapter shows differences in the expression pattern of the different dettocxification related gene families in specialist and generalist aphid species. This chapter also gives an comprhensive list of GST gene family related genes and their presence in the aphids in relation with the previously known GST genes with in insect group. The chapter also sets a platform and developed a partial model of the possible mechanism for the aliphatic and indolic glucosinolate detoxificaion in aphids. It is also important to note that the number of

182 | P a g e species used in this study are limited (1 – generalist and 2 – specialist). Therefore the outcome of this study is limited to experimental organisms and may not be applicable to other aphid or insect species. The detoxification model developed in this chapter still needs experimental evidence. This can be achieved by several different protein-protein interaction study as well as enzyme-sybstrate assays. The profile of breackdown of the glucosinolate molecule can also provide important information about the model validity.

My next chapter that focusing one of the main enzyme, myrosinase (ß-thioglucosidase), that acts on the glucosinolate and have role in the speccial mechanism called mustard bomb theory in aphids. This will also shed some light on the differences in the generalist and specialist aphids and their different ways of handling aliphatic and indolic glucosinolates.

183 | P a g e

4.5 References

Andersen, J. F., et al. (1997). "Substrate specificity for the epoxidation of terpenoids and active site topology of house fly cytochrome P450 6A1." Chemical research in toxicology 10(2): 156- 164.

Barch, D. H. and L. M. Rundhaugen (1994). "Ellagic acid induces NAD(P)H:quinone reductase through activation of the antioxidant responsive element of the rat NAD(P)H:quinone reductase gene." Carcinogenesis 15(9): 2065-2068.

Bariami, V., et al. (2012). "Gene Amplification, ABC Transporters and Cytochrome P450s: Unraveling the Molecular Basis of Pyrethroid Resistance in the Dengue Vector, Aedes aegypti." PLOS Neglected Tropical Diseases 6(6): e1692.

Beran, F., et al. (2014). "Phyllotreta striolata flea beetles use host plant defense compounds to create their own glucosinolate-myrosinase system." Proceedings of the National Academy of Sciences 111(20): 7349-7354.

Blomquist, G. J., et al. (2010). "Pheromone production in bark beetles." Insect Biochem Mol Biol 40(10): 699-712.

Bridges, M., et al. (2002). "Spatial organization of the glucosinolate–myrosinase system in brassica specialist aphids is similar to that of the host plant." Proceedings of the Royal Society of London. Series B: Biological Sciences 269(1487): 187-191.

Buss, D. S. and A. Callaghan (2008). "Interaction of pesticides with p-glycoprotein and other ABC proteins: A survey of the possible importance to insecticide, herbicide and fungicide resistance." Pestic Biochem Physiol 90.

Chahine, S. and M. J. O’Donnell (2009). "Physiological and molecular characterization of methotrexate transport by Malpighian tubules of adult Drosophila melanogaster." Journal of Insect Physiology 55(10): 927-935.

Chan, L. M. S., et al. (2004). "The ABCs of drug transport in intestine and liver: efflux proteins limiting drug absorption and bioavailability." European Journal of Pharmaceutical Sciences 21(1): 25-51.

Crava, C. M., et al. (2016). "Transcriptome profiling reveals differential gene expression of detoxification enzymes in a hemimetabolous tobacco pest after feeding on jasmonate-silenced Nicotiana attenuata plants." BMC Genomics 17(1): 1005.

Deeley, R. G., et al. (2006). "Transmembrane Transport of Endo- and Xenobiotics by Mammalian ATP-Binding Cassette Multidrug Resistance Proteins." Physiological Reviews 86(3): 849.

184 | P a g e

Dermauw, W. and T. Van Leeuwen (2014). "The ABC gene family in arthropods: Comparative genomics and role in insecticide transport and resistance." Insect Biochem Mol Biol 45(Supplement C): 89-110.

Dietrich, C. G., et al. (2003). "ABC of oral bioavailability: transporters as gatekeepers in the gut." Gut 52(12): 1788.

Divol, F., et al. (2005). "Systemic response to aphid infestation by Myzus persicae in the phloem of Apium graveolens." Plant molecular biology 57(4): 517.

Edger, P. P., et al. (2015). "The butterfly plant arms-race escalated by gene and genome duplications." Proceedings of the National Academy of Sciences 112(27): 8362-8366.

Fernández-Cañón, J. M. and M. A. Peñalva (1998). "Characterization of a Fungal Maleylacetoacetate Isomerase Gene and Identification of Its Human Homologue." Journal of Biological Chemistry 273(1): 329-337.

Francis, F., et al. (2001). "Effects of Allelochemicals from First (Brassicaceae) and Second (Myzus persicae and Brevicoryne brassicae) Trophic Levels on Adalia bipunctata." Journal of Chemical Ecology 27(2): 243-256.

Francis, F., et al. (2005). "Glutathione S-transferases in the adaptation to plant secondary metabolites in the Myzus persicae aphid." Arch Insect Biochem Physiol 58(3): 166-174.

Gaertner, L. S., et al. (1998). "Transepithelial transport of nicotine and vinblastine in isolated malpighian tubules of the tobacco hornworm (Manduca sexta) suggests a P-glycoprotein-like mechanism." The Journal of Experimental Biology 201(18): 2637.

Gibson, R. W. and J. A. Pickett (1983). "Wild potato repels aphids by release of aphid alarm pheromone." Nature 302(5909): 608-609.

Gloss, A. D., et al. (2014). "Evolution in an Ancient Detoxification Pathway Is Coupled with a Transition to Herbivory in the Drosophilidae." Mol Biol Evol 31(9): 2441-2456.

Good, R. T., et al. (2014). "The Molecular Evolution of Cytochrome P450 Genes within and between Drosophila Species." Genome Biol Evol 6(5): 1118-1134.

Grubor, V. D. and D. G. Heckel (2007). "Evaluation of the role of CYP6B cytochrome P450s in pyrethroid resistant Australian Helicoverpa armigera." Insect Mol Biol 16(1): 15-23.

Hayes, J. D., et al. (2005). "Glutathione Transferases." Annual Review of Pharmacology and Toxicology 45(1): 51-88.

Higgins, C. F. (1992). "ABC transporters: from microorganisms to man." Annu Rev Cell Biol 8.

185 | P a g e

IAGC, I. A. G. C. (2010). "Genome sequence of the pea aphid Acyrthosiphon pisum." PLoS Biol 8(2): e1000313.

Jeschke, V., et al. (2016). Chapter Eight - Insect Detoxification of Glucosinolates and Their Hydrolysis Products. Advances in Botanical Research. S. Kopriva, Academic Press. 80: 199- 245.

Kazana, E., et al. (2007). "The cabbage aphid: a walking mustard oil bomb." Proceedings of the Royal Society B: Biological Sciences 274(1623): 2271-2277.

Kim, J. H. and G. Jander (2007). "Myzus persicae (green peach aphid) feeding on Arabidopsis induces the formation of a deterrent indole glucosinolate." The Plant Journal 49(6): 1008-1019.

Kim, J. H., et al. (2008). "Identification of indole glucosinolate breakdown products with antifeedant effects on Myzus persicae (green peach aphid)." The Plant Journal 54(6): 1015- 1026.

Leslie, E. M. (2012). "Arsenic–glutathione conjugate transport by the human multidrug resistance proteins (MRPs/ABCCs)." Journal of Inorganic Biochemistry 108(Supplement C): 141-149.

Letunic, I. and P. Bork (2007). "Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation." Bioinformatics 23(1): 127-128.

Li, X., et al. (2004). "Structural and functional divergence of insect CYP6B proteins: from specialist to generalist cytochrome P450." Proc Natl Acad Sci U S A 101(9): 2939-2944.

Li, X., et al. (2000). "Cross-Resistance to α-Cypermethrin After Xanthotoxin Ingestion in Helicoverpa zea (Lepidoptera: Noctuidae)." Journal of Economic Entomology 93(1): 18-25.

Liu, M., et al. (2012). "Metaproteogenomic analysis of a community of sponge symbionts." ISME J 6(8): 1515-1525.

Malka, O., et al. (2016). "Glucosinolate Desulfation by the Phloem-Feeding Insect Bemisia tabaci." Journal of Chemical Ecology 42(3): 230-235.

Mao, W., et al. (2006). "Remarkable substrate‐specificity of CYP6AB3 in Depressaria pastinacella, a highly specialized caterpillar." Insect Mol Biol 15(2): 169-179.

Mao, W., et al. (2009). "Quercetin-metabolizing CYP6AS enzymes of the pollinator Apis mellifera (Hymenoptera: Apidae)." Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology 154(4): 427-434.

Mao, Y. B., et al. (2007). "Silencing a cotton bollworm P450 monooxygenase gene by plant- mediated RNAi impairs larval tolerance of gossypol." Nat Biotechnol 25(11): 1307-1313.

186 | P a g e

Mi, H., et al. (2013). "PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees." Nucleic Acids Res 41.

Moran, P. J., et al. (2002). "Gene expression profiling of Arabidopsis thaliana in compatible plant‐aphid interactions." Arch Insect Biochem Physiol 51(4): 182-203.

Murray, C. L., et al. (1994). "A putative nicotine pump at the metabolic blood–brain barrier of the tobacco hornworm." J Neurobiol 25.

Niu, G., et al. (2011). "A substrate-specific cytochrome P450 monooxygenase, CYP6AB11, from the polyphagous navel orangeworm (Amyelois transitella)." Insect Biochem Mol Biol 41(4): 244-253.

Opitz, S. E. W., et al. (2011). "Desulfation Followed by Sulfation: Metabolism of Benzylglucosinolate in Athalia rosae (Hymenoptera: Tenthredinidae)." ChemBioChem 12(8): 1252-1257.

Peng, T., et al. (2016). "Cytochrome P450 CYP6DA2 regulated by cap ‘n’collar isoform C (CncC) is associated with gossypol tolerance in Aphis gossypii Glover." Insect Mol Biol 25(4): 450-459.

Pentzold, S., et al. (2014). "How insects overcome two-component plant chemical defence: plant β-glucosidases as the main target for herbivore adaptation." Biological Reviews 89(3): 531-551.

Perez, J. L., et al. (2010). "In vivo induction of phase II detoxifying enzymes, glutathione transferase and quinone reductase by citrus triterpenoids." BMC Complementary and Alternative Medicine 10: 51-51.

Pratt, C., et al. (2008). "Accumulation of Glucosinolates by the Cabbage Aphid Brevicoryne brassicae as a Defense Against Two Coccinellid Species." Journal of Chemical Ecology 34(3): 323-329.

Price, M. N., et al. (2010). "FastTree 2 – Approximately maximum-likelihood trees for large alignments." PLoS ONE 5(3): e9490.

Ramsey, J. S., et al. (2010). "Comparative analysis of detoxification enzymes in Acyrthosiphon pisum and Myzus persicae." Insect Mol Biol 19 Suppl 2: 155-164.

Ranasinghe, C., et al. (1998). "Over-expression of cytochrome P450 CYP6B7 mRNA and pyrethroid resistance in Australian populations of Helicoverpa armigera (Hübner)." Pesticide Science 54(3): 195-202.

Rane, R. V., et al. (2017). "Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes." BMC Genomics 18(1): 673.

187 | P a g e

Ranson, H., et al. (2002). "Evolution of Supergene Families Associated with Insecticide Resistance." Science 298(5591): 179-181.

Ranson, H. and J. Hemingway (2005). 5.11 - Glutathione Transferases A2 - Gilbert, Lawrence I. Comprehensive Molecular Insect Science. Amsterdam, Elsevier: 383-402.

Ratzka, A., et al. (2002). "Disarming the mustard oil bomb." Proceedings of the National Academy of Sciences 99(17): 11223-11228.

Rutherford, K., et al. (2000). "Artemis: sequence visualization and annotation." Bioinformatics (Oxford, England) 16(10): 944-945.

Sasabe, M., et al. (2004). "Molecular analysis of CYP321A1, a novel cytochrome P450 involved in metabolism of plant allelochemicals (furanocoumarins) and insecticides (cypermethrin) in Helicoverpa zea." Gene 338(2): 163-175.

Sau, A., et al. (2010). "Glutathione transferases and development of new principles to overcome drug resistance." Archives of Biochemistry and Biophysics 500(2): 116-122.

Schinkel, A. H. and J. W. Jonker (2003). "Mammalian drug efflux transporters of the ATP binding cassette (ABC) family: an overview." Advanced Drug Delivery Reviews 55(1): 3-29.

Schramm, K., et al. (2012). "Metabolism of glucosinolate-derived isothiocyanates to glutathione conjugates in generalist lepidopteran herbivores." Insect Biochem Mol Biol 42(3): 174-182.

Schweizer, F., et al. (2017). "Arabidopsis glucosinolates trigger a contrasting transcriptomic response in a generalist and a specialist herbivore." Insect Biochem Mol Biol 85: 21-31.

Shah, S., et al. (2012). "Insecticide detoxification indicator strains as tools for enhancing chemical discovery screens." Pest Management Science 68(1): 38-48.

Sharom, F. J. (1997). "The P-Glycoprotein Efflux Pump: How Does it Transport Drugs?" The Journal of Membrane Biology 160(3): 161-175.

Shi G, P. and J. T. MC (2011). "MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes." PLoS ONE 6(6): e20892.

Shi, H., et al. (2012). "Glutathione S-transferase (GST) genes in the red flour beetle, Tribolium castaneum, and comparative analysis with five additional insects." Genomics 100(5): 327-335.

Snyder, M. J. and J. I. Glendinning (1996). "Causal connection between detoxification enzyme activity and consumption of a toxic plant compound." Journal of Comparative Physiology A 179(2): 255-261.

188 | P a g e

Sorensen, J. S. and M. D. Dearing (2006). "Efflux Transporters as a Novel Herbivore Countermechanism to Plant Chemical Defenses." Journal of Chemical Ecology 32(6): 1181.

Wen, Z., et al. (2003). "Metabolism of linear and angular furanocoumarins by Papilio polyxenes CYP6B1 co-expressed with NADPH cytochrome P450 reductase." Insect Biochem Mol Biol 33(9): 937-947.

Wen, Z., et al. (2006). "CYP6B1 and CYP6B3 of the black swallowtail (Papilio polyxenes): adaptive evolution through subfunctionalization." Mol Biol Evol 23(12): 2434-2443.

Wheat, C. W., et al. (2007). "The genetic basis of a plant–insect coevolutionary key innovation." Proceedings of the National Academy of Sciences 104(51): 20427-20431.

Whiteman, N. K., et al. (2012). "Genes Involved in the Evolution of Herbivory by a Leaf- Mining, Drosophilid Fly." Genome Biol Evol 4(9): 900-916.

Williams, R. T. (1959). Detoxification Mechanisms. New York, John wiley & sons INC.

Wittstock, U., et al. (2004). "Successful herbivore attack due to metabolic diversion of a plant chemical defense." Proc Natl Acad Sci U S A 101(14): 4859-4864.

Wosilait, W. D. and A. Nason (1954). "PYRIDINE NUCLEOTIDE-MENADIONE REDUCTASE FROM ESCHERICHIA COLI." Journal of Biological Chemistry 208(2): 785- 798.

Yamamoto, K. and N. Yamada (2016). "Identification of a diazinon-metabolizing glutathione S-transferase in the silkworm, Bombyx mori." 6: 30073.

Yang, J., et al. (2012). "A sigma-class glutathione S-transferase from Solen grandis that responded to microorganism glycan and organic contaminants." Fish & Shellfish Immunology 32(6): 1198-1204.

Yu, S. J. (1987). "Quinone reductase of phytophagous insects and its induction by allelochemicals." Comparative Biochemistry and Physiology Part B: Comparative Biochemistry 87(3): 621-624.

Zhang, W., et al. (2017). "Crystal structure of the nitrile-specifier protein NSP1 from Arabidopsis thaliana." Biochemical and Biophysical Research Communications 488(1): 147- 152.

Zhu-Salzman, K., et al. (2004). "Transcriptional regulation of sorghum defense determinants against a phloem-feeding aphid." Plant Physiology 134(1): 420-431.

Zou, X., et al. (2016). "Glutathione S-transferase SlGSTE1 in Spodoptera litura may be associated with feeding adaptation of host plants." Insect Biochem Mol Biol 70(Supplement C): 32-43.

189 | P a g e

Züst, T. and A. A. Agrawal (2016). "Mechanisms and evolution of plant resistance to aphids." 2: 15206.

190 | P a g e

Chapter V

Role of the myrosinase in glucosinolate

detoxification and aphid evolution

191 | P a g e

5.1 Introduction

Bussy (1840) first found that the glucosinolates (GLS) from Brassica nigra (the black mustard plant) were hydrolysed by a protein. This protein was later found to be a ß-glucosidase enzyme (also classified as glycoside hydrolase family 1). ß-glucosidases can act on variety of different chemicals including chemicals like alkaloid glucosides, benzoxazinoid glucosides, cynogenic glucosides, iridoid glucosides, salicinoinds and, of particular interest here , the GLS

(Pentzold et al. 2014). The major difference between GLS and these other plant-derived

- chemicals is the presence of a sulfate (SO3 ) group. Therefore, the specific enzymes that act on the GLS are named a ß-thioglucosidases. They are also commonly known as myrosinases (Myr) and the Enzyme Commission has classified the reaction they perform as EC:3.2.1.147.

5.1.1 Myrosinase in plants

Myrosinases are one half of a two-component plant defence system known as the

‘Mustard bomb’ (Kazana et al. 2007). They act on glucosinolate molecules releasing D-glucose and a side chain with a sulfate group (aglycone). The aglycone molecule then converts into unstable and toxic defence compounds like isothiocyanates. However, this only happens when the two components come together, which typically occurs when an herbivore breaks the cells in the process of eating the plant tissue.

In plants, the Myr enzyme and GLS molecules are stored in separate types of cells to avoid unnecessary activation of the GLS molecules and self-toxicity within plants (Heinricher

1884, Höglund et al. 1992, Kelly et al. 1998). Myr protein molecules are stored in special cells called myrosin cells (Werker and Vaughan 1977, Bones and Iversen 1985). Myrosin cells are randomly distributed throughout different organs/stage of the plant system including seeds, at seedling stage, leaves, root and stem. There is substantial experimental evidence for the localized expression of myr transcripts (Lenman et al. 1993, Falk et al. 1995) as well as the

192 | P a g e

Myr protein molecule (Werker and Vaughan 1977, Bones and Iversen 1985) within myrosin cells.

Plant Myr genes are categorised into three different types, MA, MB and MC, based on different clades they forms in the phylogenetic analysis (Rask et al. 2000). The MB and MC type Myr proteins forms a complex with an associate protein called the myrosinase binding protein (MBP) (Lenman et al. 1993, Falk et al. 1995). MBPs expression increases upon wound formation or jasmonate treatment (Bodnaryk 1992, Doughty et al. 1995, Taipalensuu et al.

1997). Several reports show the presence of specific types of myrosinases in specific type of cells, while several other studies show the signal for presence of all types of myr transcripts in the same type of cells (Lenman et al. 1993, Xue et al. 1993, Falk et al. 1995). However, it is interesting to use this knowledge of plant myrosinases to help understand how myrosinases behave in insects, particularly in aphids.

5.1.2 Myrosinase in aphids

Myrosinase genes have been predicted in sequence datasets of different insect species including hemipterans, dipterans, lepidopterans, coleopterans and hymenopterans. A body of literature has established that just as plants use the two-component myrosinase-GLS system against herbivorous insects, some aphids use it to protect their colonies from predator attack.

These aphids acquire GLS through their food, stores them in cells and these compounds react with the Myr enzyme stored in separate cells when predators attack. This reaction then produces toxic compounds that harms the predator in the process, sparing other aphids. The aphids that exhibit this altruistic mechanism are B. brassicae and L. erysimi (Pontoppidan et al. 2001,

Bridges et al. 2002, Husebye et al. 2005, Kazana et al. 2007) , two of the aphids for which I have generated transcriptome data. Bridges et al. (2002) showed the myrosinase presence within B. brassicae muscle tissue whereas in L. erysimi it is still not completely clear, but it

193 | P a g e appears to be present in the same type of cells. The location of the glucosinolate storage within aphid body is still unknown.

5.1.3 Crystal structure of the aphid myrosinase

The structure of the Myr protein from B. brassicae was studied by Husebye et al.

(2005) using X-ray crystallography and displayed a close resemblance to other ß-glucosidases.

The Myr enzyme is a dimer. Husebye et al. (2005) also suggests very high sequence similarity between aphid Myr and plant Myr with particularly high conservation of amino acids at the glucose binding site. The activity of the Myr protein was found to be highest at pH 4. The enzymatic efficiency of the Myr protein was also determined using sinigrin (a natural glucosinolate molecule) as a substrate with the result of 23.4 µmol/min/mg. The Myr protein has an active site that is surrounded by a loop structure, a distinguishing feature of aphid myrosinases and other glucosidases.

a b Active site cleft

Figure 5.1.1 Crystal structure of a myrosinase monomer from B. brassicae (PDB – 1WGC) orientated to show the active site location. A) A ribbon diagram showing the architecture of a protein monomer with alpha helices and beta sheets, B) A space-filling model showing a map of the hydrophobic (red) and hydrophilic (blue) surfaces of the Myr protein.

194 | P a g e

5.1.4 Interaction of myrosinase and glucosinolate

The natural substrate for the Myr enzyme from plant and aphids is similar. Hence, they have high levels of amino acid conservation (Husebye et al. 2005). The Myr active site has two parts, 1) the glucose binding site (GBS) and 2) the aglycon binding site (ABS). While the GBS is highly conserved between aphids and plants the ABS is highly divergent. This suggests the variation within ABS helps in interaction with GLS molecules that have side chains classified

H C A N B

- N SO3 O - - N SO3 N SO3 O O

S S S Glu Glu Glu

Figure 5.1.2 Representative chemical structures of three different types of glucosinolate, A)

Aliphatic glucosinolate, B) Indolic glucosinolate, C) Aromatic glucosinolate as aliphatic, aromatic or indolic (Figure 5.1.2).

There are only two amino acid changes found within the GBS of aphid Myr as compared to plant Myr. The first at position 167 of B. brassicae Myr involves replacement of glutamine by glutamic acid (Q 167 E), while the second change, which is observed at position 424, changes phenylalanine to tryptophan (W 424 F). Both the changes are generally considered highly conserved substitutions that may not change the substrate binding property of the GBS significantly. The fact that sequences are so well conserved between species can be attributed to the common glucose molecule present on all GLS substrates. The highly diverged plant ABS differ from the aphid amino acid sequence at position 170 (S 170 A), 180 (D 180 Y) and 346

(D 346 Y). These residues are predicted to play a role in the binding of the sulfate group of the

GLS molecule.

195 | P a g e

While myrosinases help brassica specialists defend themselves against predators, they also have a role in other aphid species. Kim et al. (2008) fed aphids on the A. thaliana mutant line

(atr1D) that can accumulate indolic glucosinolates at levels 6 to 22-fold higher than wild type.

Analysis of honeydew of aphids fed on atr1D plants showed more intact indolic glucosinolate molecules as well as products of indolic glucosinolates than aliphatic glucosinolates whereas glucosinolates produced by M. persicae feeding on wild-type A. thaliana were 95% of aliphatic and 5% of indolic glucosinolates presence (Kim and Jander 2007). Therefore, in the generalist aphid species M. persicae the roles of myrosinases is thought to be different because instead of sequestering GLS M. persicae excretes them mostly without processing (Kim and Jander

2007, Kim et al. 2008). Considering the diversity within these aphid species, the following hypotheses were tested using the tissue transcriptome data of generalist and specialist aphid species.

5.1.5 Aim and hypotheses

Understanding this prior information about the myr gene with in insects and plants, I set up the following target objectives for this study.

Hypothesis 1 – The newly available genome and my transcriptome data sets I set out to characterize the myrosinases of aphids. I hypothesized that there are more than one myr isoforms present in the both generalist (M. persicae) and specialist aphid species (B. brassicae and L. erysimi).

Hypothesis 2 – The myr isoform involved in the mustard bomb theory would be expected to show tissue-specific expression in specialist aphid species, B. brassicae and L. erysimi. In the context of my transcriptome data, I could address whether expression differed between the midgut, the bacteriocytes and the whole-body samples.

196 | P a g e

Hypothesis 3 – The specialist aphid species, B. brassicae and L. erysimi will show a positive selection signal within myr gene sequences. The expectation was that some but not all myrosinases would exhibit amino acid changes in the active site occurring at elevated rates relative to other parts of the protein.

Hypothesis 4 – Different isoforms of the Myr enzymes will show substrate specificity for aliphatic and indolic glucosinolate molecules. To address this hypothesis, I was able to combine homology modelling and in silico docking to examine the divergence patterns among aphid myrosinases.

197 | P a g e

5.2 Material and Methods

5.2.1 Myrosinase gene search

The myr gene search analysis was performed on the tissue transcriptome dataset described earlier (see Chapter II, Section 2.3.3.1). The functional annotation for all the clusters produced by Corset analysis was assigned using A. pisum annotations as a reference using the

BLASTX program. Five different A. pisum myrosinase gene isoforms were found in the NCBI annotation database and were used as query sequences. All the clusters showing significant similarity (Evalue > 0.00001) to the A. pisum myrosinase gene (myr) were selected for further investigation. The myr genes from the A. pisum annotation were used as a reference to annotate myrosinases in the M. persicae genome. A total of four M. persicae myr genes were manually annotated using Artemis software (Rutherford et al. 2000). Transcripts showing similarity to annotated genes were retrieved from the transcriptome dataset. Protein sequences of all myr like sequences were aligned using MAFFT software. Phylogenetic trees were built using

FastTreeMP software (custom settings : -pseudo –spr 4 –mlace 2 –slownni) with the maximum- likelihood NNI method (Price et al. 2010). The myr gene sequence from Raphanus sativus

(radish) was used as an outgroup for the analysis. The NCBI accession numbers for myrosinases from other species are as given in Table 5.2.1.

Table 5.2.1 All the Myrosinase-like genes used in myrosinase tree formation with NCBI accession numbers.

NCBI Accession Species Name NCBI Accession Species Name

AAL25999.1 Brevicoryne brassicae AJE75665.1 Chrysomela lapponica

XP 018465960.1 Raphanus sativus XP 019875746.1 Aethina tumida

XP 008185073.1 Acyrthosiphon pisum AHZ59662.1 Phyllotreta striolata

XP 001952406.2 Acyrthosiphon pisum XP 021200470.1 Helicoverpa armigera

198 | P a g e

XP 008187721.2 Acyrthosiphon pisum SGZ49394.1 Zygaena filipendulae

XP 001946340.2 Acyrthosiphon pisum KPI97521.1 Papilio xuthus

XP 001946297.2 Acyrthosiphon pisum XP 004932336.1 Bombyx mori

XP 015431062.1 Dufourea novaeangliae AFS49707.1 Musca domestica

NP 001136332.1 Nasonia vitripennis XP 013110639.1 Stomoxys calcitrans

XP 014209304.1 Copidosoma floridanum XP 013110640.1 Stomoxys calcitrans

XP 011863827.1 emeryi XP 011181774.1 Zeugodacus cucurbitae

XP 008544592.1 Microplitis demolitor XP 001850321.1 Culex quinquefasciatus

XP 019880470.1 Aethina tumida XP 557100.2 Anopheles gambiae

5.2.2 Myrosinase isoform search

The crystal structure of the published myr gene from B. brassicae (Bbre_myr) has been well studied (Husebye et al. 2005). Initially, the expected Bbre_myr gene sequence (from B. brassicae) was not found in the analysis. Therefore, the following custom pipeline was developed to perform myr gene focused analysis so that all myr genes expressed in the datasets were detected. This involved the following steps:

1) A genome database was created with published myrosinase gene sequences of B.

brassicae (NCBI Accession - AF203780.1) and four manually annotated myr genes

from the M. persicae genome.

2) All the fastq reads from all three-species datasets were mapped separately to the newly

created genome database using Burrows-Wheeler (BWA) program (Li and Durbin

2009).

3) The BAM files with mapping data were then processed using Trinity Genome Guided

analysis (Haas et al. 2013) to retrieve different isoforms of myr genes represented

within the mapped dataset.

199 | P a g e

4) The transcripts retrieved from Trinity analysis were then manually screened for high

quality full length transcripts. The protocol is shown in Figure 5.2.1.

• B. brassicae – myr gene (AF2037801) Reference • 4 manually annotated M. persicae – myr genes Genome File

• Mapping fastq files using Bowtie2 • Performed for each species separately Mapping

• Trinity Genome guided analysis Generating • Performed for each species separately myr isoforms

Figure 5.2.1 Flowchart explaining the retrieval of myr gene. Pipeline was performed

individually on each species dataset

5.2.3 Myrosinase isoform expression analysis

The reference sequence file prepared for Corset analysis in Chapter I were modified as follows and used as a reference for differential expression analysis –

1) All the transcripts (complete and partial) showing similarity with myr gene were

manually removed from the reference sequence file.

2) The myr transcripts showing complete ORF (Figure 5.2.1) were manually added to the

reference sequence file.

3) Three separate reference sequence data files were generated for each species.

4) The full pipeline is graphically represented in Figure 5.2.2

200 | P a g e

Figure 5.2.2 Bioinformatics pipeline used for the myr gene targeted analysis

201 | P a g e

5.2.4 Positive selection study

All the myr transcripts nucleotide sequences were aligned as translated amino acids using MAFFT software (Katoh and Standley 2013) with default settings. A phylogenetic tree was built using FastTreeMP software (custom settings: -pseudo –spr 4 –mlace 2 –slownni) with

Maximum-likelihood NNI method (Price et al. 2010). The positive selection analysis was performed using HyPhy software packages on http://test.datamonkey.org that has following packages (Pond et al. 2005). BUSTED (Branch-site Unrestricted Statistical Test for Episodic

Diversification) and aBSREL (An Adaptive Branch Site REL test for episodic diversification) analysis were performed for the detection of overall positive selection in the aphid related dataset. The MEME (Mixed Effects Model of Evolution) analysis was performed to detect specific sites that are under positive selection within aphid related clades. AliView was used to visualize alignments (Larsson 2014).

5.2.5 Myrosinase structure and docking study

The SWISS-MODEL server (Schwede et al. 2003) was used to build the homology models for Myr isoform sequences generated from the transcriptome analysis. UCSF Chimera software (Pettersen et al. 2004) was used to study the 3D structure and alignment of protein models. Protein-ligand docking was performed using Autodock Vina as implemented in USCF

Chimera (Trott and Olson 2010). The selection of Autodock Vina was done because it has the best docking scoring power compared to twelve other docking programs (Wang et al. 2016).

Docking method validation

The crystal structure (PDB – 1W9B) of myrosinase from Sinapis alba shows the binding pose induced by the artificial compound carba-glucotropaeolin (CGTi) that mimics glucosinolate structure. The CGTi molecule was then docked independently with 1W9B myrosinase protein as its receptor. The docked poses were then compared with the published

202 | P a g e binding pose to confirm that the Autodock Vina based docking experiments can validate the binding pose as shown in 1W9B. Details of the process are as follows –

First validation

1) The Sinapis alba Myr protein crystal structure (PDB ID: 1W9B) was retrieved from

Brookhaven Protein databank and prepared for docking calculations by removing all

the bound ligands. The protein was prepared by standard energy minimisation protocol

and adding hydrogens to the structure.

2) The artificial ligand carba-glucotropaeolin (CGTi) was built using canonical SMILES

retrieved from the PubChem database (PubChem ID – 9600408) in UCSF Chimera

software (Pettersen et al. 2004). The built molecule was also prepared for docking by

performing standard hydrogen addition and energy minimisation protocols.

3) Both structures were imported into the UCSF Chimera software for further molecular

docking processes.

4) Docking was performed by having all the options (add hydrogen, merge charges and

remove nonpolar hydrogens, merge charges and remove lone pair, ignore waters, ignore

chains of non-standard residues, ignore all non-standard residues) set to true within the

Autodock Vina (Trott and Olson 2010) option window and defined docking area.

5) The ten best docking poses were generated by Autodock Vina.

6) The original structure of the 1W9B-CGTi bound state was then compared with these

ten docked poses using the match function from UCSF Chimera to find the expected

binding pose.

7) Method validation was performed based on a) at least one pose (out of 10) showing

similarity with the published 1W9B-CGTi pose and b) the affinity score associated with

selected pose being above -6 kcal/mol.

203 | P a g e

8) The docking poses fulfilling the above-mentioned criteria were selected for further

assessment.

Second validation

1) The homology model generated for aphid Myr protein (showing 100% similarity with

the Bbre_myr) was used to dock the artificial substrate CGTi using the same parameters

as mentioned above.

2) The docking poses were overlaid on the original 1W9B-CGTi structure.

3) The docking position showing highest similarity with 1W9B-CGTi was used for

assessment.

Docking experiments

1) The aphid myrosinase protein molecules built using SWISS-MODEL from clade 1 and

clade 2 were used for the docking study.

2) First the reference docking state was generated for each protein molecule.

3) The docking was performed using the CGTi ligand with all the test Myr protein models.

4) The reference (1W9B-CGTi) molecule was then overlaid on the docking positions and

pose showing highest similarity to the reference molecule. The selected pose was then

saved in .pdb format and used as a reference for further study.

5) Two glucosinolate molecules that are usually present at highest concentration in the

Raphanus sativus, a) with indolic side chain (glucobrassocanapin, PubChem ID –

5485207) and b) with aliphatic side chain (glucoraphanin, PubChem ID – 15559531),

were selected as ligands for the study.

6) Glucobrassocanapin (GBMi) and glucoraphanin (GRMi) were docked using the same

method as used for validation of the docking protocol to find out which was the

preferred substrate.

204 | P a g e

7) Each pose was screened by overlaying on to the respective reference pose generated in

step 4.

8) The docking pose showing highest similarity with its respective reference pose was

selected for assessment.

9) The substrate preference was determined based on the affinity score (cutoff: -6.0

kcal/mol) for the selected docking pose and the orientation of the glucose and sulfate

group present in each molecule.

5.2.6 Identification of the amino acid residues responsible for the substrate specificity

To identify amino acid residues that are involved in substrate specificity within Myr isoforms, individual sequences were mutated in silico and protein homology models were generated using the SWISS-MODEL server. Similar docking steps were followed and affinity scores and docking positions of GRMi and GBMi compared between the original and mutated proteins.

205 | P a g e

5.3 Results

5.3.1 Preliminary myrosinase gene search in the transcriptome data

A preliminary analysis of the tissue transcriptome data of all three-aphid species identified several transcripts in each. A phylogenetic tree that included other insect myrosinases and the Myr gene sequence (XM_018610458.1) from Raphanus sativus showed the aphid sequences fell into two major clades and that each of these clades can further be divided into two minor clades. These clades were temporarily numbered 1 to 4 (Mp-myr1 – clade 1, Mp- myr2 – clade 2, Mp-myr3 – clade 3, Mp-myr4 – clade 4). The initial sequence alignment did not show the presence of a transcript that matched closely to the transcript of the well-studied mustard-bomb myrosinase of B. brassicae for which crystal structure analysis had been performed (called ‘Bbre_myr’ hereafter, Figure 5.3.1) Therefore, I returned to the raw read data and mapped them onto the sequence accession of the Bbre_myr transcript. This did indeed recover the Bbre_myr transcript from the B. brassica whole body reads (Figure 5.3.2) as well as a similar sequence from the L. erysimi whole body reads (Figure 5.3.3), although no sequence with high similarity was found in the M. persicae reads (Figure 5.3.4). The reason for this could have been due to the corset’s data processing algorithm where it removes all the non-differentially expressed transcripts.

For each species, the transcripts from all the tissue samples and from the initial and secondary analyses were considered together, along with similar aphid sequences from the transcriptome databases (including the M. persicae genome). Redundant reads were removed, and a final gene set was established. This included five M. persicae sequences, five L. erysimi sequences, and four B. brassicae sequences.

A phylogenetic analysis of these final gene sets reveals multiple gene gain events but provides a focus on particular sequences (Figure 5.3.5). A fifth, divergent clade was found that

206 | P a g e contained a representative from M. persicae and A. pisum. There are two divergent clade 3 sequence in A. pisum that may be divergent alleles or sequence anomalies. Similarly, there seemed to be extra clade 4 sequences in the L. erysimi gene set. However, two factors focus the attention onto the distinctly different clade 1 and 2 sequences. Firstly, some of the clade 4 sequences have been annotated as ‘lactase-phlorizin like’ suggesting that they are glycoside hydrolases that are better characterized as -galactosidases (EC 3.2.1.23 or EC 3.2.1.108) rather than myrosinases (EC 3.2.1.147). Secondly the characterized myrosinase from B. brassicae fits into clade 1.

The duplication that formed the clade 1 and clade 2 enzymes occurred before the divergence of the brassica-feeding aphids, indeed before the Macrosiphoni and Aphidini tribes diverged. However, the tree suggests another duplication occurred in the clade 1 myrosinases after B. brassicae diverged from M. persicae. The mustard bomb myrosinase of B. brassicae

(Bbre_myr) is a daughter gene of this duplication.

207 | P a g e

Figure 5.3.1 Preliminary phylogenetic tree showing different myrosinases retrieved from the differential gene expression study

208 | P a g e

Figure 5.3.2 Mapping of B. brassicae reads against the well characterized aphid mustard bomb myrosinases. The mapping of all the reads from WB, visualized with the IGV software, are shown in the top panel. The MG reads are in the middle panel and the Bcyt in the bottom panel. Each read is represented as a thick grey pointed line and any nucleotide differences compared to the reference are represented as coloured line on read. Read coverage is also graphed for every sample with label “coverage”. The WB sample fastq reads show the presence of reads similar to the Bbre_myr gene (reference), the reads from MG and Bcyt sample show completely diverged myr isoforms.

209 | P a g e

Figure 5.3.3 Mapping of L. erysimi reads against the well characterized aphid mustard bomb myrosinases. The mapping of all the reads from WB, visualized with the IGV software, are shown in the top panel. The MG reads are in the middle panel and the Bcyt in the bottom sample. Each read is represented as a thick grey pointed line and any nucleotide differences compared to the reference are represented as coloured line on read. Read coverage is also graphed for every sample with label “coverage”. Whereas the WB sample fastq reads showing presence of the reads similar to the Bbre_myr gene (reference) with species specific changes with full coverage, the reads from MG and Bcyt sample showing completely diverged myr isoforms.

210 | P a g e

Figure 5.3.4 Mapping of M. persicae reads against the well characterized aphid mustard bomb myrosinases. The mapping of all the reads from WB, visualized with the IGV software, are shown in the top panel. The MG reads are in the middle panel and the Bcyt in the bottom sample. Each read is represented as a thick grey pointed line and any nucleotide differences with the reference are represented as coloured line on read. Read coverage is also graphed for every sample with label “coverage”. The reads from MG, Bcyt and WB sample showing no read coverage in any tissue sample suggesting missing no gene expression.

211 | P a g e

Figure 5.3.5 Phylogenetic tree of glycoside hydrolase family sequences showing distribution of all the myrosinase like isoforms found in B. brassicae, L. erysimi and

M. persicae. Five clades were formed with well characterised Bbre_myr-like sequence forming clade 1 with similar sequences from other aphid species.

212 | P a g e

5.3.2 Expression analysis of different isoforms of myr genes

All the transcripts retrieved from different tissue samples were merged based on similarity and renamed for convenience. The expression level of different myr like transcripts in the MG, Bcyt and WB tissues were examined for all three species (Table 5.2.1, Figure 5.3.6).

Transcripts encoding clade 3 and clade 4 enzymes have low expression in the midgut compared to the rest of the body. They are enriched in the bacteriocytes of M. persicae but not of the specialist species. The clade 1 and 2 enzymes are generally enriched in the midgut except for the mustard bomb myrosinases of specialist aphid species, which are expressed at low levels in the midgut and the bacteriocyte relative to the whole body.

213 | P a g e

Table 5.3.1 Differentially expressed transcripts from M. persicae, B. brassicae and L. erysimi in the MG and Bcyt tissues. Values are in LFC (Log Fold Change) of transcripts in the tissues as compared to WB tissue sample. All LFC numbers shows significant changes (p<0.05).

Transcript name MG (LFC) Bcyt (LFC) Clade Tissue specificity

Mper_1 1.42 0.54 I

Bbra_1b 1.6 0.08 I Mid-Gut

Lery_1 -3.73 -0.21 I

Bbra_1a -4.06 -5.36 I Whole body

Mper_2 3.24 -0.52 II

Lery_2 5.83 0.07 II Mid-Gut

Bbra_2 6.23 2.69 II

Mper_3 -3.21 0.7 III

Lery_3 -3.34 -0.82 III Whole body

Mper_4 -2.96 1.03 IV

Lery_4 -4.67 -1.49 IV Whole body

Bbra_4 -2.88 0.044 IV

214 | P a g e

Clade 5

Figure 5.3.6 Phylogenetic tree with differentially expressed myrosinase-like sequence isoforms of M. persicae, B. brassicae and L. erysimi with myrosinase and glycoside hydrolase family sequences from different insects showing five different clades. Two myr- like isoforms found in B. brassicae are located in clade 1.

215 | P a g e

5.3.3 Tests for positive selection

The phylogenetic tree shows that the two clade I enzymes from B. brassicae differ in their terminal branch lengths with Bbre_myr showing the longer branch. This suggests that positive selection may have favoured amino acid changes in that lineage, perhaps fine tuning that enzyme for neofunctionalization as a mustard bomb enzyme. This contention is supported by a one-dimensional relative rate test that quantifies the amount of amino acid substitution down each of the lineages. If the M. persicae clade I enzyme is used as an outgroup then 72 changes can be mapped to the Bbre_myr enzymes lineage, but only 28 are observed in the other lineage (Tajima’s relative rate test, Chi-squared 19.39, P=0.00001); (Tajima 1993).

To explore this observation further, I applied a more sophisticated analysis that examined changes at the codon level. Three different tests were performed to check the positive selection signal using the online HyPhy software package server – 1) aBSREL 2) BUSTED and 3) MEME.

BUSTED (Bayesian Unrestricted Test for Episodic Diversification) –

BUSTED analysis was performed with clade 1 and clade 2 sequences only. Overall

BUSTED analysis found evidence of gene-wide episodic diversifying selection (p = 0.006, ω

= 1.87) in the dataset when all the sequences were selected as foreground. This suggests that over the course of aphid evolution and species diversification positive selection took place in the myr gene family. aBSREL (Adaptive Branch Site REL test for episodic diversification) –

aBSREL analysis was also performed with clade 1 and clade 2 sequences only. As expected, the evidence for episodic diversification was found in 2 out of 31 branches including

1 node and 1 branch. The H. amygdali_partial branch was found to be under positive selection

216 | P a g e

(clade 1, P = 0.005). Node 7, which belongs to clade 1 (Bbra_1a and Lery_1) was found to be under positive selection (p = 0.015).

MEME (Mixed Effects Model of Evolution)

Two different types of MEME analysis were performed. In the first analysis, only sequences from clade 1 and clade 2 were used (17 sequences). MEME analysis found evidence for episodic positive/diversifying selection. There are 11 sites found across the sequence alignment length that show a positive selection with a threshold of p < 0.1. These sites are dispersed across the length of the alignment. Interestingly, valine at position 228 (position based on Bbre_myr sequence without gaps) which is known as one of the aglycone binding sites, was found to be under positive selection (Figure 5.3.7). In the second analysis, only the eleven clade 1 sequences were selected to reduce the noise from the data and a total 12 sites under positive selection were found. Although there was no substrate binding site showing positive selection signal in this analysis, aglycone binding sites were surrounded by sites showing positive selection signals (Figure 5.3.8).

217 | P a g e

218 | P a g e

219 | P a g e

Figure 5.3.7 Myrosinase alignment of clade 1 and clade 2. Residues with back background showing position of the glucose binding site, residues with blue background showing position of aglycone binding site, residues with green background showing sites found under positive selection based on MEME analysis. Residues highlighted with red star showing positions altered in In-silico mutation experiment. Red highlighted amino acids represent conserved residues across all species examined.

220 | P a g e

221 | P a g e

Figure 5.3.8 Myrosinase alignment of clade 1. Residues with back background showing position of the glucose binding site, residues with blue background showing position of aglycone binding site, residues with green background showing sites found under positive selection based on MEME analysis. Residues highlighted with red star showing positions altered in In-silico mutation experiment. Red highlighted amino acids represent conserved residues across all species examined.

222 | P a g e

5.3.4 3D myrosinase structure analysis

To examine the structural significance of the divergent sites in the enzymes, a 3D structural model was built using the SWISS-MODEL server. There are several changes within the Mper_1 sequence relative to Bbre_1a and Lery_1 that are linked to the mustard bomb theory. The changes at the predicted aglycone binding sites, (I 169 M, A 170 E, F 324 Y, Y

346 W and I 347 L) may hint at the main difference between an enzyme that is capable of mustard bomb activity and one that is not. The two isoforms of clade 1 Myr from B. brassicae vary at A 170 E, F 324 Y, Y 346 W and I 347 L identical substitutions at the aglycone binding site as the M. persicae sequence. However, the glucose binding site was found to be conserved in all species and all isoforms of clade 1. The sequence alignment shows a conserved tryptophan residue at position 424 in clade 1, 2 and 4 which is replaced with phenylalanine in clade 3. This residue is also different in plant myrosinases and is predicted to be involved in glucose binding. In aphid myrosinases, Lys173 is conserved throughout clade 1, which is in accordance with the findings of Husebye et al. (2005). The Ala170 is only conserved in B. brassicae and L. erysimi while other aphid Myr sequences show a change to Glu170.

The SWISS-MODEL generated homology model of all sequences from clade 1-5 of B. brassicae, L. erysimi and M. persicae were studied for structural similarity. Only clade 1 and

2 models showed maximum similarity with the mustard bomb associated Myr crystal structure

(Root Mean Square Deviation (RMSD) values - ~0.07 in clade 1, ~0.1 in clade 2, ~0.7 in clade

3 and 4, 2.7 in clade 5). Here we decided to focus on clade 1 and clade 2 structures for further investigation as they show the highest similarity to the published B. brassicae Myr crystal structure. Also, the level of divergence between clade 1,2 and clade 3,4 (Figure 5.3.9).

223 | P a g e

Figure 5.3.9 The graph showing structural differences between myr clade 1, 2, 3 and 4. Graph showing Clade 1 and 2 are profoundly diverged from clade 3 and 4 based on RMSD values.

5.3.5 Substrate docking into the protein model

A previous study co-crystalized Sinapis alba myrosinase (1W9B) with an artificial substrate that mimics glucosinolate (CGTi). This structure was used for docking method standardisation because it shows the closest binding pose of glucosinolate like molecules.

Initially I used the docking algorithm AutoDock Vina on this plant myrosinase to show that the in-silico model of CGT generated a docking pose that exactly matched the published binding pose. The details of the protocol are as follows:

Validity of docking method

The docking of the artificial substrate CGTi was performed using S. alba Myr (1W9B) as a receptor. This generated a near perfect overlapping docking pose compared to the original published docking pose of 1W9B-CGTi. The overlaying method generated the expected perfect

224 | P a g e matching of two protein molecules with an RMSD value of 0.0 (Figure 5.3.10). The affinity score for the docked receptor pose was -8.8 kcal/mol, with 4 hydrogen bonds formed. These results suggest the validity and reproducibility of the docking method used in this study.

Second docking confirmation

The docking of the artificial substrate CGTi was performed using B. brassicae Myr

(1WCG) as a receptor. The overlaying of one of the docked poses and the published 1W9B-

CGTi (S. alba) binding pose showed a close resemblance with an overall RMSD value of 1.26.

The difference observed in the docking pose can be attributed to the other amino acid changes between the two proteins that might influence the docking protocol. Importantly, the docking pose showed the same orientation of glucose and aglycone residues as with the CGTi molecule

(Figure 5.3.11). The affinity score for the docked receptor pose was -8.5 kcal/mol with 10 hydrogen bonds formed, which supports the validity of the docking method.

225 | P a g e

Figure 5.3.10 The overlaid image of the docking experiment performed using the artificial substrate, CGTi and the protein 1W9B of S. alba and the published docking state of the 1W9B protein from the PDB database. The near perfect superimposition of two substrates, (PDB structure in blue, and docked pose in red) shows the validity of docking method used in this study.

226 | P a g e

Figure 5.3.11 The overlaid image of the docking experiment performed using the artificial substrate, CGTi and the protein brevi_wb_2399 of B. brassicae and the published docking state of the 1W9B protein from PDB database. The close resemblance of the two superimposited substrates blue from PDB structure and red docked pose suggesting validity of the docking method used in this study.

227 | P a g e

5.3.6 Substrate preference

The substrate preference study was performed on clade 1 and clade 2 myrosinases from each species. The second isoform of Myr from B. brassicae (Bbra_1b) was not used in this study as it showed similar changes to those present in the Mper_1 sequence. The reference docking position was built using the artificial substrate CGTi for each Myr molecule taking S. alba Myr (1W9B) as a reference. The selection of the correct pose of the docked substrate was performed by observing the orientation of the glucose molecule towards the glucose binding site and the orientation of the side chain along with the sulfate group towards the aglycon binding site.

The two substrates, glucobrassocanapin (GBMi) and glucoraphanin (GRMi), were docked with 6 Myr protein structures. The first two letters of acronyms of the two substrates representing the glucosinolate molecule and last two letters representing the structure was processed with a standard energy minimization protocol. The docking study clearly shows the higher affinity of all the Myr protein molecule towards the indolic glucosinolate GBMi as compared to the aliphatic glucosinolate GRMi (Table 5.3.2). However, when docking positions were analysed carefully with respect to the reference docking position and the predicted glucose and aglycone binding sites, the Myr proteins from clade 1 showed a preference for aliphatic glucosinolate (GRMi) in the specialist aphid species, B. brassicae and L. erysimi. In contrast, the M. persicae Myr can bind to both substrates.

228 | P a g e

Table 5.3.2 The affinity score generated between ligand and Myr protein from B. brassicae, L. erysimi and M. persicae. Affinity score in kcal/mol

Ligand Receptor Affinity score (kcal/mol)

GRMi Bbra_1a -7.7

GBMi Bbra_1a -8.8

GRMi Bbra_2 -7.0

GBMi Bbra_2 -8.7

GRMi Mper_1 -7.4

GBMi Mper_1 -9.3

GRMi Mper_2 -7.4

GBMi Mper_2 -8.8

GRMi Lery_1 -7.2

GBMi Lery_1 -8.6

GRMi Lery_2 -8.2

GBMi Lery_2 -9.3

B. brassicae Myr docking

The docking position of Bbra_1a shows that docking of the aliphatic glucosinolate,

GRMi, generated ten best possible poses. One of the selected poses (as per the rules set out in the methods section) shows the glucose molecule of both the reference and the test substrate

(GRMi) oriented similar to the glucose binding sites of the Bbre_myr enzyme providing confidence in the docking results (Figure 5.3.12(a)), whereas the docking of the GBMi

229 | P a g e substrate shows the indolic group ring structure aligning with the glucose structure of the reference molecule (Figure 5.3.12Figure 5.3.13 (b)).

The Myr protein molecule from clade 2 (Bbra_2) was used for docking with GRMi and

GBMi. The docking generated at least one pose that aligns with the reference substrate pose suggesting unlike Bbra_1a, Bbra_2 can bind both substrates (Figure 5.3.13 (a, b)).

230 | P a g e a b

Figure 5.3.12 The docking of GRMi and GBMi with Bbra_1a. a) The alignment faces the

glucose molecules (yellow arrow) of the reference substrate, CGTi (green), and glucose of

GRMi (red) towards glucose binding site (black residues), while the aglycone of the

glucosinolate molecule points towards the aglycon binding site (blue residues). b) The

alignment faces the glucose moity of CGTi (green) towards the glucose binding site (black

residues) whereas glucose molecule (yellow arrow) of GBMi (red) does not align with the

glucose molecule of reference molecule, CGTi.

a b

Figure 5.3.13 The docking of GRMi and GBMi on Bbra_2. a) Showing orientation of the

glucose and aglycone part of GRMi (red) pointing towards their respective binding sites

(glucose binding site – black, aglycone binding site – blue). b) Showing alignment of the glucose

moity of reference substrate, CGTi (green) facing towards glucose binding site (black coloured

residues) and glucose of GBMi (red) aligning to the glucose group of reference molecule, CGTi.

Yellow arrow showing position of the glucose molecule of CGTi and GBMi.

231 | P a g e

L. erysimi Myr docking

Protein sequences of L. erysimi from clade 1 (Lery_1) were used to build the Myr protein. The docking performed with GRMi produced at least one pose that matches the reference substrate and binding sites of the protein (Figure 5.3.14 (a)), whereas GBMi docking produced a similar result to Bbra_1a docking with GBMi, with no single pose aligning with the glucose molecule of the reference molecule CGTi generated (Figure 5.3.14 (b)).

The protein structure from clade 2 (Lery_2) was also tested for its interaction with

GRMi and GBMi. The GRMi binding poses did not reproduce exactly overlapping glucose or aglycone molecules. However, these molecules are positioned in the middle of glucose and aglycone binding sites making it difficult to draw a firm conclusion (Figure 5.3.15 (a)). In contrast, the docking of GBMi shows the opposite orientation of glucose as compared to the reference molecule, CGTi. The position of the glucose molecule of CGTi overlaps with the indolic ring of GBMi (Figure 5.3.15 (b)).

232 | P a g e

a b

Figure 5.3.14 The docking of GRMi and GBMi on Lery_1. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) pointing towards their respective binding sites (glucose binding site – black, aglycone binding site – blue) b) Showing alignment of the glucose moity of reference substrate, CGTi (green) facing towards glucose binding site (black coloured residues) and glucose of GBMi (red) aligning to the glucose group of reference molecule, CGTi. Yellow arrow showing position of the glucose molecule of GBMi.

a b

Figure 5.3.15 The docking of GRMi and GBMi on Lery_2. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) facing in the middle of glucose binding site – black, aglycone binding site – blue. b) Showing alignment of the glucose moity of reference substrate, CGTi (green) facing towards glucose binding site (black coloured residues) but glucose of GBMi (red) pointing towards aglycone binding site, yellow arrow showing position of the glucose molecule of GBMi.

233 | P a g e

M. persicae Myr docking

Mper_1 protein docking with GRMi and GBMi generated almost identical poses for both substrates (Figure 5.3.16). In both cases the glucose molecule and the side chain of the

GLS molecule face their expected binding sites. The presence of the indolic ring in the glucosinolate does not the binding pose of GBMi relative to the target protein as it affected

Bbra_1a and Lery_1. Similar docking positions were found with Mper_2 using both, the GRMi and GBMi substrates, suggesting possible activity of the protein on them (Figure 5.3.17).

234 | P a g e

a b

Figure 5.3.16 The docking of GRMi and GBMi on Mper_1. a) Showing orientation of the glucose (indicated by yellow arrow) and aglycone part of GRMi (red) pointing towards their respective binding sites (glucose binding site – black, aglycone binding site – blue) b) Showing alignment of the glucose molecule of GBMi (red) aligning to the glucose binding site (showed in black). Yellow arrow showing position of the glucose molecule of GRMi and GBMi.

a b

Figure 5.3.17 The docking of GRMi and GBMi on Mper_2. a) Showing orientation of the

glucose (indicated by yellow arrow) and aglycone part of GRMi (red) pointing towards their

respective binding sites (glucose binding site – black, aglycone binding site – blue) b) Showing

alignment of the glucose molecule of GBMi (red) aligning to the glucose binding site (showed

in black). Yellow arrow showing position of the glucose molecule of GRMi and GBMi.

235 | P a g e

5.3.7 Identification of the amino acid residues involved in substrate specificity

To explore the importance of particular amino acids, an in-silico mutation analysis was performed on Bbra_1a (a protein similar to Bbre_myr). Four residues from the aglycone site

(positions: 169, 170, 324, 346) were selected for this study based on non-conservative changes between Mper_1 and Bbra_1a/Lery_1. Additional reasons for these choices were: 1) position

346 has several divergent sites preceding it, 2) sites preceding position 324 showed differences in Mper_1 perhaps hinting at potential pre-adaptation. These residues were changed manually in the Bbra_1a sequence to the residues from the Mper_1 sequence; I 169 M, A 170 E, F 324

Y, Y 346 W (Figure 5.3.7, Figure 5.3.8). The mutated protein sequence was then modelled using SWISS-MODEL server. A standardised docking procedure was applied to generate reference docking pose with the CGTi ligand followed by docking of GRMi and GBMi. GRMi docking showed (as expected) correct orientation of the glucose molecule and sulfate group

(Figure 5.3.18). Unlike the original Bbra_1a, the mutated Bbra_1a showed a valid docking pose with the GBMi ligand with an affinity score of -8.5 kcal/mol.

236 | P a g e

Figure 5.3.18 The docking of GBMi on Bbra_1a and in-silico mutated version of Bbra_1a. a) the docking poses of GBMi ligand with original Bbra_1a showing orientation of the glucose

(indicated by yellow arrow) of GBMi ligand (red) not showing its orientation towards its expected glucose binding site, whereas the glucose molecule of reference is oriented to glucose binding site (glucose binding site – black, aglycone binding site – blue). b) the docking poses of GBMi with in-silico mutated Bbra_1a showing alignment of the glucose molecule of GBMi

(red) aligning to the glucose binding site (showed in black). Yellow arrow showing position of the glucose molecule of reference (green) and GBMi (red)

237 | P a g e

5.4 Discussion

Glucosinolates are major defence compounds synthesized by plants of the Brassicaceae family for protection against herbivorous insects as well as fungal pathogens. Brassicaceae plants mainly produce glucosinolates with aliphatic and indolic side chains. The aliphatic glucosinolates make up >90% of glucosinolates within plant tissues while the indolic glucosinolates typically make up less than 10% (Velasco et al. 2008). The defence response of the plant is different for different types of insect. It has been reported that M. persicae feeding can induce the synthesis of indolic glucosinolates (Kim and Jander 2007), whereas aliphatic glucosinolates are more effective against lepidopteran insects (Beekwilder et al. 2008).

Myrosinase enzymes present in the plant are involved in the activation of glucosinolate molecules to form toxic compounds like isothiocyanate, nitriles etc. Myrosinase enzymes are also present in a range of different insects and may act on the glucosinolate molecules consumed by the insect.

Hypothesis 1 – Several different isoforms present in the generalist and specialist aphid species

Myrosinase enzymes play a key role in the biology of specialist aphid species B. brassicae and L. erysimi. The mustard bomb myrosinase of B. brassicae was first purified and characterised by Jones et al. (2001). Other studies reported that this enzyme is not present in the generalist aphid, M. persicae (Jones et al. 2001, Bridges et al. 2002). The research presented in this thesis shows that there are several myrosinase genes in B. brassicae and L. erysimi as well as in M. persicae. In plants, there are three established types of myrosinase - MA, MB and

MC (Xue et al. 1995), although the sequence analysis performed by Rask et al. (2000) also hints the presence of fourth myr gene in plants. The manual annotation of myr genes presented in this thesis shows five myr-like genes in the M. persicae genome. Although sometimes these transcripts or their orthologs are annotated as myrosinases in the NCBI database, they may not play the same role. Also, there are some transcripts from clade 4 which have been assigned

238 | P a g e different functional annotations (e.g. lactase-phlorizin like). The structural differences detected in this study perhaps hints that they are from same gene family, i.e. the glycoside hydrolase family. The phylogenetic study suggests, myr and myr-like proteins from M. persicae genome and other aphid species transcriptomes forms five separate clades defined by sequence variation at particular residues. This suggests that these genes may have different role in aphid biology. There are five B. brassicae myr-like transcripts forming only three clades, the clade 3 myr-like transcripts are missing from B. brassicae transcriptome. The presence of two B. brassicae myr-like transcripts in clade 1 shows that it differs from the other two aphid species in this study (Figure 5.3.5).

Hypothesis 2 – Tissue specific expression of myr isoform in in specialists B. brassicae and L. erysimi

The muscle cells and thorax region of B. brassicae and L. erysimi have been shown to contain the myrosinase enzyme (Bridges et al. 2002). The clade 1 myr transcripts (Bbra_1a and

Lery_1), involved in the “mustard bomb”, were found to be highly expressed only in the WB samples of the specialist aphids, in concurrence with these earlier published results. In contrast, the other clade 1 myrosinases, and the clade 2 myrosinases show high expression in midgut tissues. This finding raises the question of why the midgut myrosinases do not activate glucosinolates and produce toxins in the midgut. As shown above and discussed below, the answer may lie in the relative catalytic capability of the different enzymes.

Hypothesis 3 – Specialists aphid species will show positive selection signal in myr gene

The evolutionary study performed by Jones et al. (2002) shows that the myr sequence from B. brassicae forms a group with animal glucosinolates and lactase phlorizin hydrolase.

This study also suggests that the myr genes of plant and aphid have evolved independently.

239 | P a g e

However, the evolution of myr genes within aphid species has not been studied yet. This chapter tries to understand the evolution of aphid myrosinases using sophisticated positive selection algorithms like BUSTED, aBSREL and MEME. The positive selection signal found by all the tests suggests that the myr gene may have contributed aphid host selection and thereby aphid evolution. Clade 1, the most closely related to the Bbre_myr gene, shows positive selection in H. amygdali and in node 7 which contains only the Bbra_1a and Lery_1 sequences only. The MEME analysis found 11 sites (p-value < 0.1) under positive selection when used clade 1 and 2 were used for analysis and one site under selection was located in the aglycone binding region. The analysis performed with clade 1 sequences also showed similar results, suggesting that perhaps positive selection in B. brassicae and L. erysimi led to the evolution of modified myr enzymes that can show mustard bomb activity in aphids. This situation was then built upon by the tissue specific expression of this transcripts in specialist aphid species to avoid self-toxicity and instead use this machinery as a defence strategy against predators.

Hypothesis 4 – Myr enzyme will show substrate specificity for aliphatic and indolic type glucosinolates in specialist aphid species

The feeding of M. persicae as well as B. brassicae on brassica plants induces the production of indolic glucosinolates (Kim and Jander 2007, Khan et al. 2011). The Bbre_Myr enzyme acts on aliphatic glucosinolate molecules to start the process of toxic chemical production. The glucosinolate degradation study using the Myr enzyme from B. brassicae shows differential efficiency towards 8 different glucosinolate molecules. The highest specific activity was found on glucoerucin, sinigrin, gluconapin and sinalbin. In this study all the glucosinolate molecules belongs to the aliphatic class except sinalbin which is an aromatic glucosinolate (Francis et al. 2002). Reports on glucosinolates present in the honeydew of M. persicae showed the presence of indolic glucosinolates at significantly lower levels than the aliphatic glucosinolates (Kim and Jander 2007, Kim et al. 2008). Kos et al. (2012) performed

240 | P a g e a feeding experiment with B. brassicae on transgenic Arabidopsis thaliana line (Col-0-

MYB28) that produces significantly less aliphatic glucosinolates. Feeding of B. brassicae on these plants showed that the total aliphatic glucosinolate present in the insect body decreased from 138.40±58.95 µmol/gm (Col-0) to 62.36±28.58 µmol/gm (Col-0-MYB28) compared to aphids fed on wild-type A. thaliana, whereas the indolic glucosinolate quantity did not change

(10.54±2.14 µmol/gm (Col-0), 12.16±2.60 µmol/gm (Col-0-MYB28). This suggests that B. brassicae mainly sequesters aliphatic glucosinolates. Whereas the biochemical analysis of whole body of the insect showed the significant amount of the glucosinolates in B. brassicae as compared to M. persicae when fed on different brassica family host plants (Francis et al.

2001) and artificial diet containing glucosinolate (Pratt et al. 2008). Also, the myr transcript present in the muscle and thoracic tissues of B. brassicae is known to act on stored glucosinolate upon predator attack (Francis et al. 2001, Kazana et al. 2007). This Myr enzyme, is not present in the generalist M. persicae (Francis et al. 2001, Pratt et al. 2008). Considering all the above-mentioned facts about glucosinolates in generalist (M. persicae) and specialist

(B. brassicae) species, the hypothesis of Myr specificity to aliphatic glucosinolates in specialist aphid species was put forward.

In the present study, homology modelling shows that the clade 1 and clade 2 transcripts show highest similarity to the published crystal structure of the Myr enzyme of B. brassicae

(Bbra_myr). The molecular docking performed using homology models of M. persicae, B. brassicae and L. erysimi Myr sequences shows clear differences in binding poses of indolic glucosinolates between specialists and generalists. The glucose molecules from indolic glucosinolates do not show obvious interactions with the glucose binding site in specialists but show the expected pose in the generalist M. persicae. In contrast, the clade 2 Myr enzyme from

B. brassicae and M. persicae show valid binding poses with both aliphatic and indolic glucosinolates but not with clade 2 myr from L. erysimi, perhaps due to the lower activity of

241 | P a g e the Myr enzyme of L. erysimi as compared to that of B. brassicae (MacGibbon and Beuzenberg

1978).

Although molecular docking supports the hypothesized results, the docking experiment is based on nothing but computational iterations and does not necessarily represent the true result. Hence there are a few questions which needs to be addressed to support these findings.

1) Why does the indolic ring move into the expected the glucose ring position when docked

with Myr clade 1 enzyme of specialist but not generalist aphid species?

To address these questions, the clade 1 Myr enzyme from B. brassicae was subjected to an in-silico mutation study. The four amino acid replacements performed within the active site clearly showed the basis for selective affinity of specialist Myr enzymes for aliphatic glucosinolates. All the amino acid changes were non-conservative. Out of four changes three were from non-polar amino acids to either negatively charged or polar amino acids suggesting significant changes in the active site.

2) Why is the affinity score of the indolic glucosinolate docking-pose higher than the

aliphatic glucosinolate docking-pose?

The calculated affinity score is based on a computational program (Autodock Vina) and the theoretical operations performed to simulate bond formation (Trott and Olson 2010). Hence it is highly dependent on the characteristics of the interacting molecules. There are five aromatic amino acids present in the Myr glucose binding site. After molecular docking experiment with B. brassicae Myr structure, it was found that the distance between the indolic ring of the GLS molecule and the rings of the aromatic amino acids ranges from a maximum of 6.4Å to a minimum of 3.3Å. Also, the indole ring of the GLS and the aromatic rings of the protein show only one bond with an angle of 43.36° whereas the other four bond angles range from 95.4° to 175°. Anjana et al. (2012) showed that the number of interactions between

242 | P a g e tryptophan and other aromatic amino acids changes based on the distance and the angle. Angle between 90° to 120° showed the highest number of interactions, as did distances of 6.0Å to

6.5Å (Anjana et al. 2012). The high number of interactions between aromatic amino acids suggests possible reason for the lowest (most favourable) affinity score appearing in indolic

GLS-Myr enzyme docking rather than aliphatic GLS-Myr docking.

3) Despite having an aromatic ring which is similar in structure to the indole ring, why

does the reference molecule not show an abnormal binding-pose in the docking

experiments?

Tyrosine and tryptophan are more reactive than phenylalanine (Sælensminde et al. 2008).

The aromatic amino acids have a tendency to fall into more hydrophobic regions (Burley and

Petsko 1986, Anjana et al. 2012). Surface hydrophobicity analysis of the Myr enzyme shows that the active site in M. persicae Myr is more hydrophobic than the active site of the B. brassicae and L. erysimi enzyme (Figure 5.4.1), suggesting the possible reason for the selective displacement of the benzene ring present in the reference molecule (CGTi) towards the aglycone binding site.

The hypotheses mentioned in this chapter can be tested by performing different types of experiments including but not limited to the in-vitro myr protein expression from different aphid species and can be tested for their affinity towards different types of glucosinolate molecules. It is also possible to study the breakdown of glucosinolate molecule due to MYR protein using different types of chromatography coupled with mass spectrometry techniques.

To conclude, the Myr enzyme present in aphids shows dynamism in terms of its activity and its expression in different species. The species-specific changes in the Myr enzyme can explain the differences between the generalist and brassica specialist aphids.

243 | P a g e

A B

Figure 5.4.1 The picture showing hydrophobic surface map of myrosinase enzyme from A) Brevicoryne brassicae and B) Myzus persicae. The picture clearly shows high hydrophobic region within active site of the

M. persicae as compared to the active site of the B. brassicae. The area highlighted with black rectangle shows the location of active site. The red colour shows very high hydrophobic region, white region shows neutral region and blue colour region shows the hydrophilic region.

244 | P a g e

5.5 References

Anjana, R., et al. (2012). "Aromatic-aromatic interactions in structures of proteins and protein- DNA complexes: a study based on orientation and distance." Bioinformation 8(24): 1220-1224.

Beekwilder, J., et al. (2008). "The Impact of the Absence of Aliphatic Glucosinolates on Insect Herbivory in Arabidopsis." PLoS ONE 3(4): e2068.

Bodnaryk, R. P. (1992). "Effects of wounding on glucosinolates in the cotyledons of oilseed rape and mustard." Phytochemistry 31(8): 2671-2677.

Bones, A. and T.-H. Iversen (1985). "Myrosin cells and myrosinase." Israel journal of botany 34(2-4): 351-376.

Bridges, M., et al. (2002). "Spatial organization of the glucosinolate–myrosinase system in brassica specialist aphids is similar to that of the host plant." Proceedings of the Royal Society of London. Series B: Biological Sciences 269(1487): 187-191.

Burley, S. K. and G. A. Petsko (1986). "Amino-aromatic interactions in proteins." FEBS letters 203(2): 139-143.

Bussy, A. (1840). "Sur la formation de l'huile essentielle de moutarde." J Pharm 27: 464-471.

Doughty, K. J., et al. (1995). "Selective induction of glucosinolates in oilseed rape leaves by methyl jasmonate." Phytochemistry 38(2): 347-350.

Falk, A., et al. (1995). "Characterization of a new myrosinase in Brassica napus." Plant molecular biology 27(5): 863-874.

Francis, F., et al. (2001). "Effects of Allelochemicals from First (Brassicaceae) and Second (Myzus persicae and Brevicoryne brassicae) Trophic Levels on Adalia bipunctata." Journal of Chemical Ecology 27(2): 243-256.

Francis, F., et al. (2002). "Characterisation of aphid myrosinase and degradation studies of glucosinolates." Arch Insect Biochem Physiol 50(4): 173-182.

Haas, B. J., et al. (2013). "De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis." Nat Protoc 8.

Heinricher, E. (1884). "Ueber Eiweissstoffe fuhrende Idioblasten bei einigen Cruciferen." Ber. dt. bot. Ges.(2): 463-466.

Höglund, A.-S., et al. (1992). "Myrosinase is localized to the interior of myrosin grains and is not associated to the surrounding tonoplast membrane." Plant Science 85(2): 165-170.

245 | P a g e

Husebye, H., et al. (2005). "Crystal structure at 1.1Å resolution of an insect myrosinase from Brevicoryne brassicae shows its close relationship to β-glucosidases." Insect Biochem Mol Biol 35(12): 1311-1320.

Jones, A. M. E., et al. (2001). "Purification and characterisation of a non-plant myrosinase from the cabbage aphid Brevicoryne brassicae (L.)." Insect Biochem Mol Biol 31(1): 1-5.

Jones, A. M. E., et al. (2002). "Characterization and evolution of a myrosinase from the cabbage aphid Brevicoryne brassicae." Insect Biochem Mol Biol 32(3): 275-284.

Katoh, K. and D. M. Standley (2013). "MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability." Mol Biol Evol 30(4): 772-780.

Kazana, E., et al. (2007). "The cabbage aphid: a walking mustard oil bomb." Proceedings of the Royal Society B: Biological Sciences 274(1623): 2271-2277.

Kelly, P. J., et al. (1998). "Sub-cellular immunolocalization of the glucosinolate sinigrin in seedlings of Brassica juncea." Planta 206(3): 370-377.

Khan, M. A. M., et al. (2011). "Water stress alters aphid-induced glucosinolate response in Brassica oleracea var. italica differently." Chemoecology 21(4): 235-242.

Kim, J. H. and G. Jander (2007). "Myzus persicae (green peach aphid) feeding on Arabidopsis induces the formation of a deterrent indole glucosinolate." The Plant Journal 49(6): 1008-1019.

Kim, J. H., et al. (2008). "Identification of indole glucosinolate breakdown products with antifeedant effects on Myzus persicae (green peach aphid)." The Plant Journal 54(6): 1015- 1026.

Kos, M., et al. (2012). "Herbivore-Mediated Effects of Glucosinolates on Different Natural Enemies of a Specialist Aphid." Journal of Chemical Ecology 38(1): 100-115.

Larsson, A. (2014). "AliView: a fast and lightweight alignment viewer and editor for large datasets." Bioinformatics 30(22): 3276-3278.

Lenman, M., et al. (1993). "Differential Expression of Myrosinase Gene Families." Plant Physiology 103(3): 703-711.

Li, H. and R. Durbin (2009). "Fast and accurate short read alignment with Burrows–Wheeler transform." Bioinformatics 25(14): 1754-1760.

MacGibbon, D. and E. Beuzenberg (1978). "Location of glucosinolase in Brevicoryne brassicae and Lipaphis erysimi (Aphididae)." New Zealand journal of science.

246 | P a g e

Pentzold, S., et al. (2014). "How insects overcome two-component plant chemical defence: plant β-glucosidases as the main target for herbivore adaptation." Biological Reviews 89(3): 531-551.

Pettersen, E. F., et al. (2004). "UCSF Chimera—A visualization system for exploratory research and analysis." Journal of Computational Chemistry 25(13): 1605-1612.

Pond, S. L. K., et al. (2005). "HyPhy: hypothesis testing using phylogenies." Bioinformatics 21(5): 676-679.

Pontoppidan, B., et al. (2001). "Purification and characterization of myrosinase from the cabbage aphid (Brevicoryne brassicae), a brassica herbivore." European Journal of Biochemistry 268(4): 1041-1048.

Pratt, C., et al. (2008). "Accumulation of Glucosinolates by the Cabbage Aphid Brevicoryne brassicae as a Defense Against Two Coccinellid Species." Journal of Chemical Ecology 34(3): 323-329.

Price, M. N., et al. (2010). "FastTree 2 – Approximately maximum-likelihood trees for large alignments." PLoS ONE 5(3): e9490.

Rask, L., et al. (2000). "Myrosinase: gene family evolution and herbivore defense in Brassicaceae." Plant molecular biology 42(1): 93-114.

Rutherford, K., et al. (2000). "Artemis: sequence visualization and annotation." Bioinformatics (Oxford, England) 16(10): 944-945.

Sælensminde, G., et al. (2008). "Amino acid contacts in proteins adapted to different temperatures: hydrophobic interactions and surface charges play a key role." Extremophiles 13(1): 11.

Schwede, T., et al. (2003). "SWISS-MODEL: an automated protein homology-modeling server." Nucleic Acids Research 31(13): 3381-3385.

Taipalensuu, J., et al. (1997). "The Myrosinase-Binding Protein from Brassica Napus Seeds Possesses lectin Activity and has a Highly Similar Vegetatively Expressed Wound-Inducible Counterpart." European Journal of Biochemistry 250(3): 680-688.

Tajima, F. (1993). "Simple methods for testing the molecular evolutionary clock hypothesis." Genetics 135(2): 599-607.

Trott, O. and A. J. Olson (2010). "AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading." Journal of Computational Chemistry 31(2): 455-461.

247 | P a g e

Velasco, P., et al. (2008). "Comparison of Glucosinolate Profiles in Leaf and Seed Tissues of Different Brassica napus Crops." Journal of the American Society for Horticultural Science 133(4): 551-558.

Werker, E. and J. Vaughan (1977). "Ontogeny and distribution of myrosin cells in the shoot of Sinapis alba L.: a light and electron microscope study." Israel journal of botany.

Xue, J., et al. (1995). "The myrosinase gene family in Arabidopsis thaliana: gene organization, expression and evolution." Plant molecular biology 27(5): 911-922.

Xue, J., et al. (1993). "Temporal, cell-specific, and tissue-preferential expression of myrosinase genes during embryo and seedling development in Sinapis alba." Planta 191(1): 95-101.

248 | P a g e

Chapter VI

Discussion and future prospectus

249 | P a g e

6.1 General Discussion and future prospectus

Aphids are incredibly complex organism to understand. When such organisms become a problem for the economy, it is important that the application-oriented research as well fundamental research should go hand-in-hand. This thesis tried to touch both fundamental and application-oriented aspect about aphids. The three-aphid species selected in this study are of considerable economic importance. They are also different from each other in terms of their morphology and behaviour. Hence comparison between these species provides important specific insights and more generally provides a broader picture of aphid biology.

This thesis attempted to explore the use of RNAi technology for aphid control. My extensive studies on the possibilities of dietary RNAi application provided new insights and raised concerns about the application of this technology to aphid control. I attempted to replicate published experiments with previously used, as well as new, RNAi target genes, and yet did not find compelling effects of orally delivered RNAi. This is despite rigorously using various in vitro and in planta delivery routes. The mode of dsRNA delivery is therefore one of the concerns to which researchers need to give immediate attention. Perhaps more efficient and targeted ways of dsRNA delivery are possible using new nanoparticle technologies.

Nanobiotechnology is still at its juvenile stage and much need to be done to achieve thorough integration with biological systems of interest.

The current study reports the potential role dsRNases have in the failure of RNAi in aphids. Previous studies on several insects including aphids showed dsRNase could have a role in the failure of dsRNA mediated silencing (Arimatsu et al. 2007, Christiaens and Smagghe

2014, Christiaens et al. 2014, Wynant et al. 2014). There are several studies reporting increased efficiency of dsRNA/siRNA delivery in the system using specially developed carriers. Yang et al. (2017) reports the engineered viral protein P19 increased the uptake as well as silencing of a target gene in human cell line. Such novel engineered virus particles with potential in

250 | P a g e agricultural pest control are discussed in detail by Kolliopoulou et al. (2017). This approach also has some limitations like limited knowledge about the process of viral infection, virus infection to insects with persistent life cycle, viral suppressors that could affect the efficiency of RNAi. A solution for such problems is to build recombinant viruses, but this needs thorough research about the specific virus-host interactions that is still lacking for aphids and their viruses. The development of virus-like particles could be an alternative to the virus-mediated silencing but also needs investigation in greater depth. The dsRNA mediated silencing showed significant effect on the survival and fitness of Aedes aegypti when coupled with chitosan nanoparticles (Ramesh Kumar et al. 2016). Along with this there are reports showing effectivity of transfection reagents in the delivery of dsRNA (Whyard et al. 2009, Cancino-

Rodezno et al. 2010, Singh et al. 2013, Murphy et al. 2016) but this approach may not be always successful as reported in this thesis as well as reported by Terenius et al. (2011).

Perhaps one of the most surprising results of this thesis was that the transcriptome data from the samples of different aphid strains fed on different host plants showed a stronger strain effect than host plant effect. This result suggests that aphid strains of single species react differently to the same conditions at transcriptome level. The analysis presented here was preliminary but it central finding is important and needs substantial scrutiny by considering different aphid species from different regions of the world. Such analysis will strengthen the idea that the aphid life cycle variation can produce ‘clonal lines’ that can actually be quite diverged. Such experiments could help to develop robust and universal strategy for aphid control. The generalised outcome of the study was surprising but needs more experimental evidence.

Aphid species react differently to their surroundings including to their diet components.

The types of reactions can vary from host plant choice, life-cycle transitions, physiological, morphological or behavioural alterations. In this thesis, I tried to understand the transcriptional

251 | P a g e response of the different organs of the different aphid species to the glucosinolates. The findings of this study showed that generalist and specialist aphid species react differently to the glucosinolate toxicity at transcriptome level and have developed their own strategies to deal with dietary stress.

Most of the work presented in this thesis was performed using radish (R. sativus) as a host for all the aphid species. It contains glucoraphasatin, an aliphatic glucosinolate, the most abundant glucosinolate molecule (Hanlon and Barnes 2011) whereas the tryptophan-derived, indolic glucosinolates are much less abundant. Further reports confirm the presence of the aliphatic and indolic glucosinolates in either R. sativus leaves or seedlings (Sang et al. 1984,

Gu et al. 2015). Hence feeding on R. sativus will expose aphids to different levels of aliphatic and indolic glucosinoates.

Previous studies show that M. persicae feeding on A. thaliana induces the production of indolic glucosinolates. One of the compound that is induced due to aphid saliva is indol-3- ylmethylglucosinolate (I3M) (De Vos and Jander 2009). The I3M can convert into 4- methoxyindol-3-ylmethylglucosinolate (4MI3M). The 4MI3M is known for its strong deterring effect on M. persicae feeding. The concentration of the 4MI3M also increases upon B. brassicae feeding (Kim and Jander 2007, Khan et al. 2011). This is also true when broccoli plants (Brassica oleracea) was used as a host plant for both M. persicae and B. brassicae (Khan et al. 2011). This suggests that the brassica plant might be showing similar defence response against both generalist and specialist aphid species and therefore both generalist and specialist aphids must deal with indolic glucosinolates toxicity. However, honeydew analysis of B. brassicae and M. persicae shows that former prefers to store glucosinolates in their body and

M. persicae excretes them without processing.

252 | P a g e

The tissue transcriptome analysis performed on the MG and Bcyt tissue samples of all three-aphid species characterized genes that were differentially expressed. By focussing on gene families that are likely to be involved in glucosinolate processing I highlighted differences between generalist and specialist species that elucidate the mechanism by which generalists and specialist aphid species handle glucosinolates. The model suggests that the GST-delta and

ABCC transporters may play role in the aliphatic glucosinolate detoxification in specialist species. Whereas CYP6 and ABCB transporter could have role in the detoxification of indolic glucosinolates in both generalist and specialist aphid species. The model of differential glucosinolate processing developed in this study needs experimental evidence for confirmation, but at least now there are clear hypotheses that can provide necessary directions for future studies. Ideally these studies would involve gene-specific manipulations using RNAi or alternative technologies such as CRISPR. Much work needs to be done to develop such research tools for aphid, but they should greatly help us understand aphid biology with greater depth. These tools will also allow to understand the emerging insecticide resistant mechanisms in aphids.

The myrosinase enzyme also plays an important role in the glucosinolate processing and has evolved into the specific “mustard bomb” adaptation in the specialist aphid species B. brassicae and L. erysimi. The 3D structure analysis of myr and O-ß-glucosidase shows that the aglycone site is conserved in myr (ß-thioglucosidase) but not in another O-ß-glucosidase

(Burmeister et al. 1997, Rask et al. 2000). My structurally informed characterisation of the inter-species divergence of the Myr enzyme combined with my molecular evolutionary positive selection analysis identified the key changes on the evolution of the mustard bomb trait. The study also showed evidence, based on in-silico analysis, that the Myr enzyme shows specific activity against aliphatic glucosinolates in generalist M. persicae but does not bind in specialist,

B. brassicae and L. erysimi.

253 | P a g e

The above-mentioned findings specifically point out some of the key differences between M. persicae as generalist aphid species and B. brassicae, L. erysimi as specialists.

Such studies also noticed the similarities between species, e.g. a potential common detoxification mechanism for indolic glucosinolates and less activity of bacteriocytes in the detoxification mechanism of glucosinolates. Such studies will set up a platform for future studies.

In conclusion, the current study has contributed significantly to setup a platform for several studies focusing on aphid control and aphid biology. It also demonstrates that much still needs to be done to better understand this tiny but devastating, mysterious but well known, cooperative yet disease spreading, organism.

254 | P a g e

6.2 References

(2017). AphidBase. B. P. f. A. Arthropods. https://bipaa.genouest.org/is/, Bioinformatics Platform for Agroecosystem Arthropods

Aktar, M. W., et al. (2009). "Impact of pesticides use in agriculture: their benefits and hazards." Interdisciplinary Toxicology 2(1): 1-12.

Ali, J. G. and A. A. Agrawal (2012). "Specialist versus generalist insect herbivores and plant defense." Trends in Plant Science 17(5): 293-302.

Altincicek, B., et al. (2008). "Wounding-mediated gene expression and accelerated viviparous reproduction of the pea aphid, Acyrthosiphon pisum." Insect Mol Biol 17(6): 711-716.

Altschul, S. F., et al. (1990). "Basic local alignment search tool." Journal of Molecular Biology 215(3): 403-410.

Anathakrishnan, R., et al. (2014). "Comparative gut transcriptome analysis reveals differences between virulent and avirulent Russian wheat aphids, Diuraphis noxia." Arthropod-Plant Interactions 8(2): 79-88.

Andersen, J. F., et al. (1997). "Substrate specificity for the epoxidation of terpenoids and active site topology of house fly cytochrome P450 6A1." Chemical research in toxicology 10(2): 156- 164.

Andrews, S. (2010). "FastQC: a quality control tool for high throughput sequence data." from http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

Anjana, R., et al. (2012). "Aromatic-aromatic interactions in structures of proteins and protein- DNA complexes: a study based on orientation and distance." Bioinformation 8(24): 1220-1224.

Anthon, E. W. (1955). "Evidence for Green Peach Aphid Resistance to Organo-Phosphorous Insecticides1." Journal of Economic Entomology 48(1): 56-57.

Arimatsu, Y., et al. (2007). "Molecular characterization of a cDNA encoding extracellular dsRNase and its expression in the silkworm, Bombyx mori." Insect Biochem Mol Biol 37(2): 176-183.

Bak, S., et al. (2006). "Cyanogenic glycosides: a case study for evolution and application of cytochromes P450." Phytochemistry Reviews 5(2): 309-329.

Bandopadhyay, L., et al. (2013). "Identification of Genes Involved in Wild Crucifer Rorippa indica Resistance Response on Mustard Aphid Lipaphis erysimi Challenge." PLoS ONE 8(9): e73632.

Barch, D. H. and L. M. Rundhaugen (1994). "Ellagic acid induces NAD(P)H:quinone reductase through activation of the antioxidant responsive element of the rat NAD(P)H:quinone reductase gene." Carcinogenesis 15(9): 2065-2068.

255 | P a g e

Bariami, V., et al. (2012). "Gene Amplification, ABC Transporters and Cytochrome P450s: Unraveling the Molecular Basis of Pyrethroid Resistance in the Dengue Vector, Aedes aegypti." PLOS Neglected Tropical Diseases 6(6): e1692.

Bass, C., et al. (2014). "The evolution of insecticide resistance in the peach potato aphid, Myzus persicae." Insect Biochem Mol Biol 51(Supplement C): 41-51.

Baum, J. A., et al. (2007). "Control of coleopteran insect pests through RNA interference." Nat Biotechnol 25(11): 1322-1326.

Baumann, P. (2005). "Biology of Bacteriocyte-Associated Endosymbionts of Plant Sap- Sucking Insects." Annual Review of Microbiology 59: 155-189.

Beekwilder, J., et al. (2008). "The Impact of the Absence of Aliphatic Glucosinolates on Insect Herbivory in Arabidopsis." PLoS ONE 3(4): e2068.

Benn, M. (1977). "Glucosinolates." Pure and Applied Chemistry 49(2): 197-210.

Beran, F., et al. (2014). "Phyllotreta striolata flea beetles use host plant defense compounds to create their own glucosinolate-myrosinase system." Proceedings of the National Academy of Sciences 111(20): 7349-7354.

Berlandier, F., et al. (2010). "Aphid management in canola crops." Farm note, Department of Agriculture and Food, Govt of Western Australia.

Bernays, E. and M. Graham (1988). "On the Evolution of Host Specificity in Phytophagous Arthropods." Ecology 69(4): 886-892.

Bhatia, V., et al. (2012). "Host Generated siRNAs Attenuate Expression of Serine Protease Gene in Myzus persicae." PLoS ONE 7(10): e46343.

Bilgi, V., et al. (2017). "Using Vital Dyes to Trace Uptake of dsRNA by Green Peach Aphid Allows Effective Assessment of Target Gene Knockdown." Int J Mol Sci 18(1): 80.

Blomquist, G. J., et al. (2010). "Pheromone production in bark beetles." Insect Biochem Mol Biol 40(10): 699-712.

Bodnaryk, R. P. (1992). "Effects of wounding on glucosinolates in the cotyledons of oilseed rape and mustard." Phytochemistry 31(8): 2671-2677.

Bolger, A. M., et al. (2014). "Trimmomatic: a flexible trimmer for Illumina sequence data." Bioinformatics 30(15): 2114-2120.

Bones, A. and T.-H. Iversen (1985). "Myrosin cells and myrosinase." Israel journal of botany 34(2-4): 351-376.

Bones, A. M. and J. T. Rossiter (1996). "The myrosinase-glucosinolate system, its organisation and biochemistry." Physiologia Plantarum 97(1): 194-208.

Bracho, A. M., et al. (1995). "Discovery and molecular characterization of a plasmid localized in Buchnera sp. bacterial endosymbiont of the aphid Rhopalosiphum padi." Journal of Molecular Evolution 41(1): 67-73.

256 | P a g e

Bridges, M., et al. (2002). "Spatial organization of the glucosinolate–myrosinase system in brassica specialist aphids is similar to that of the host plant." Proceedings of the Royal Society of London. Series B: Biological Sciences 269(1487): 187-191.

Broadbent, L. (1949). "Factors affecting the activity of alatae of the aphids Myzus persicae (Sulzer) and Brevicoryne brassicae (L.)." Annals of applied Biology 36(1): 40-62.

Brough, C. N. and A. F. Dixon (1990). "Ultrastructural feachers of egg development in oviparae of the vetch aphid Megoura viciae Buckton." Tissue and Cell 22(1): 13.

Bumgarner, R. (2013). "DNA microarrays: Types, Applications and their future." Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.] 0 22: Unit-22.21.

Burke, G. R. and N. A. Moran (2011). "Responses of the pea aphid transcriptome to infection by facultative symbionts." Insect Mol Biol 20(3): 357-365.

Burley, S. K. and G. A. Petsko (1986). "Amino-aromatic interactions in proteins." FEBS letters 203(2): 139-143.

Burmeister, W. P., et al. (1997). "The crystal structures of Sinapis alba myrosinase and a covalent glycosyl-enzyme intermediate provide insights into the substrate recognition and active-site machinery of an S-glycosidase." Structure 5(5): 663-676.

Buss, D. S. and A. Callaghan (2008). "Interaction of pesticides with p-glycoprotein and other ABC proteins: A survey of the possible importance to insecticide, herbicide and fungicide resistance." Pestic Biochem Physiol 90.

Bussy, A. (1840). "Sur la formation de l'huile essentielle de moutarde." J Pharm 27: 464-471.

Cancino-Rodezno, A., et al. (2010). "The mitogen-activated protein kinase p38 is involved in insect defense against Cry toxins from Bacillus thuringiensis." Insect Biochem Mol Biol 40(1): 58-63.

Chahine, S. and M. J. O’Donnell (2009). "Physiological and molecular characterization of methotrexate transport by Malpighian tubules of adult Drosophila melanogaster." Journal of Insect Physiology 55(10): 927-935.

Chan, L. M. S., et al. (2004). "The ABCs of drug transport in intestine and liver: efflux proteins limiting drug absorption and bioavailability." European Journal of Pharmaceutical Sciences 21(1): 25-51.

Chapple, C. E. and R. Guigó (2008). "Relaxation of selective constraints causes independent selenoprotein extinction in insect genomes." PLoS ONE 3(8): e2968.

Chen, G. and G. Wu (2005). "Resistance to seven insecticides and analysis of enzymatic characteristics in Lipaphis erysimi (Homoptera: Aphididae) in Fuzhou, China." Journal of Fujian Agriculture and Forestry University(Natural Science Edition) 34(2): 204-207.

Christiaens, O. and G. Smagghe (2014). "The challenge of RNAi-mediated control of hemipterans." Current Opinion in Insect Science(0).

257 | P a g e

Christiaens, O., et al. (2014). "DsRNA degradation in the pea aphid (Acyrthosiphon pisum) associated with lack of response in RNAi feeding and injection assay." Peptides 53(0): 307- 314.

Claudianos, C., et al. (2006). "A deficit of detoxification enzymes: pesticide sensitivity and environmental response in the honeybee." Insect Mol Biol 15(5): 615-636.

Clough, S. J. and A. F. Bent (1998). "Floral dip: a simplified method for Agrobacterium- mediated transformation of Arabidopsis thaliana." The Plant Journal 16(6): 735-743.

Coleman, A. D., et al. (2015). "Persistence and transgenerational effect of plant-mediated RNAi in aphids." J Exp Bot 66(2): 541-548.

Collett, M. G., et al. (2014). "Could Nitrile Derivatives of Turnip (Brassica rapa) Glucosinolates Be Hepato- or Cholangiotoxic in Cattle?" Journal of agricultural and food chemistry 62(30): 7370-7375.

Crava, C. M., et al. (2016). "Transcriptome profiling reveals differential gene expression of detoxification enzymes in a hemimetabolous tobacco pest after feeding on jasmonate-silenced Nicotiana attenuata plants." BMC Genomics 17(1): 1005.

Cristofoletti, P. T., et al. (2003). "Midgut adaptation and digestive enzyme distribution in a phloem feeding insect, the pea aphid Acyrthosiphon pisum." Journal of Insect Physiology 49(1): 11-24.

Dadd, R. H., et al. (1967). "Studies on the artificial feeding of the aphid Myzus persicae (Sulzer)—IV. Requirements for water-soluble vitamins and ascorbic acid." Journal of Insect Physiology 13(2): 249-272.

Daiber, C. and S. Schöll (1959). "Further notes on the overwintering of the green peach aphid, Myzus persicae (Sulzer)." South Africa. J Entomol Soc South Afr 22: 494-520.

Davidson, N. M. and A. Oshlack (2014). "Corset: enabling differential gene expression analysis for de novoassembled transcriptomes." Genome Biol 15(7): 410.

De Vos, M. and G. Jander (2009). "Myzus persicae (green peach aphid) salivary components induce defence responses in Arabidopsis thaliana." Plant, Cell & Environment 32(11): 1548- 1560.

Deeley, R. G., et al. (2006). "Transmembrane Transport of Endo- and Xenobiotics by Mammalian ATP-Binding Cassette Multidrug Resistance Proteins." Physiological Reviews 86(3): 849.

Dermauw, W. and T. Van Leeuwen (2014). "The ABC gene family in arthropods: Comparative genomics and role in insecticide transport and resistance." Insect Biochem Mol Biol 45(Supplement C): 89-110.

Dietrich, C. G., et al. (2003). "ABC of oral bioavailability: transporters as gatekeepers in the gut." Gut 52(12): 1788.

Divol, F., et al. (2005). "Systemic response to aphid infestation by Myzus persicae in the phloem of Apium graveolens." Plant molecular biology 57(4): 517.

258 | P a g e

Dobin, A., et al. (2013). "STAR: ultrafast universal RNA-seq aligner." Bioinformatics 29(1): 15-21.

Dohlen, C. D. V. and N. A. Moran (2000). "Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alternation." Biological Journal of the Linnean Society 71(4): 689-717.

Doughty, K. J., et al. (1995). "Selective induction of glucosinolates in oilseed rape leaves by methyl jasmonate." Phytochemistry 38(2): 347-350.

Douglas, A. E. (2006). "Phloem-sap feeding by animals: problems and solutions." Journal of Experimental Botany 57(4): 747-754.

Downing, N. (1978). "Measurements of the osmotic concentrations of stylet sap, haemolymph and honeydew from an aphid under osmotic stress." Journal of Experimental Biology 77(1): 247-250.

Edger, P. P., et al. (2015). "The butterfly plant arms-race escalated by gene and genome duplications." Proceedings of the National Academy of Sciences 112(27): 8362-8366.

Edwards, O. R., et al. (2008). "Insecticide resistance and implications for future aphid management in Australian grains and pastures: a review." Australian Journal of Experimental Agriculture 48(12): 1523-1530.

Emden, H. F. V., et al. (1969). "The Ecology of Myzus persicae." Annual Review of Entomology 14(1): 197-270.

Engel, A. and H. Stahlberg (2002). Aquaglyceroporins: Channel proteins with a conserved core, multiple functions, and variable surfaces. International Review of Cytology, Academic Press. 215: 75-104.

Ettlinger, M. G. and A. Kjaer (1968). "Sulfur compounds in plants." Recent Advances in Phytochemistry(1): 59-144.

Eyres, I., et al. (2016). "Differential gene expression according to race and host plant in the pea aphid." Mol Ecol. 25.

Fahey, J. W., et al. (2001). "The chemical diversity and distribution of glucosinolates and isothiocyanates among plants." Phytochemistry 56(1): 5-51.

Falk, A., et al. (1995). "Characterization of a new myrosinase in Brassica napus." Plant molecular biology 27(5): 863-874.

Fenwick, G. R., et al. (1983). "Glucosinolates and their breakdown products in food and food plants." C R C Critical Reviews in Food Science and Nutrition 18(2): 123-201.

Fernández-Cañón, J. M. and M. A. Peñalva (1998). "Characterization of a Fungal Maleylacetoacetate Isomerase Gene and Identification of Its Human Homologue." Journal of Biological Chemistry 273(1): 329-337.

Fire, A., et al. (1998). "Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans." Nature 391(6669): 806-811.

259 | P a g e

Fisher, J. W., et al. (2000). Long-distance transport. Biochemistry and Molecular Biology of Plants. American Society of Plant Physiologists, Citeseer.

Francis, F., et al. (2001). "Effects of Allelochemicals from First (Brassicaceae) and Second (Myzus persicae and Brevicoryne brassicae) Trophic Levels on Adalia bipunctata." Journal of Chemical Ecology 27(2): 243-256.

Francis, F., et al. (2002). "Characterisation of aphid myrosinase and degradation studies of glucosinolates." Arch Insect Biochem Physiol 50(4): 173-182.

Francis, F., et al. (2005). "Glutathione S-transferases in the adaptation to plant secondary metabolites in the Myzus persicae aphid." Arch Insect Biochem Physiol 58(3): 166-174.

Fukumorita, T. and M. Chino (1982). "Sugar, Amino Acid and Inorganic Contents in Rice Phloem Sap." Plant and Cell Physiology 23(2): 273-283.

Gaertner, L. S., et al. (1998). "Transepithelial transport of nicotine and vinblastine in isolated malpighian tubules of the tobacco hornworm (Manduca sexta) suggests a P-glycoprotein-like mechanism." The Journal of Experimental Biology 201(18): 2637.

Gerardo, N. M., et al. (2010). "Immunity and other defenses in pea aphids, Acyrthosiphon pisum." Genome Biol 11(2): R21.

Gibson, R. W. and J. A. Pickett (1983). "Wild potato repels aphids by release of aphid alarm pheromone." Nature 302(5909): 608-609.

Gillet, F.-X., et al. (2017). "Investigating engineered ribonucleoprotein particles to improve oral RNAi delivery in crop insect pests." Frontiers in Physiology 8(256).

Gloss, A. D., et al. (2014). "Evolution in an Ancient Detoxification Pathway Is Coupled with a Transition to Herbivory in the Drosophilidae." Mol Biol Evol 31(9): 2441-2456.

Good, R. T., et al. (2014). "The Molecular Evolution of Cytochrome P450 Genes within and between Drosophila Species." Genome Biol Evol 6(5): 1118-1134.

Good, R. T., et al. (2016). "OfftargetFinder: a web tool for species-specific RNAi design." Bioinformatics 32(8): 1232-1234.

Gordon, K. H. J. and P. M. Waterhouse (2007). "RNAi for insect-proof plants." Nat Biotech 25(11): 1231-1232.

Grabherr, M. G., et al. (2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome." Nat Biotech 29(7): 644-652.

Grubor, V. D. and D. G. Heckel (2007). "Evaluation of the role of CYP6B cytochrome P450s in pyrethroid resistant Australian Helicoverpa armigera." Insect Mol Biol 16(1): 15-23.

Gu, E.-H., et al. (2015). "Increase in aliphatic glucosinolates synthesis during early seedling growth and insect herbivory in radish (Raphanus sativus L.) plant." Horticulture, Environment, and Biotechnology 56(2): 255-262.

260 | P a g e

Guignard, L. (1890). "Recherche.s sur la localisation des principesactifs des Cruciferes. - J. Bot. 4(22): 38,S-395." Journal of Botony 4(22): 385- 395.

Guo, H., et al. (2014). "Plant-Generated Artificial Small RNAs Mediated Aphid Resistance." PLoS ONE 9(5): e97410.

Haas, B. J., et al. (2013). "De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis." Nat Protoc 8.

Hagenbucher, S., et al. (2013). "Pest trade-offs in technology: reduced damage by caterpillars in Bt cotton benefits aphids." Proceedings of the Royal Society B: Biological Sciences 280(1758).

Hales, D. F. and T. E. Mittler (1988). "Male production by aphids prenatally treated with precocene: Prevention by short-term kinoprene treatment." Arch Insect Biochem Physiol 7(1): 29-36.

Halkier, B. A. and J. Gershenzon (2006). "Biology and biochemistry of glucosinolates." Annual Review of Plant Biology 57(1): 303-333.

Hanlon, P. R. and D. M. Barnes (2011). "Phytochemical Composition and Biological Activity of 8 Varieties of Radish (Raphanus sativus L.) Sprouts and Mature Taproots." Journal of Food Science 76(1): C185-C192.

Hansen, A. K. and N. A. Moran (2011). "Aphid genome expression reveals host-symbiont cooperation in the production of amino acids." Proc Natl Acad Sci U S A 108(7): 2849-2854.

Harnischfeger, G. (1974). "Studies on photosynthetic pigments using fluorescence at liquid nitrogen temperature: Evidence for light induced pigment alignment in the photosynthetic apparatus." Berichte der Deutschen Botanischen Gesellschaft 87(3): 483-491.

Hayes, J. D., et al. (2005). "Glutathione Transferases." Annual Review of Pharmacology and Toxicology 45(1): 51-88.

Heie, O., et al. (1987). "Paleontology and phylogeny." Aphids: Their Biology, Natural Enemies and Control, Vol. 2a: 367-391.

Heie, O. and B. Petersen (1961). "Investigations on Myzus persicae Sulz." Aphis fabae: 7-52.

Heinricher, E. (1884). "Ueber Eiweissstoffe fuhrende Idioblasten bei einigen Cruciferen." Ber. dt. bot. Ges.(2): 463-466.

HGSC, T. H. G. S. C. (2006). "Insights into social insects from the genome of the honeybee Apis mellifera." Nature 443: 931.

Higgins, C. F. (1992). "ABC transporters: from microorganisms to man." Annu Rev Cell Biol 8.

Hille Ris Lambers, D. (1946). "The hibernation of Myzus persicae Sulzer and some related species, including a new one." Bull Entomol Res 37: 197-199.

261 | P a g e

Höglund, A.-S., et al. (1992). "Myrosinase is localized to the interior of myrosin grains and is not associated to the surrounding tonoplast membrane." Plant Science 85(2): 165-170.

Holman, J. (2009). Host Plant Catalog of Aphids, Springer Netherlands.

Husebye, H., et al. (2005). "Crystal structure at 1.1Å resolution of an insect myrosinase from Brevicoryne brassicae shows its close relationship to β-glucosidases." Insect Biochem Mol Biol 35(12): 1311-1320.

Huvenne, H. and G. Smagghe (2010). "Mechanisms of dsRNA uptake in insects and potential of RNAi for pest control: A review." Journal of Insect Physiology 56(3): 227-235.

IAGC, I. A. G. C. (2010). "Genome sequence of the pea aphid Acyrthosiphon pisum." PLoS Biol 8(2): e1000313.

Jaubert-Possamai, S., et al. (2007). "Gene knockdown by RNAi in the pea aphid Acyrthosiphon pisum." BMC Biotechnol 7: 63.

Jeschke, V., et al. (2016). Chapter Eight - Insect Detoxification of Glucosinolates and Their Hydrolysis Products. Advances in Botanical Research. S. Kopriva, Academic Press. 80: 199- 245.

Ji, R., et al. (2016). "Transcriptome Analysis of Green Peach Aphid (Myzus persicae): Insight into Developmental Regulation and Inter-Species Divergence." Front Plant Sci 7(1562).

Joga, M. R., et al. (2016). "RNAi efficiency, systemic properties, and novel delivery methods for pest insect control: What we know so far." Frontiers in Physiology 7: 553.

Jones, A. M. E., et al. (2001). "Purification and characterisation of a non-plant myrosinase from the cabbage aphid Brevicoryne brassicae (L.)." Insect Biochem Mol Biol 31(1): 1-5.

Jones, A. M. E., et al. (2002). "Characterization and evolution of a myrosinase from the cabbage aphid Brevicoryne brassicae." Insect Biochem Mol Biol 32(3): 275-284.

Jones, P., et al. (2014). "InterProScan 5: genome-scale protein function classification." Bioinformatics 30(9): 1236-1240.

Kalyaanamoorthy, S., et al. (2017). "ModelFinder: fast model selection for accurate phylogenetic estimates." Nat Meth 14(6): 587-589.

Katoh, K. and D. M. Standley (2013). "MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability." Mol Biol Evol 30(4): 772-780.

Kazana, E., et al. (2007). "The cabbage aphid: a walking mustard oil bomb." Proceedings of the Royal Society B: Biological Sciences 274(1623): 2271-2277.

Kelly, P. J., et al. (1998). "Sub-cellular immunolocalization of the glucosinolate sinigrin in seedlings of Brassica juncea." Planta 206(3): 370-377.

Kennedy, J. S. and H. L. G. Stroyan (1959). "Biology of Aphids." Annual Review of Entomology 4(1): 139-160.

262 | P a g e

Khan, M. A. M., et al. (2011). "Water stress alters aphid-induced glucosinolate response in Brassica oleracea var. italica differently." Chemoecology 21(4): 235-242.

Kim, H., et al. (2011). "Macroevolutionary Patterns in the Aphidini Aphids (Hemiptera: Aphididae): Diversification, Host Association, and Biogeographic Origins." PLoS ONE 6(9): e24749.

Kim, J. H. and G. Jander (2007). "Myzus persicae (green peach aphid) feeding on Arabidopsis induces the formation of a deterrent indole glucosinolate." The Plant Journal 49(6): 1008-1019.

Kim, J. H., et al. (2008). "Identification of indole glucosinolate breakdown products with antifeedant effects on Myzus persicae (green peach aphid)." The Plant Journal 54(6): 1015- 1026.

Koga, R., et al. (2012). "Cellular mechanism for selective vertical transmission of an obligate insect symbiont at the bacteriocyte-embryo interface." Proc Natl Acad Sci U S A 109(20): E1230-1237.

Kolliopoulou, A., et al. (2017). "Viral Delivery of dsRNA for Control of Insect Agricultural Pests and Vectors of Human Disease: Prospects and Challenges." Frontiers in Physiology 8(399).

Kos, M., et al. (2012). "Herbivore-Mediated Effects of Glucosinolates on Different Natural Enemies of a Specialist Aphid." Journal of Chemical Ecology 38(1): 100-115.

Kubori, T., et al. (1998). "Supramolecular structure of the salmonella typhimurium type III protein secretion system." Science 280(5363): 602-605.

Kumar, S., et al. (2016). "MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets." Mol Biol Evol 33(7): 1870-1874.

Kunkel, H. (1977). Chapter 13 - Membrane feeding system in aphid research A2- Harris Arris, Kerry F. Aphids As Virus Vectors. K. Maramorosch, Academic Press: 311-338.

Lai, C. Y., et al. (1994). "Amplification of trpEG: adaptation of Buchnera aphidicola to an endosymbiotic association with aphids." Proceedings of the National Academy of Sciences 91(9): 3819-3823.

Larsson, A. (2014). "AliView: a fast and lightweight alignment viewer and editor for large datasets." Bioinformatics 30(22): 3276-3278.

Law, C. W., et al. (2014). "voom: precision weights unlock linear model analysis tools for RNA-seq read counts." Genome Biol 15(2): R29.

Leather, S. R. and A. F. G. Dixon (1984). "Aphid growth and reproductive rates." Entomologia Experimentalis et Applicata 35(2): 137-140.

Lenman, M., et al. (1993). "Differential Expression of Myrosinase Gene Families." Plant Physiology 103(3): 703-711.

263 | P a g e

Leslie, E. M. (2012). "Arsenic–glutathione conjugate transport by the human multidrug resistance proteins (MRPs/ABCCs)." Journal of Inorganic Biochemistry 108(Supplement C): 141-149.

Letunic, I. and P. Bork (2007). "Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation." Bioinformatics 23(1): 127-128.

Li, H., et al. (2016). "Systemic RNAi in western corn rootworm, Diabrotica virgifera virgifera, does not involve transitive pathways." Insect Science: n/a-n/a.

Li, H. and R. Durbin (2009). "Fast and accurate short read alignment with Burrows–Wheeler transform." Bioinformatics 25(14): 1754-1760.

Li, J., et al. (2013). "Advances in the use of the RNA interference technique in Hemiptera." Insect Science 20(1): 31-39.

Li, X., et al. (2004). "Structural and functional divergence of insect CYP6B proteins: from specialist to generalist cytochrome P450." Proc Natl Acad Sci U S A 101(9): 2939-2944.

Li, X., et al. (2000). "Cross-Resistance to α-Cypermethrin After Xanthotoxin Ingestion in Helicoverpa zea (Lepidoptera: Noctuidae)." Journal of Economic Entomology 93(1): 18-25.

Li, Z.-Q., et al. (2013). "Ecological Adaption Analysis of the Cotton Aphid (Aphis gossypii) in Different Phenotypes by Transcriptome Comparison." PLoS ONE 8(12): e83180.

Liu, M., et al. (2012). "Metaproteogenomic analysis of a community of sponge symbionts." ISME J 6(8): 1515-1525.

Liu, S., et al. (2012). "Deep Sequencing of the Transcriptomes of Soybean Aphid and Associated Endosymbionts." PLoS ONE 7(9): e45161.

Livak, K. J. and T. D. Schmittgen (2001). "Analysis of relative gene expression data using real- time quantitative PCR and the 2(-Delta Delta C(T)) Method." Methods 25(4): 402-408.

MacGibbon, D. and E. Beuzenberg (1978). "Location of glucosinolase in Brevicoryne brassicae and Lipaphis erysimi (Aphididae)." New Zealand journal of science.

Malka, O., et al. (2016). "Glucosinolate Desulfation by the Phloem-Feeding Insect Bemisia tabaci." Journal of Chemical Ecology 42(3): 230-235.

Mao, J. and F. Zeng (2012). "Feeding-Based RNA Intereference of a Gap Gene Is Lethal to the Pea Aphid, Acyrthosiphon pisum." PLoS ONE 7(11): e48718.

Mao, W., et al. (2006). "Remarkable substrate‐specificity of CYP6AB3 in Depressaria pastinacella, a highly specialized caterpillar." Insect Mol Biol 15(2): 169-179.

Mao, W., et al. (2009). "Quercetin-metabolizing CYP6AS enzymes of the pollinator Apis mellifera (Hymenoptera: Apidae)." Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology 154(4): 427-434.

Mao, Y.-B., et al. (2007). "Silencing a cotton bollworm P450 monooxygenase gene by plant- mediated RNAi impairs larval tolerance of gossypol." Nat Biotech 25(11): 1307-1313.

264 | P a g e

Mao, Y. B., et al. (2007). "Silencing a cotton bollworm P450 monooxygenase gene by plant- mediated RNAi impairs larval tolerance of gossypol." Nat Biotechnol 25(11): 1307-1313.

Martinez-Torres, D., et al. (2001). "Molecular systematics of aphids and their primary endosymbionts." Mol Phylogenet Evol 20(3): 437-449.

Mathers, T. C., et al. (2017). "Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species." Genome Biol 18(1): 27.

Mi, H., et al. (2013). "PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees." Nucleic Acids Res 41.

Misof, B., et al. (2014). "Phylogenomics resolves the timing and pattern of insect evolution." Science 346(6210): 763-767.

Mithen, R., et al. (2010). "Glucosinolate biochemical diversity and innovation in the Brassicales." Phytochemistry 71(17): 2074-2086.

Moran, N. A. (1996). "Accelerated evolution and Muller's rachet in endosymbiotic bacteria." Proceedings of the National Academy of Sciences Evolution 93: 2873-2878.

Moran, P. J., et al. (2002). "Gene expression profiling of Arabidopsis thaliana in compatible plant‐aphid interactions." Arch Insect Biochem Physiol 51(4): 182-203.

Morant, A. V., et al. (2008). "β-Glucosidases as detonators of plant chemical defense." Phytochemistry 69(9): 1795-1813.

Morozova, O., et al. (2009). "Applications of new sequencing technologies for transcriptome analysis." Annual review of genomics and human genetics 10: 135-151.

Mulot, M., et al. (2016). "Comparative Analysis of RNAi-Based Methods to Down-Regulate Expression of Two Genes Expressed at Different Levels in Myzus persicae." Viruses 8(11): 316.

Murphy, K. A., et al. (2016). "Ingestion of genetically modified yeast symbiont reduces fitness of an insect pest via RNA interference." 6: 22587.

Murray, C. L., et al. (1994). "A putative nicotine pump at the metabolic blood–brain barrier of the tobacco hornworm." J Neurobiol 25.

Mutti, N. S., et al. (2008). "A protein from the salivary glands of the pea aphid, Acyrthosiphon pisum, is essential in feeding on a host plant." Proceedings of the National Academy of Sciences 105(29): 9965-9969.

Mutti, N. S., et al. (2008). "A protein from the salivary glands of the pea aphid, Acyrthosiphon pisum, is essential in feeding on a host plant." Proc Natl Acad Sci U S A 105(29): 9965-9969.

Mutti, N. S., et al. (2006). "RNAi knockdown of a salivary transcript leading to lethality in the pea aphid, Acyrthosiphon pisum." J Insect Sci 6: 1-7.

265 | P a g e

Nakabachi, A., et al. (2005). "Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera." Proc Natl Acad Sci U S A 102(15): 5477-5482.

Nan, S., et al. (2016). "All 37 Mitochondrial Genes of Aphid Aphis craccivora Obtained from Transcriptome Sequencing: Implications for the Evolution of Aphids." PLoS ONE 11(6).

Navdeep S. Mutti, et al. (2006). "RNAi knockdown of a salivary transcript leading to lethality in the pea aphid, Acyrthosiphon pisum." Journal of Insect Science 6(38): 1-7.

Nelson, N. J. (2001). "Microarrays have arrived: gene expression tool matures." Journal of the National Cancer Institute 93(7): 492-494.

Nguyen, L.-T., et al. (2015). "IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies." Mol Biol Evol 32(1): 268-274.

Nikoh, N. and A. Nakabachi (2009). "Aphids acquired symbiotic genes via lateral gene transfer." BMC Biol 7: 12.

Niu, G., et al. (2011). "A substrate-specific cytochrome P450 monooxygenase, CYP6AB11, from the polyphagous navel orangeworm (Amyelois transitella)." Insect Biochem Mol Biol 41(4): 244-253.

Novakova, E., et al. (2013). "Reconstructing the phylogeny of aphids (Hemiptera: Aphididae) using DNA of the obligate symbiont Buchnera aphidicola." Mol Phylogenet Evol 68(1): 42- 54.

O'Brien, P. J. and J. B. Graves (1992). "Insecticide resistance and reproductive biology of Aphis gossypii Glover." Southwestrn Entomologist 17: 115-122.

Opitz, S. E. W., et al. (2011). "Desulfation Followed by Sulfation: Metabolism of Benzylglucosinolate in Athalia rosae (Hymenoptera: Tenthredinidae)." ChemBioChem 12(8): 1252-1257.

Ortiz-Rivas, B. and D. Martinez-Torres (2010). "Combination of molecular data support the existence of three main lineages in the phylogeny of aphids (Hemiptera: Aphididae) and the basal position of the subfamily Lachninae." Mol Phylogenet Evol 55(1): 305-317.

Pearce, S. L., et al. (2017). "Genomic innovations, transcriptional plasticity and gene loss underlying the evolution and divergence of two highly polyphagous and invasive Helicoverpa pest species." BMC Biol 15(1): 63.

Peng, T., et al. (2016). "Cytochrome P450 CYP6DA2 regulated by cap ‘n’collar isoform C (CncC) is associated with gossypol tolerance in Aphis gossypii Glover." Insect Mol Biol 25(4): 450-459.

Pentzold, S., et al. (2014). "How insects overcome two-component plant chemical defence: plant β-glucosidases as the main target for herbivore adaptation." Biological Reviews 89(3): 531-551.

266 | P a g e

Perez, J. L., et al. (2010). "In vivo induction of phase II detoxifying enzymes, glutathione transferase and quinone reductase by citrus triterpenoids." BMC Complementary and Alternative Medicine 10: 51-51.

Pettersen, E. F., et al. (2004). "UCSF Chimera—A visualization system for exploratory research and analysis." Journal of Computational Chemistry 25(13): 1605-1612.

Pham, D. Q. D. and J. J. Winzerling (2010). "Insect ferritins: Typical or atypical?" Biochimica et Biophysica Acta (BBA) - General Subjects 1800(8): 824-833.

Pitino, M., et al. (2011). "Silencing of aphid genes by dsRNA feeding from plants." PLoS ONE 6(10): e25709.

Pond, S. L. K., et al. (2005). "HyPhy: hypothesis testing using phylogenies." Bioinformatics 21(5): 676-679.

Ponder, K. L., et al. (2000). "Difficulties in location and acceptance of phloem sap combined with reduced concentration of phloem amino acids explain lowered performance of the aphid Rhopalosiphum padi on nitrogen deficient barley (Hordeum vulgare) seedlings." Entomologia Experimentalis et Applicata 97(2): 203-210.

Pontoppidan, B., et al. (2001). "Purification and characterization of myrosinase from the cabbage aphid (Brevicoryne brassicae), a brassica herbivore." European Journal of Biochemistry 268(4): 1041-1048.

Pozhitkov, A. E., et al. (2007). "Oligonucleotide microarrays: widely applied—poorly understood." Briefings in Functional Genomics and Proteomics 6(2): 141-148.

Pratt, C., et al. (2008). "Accumulation of Glucosinolates by the Cabbage Aphid Brevicoryne brassicae as a Defense Against Two Coccinellid Species." Journal of Chemical Ecology 34(3): 323-329.

Price, D. R. G., et al. (2007). "Molecular characterisation of a candidate gut sucrase in the pea aphid, Acyrthosiphon pisum." Insect Biochem Mol Biol 37(4): 307-317.

Price, M. N., et al. (2010). "FastTree 2 – Approximately maximum-likelihood trees for large alignments." PLoS ONE 5(3): e9490.

Prosser, W. A. and A. E. Douglas (1992). "A test of the hypotheses that nitrogen is upgraded and recycled in an aphid (Acyrthosiphon pisum) symbiosis." Journal of Insect Physiology 38(2): 93-99.

Ramesh Kumar, D., et al. (2016). "Delivery of chitosan/dsRNA nanoparticles for silencing of wing development vestigial (vg) gene in Aedes aegypti mosquitoes." International Journal of Biological Macromolecules 86(Supplement C): 89-95.

Ramsey, J. S., et al. (2010). "Comparative analysis of detoxification enzymes in Acyrthosiphon pisum and Myzus persicae." Insect Mol Biol 19 Suppl 2: 155-164.

Ranasinghe, C., et al. (1998). "Over-expression of cytochrome P450 CYP6B7 mRNA and pyrethroid resistance in Australian populations of Helicoverpa armigera (Hübner)." Pesticide Science 54(3): 195-202.

267 | P a g e

Rane, R. V., et al. (2017). "Orthonome – a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes." BMC Genomics 18(1): 673.

Ranson, H., et al. (2002). "Evolution of Supergene Families Associated with Insecticide Resistance." Science 298(5591): 179-181.

Ranson, H. and J. Hemingway (2005). 5.11 - Glutathione Transferases A2 - Gilbert, Lawrence I. Comprehensive Molecular Insect Science. Amsterdam, Elsevier: 383-402.

Rask, L., et al. (2000). "Myrosinase: gene family evolution and herbivore defense in Brassicaceae." Plant molecular biology 42(1): 93-114.

Ratzka, A., et al. (2002). "Disarming the mustard oil bomb." Proceedings of the National Academy of Sciences 99(17): 11223-11228.

Robinson, M., et al. (2010). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data." Bioinformatics 26.

Ronnebeck, W. (1950). "On the spring development of the green Peach Aphid (Myzus persicae Sulzer) on the primary host with respect to its importance as a virus vector in the Potato field." Zeitschrift fur Pflanzenkrankheiten, Pflanzenpathologie und Pflanzenschutz 57(9-10): 351- 357.

Ronnebeck, W. (1952). "(German title.) Experiment on the reduction of virus infection of potato plants." Nachrichtenblatt des Deutchen Pflanzenschutzdienstes 4: 189-190.

Rutherford, K., et al. (2000). "Artemis: sequence visualization and annotation." Bioinformatics (Oxford, England) 16(10): 944-945.

Sadeghi, A., et al. (2009). "Evaluation of the Susceptibility of the Pea Aphid, Acyrthosiphon pisum, to a Selection of Novel Biorational Insecticides using an Artificial Diet." Journal of Insect Science 9(65): 1-8.

Sælensminde, G., et al. (2008). "Amino acid contacts in proteins adapted to different temperatures: hydrophobic interactions and surface charges play a key role." Extremophiles 13(1): 11.

Sang, J. P., et al. (1984). "Glucosinolate profiles in the seed, root and leaf tissue of cabbage, mustard, rapeseed, radish and swede." Canadian Journal of Plant Science 64(1): 77-93.

Sapountzis, P., et al. (2014). "New insight into the RNA interference response against cathepsin-L gene in the pea aphid, Acyrthosiphon pisum: Molting or gut phenotypes specifically induced by injection or feeding treatments." Insect Biochem Mol Biol 51: 20-32.

Sasabe, M., et al. (2004). "Molecular analysis of CYP321A1, a novel cytochrome P450 involved in metabolism of plant allelochemicals (furanocoumarins) and insecticides (cypermethrin) in Helicoverpa zea." Gene 338(2): 163-175.

Sau, A., et al. (2010). "Glutathione transferases and development of new principles to overcome drug resistance." Archives of Biochemistry and Biophysics 500(2): 116-122.

268 | P a g e

Scheirs, J., et al. (2000). "Optimization of Adult Performance Determines Host Choice in a Grass Miner." Proceedings: Biological Sciences 267(1457): 2065-2069.

Schena, M., et al. (1995). "Quantitative monitoring of gene expression patterns with a complementary DNA microarray." Science: 467-467.

Schinkel, A. H. and J. W. Jonker (2003). "Mammalian drug efflux transporters of the ATP binding cassette (ABC) family: an overview." Advanced Drug Delivery Reviews 55(1): 3-29.

Schoonhoven, L. M., et al. (2005). Insect-plant biology. New York, Oxford University Press on Demand.

Schramm, K., et al. (2012). "Metabolism of glucosinolate-derived isothiocyanates to glutathione conjugates in generalist lepidopteran herbivores." Insect Biochem Mol Biol 42(3): 174-182.

Schwarz, K. and M. Dayhoff (1979). Matrices for detecting distant relationships Atlas of protein sequences. M. Dayhoff, National Biomedical Research Foundation. 5.

Schwede, T., et al. (2003). "SWISS-MODEL: an automated protein homology-modeling server." Nucleic Acids Research 31(13): 3381-3385.

Schweizer, F., et al. (2017). "Arabidopsis glucosinolates trigger a contrasting transcriptomic response in a generalist and a specialist herbivore." Insect Biochem Mol Biol 85: 21-31.

Scott, J. G., et al. (2013). "Towards the elements of successful insect RNAi." Journal of Insect Physiology 59(12): 1212-1221.

Shah, S., et al. (2012). "Insecticide detoxification indicator strains as tools for enhancing chemical discovery screens." Pest Management Science 68(1): 38-48.

Shakesby, A. J., et al. (2009). "A water-specific aquaporin involved in aphid osmoregulation." Insect Biochem Mol Biol 39(1): 1-10.

Shang, F., et al. (2016). "Differential expression of genes in the alate and apterous morphs of the brown citrus aphid, Toxoptera citricida." Scientific Reports 6: 32099.

Sharom, F. J. (1997). "The P-Glycoprotein Efflux Pump: How Does it Transport Drugs?" The Journal of Membrane Biology 160(3): 161-175.

Shi G, P. and J. T. MC (2011). "MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes." PLoS ONE 6(6): e20892.

Shi, H., et al. (2012). "Glutathione S-transferase (GST) genes in the red flour beetle, Tribolium castaneum, and comparative analysis with five additional insects." Genomics 100(5): 327-335.

Shigenobu, S., et al. (2000). "Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS." Nature 407: 8.

Shukla, J. N., et al. (2016). "Reduced stability and intracellular transport of dsRNA contribute to poor RNAi response in lepidopteran insects." RNA Biology 13(7): 656-669.

269 | P a g e

Singh, A. D., et al. (2013). "Oral delivery of double-stranded RNA in larvae of the yellow fever mosquito, Aedes aegypti: Implications for pest mosquito control." Journal of Insect Science 13(1): 69-69.

Snyder, M. J. and J. I. Glendinning (1996). "Causal connection between detoxification enzyme activity and consumption of a toxic plant compound." Journal of Comparative Physiology A 179(2): 255-261.

Sorensen, J. S. and M. D. Dearing (2006). "Efflux Transporters as a Novel Herbivore Countermechanism to Plant Chemical Defenses." Journal of Chemical Ecology 32(6): 1181.

Sugahara, R., et al. (2017). "Geographic variation in RNAi sensitivity in the migratory locust." Gene 605: 5-11.

Suyama, M., et al. (2006). "PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments." Nucleic Acids Res. 34.

Swevers, L., et al. (2013). "The possible impact of persistent virus infection on the function of the RNAi machinery in insects: a hypothesis." Frontiers in Physiology 4(319).

Tagu, D., et al. (2010). "The anatomy of an aphid genome: from sequence to biology." C R Biol 333(6-7): 464-473.

Taipalensuu, J., et al. (1997). "The Myrosinase-Binding Protein from Brassica Napus Seeds Possesses lectin Activity and has a Highly Similar Vegetatively Expressed Wound-Inducible Counterpart." European Journal of Biochemistry 250(3): 680-688.

Tajima, F. (1993). "Simple methods for testing the molecular evolutionary clock hypothesis." Genetics 135(2): 599-607.

Tamura, K., et al. (2012). "Estimating divergence times in large molecular phylogenies." Proc Natl Acad Sci U S A. 109.

Tang, S., et al. (2015). "Identification of protein coding regions in RNA transcripts." Nucleic Acids Research 43(12): e78-e78.

Tang, X. and B. Zhou (2013). "Ferritin is the key to dietary iron absorption and tissue iron detoxification in Drosophila melanogaster." The FASEB Journal 27(1): 288-298.

Terenius, O., et al. (2011). "RNA interference in Lepidoptera: an overview of successful and unsuccessful studies and implications for experimental design." J Insect Physiol 57(2): 231- 245.

Thomas Huxley, H. (1858). "On the agmaic reproduction and morphology of Aphis – Part I." Transactions of the Linnean Society London 22: 193-219.

Thorpe, P., et al. (2016). "Comparative transcriptomics and proteomics of three different aphid species identifies core and diverse effector sets." BMC Genomics. 17.

Tookey, H. L. (1973). "Crambe Thioglucoside Glucohydrolase (EC 3.2.3.1): Separation of a Protein Required for Epithiobutane Formation." Canadian Journal of Biochemistry 51(12): 1654-1660.

270 | P a g e

Trott, O. and A. J. Olson (2010). "AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading." Journal of Computational Chemistry 31(2): 455-461.

Tzin, V., et al. (2015). "RNA interference against gut osmoregulatory genes in phloem-feeding insects." Journal of Insect Physiology 79: 105-112.

Valenzuela, I. and A. A. Hoffmann (2013). Scoping study for further R&D on managing aphids and virus transmission and economic impact of IPM in grain production zones. Barton, ACT, Australia, Grains Research Development Corporation.

Valenzuela, I. and A. A. Hoffmann (2015). "Effects of aphid feeding and associated virus injury on grain crops in Australia." Austral Entomology 54(3): 292-305. van Ham, R. C., et al. (2003). "Reductive genome evolution in Buchnera aphidicola." Proc Natl Acad Sci U S A 100(2): 581-586.

Velasco, P., et al. (2008). "Comparison of Glucosinolate Profiles in Leaf and Seed Tissues of Different Brassica napus Crops." Journal of the American Society for Horticultural Science 133(4): 551-558.

Vogel, K. J. and N. A. Moran (2013). "Functional and Evolutionary Analysis of the Genome of an Obligate Fungal Symbiont." Genome Biol Evol 5(5): 891-904. von Bubnoff, A. (2008). "Next-Generation Sequencing: The Race Is On." Cell 132(5): 721- 723.

Von Dohlen, C. (2000). "Molecular data support a rapid radiation of aphids in the Cretaceous and multiple origins of host alternation." Biological Journal of the Linnean Society 71(4): 689- 717. von Dohlen, C. D., et al. (2006). "A test of morphological hypotheses for tribal and subtribal relationships of Aphidinae (Insecta: Hemiptera: Aphididae) using DNA sequences." Mol Phylogenet Evol 38(2): 316-329.

Wang, Z., et al. (2009). "RNA-Seq: a revolutionary tool for transcriptomics." Nat Rev Genet 10(1): 57-63.

Wang, Z., et al. (2016). Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: prediction accuracy of sampling power and scoring power.

Ward, K. (1934). "The green peach aphid (Myzus persicae Sulzer) in relation to the peach in Victoria and the measures investigated for its control." J. Agric., Victoria 32: 97-104.

Watt, S. (2016) Green peach aphid confirmed resistant to another insecticide.

Wen, Z., et al. (2003). "Metabolism of linear and angular furanocoumarins by Papilio polyxenes CYP6B1 co-expressed with NADPH cytochrome P450 reductase." Insect Biochem Mol Biol 33(9): 937-947.

Wen, Z., et al. (2006). "CYP6B1 and CYP6B3 of the black swallowtail (Papilio polyxenes): adaptive evolution through subfunctionalization." Mol Biol Evol 23(12): 2434-2443.

271 | P a g e

Werker, E. and J. Vaughan (1977). "Ontogeny and distribution of myrosin cells in the shoot of Sinapis alba L.: a light and electron microscope study." Israel journal of botany.

Wheat, C. W., et al. (2007). "The genetic basis of a plant–insect coevolutionary key innovation." Proceedings of the National Academy of Sciences 104(51): 20427-20431.

Whiteman, N. K., et al. (2012). "Genes Involved in the Evolution of Herbivory by a Leaf- Mining, Drosophilid Fly." Genome Biol Evol 4(9): 900-916.

Whyard, S., et al. (2009). "Ingested double-stranded RNAs can act as species-specific insecticides." Insect Biochem Mol Biol 39(11): 824-832.

Wilkinson, T., et al. (1997). "Honeydew sugars and osmoregulation in the pea aphid Acyrthosiphon pisum." Journal of Experimental Biology 200(15): 2137-2143.

Wilkinson, T. L. and A. E. Douglas (1998). "Host cell allometry and regulation of the symbiosis between pea aphids, Acyrthosiphon pisum, and bacteria, Buchnera." Journal of Insect Physiology 44(7-8): 7.

Williams, R. T. (1959). Detoxification Mechanisms. New York, John wiley & sons INC.

Wittstock, U., et al. (2004). "Successful herbivore attack due to metabolic diversion of a plant chemical defense." Proc Natl Acad Sci U S A 101(14): 4859-4864.

Wojciechowski, W. (1992). Studies on the systematic system of aphids (Homoptera, Aphidinea), Katowice.

Wosilait, W. D. and A. Nason (1954). "PYRIDINE NUCLEOTIDE-MENADIONE REDUCTASE FROM ESCHERICHIA COLI." Journal of Biological Chemistry 208(2): 785- 798.

Wynant, N., et al. (2014). "Scavenger receptor-mediated endocytosis facilitates RNA interference in the desert locust, Schistocerca gregaria." Insect Mol Biol 23(3): 320-329.

Wynant, N., et al. (2014). "Identification, functional characterization and phylogenetic analysis of double stranded RNA degrading enzymes present in the gut of the desert locust, Schistocerca gregaria." Insect Biochem Mol Biol 46: 1-8.

Xue, J., et al. (1995). "The myrosinase gene family in Arabidopsis thaliana: gene organization, expression and evolution." Plant molecular biology 27(5): 911-922.

Xue, J., et al. (1993). "Temporal, cell-specific, and tissue-preferential expression of myrosinase genes during embryo and seedling development in Sinapis alba." Planta 191(1): 95-101.

Yamamoto, K. and N. Yamada (2016). "Identification of a diazinon-metabolizing glutathione S-transferase in the silkworm, Bombyx mori." 6: 30073.

Yang, J., et al. (2012). "A sigma-class glutathione S-transferase from Solen grandis that responded to microorganism glycan and organic contaminants." Fish & Shellfish Immunology 32(6): 1198-1204.

272 | P a g e

Yang, N. J., et al. (2017). "Cytosolic delivery of siRNA by ultra-high affinity dsRNA binding proteins." Nucleic Acids Research 45(13): 7602-7614.

Young, G. M., et al. (1999). "A new pathway for the secretion of virulence factors by bacteria: The flagellar export apparatus functions as a protein-secretion system." Proceedings of National Academy of Science:Microbiology 96: 6456–6461.

Yu, S. J. (1987). "Quinone reductase of phytophagous insects and its induction by allelochemicals." Comparative Biochemistry and Physiology Part B: Comparative Biochemistry 87(3): 621-624.

Zhang, M., et al. (2013). "Identifying potential RNAi targets in grain aphid (Sitobion avenae F.) based on transcriptome profiling of its alimentary canal after feeding on wheat plants." BMC Genomics 14(1): 560.

Zhang, W., et al. (2017). "Crystal structure of the nitrile-specifier protein NSP1 from Arabidopsis thaliana." Biochemical and Biophysical Research Communications 488(1): 147- 152.

Zhang, Y., et al. (2015). "Cloning and RNA interference analysis of the salivary protein C002 gene in Schizaphis graminum." Journal of Integrative Agriculture 14(4): 698-705.

Zhao, J. H., et al. (2011). "Benefits of Bt cotton counterbalanced by secondary pests? Perceptions of ecological change in China." Environmental Monitoring and Assessment 173(1): 985-994.

Zhu-Salzman, K., et al. (2004). "Transcriptional regulation of sorghum defense determinants against a phloem-feeding aphid." Plant Physiology 134(1): 420-431.

Ziegler, H. (1975). Nature of transported substances. Transport in Plants I, Springer: 59-100.

Zou, X., et al. (2016). "Glutathione S-transferase SlGSTE1 in Spodoptera litura may be associated with feeding adaptation of host plants." Insect Biochem Mol Biol 70(Supplement C): 32-43.

Züst, T. and A. A. Agrawal (2016). "Mechanisms and evolution of plant resistance to aphids." 2: 15206.

273 | P a g e

|| ् प्रकृ ेववशा ﴂकृत्स्नमवश ﴂस्वामवष्टभ्य तवसृजातम पुन: पुन:| भू ग्रामतमम ﴂ प्रकृत

– श्रीमद्भगवदगी ा

Curving back within myself I create again and again

– The Bhagavad Gita

Thank you!

274 | P a g e

Minerva Access is the Institutional Repository of The University of Melbourne

Author/s: Ghodke, Amol Bharat

Title: Understanding aphids: transcriptomics, molecular evolution and pest control

Date: 2018

Persistent Link: http://hdl.handle.net/11343/213475

File Description: Complete Thesis

Terms and Conditions: Terms and Conditions: Copyright in works deposited in Minerva Access is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only download, print and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works.