Running title: Soldier salivary gland transcriptome

Title for author: K. Etebari et al.

Correspondence: Kayvan Etebari, School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia.

Email: [email protected]

Original article

Australian sugarcane soldier fly’s salivary gland transcriptome in response to starvation and feeding on sugarcane crops

Kayvan Etebari1, Karel. R. Lindsay2 Andrew L. Ward3and Michael J. Furlong1

1 School of Biological Sciences, The University of Queensland, Brisbane QLD, Australia, 2 Sugar

Research Australia, 26135 Peak Downs Highway, Te Kowai QLD 4740 Australia, 3 Sugar

Research Australia, 50 Meiers Rd, Indooroopilly QLD 4068

Abstract

The soldier fly is an endemic pest of sugarcane in Australia. Small numbers of larvae can cause significant damage to roots and reduce the crop yields. Little is known about the composition and function of the soldier fly salivary gland, its secretions, and their roles in ‒plant interactions. In

This is an Accepted Article that has been peer-reviewed and approved for publication in the Insect Science but has yet to undergo copy-editing and proof correction. Please cite this article as doi: 10.1111/1744-7917.12676.

This article is protected by copyright. All rights reserved.

this study, we performed transcriptome analysis of the salivary glands of starved and sugarcane root-fed soldier fly larvae. A total of 31 119 highly expressed assembled contigs were identified in the salivary glands and almost 50% of them showed high levels of similarity to known proteins in Nr databases. Of all the obtained contigs, only 9727 sequences contain an open reading frame of over 100 amino acids. Around 31% of contigs were predicted to encode secretory proteins, including some digestive and detoxifying enzymes and potential effectors. Some known salivary secreted peptides such as serine protease, cysteine proteinase inhibitors, antimicrobial peptides and venom proteins were among the top 100 highly expressed genes. Differential gene expression analysis revealed significant modulation of 850 transcripts in salivary glands upon exposure to plant roots or starvation stress. Here, we identified some venom proteins which were significantly upregulated in the salivary glands of soldier fly larvae exposed to sugarcane roots. In other and nematodes some of these proteins have been used to manipulate host plant defense systems and facilitate the invasion of the host plant. These findings provide a further insight into the identification of potential effector proteins involved in soldier fly‒sugarcane interactions.

Keywords: , salivary gland, soldier fly, sugarcane pest, transcriptome

Introduction:

The soldier , Macquart (Diptera: ) and Inopus flavus (James) are endemic to Australia where they are economically important insect pests of sugarcane (Saccharum officinarum L.) (Allsopp & Robertson, 1988). The distribution of I. rubriceps extends throughout eastern Queensland and New South Wales and there are also established populations in New Zealand and California, USA (Robertson, 1986). The known distribution of I. flavus is more limited to local areas in eastern central Queensland (Hitchcock, 1976) and the damage they cause has increased recently. Soldier fly larvae are slow growing and most individuals develop through 8‒9 larval instars after hatching and reach the adult stage within one year. If larvae fail to pupate in the autumn (March to May in Queensland) they complete their development over two years after passing through an additional 3 or 4 instars (Hitchcock, 1976). The adult lifespan for both males and females is less than five days. In sugarcane crops, the larvae of both species cause economic damage by feeding on sugarcane roots, resulting in poor sugarcane yields, inhibition of bud development and germination, and reduced ratoons after harvest (Samson, 2001). Soldier fly pest management is difficult in sugarcane crops as insecticides are ineffective and no crop varieties are tolerant to 2

This article is protected by copyright. All rights reserved.

feeding. The development of improved pest management strategies requires a better understanding of the relationship between soldier fly and its sugarcane host plants.

The underlying mechanisms responsible for the detrimental effects of solider fly feeding on sugarcane remain very poorly understood. Hitchcock (1976), attributed poor germination and ratooning to the mechanical damage caused by larvae excavating cavities in roots (Hitchcock, 1976). In barley, Fellowes (1975) demonstrated that increasing densities of soldier fly larvae feeding on the roots caused decreased shoot production and root mass but considered the cause to be a loss of nutrients (Fellowes, 1975). The high levels of damage that the above ground parts of sugarcane crops can suffer following apparently low levels of feeding damage to roots has led to speculation that soldier fly feeding leads to the introduction of toxic chemicals or plant growth inhibitors into the roots but up to now there has been no evidence to support such hypotheses (Samson, 2001).

Sugarcane soldier larvae have specialized mouthparts consisting of two hooked mandible-like structures that operate on the vertical plane and excavate asymmetrical cone shaped cavities in roots (Fellowes, 1975; Hitchcock, 1976) where larvae bury their heads to feed on fluids. Interactions between plants and insects with piercing-sucking mouthparts are similar to those between plants and plant pathogens during the infection process, and the interplay between host plants and larvae corresponds the plant–pathogen model proposed by Jones and Dangl (Jones & Dangl, 2006). This model has been examined in many other sap sucking insects and host plant interactions (Stuart, 2015; Zhang et al., 2017).

Plant pathogens and insects with sucking mouthparts can deliver effectors into the host plant to manipulate plant immunity. Salivary glands release proteins that aid feeding, external digestion and the suppression of host plant defenses (Rivera-Vega et al., 2017). In the Diptera, human blood feeding disease vectors such as Anopheles gambiae (Culicidae) emit salivary proteins that function as anti-collagens in their hosts and aid digestion (Arca et al., 2005). Agricultural pests including Mayetiola destructor (Cecidomyiidae) (Shukle et al., 2009) and Sitodiplosis mosellana (Cecidomyiidae) (Al-Jbory et al., 2018b) produce effector proteins to manipulate host growth and metabolism to form plant outgrowths (galls) which benefit the insect with enhanced nutrition and reduced plant defenses (Chen et al., 2010). The salivary glands in insects are also used for the transmission of plant pathogens including viruses, bacteria, mycoplasma-like bodies, and phytoplasmas, which severely impact agricultural industries (Sugio et al., 2015; Kaur et al., 2016). Soldier flies have a pair of salivary glands positioned ventro-laterally in the thorax. These are known to secrete the digestive enzyme invertase, which suggests that larvae are feeding on the phloem 3

This article is protected by copyright. All rights reserved.

(Fellowes, 1975). However, there is no information on other proteins that may be produced by soldier fly salivary glands or on micro-organisms that might be associated with them and transmitted to host plants during feeding.

In this study, a transcriptomic approach was developed to characterize the composition of salivary glands in soldier fly larvae. This cutting-edge approach improved our understanding of the insect-plant interaction as it enabled the global gene expression profile in soldier fly salivary glands to be investigated. Comparison of the gene expression profiles of salivary glands from insects that had been exposed to host plants with those of salivary glands from insects that had been held in isolation of host plant roots produced a list of differently expressed genes. We also screened all soldier fly transcripts to identify genes that potentially encode a secretory protein. Proteins with hypothetical secretory signal peptides and without any transmembrane domains that are overexpressed during root feeding are more likely to pass through the cellular membrane and show potential effector activity. Our results provided greater understanding of the molecular mechanism behind soldier fly feeding on sugarcane roots and provide data for future identification of possible effector proteins in closely related insects.

Material and methods

Experiment set up

Soldier fly (Inopus falvus) larvae were collected from an infested sugarcane field near Hay Point, Queensland (21°18' 5"S, 149 °14' 7"E). In February 2018, stools were dug from the ground and large larvae were manually collected from the roots and associated soil. Larvae were transferred to aerated 480 mL polypropylene containers with soil collected from the same sugarcane field and transported to the laboratory and used in experiments within 2 days.

To examine direct interactions between larvae and plant roots, individual sugarcane seedlings (cultivar Q208) were transplanted into six 50 ml Falcon tubes filled with 70% peat and 30% sand and grown in sunlight for three weeks to develop a root system. To avoid exposure of the roots to light, all falcon tubes were wrapped with aluminum foil. Seedlings were irrigated with 5ml water every second day. Determination of larval instar is difficult in this species. We used larvae that were ~10 mm long, this is typically the size of later, 5th and 6th instar, larvae. Ten larvae were then introduced to the surface of the soil in each tube and allowed to move down towards the roots. The plants and larvae were kept in incubator at 25°C with 16L:8D light regime. Larval activity and feeding behaviour were

4

This article is protected by copyright. All rights reserved.

monitored under a binocular microscope on a daily basis for two weeks before larvae were carefully removed. Feeding on plant roots was observed but no significant damage or symptoms were detected on the roots of experimental plants. One week after larvae were introduced to the sugarcane seedlings, six groups of 10 larvae (10‒15 mm long) were introduced into Petri dishes (90 mm diameter) lined with moisturised filter paper. These insects had no access to sugarcane roots or any other sources of foods for a week and were incubated at 25°C with a 16L:8D light regime.

Salivary gland dissection and RNA extraction

The larval body surfaces were disinfected by soaking in 75% ethanol for 30 s and then rinsing in phosphate-buffered saline (PBS) before dissecting out the salivary glands. The salivary glands (SG) were extracted by pulling out the head capsule and removing all other tissues, such as fat body droplets. The SG tissue from 20 larvae (representing one biological replicate) were pooled together and transferred to Qiazol lysis reagent for RNA extraction according to manufacture instruction (QIAGEN; Cat No.: 79306). The RNA samples were treated with DNase I for 1 h at 37°C and then their concentrations were measured using a spectrophotometer and integrity was ensured through analysis of RNA on a 1% (w/v) agarose gel. After checking the RNA quality, total RNA samples from three biological replicates of root-exposed and control (starved) larvae were submitted to the Australian Genome Research Facility (AGRF) for next generation RNA deep sequencing (RNA-seq). The PCR-based cDNA libraries were prepared using the illumine TrueSeq cDNA library construction kit. cDNA from both sets of samples were sequenced using Illumina HiSeq 4000 paired read (75×75 bp) technologies with an average fragment size of 350 bp and insert size of 230 bp.

De novo assembly and data analysis

The CLC Genomics Workbench version 11.01 was used for bioinformatics analyses. All libraries were trimmed from any vector or adapter sequences remaining. Low quality reads (quality score below 0.05) and reads with more than two ambiguous nucleotides were discarded. The genome sequence of this species is not available and we used a de novo assembly approach with kmer size 45, bubble size 50 and minimum contig length 150 bp to process this data. The contigs were corrected by mapping all small reads against the assembled sequences (min. length fraction=0.9, maximum mismatches=2) and unmapped reads were retained for another assembly. The corrected and combined contigs arising from the two de novo assemblies were used as a reference set of transcripts for

5

This article is protected by copyright. All rights reserved.

RNAseq analysis. Short-read sequence data from the SG of root-exposed and control (starved) larvae was separately mapped against the reference set of assembled transcripts using the CLC Genome Workbench RNAseq function (min. length fraction=0.9, maximum mismatches=2, insertion cost=3, deletion cost=3) on a non-strand specific option with maximum 10 hits for a read.

The relative expression levels were output as RPKM (Reads Per Kilobase of exon model per Million mapped reads) values, which considers the relative size of the transcripts and only uses the mapped-read datasets (i.e. it excludes the non-mapped reads) to determine relative transcript abundance. In this way, the output for each dataset can be directly compared as the number of mapped reads per dataset and transcript size has already been considered. To explore the differential expression profile between two groups of samples in CLC Genomic Workbench, each gene is modelled by a separate Generalized Linear Model (GLM). The use of the GLM formalism allows the fitting of curves to expression values without assuming that the error on the values is normally distributed. The Wald Test was also used to compare each sample against its control group to test whether a given coefficient is non-zero. We considered genes with more than 2-fold changes and a false discovery rate (FDR) of less than 0.05 as significantly modulated genes. We used signalP 4.1 server (Petersen et al., 2011) to screen transcripts for putative secretory proteins based on the presence of N-terminal signal peptide cleavage site and the absence of transmembrane domain in contig sequences.

Gene Ontology (GO) analysis

All differentially expressed genes were uploaded to the Blast2GO server for functional annotation and GO analysis. We used Blast, enzyme classification codes (EC), and InterProScan algorithms to reveal the GO terms of differentially expressed sequences. More abundant terms were computed for each category of molecular function, biological process and cellular components. Blast2GO has integrated the FatiGO package for statistical assessment and this package uses the Fisher’s Exact Test.

Accession number

Deep sequencing data have been deposited in the National Centre for Biotechnology Information’s (NCBI’s) Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE127658. 6

This article is protected by copyright. All rights reserved.

Results and Discussion

RNAseq data analysis

The Illumina Hiseq 4000 produced approximately 374.5 and 416.8 million paired-end reads from RNA extracted from the salivary glands of starved (control) larvae and larvae with access to sugarcane roots respectively (Table 1). De novo assembly produced 34 684 contigs from all 6 libraries in the first assembly run with N50 of 2015 bp and maximum length of 23 672 bp (Table 2). The reads from each library mapped against the preliminary reference contigs and around 95% of reads mapped back to the reference. Unmapped reads were collected for another assembly and 48 286 contigs with N50 of 297 bp were produced. The combined 82 970 contigs were used for downstream analysis. Of these, 74% were between 100‒499 bp and only 13% of the contigs were above 1000 bp. This high coverage enabled the majority of the expressed transcripts to be detected and the generation of a precise view of gene expression in salivary gland tissue of solider fly larvae.

Gene expression profile of Soldier fly’s salivary gland

The contigs with more than 10 reads in at least one library were kept for further analysis and therefore 31 119 contigs were classified as highly expressed transcripts. A similarity search revealed that almost 50% of these sequences did not produce any hits against NCBI protein database (nr) in BLASTx and only 9727 sequences contained an open reading frame of over 100 amino acids. This suggests that they are either soldier fly specific sequences or the product of non-coding regions of the genome. Table S1 summarized top 200 highly expressed sequences after removing all contigs with similarity to ribosomal proteins and insect RNA viruses. Australian sheep blowfly (Lucilia cuprina) and Mediterranean fruit fly (Ceratitis capitata) were the top hit species for BLAST. A sequence similar to a spider venom protein NPTX_B154 (KX023305.1) is also among the top 15 highly expressed transcripts in all six libraries. This small venom protein has been identified in the giant golden orb weaver spider (Nephila pilipes) and shows similarity to a potential neurotoxin protein in scorpions (Luna-Ramirez et al., 2013; Diego-Garcia et al., 2014). We don‟t have access to the full length of this conting to assess its potential secretory function, however it will be a good candidate for further investigation. Recently, a similar venom protein has been reported in the saliva of the midge (Mayetiola avenae, Diptera: Cecidomyiidae) as a non-secreted protein (Al-Jbory et al., 2018a).

7

This article is protected by copyright. All rights reserved.

This soldier fly venom protein showed an 82% identity match with the homologous protein in N. pilipes, however, the activities of these molecules might be different in the two organisms.

Identification of soldier fly transcripts with potential secretory function

Most of the salivary gland proteins secreted by insects are small peptides (50–150 amino acid residues) and typically, these do not share sequence similarity with any known proteins in available databases (Matsumoto et al., 2014; Al-Jbory et al., 2018b). Thus, the low level of homology between detected sequences and previously reported sequences is not unexpected as the gene products are involved in specific host plant-insect interactions. Some of these sequences are ubiquitous and they can be found in different tissues but some are expected to be salivary gland tissue specific and crucial for effective and stable feeding on sugarcane host plants. Previous proteomics studies show around 60% of transcripts in Dipteran salivary glands encode secreted saliva proteins which can manipulate host plants and some of them work as effector proteins that promote insect virulence in compatible interactions (Chen et al., 2008). Many of these “effector proteins” have been identified in different insects and they act either to suppress or to trigger plant defense responses.

Although the assembled contigs by the RNAseq approach do not represent the full-length genes in most cases, presence of an N-terminal signal peptide cleavage site and absence of transmembrane domains in some of those sequences can provide information about their potential secretory ability. In this study, we screened transcripts based on their structural domains to identify putative secretory proteins. Around 31% of highly-expressed contigs were predicted to encode secretory proteins, including some digestive and detoxifying enzymes and potential effectors. However, lack of these signals does not exclude them from any potential secretory function and they might still be able to encode a secretory protein. Among the top 100 highly-expressed transcripts there are some sequences that encode known salivary secreted peptides, viz. serine protease, cysteine proteinase inhibitors and venom proteins (Table S1).

A previous study showed that the inhibitory effect of soldier fly larval feeding on sugarcane ratooning was not rescued when larvae were removed from pots 10 weeks before harvest and suggested that this was evidence for the injection of some inhibitory substance by larvae (Samson, 2001). Salivary gland proteins with secretory signal peptides cleavage site and absence of transmembrane domain are hypothetically directed towards the secretory pathways and show their impact in the host plant tissue (Zhang et al., 2017). It is possible that these proteins are injected into

8

This article is protected by copyright. All rights reserved.

sugarcane root tissue when soldier fly larvae feed but further investigations are required to support this and to demonstrate specific impacts on sugarcane plants. The identification of such proteins will also facilitate comparative analyses of insect effectors from closely related species.

Injection of some effectors into host tissues can facilitate pre-digestion of food before it is ingested. For example, hydrolases can be responsible for the pre-oral digestion of food before ingestion (Miles & Taylor, 1994) and these enzymes have been reported as major components of the salivary gland proteome in many (Moreira et al., 2017; Al-Jbory et al., 2018b). The enzyme code distribution among contigs with enzyme activity shows that a substantial number of sequences have hydrolase activity and among these enzymes acting on acid anhydrides (EC:3.6.), peptidase (EC:3.4.) and esterase (EC:3.1.) activity are the most abundant enzymes in the soldier fly salivary gland transcriptome (Figure 1). The second main group of identified enzymes are Transferases with the majority of them transferring phosphorous-containing groups (EC:2.7.). Glutathione S-transferases and esterase have been suggested as important enzymes for detoxification of plant secondary metabolites in insects and have been found in many insect salivary glands (Liu et al., 2016; Zhang et al., 2017).

Gene ontology and conserved domain analysis

Gene Ontology (GO) analysis showed that the majority of salivary gland transcripts participated in organic substance metabolic process (12.4%), primary (11.6%) cellular metabolic process (11.8%) and nitrogen compound metabolic process (Figure 2 A). Organic cyclic and heterocyclic compound binding (16.78% and 16.77%) are the most abundant GO terms among level 3 biological function (Fig. 2B). Cellular components GO term distribution shows more than 50% of sequences are classified as intracellular related.

The top 50 identified protein domains in soldier fly salivary glands are summarized in Table 3. Soldier fly salivary gland transcripts are enriched with zinc finger domains. Zinc finger proteins (ZFP) were initially described as a DNA binding motif but a wider range of molecular functions have been discovered in the recent years. It has been shown even in plant C2H2 zinc finger transcription factors are regulated by abiotic stress and also during insect herbivory (Lawrence et al., 2014). This group of proteins are one of the largest nucleic acid binding factors in higher eukaryotes and in silico screening of Drosophila genome identified 326 putative ZFPs (Chung et al., 2002). Transcripts with

9

This article is protected by copyright. All rights reserved.

zinc finger domains has been reported as non-secretory salivary gland proteins in wheat midge, recently (Al-Jbory et al., 2018b).

Protein kinase is another abundant detected domain in soldier fly salivary glands. These proteins modify the structure of other proteins by phosphorylation and they are responsible for wound-induced jasmonic acid (JA) accumulation in plants in response to insect herbivory (Bonaventure et al., 2011). They have been reported in the saliva of a wide range of insects such as the black fly (Simulium vittatum), rice brown planthopper (Nilaparvata lugens), the green rice leafhopper (Nephotettix cincticeps) and white-backed planthoppers (Sogatella furcifera) (Andersen et al., 2009; Matsumoto et al., 2014; Li et al., 2016; Liu et al., 2016). In the grain aphid (Sitobion avenae) salivary gland, the mitogen-activated protein kinase signaling pathway has been identified as one of the KEGG pathways with highest number of unigenes (Zhang et al., 2017).

Differential gene expression profile and putative effector proteins in soldier fly salivary glands

Differential gene expression analysis revealed significant modulation (fold change >2; FDR P-value ≤ 0.05) of 850 transcripts in salivary glands when larvae were exposure to sugarcane roots or subject to starvation stress (Fig. 3 and Table S2). 490 of these 850 transcripts showed some level of similarity with known genes in databases. Exposure to the plant root resulted in overexpression of 309 transcripts in the salivary glands of soldier fly larvae, most of these genes are involved in metabolic process, catalytic and hydrolase activity (Fig. 4). Presence of an N-terminal signal peptide cleavage site and absence of transmembrane domains were detected in only 145 upregulated contigs. Starvation stress resulted in overexpression of some potential stress related genes (181 transcripts) in larvae and the suppression of some genes involved in metabolic pathways (Fig. 4).

Here, we identified some venom proteins which were significantly upregulated (Log2 FC >11) in the salivary glands of soldier fly larvae exposed to sugarcane roots. We couldn‟t detect the N- terminal signal peptide cleavage site on some these contigs, this could well be explained by their incomplete sequences. These proteins, including salivary secreted antigen 5-related protein (AG5), venom allergen 3-like proteins and venom carboxylesterase-6, have previously been reported from plant parasitic nematodes and herbivorous insects. Many plant parasitic nematodes secrete proteins similar to “venom allergen 3” which facilitate host plant invasion (Cooper & Eleftherianos, 2016); the molecules could well have a similar function in soldier fly larvae. Based on previous studies of effector protein in other phytopathogenic organisms, Carolan et al. (2011) predicted the effector

10

This article is protected by copyright. All rights reserved.

role for some proteins in the salivary secretome of the pea aphid (Acyrthosiphon pisum) (Carolan et al., 2011). These findings show significant similarities between the saliva from plant-feeding nematodes and sap sucking insects that may indicate the evolution of common solutions to the plant-parasitic lifestyle (Carolan et al., 2011).

Significant modification in transcript levels of some digestive enzymes and proteins in the salivary glands of larvae that were exposed to sugarcane roots are evidence of active feeding and enhanced metabolism in soldier fly larvae. L-xylulose reductase, sugar transporter protein and facilitated trehalose transporter (Tret1) are all involved in carbohydrate metabolism and were drastically over expressed in fed larvae (Log2 FC >4). Similarly, chymotrypsin was highly upregulated in soldier fly salivary glands in response to feeding on sugarcane roots (Log2 FC >7). Chymotrypsin is a serine protease enzyme involved in many different pathways, especially in blood- sucking insects (e.g. mosquitoes, assassin bugs (Triatoma) and the paralysis tick, Ixodes holocyclus)(Rodriguez-Valle et al., 2018) where they degrade elastin and other components in extracellular matrix during feeding. Insect herbivores chymotrypsins can possibly be used to cleave internal peptide bonds in plant proteins.

The larger complements of insect saliva proteins are essential to combat plant secondary metabolites and a variety of defenses. Sarcocystatin-A is a cysteine proteinase inhibitor which was overexpressed in soldier fly larvae after exposing to the plant roots (Log 2 FC >10). Previous studies in flesh-fly, Sarcophaga peregrine, showed that Sarcocystatin-A is an important protein involved in metamorphosis and in protecting adult tissues from attack by humoral proteinases released from decomposing larval tissues (Suzuki & Natori, 1986). Synthesizing protective proteins such as cysteine protease, to deter insect feeding is a common defence mechanism to herbivorous insect attack in plants and protease inhibitors could neutralize these compounds from host plants (Furstenberg- Hagg et al., 2013). It has been shown that cysteine protease has a negative impact on insect herbivores and over expression of this protein in can trigger a resistance to caterpillar feeding (Pechan et al., 2002).

Another gene with significant overexpression after exposure to food is similar to Myeloblastosis (MYB-like) proteins (Log 2 FC >6). This protein is a member of a family of gene transcription factors but their function in the insect salivary gland is still unknown. MYB proteins are known to play an important role in response to herbivore attack and biotic stress in plants

11

This article is protected by copyright. All rights reserved.

(Ambawat et al., 2013) and further investigations are required to identify their function in insect saliva.

Highly expressed antimicrobial peptides

There is increasing evidence that bacterial communities can colonize the salivary glands of herbivorous insects (Sugio et al., 2015) and therefore overexpression of anti-microbial peptides in this tissue is not unexpected. Metagenomic studies in recent years have revealed plant- or insect- associated microbes to be instrumental in plant–insect interactions. Bacteria may regulate plant primary and secondary metabolisms and/or plant defence systems against insects for the benefit of either plants or insects (Sugio et al., 2015).

It has been shown that the salivary glands of Dipteran insects fed on high sugar content diets, contain glycosidases and antimicrobial polypeptides that help sugar digestion and may prevent microbial growth in the food (Andersen et al., 2009). We also detected the highly expressed anti- microbial peptide sequence similar to Cecropin in one of the libraries. These peptides showed significant homology to black soldier fly (Hermetia illucens) Cecropin-like peptide 2 and 3 (AHH35096 and AFD96564). Several antimicrobial peptides belonging to the cecropin family have been reported from the salivary glands of black fly (Simulium bannaense) and horsefly (Tabanus yao) (Wei et al., 2015). Cecropins destroy bacteria through the disruption of cell membrane integrity and infection of black flies with bacteria caused an upregulation of the expression of Cecropin (Andersen et al., 2009).

Peritrophin which is one of the components of the peritrophic matrix is also overexpressed in the salivary glands of larvae exposed to food (log 2 FC>6). Binding to Gram-negative bacteria and also strong binding activity to chitin indicate the involvement of this protein in insect immune response. Although this protein protects insects from invasion of microorganisms and to stimulate digestion of food in the mid gut, new studies have found the presence of Peritrophin in insect salivary glands (Baek & Lee, 2014; Huang et al., 2018). Attacin is a glycine-rich antimicrobial peptide which is differentially expressed in soldier fly salivary glands (Log2 FC>3). Expression of antimicrobial peptide genes with an anti-inflammatory function are more common on hematophagous arthropods but it has been reported from salivary gland of other herbivores insects (Wei et al., 2015). Recently, it has been shown that feeding on different types of host plants can also induce overexpression of attacin in salivary glands of the tobacco hornworm (Manduca sexta) larvae (Koenig et al., 2015).

12

This article is protected by copyright. All rights reserved.

Conclusion

In this study, a transcriptomic approach was developed to characterize the composition of salivary glands in soldier fly larvae. This cutting-edge approach improved our understanding of the insect- plant interaction as it enabled us to produce the first gene expression profile in soldier fly salivary glands. Although we identified noticeable differential gene expression in the salivary glands of starved and fed soldier fly larvae, further comprehensive investigations are required to characterise the proteins that these genes code for, using chromatography and functional studies in sugarcane plants. There are many other sequences in the solder fly transcriptome which as yet have completely unknown functions. These need to be identified and their role in the interaction between soldier fly and its sugar cane host plant investigated.

Acknowledgments

This project was supported by Sugar Research Australia funding (SRA-00504). We would also like to thank Manda Khudhir from Sugar Research Australia for assistance in collecting soldier fly larvae for experiments.

Author contributions:

Conceptualization: K.E., AW, and M.J.F; Investigation: K.E., K.R.L. and M.J.F; Data curation: K.E.;

Formal analysis: K.E; Writing-original draft: K.E., K.R.L and M.J.F.; Editing manuscript: AW and M.J.F.

Figure legend

13

This article is protected by copyright. All rights reserved.

Figure 1: The enzyme code distribution among all identified enzyme in soldier fly assembled contigs. Substantial number of sequences have hydrolase and transferase activities. Enzymes acting on acid anhydrides (EC:3.6.), peptidase (EC:3.4.), esterase (EC:3.1.) activity and transferring phosphorous-containing groups (EC:2.7.) are the most abundant enzymes in soldier fly salivary gland transcriptome.

Figure 2: GO term enrichment analysis of 31 119 highly expressed A) Biological process B)

Molecular function C) Cellular component.

14

This article is protected by copyright. All rights reserved.

Figure 3: Volcano plot analysis. Red circles indicate differentially expressed transcripts in response to starvation stress (Fold change >2 and FDR < 0.05).

Figure 4: GO term enrichment analysis of 850 differentially expressed transcript in response starvation stress A) Biological process B) Molecular function C) Cellular component.

15

This article is protected by copyright. All rights reserved.

Table 1. Reads mapping parameters

Sample Raw Read Trimmed Trimmed Avg. Total reads Mapped Mapped Unmapped trimmed mapped to in in broken read length read (%) (%) read assembled pairs (%) pairs (%) length reference

Starved - 126 75 125 396 99.25% 74.0 119 251 958 66.60 29.19 4.21 Control R1 342 126 434

Starved - 133 75 132 330 99.29% 73.5 126 780 708 37.72 3.53 Control R2 282 752 756 58.75

Starved - 114 75 113 887 99.19% 74.5 108 148 394 4.29 Control R3 820 477 728 59.84 35.87

Exposed to 133 75 132 519 99.24% 73.5 125 783 957 29.4 4.38 Root – R1 537 228 124 66.21

Exposed to 157 75 156 302 99.28% 74.5 149 151 359 30.38 3.91 Root – R2 429 620 080 65.71

Exposed to 125 75 124 914 99.27% 74.0 119 343 795 30.4 3.79 Root – R3 827 111 344 65.81

Table 2. Contig measurements (sequence length) of de novo assembled libraries.

Sample N75* N50** N25* Minimum Maximum Average Count Total

Starved -Control R1 27 170 623 1692 3365 150 52 968 802 33 863 625

Starved -Control R2 25 575 609 1624 3178 150 48 683 787 32 496 245

Starved -Control R3 25 592 480 1380 2769 150 21 538 686 37 333 973

Exposed to Root – 26 744 R1 591 1613 3202 150 28 839 778 34 390 369

Exposed to Root – 570 1598 3257 150 44 059 759 36 346 27 598

16

This article is protected by copyright. All rights reserved.

R2 843

Exposed to Root – 22 067 R3 592 1437 2550 150 10 133 755 29 226 971

Combined libraries 36 245 (Set 1) 864 2015 3833 150 23 672 1045 34 684 005

Unmapped to Set 1 14 134 (Set 2) 221 297 439 100 3071 293 48 286 545

* The N75 and N25 are the length for which the collection of all contigs of that length or longer contains at least 75% and 25% of the sum of the lengths of all contigs, respectively. ** The N50 is defined as the sequence length of the shortest contig at 50% of the total assembly length.

Table 3. InterProScan top 50 protein domain abundance among identified domain in soldier fly salivary glands.

Inter Pro Scan ID Domain name # sequences Domain abundance (%)

IPR013087 Zinc finger C2H2-type 461 3.5

IPR000719 Protein kinase domain 248 1.9

IPR017986 WD40-repeat-containing domain 183 1.4

IPR020846 Major facilitator superfamily domain 156 1.2

IPR000477 Reverse transcriptase domain 154 1.2

IPR001584 Integrase, catalytic core 146 1.1

IPR000504 RNA recognition motif domain 144 1.1

IPR012934 Zinc finger, AD-type 123 0.9

IPR001254 Serine proteases, trypsin domain 122 0.9

IPR007110 Immunoglobulin-like domain 121 0.9

IPR001841 Zinc finger, RING-type 105 0.8

IPR020683 Ankyrin repeat-containing domain 91 0.7

IPR002557 Chitin binding domain 88 0.7

17

This article is protected by copyright. All rights reserved.

IPR001849 Pleckstrin homology domain 87 0.7

IPR001650 Helicase, C-terminal 85 0.6

IPR001452 SH3 domain 85 0.6

IPR014001 Helicase superfamily 1/2, ATP-binding domain 85 0.6

IPR013098 Immunoglobulin I-set 79 0.6

IPR002048 EF-hand domain 74 0.6

IPR003439 ABC transporter-like 73 0.6

IPR001478 PDZ domain 73 0.6

IPR000210 BTB/POZ domain 67 0.5

IPR001878 Zinc finger, CCHC-type 66 0.5

IPR000742 EGF-like domain 65 0.5

IPR005225 Small GTP-binding protein domain 65 0.5

IPR013026 Tetratricopeptide repeat-containing domain 64 0.5

IPR003961 Fibronectin type III 64 0.5

IPR011545 DEAD/DEAH box helicase domain 60 0.5

IPR001245 Serine-threonine/tyrosine-protein kinase, catalytic domain 59 0.4

IPR017452 GPCR, rhodopsin-like, 7TM 55 0.4

IPR002018 Carboxylesterase, type B 53 0.4

IPR000008 C2 domain 51 0.4

IPR001356 Homeobox domain 49 0.4

IPR011598 Myc-type, basic helix-loop-helix (bHLH) domain 42 0.3

IPR006578 MADF domain 42 0.3

IPR003959 ATPase, AAA-type, core 41 0.3

IPR019787 Zinc finger, PHD-finger 41 0.3

IPR011527 ABC transporter type 1, transmembrane domain 40 0.3

IPR001781 Zinc finger, LIM-type 38 0.3

IPR033121 Peptidase family A1 domain 38 0.3

IPR029526 PiggyBac transposable element-derived protein 37 0.3

18

This article is protected by copyright. All rights reserved.

IPR001251 CRAL-TRIO lipid binding domain 35 0.3

IPR002126 Cadherin 34 0.3

IPR001623 Dna J domain 34 0.3

IPR013057 Amino acid transporter, transmembrane domain 33 0.2

IPR027806 Harbinger transposase-derived nuclease domain 33 0.2

IPR000980 SH2 domain 33 0.2

IPR000198 Rho GTPase-activating protein domain 33 0.2

IPR014014 RNA helicase, DEAD-box type, Q motif 33 0.2

IPR005135 Endonuclease/exonuclease/phosphatase 33 0.2

References

Al-Jbory, Z., El-Bouhssini, M. and Chen, M.S. (2018a) Conserved and unique putative effectors expressed in the salivary glands of three related gall midge species. Journal of Insect Science, 18.

Al-Jbory, Z., Anderson, K.M., Harris, M.O., Mittapalli, O., Whitworth, R.J. and Chen, M.S. (2018b) Transcriptomic analyses of secreted proteins from the salivary glands of wheat midge larvae. Journal of Insect Science 18.

Allsopp, P.G., and Robertson, L.N. (1988) Biology, ecology and control of soldier flies Inopus spp. (Diptera, Stratiomyidae) ‒ A review. Australian Journal of Zoology, 36, 627‒648.

Ambawat, S., Sharma, P., Yadav, N.R. and Yadav, R.C. (2013) MYB transcription factor genes as regulators for plant responses: an overview. Physiology and Molecular Biology of Plants, 19, 307‒321.

19

This article is protected by copyright. All rights reserved.

Andersen, J.F., Pham, V.M., Meng, Z., Champagne, D.E. and Ribeiro, J.M.C. (2009) Insight into the sialome of the black fly, Simulium vittatum. Journal of Proteome Research, 8, 1474‒1488.

Arca, B., Lombardo, F., Valenzuela, J.G., Francischetti, I.M.B., Marinotti, O., Coluzzi, M. et al. (2005) An updated catalogue of salivary gland transcripts in the adult female mosquito, Anopheles gambiae. Journal of Experimental Biology, 208, 3971‒3986.

Baek, J.H. and Lee, S.H. (2014) Differential gene expression profiles in the salivary gland of Orius laevigatus. Journal of Asia-Pacific Entomology, 17, 729‒735.

Bonaventure, G., VanDoorn, A. and Baldwin, I.T. (2011) Herbivore-associated elicitors: FAC signaling and metabolism. Trends in Plant Science, 16, 294‒299.

Carolan, J.C., Caragea, D., Reardon, K.T., Mutti, N.S., Dittmer, N., Pappan, K. et al. (2011) Predicted effector molecules in the salivary secretome of the pea aphid (Acyrthosiphon pisum): A dual transcriptomic/proteomic approach. Journal of Proteome Research, 10, 1505‒ 1518.

Chen, M.S., Liu, X.M., Yang, Z.H., Zhao, H.X., Shukle, R.H., Stuart, J.J. et al. (2010) Unusual conservation among genes encoding small secreted salivary gland proteins from a gall midge. BMC Evolutionary Biology, 10.

Chen, M.S., Zhao, H.X., Zhu, Y.C., Scheffler, B., Liu, X.M., Liu, X. et al. (2008) Analysis of transcripts and proteins expressed in the salivary glands of Hessian fly (Mayetiola destructor) larvae. Journal of Insect Physiology, 54, 1‒16.

Chung, H.R., Schafer, U., Jackle, H. and Bohm, S. (2002) Genomic expansion and clustering of ZAD-containing C2H2 zinc-finger genes in Drosophila. EMBO Report, 3, 1158‒1162.

Cooper, D. and Eleftherianos, I. (2016) Parasitic nematode immunomodulatory strategies: Recent advances and perspectives. Pathogens, 5, e5030058.

Diego-Garcia, E., Caliskan, F. and Tytgat, J. (2014) The Mediterranean scorpion Mesobuthus gibbosus (Scorpiones, Buthidae): transcriptome analysis and organization of the genome encoding chlorotoxin-like peptides. BMC Genomics, 15.

Fellowes, R.W. (1975) Some aspects of the feeding and digestive system of Inopus rubriceps (Diptera, Stratiomyidae). Thesis M. Sc., University of Auckland.

20

This article is protected by copyright. All rights reserved.

Furstenberg-Hagg, J., Zagrobelny, M. and Bak, S. (2013) Plant defense against insect herbivores. International Journal of Molecular Sciences, 14, 10242‒10297.

Hitchcock, B.E. (1976) Studies on the life-cycles of two species of soldier flies (Diptera, stratiomyidae) which affect sugarcane in Queensland. Bulletin of Entomological Research, 65, 573‒578.

Huang, H.J., Lu, J.B., Li, Q., Bao, Y.Y. and Zhang, C.X. (2018) Combined transcriptomic/proteomic analysis of salivary gland and secreted saliva in three planthopper species. Journal of Proteomics, 172, 25‒35.

Jones, J.D.G. and Dangl, J.L. (2006) The plant immune system. Nature, 444, 323‒329.

Kaur, N., Hasegawa, D.K., Ling, K.S. and Wintermantel, W.M. (2016) Application of genomics for understanding plant virus‒insect vector interactions and insect vector control. Phytopathology, 106, 1213‒1222.

Koenig, C., Bretschneider, A., Heckel, D.G., Grosse-Wilde, E., Hansson, B.S. and Vogel, H. (2015) The plastic response of Manduca sexta to host and non-host plants. Insect Biochemistry and Molecular Biology, 63, 72‒85.

Lawrence, S.D., Novak, N.G., Jones, R.W., Farrar, R.R. and Blackburn, M.B. (2014) Herbivory responsive C2H2 zinc finger transcription factor protein StZFP2 from potato. Plant Physiology and Biochemistry, 80, 226‒233.

Li, Z., An, X.K., Liu, Y.D. and Hou, M.L. (2016) Transcriptomic and expression analysis of the salivary glands in white-backed planthoppers, Sogatella furcifera. PLoS ONE, 11, e0159393.

Liu, X.Q., Zhou, H.Y., Zhao, J., Hua, H.X. and He, Y.P. (2016) Identification of the secreted watery saliva proteins of the rice brown planthopper, Nilaparvata lugens (Stål) by transcriptome and Shotgun LC-MS/MS approach. Journal of Insect Physiology, 89, 60‒69.

Luna-Ramirez, K., Quintero-Hernandez, V., Vargas-Jaimes, L., Batista, C.V.F., Winkel, K.D. and Possani, L.D. (2013) Characterization of the venom from the Australian scorpion Urodacus yaschenkoi: Molecular mass analysis of components, cDNA sequences and peptides with antimicrobial activity. Toxicon, 63, 44‒54.

Matsumoto, Y., Suetsugu, Y., Nakamura, M. and Hattori, M. (2014) Transcriptome analysis of the salivary glands of Nephotettix cincticeps (Uhler). Journal of Insect Physiology, 71, 170‒176.

21

This article is protected by copyright. All rights reserved.

Miles, P.W. and Taylor, G.S. (1994) „Osmotic pump„ feeding by coreids. Entomologia Experimentalis et Applicata, 73, 163‒173.

Moreira, H.N.S., Barcelos, R.M., Vidigal, P.M.P., Klein, R.C., Montandon, C.E., Maciel, T.E.F. et al. (2017) A deep insight into the whole transcriptome of midguts, ovaries and salivary glands of the Amblyomma sculptum tick. Parasitology International, 66, 64‒73.

Pechan, T., Cohen, A., Williams, W.P. and Luthe, D.S. (2002) Insect feeding mobilizes a unique plant defense protease that disrupts the peritrophic matrix of caterpillars. Proceedings of the National Academy of Sciences USA, 99, 13319‒13323.

Petersen, T. N., S. Brunak, G. von Heijne, and H. Nielsen. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786.

Rivera-Vega, L. J., F. E. Acevedo, and G. W. Felton. 2017. Genomics of Lepidoptera saliva reveals function in herbivory. Current Opinion in Insect Science 19: 61-69.

Robertson, L.N. (1986) Experimental studies of predation on grassland populations of Australian soldier fly, Inopus rubriceps (Macquart) (Diptera: Stratiomyidae). New Zealand Journal of Zoology, 13, 75‒81.

Rodriguez-Valle, M., Moolhuijzen, P., Barrero, R.A., Ong, C.T., Busch, G., Karbanowicz, T. et al. (2018) Transcriptome and toxin family analysis of the paralysis tick, Ixodes holocyclusee. International Journal for Parasitology, 48, 71‒82.

Samson, P.R. (2001) Effect of feeding by larvae of Inopus rubriceps (Diptera: Stratiomyidae) on development and growth of sugarcane. Journal of Economic Entomology, 94, 1097‒1103.

Shukle, R.H., Mittapalli, O., Morton, P.K. and Chen, M.S. (2009) Characterization and expression analysis of a gene encoding a secreted lipase-like protein expressed in the salivary glands of the larval Hessian fly, Mayetiola destructor (Say). Journal of Insect Physiology, 55, 104‒111.

Stuart, J. (2015) Insect effectors and gene-for-gene interactions with host plants. Current Opinion in Insect Science, 9, 56‒61.

Sugio, A., Dubreuil, G., Giron, D. and Simon, J.C. (2015) Plant‒insect interactions under bacterial influence: ecological implications and underlying mechanisms. Journal of Experimental Botany, 66, 467‒478.

22

This article is protected by copyright. All rights reserved.

Suzuki, T. and Natori, S. (1986) Changes in the amount of sarcocystatin A, a new cysteine proteinase inhibitor, during the development of adult Sarcophaga peregrina. Insect Biochemistry, 16, 589‒595.

Wei, L., Huang, C.J., Yang, H.L., Li, M., Yang, J.J., Qiao, X. et al. (2015) A potent anti- inflammatory peptide from the salivary glands of horsefly. Parasites & Vectors, 8.

Zhang, Y., Fan, J., Sun, J.R., Francis, F. and Chen, J.L. (2017) Transcriptome analysis of the salivary glands of the grain aphid, Sitobion avenae. Scientific Reports, 7.

Manuscript received January 21, 2019

Final version received March 27, 2019

Accepted March 31, 2019

23

This article is protected by copyright. All rights reserved.