<<

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2015 Use of NGS for discovery of in the Ying Feng Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Bioinformatics Commons, and the Virology Commons

Recommended Citation Feng, Ying, "Use of NGS for discovery of viruses in the " (2015). Graduate Theses and Dissertations. 14334. https://lib.dr.iastate.edu/etd/14334

This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected].

Use of NGS for discovery of viruses in the soybean aphid

by

Ying Feng

A thesis submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

Major: Bioinformatics and Computational Biology

Program of Study Committee: Wyatt Allen Miller, Co-Major Professor Karin Dorman, Co-Major Professor Bryony C. Bonning

Iowa State University

Ames, Iowa

2015

Copyright © Ying Feng, 2015. All rights reserved.

ii

TABLE OF CONTENTS

Page

CHAPTER 1 GENERAL INTRODUCTION ...... 1

CHAPTER 2 NOVEL AND KNOWN FULL

DETECTION FROM DIFFERENT REGIONS COLLECTED SOYBEAN

APHIDS BY DEEP SEQUENCING...... 32

Abstract ……… ...... 33

Introduction ……….……………………………………………………………. 34

Materials and Methods ...... 35

Results…………...... 38

Discussion… ...... 43

References…...... 47

Tables and Figures…...... 52

CHAPTER 3 GENERAL CONCLUSIONS ...... 59

ACKNOWLEDGEMENTS ...... 60

1

CHAPTER 1. GENERAL INTRODUCTION

Soybean

The soybean aphid, Aphis glycines, has become one of the most economically important pest in North America [3]. This invasive species is native to Asia, and was first detected in North America in 2000 in Wisconsin. Subsequently, it has spread rapidly throughout the United States and southeastern Canada. By 2009, the aphid had infested 30 states and three provinces of Canada (Fig. 1) [4].

Fig. 1 Distribution of the soybean aphid in North America from 2000 to 2009. Red indicates the initial 10 states in which soybean aphid was detected. Yellow indicates the distribution of soybean aphid from 2001 to 2009 (from [4]).

2

Aphids are a serious problem for agriculture, especially in temperate regions.

Soybean aphid has a temperature-dependent growth rate [5,6]. The optimal development temperature of soybean aphid is 28 °C, and as temperatures approach 35°C the growth rate exhibits a rapid decline[5]. This decline partially explains why the range of the soybean aphids has not extended to the southern most region of North America.

Aphids damage their host in several ways. Firstly, they plunder the nutrients necessary for growth and reproduction for their own profit. Secondly, they inject their phytotoxic saliva during the feeding phase, causing early leaf senescence and reducing pod set, seed quality, and yield [7,8]. Thirdly, aphids are the main vectors, nearly 50% (275 out of 600)[9,10] of -borne viruses. Soybean aphid is also a competent vector of many plant viruses. Soybean aphids were partially responsible for widespread virus epidemics observed in snap bean [11] and is a new vector of Potato

Virus Y, leading to reduce of potato production [12]. Lastly, aphids produce , which promotes growth of sooty mould, which in turn hinders the photosynthetic activity of plants, further reducing yield [13].

Soybean is one of the most economically important crops in United State. The

USDA Crop Production reports total soybean production in 2010 at a record level of 32

M ha with a production value in excess of US$27 billion. In the US, the main soybean growing districts are the 12 north central states, which cover over 80% of the soybean grown [4]. In these states, soybean aphids represent the first insect pest to consistently cause significant yield losses over wide areas, with yield decreases as high as 40%

[14,15]. As a result, pest management practices have changed greatly, and the soybean producers must budget for scouting and insecticidal control of soybean aphid.

3

Currently, application of chemical is the main method for management of soybean aphids. For example, in Iowa, insecticides use went from zero application in 1996, to nearly 4 million acres treated in 2003. Even in the absence of harmful soybean aphid populations, farmers often use systemic insecticides or even aerial spraying to control the potential aphid outbreaks [16]. However, aphids can rapidly re- infest soybean fields after treatment, and readily develop resistance to insecticides [17].

Moreover, pesticides can harm beneficial non-target organisms [18]. For example, widely used neonicotinoid insecticides appear to be decimating the honeybee population, contributing to 66% annual losses of honeybees in Iowa in 2010-2011 [19]. Therefore, alternatives to chemical pesticides are necessary to control the soybean aphid.

Breeding resistant to aphid and/ or viruses is an alternative control method. Genetic resistance can confer an effective protection without additional costs or labor for farmers during the growing season, and is also safe for both the environment and human health [13]. Three mechanisms of resistance to aphids have been described:

(i) in antibiosis, survival and fecundity of the pest are affected by feeding on the plant;

(ii) in antixenosis, the plant is rapidly recognized as a poor host by the pest that moves away; (iii) in tolerance, the plant is infested by pest, but is less affected than susceptible ones [20]. Several soybean varieties have been identified as resistant to soybean aphid and present mechanisms of resistance of antibiosis, antixenosis, and tolerance. Rag1

(Resistance to Aphis glycines gene 1) [21-23] and Rag2 (Resistance to Aphis glycines gene 2) [24] were discovered as resistant genes in soybean to control aphid feeding.

RNA silencing has been recognized as a potential approach for management of aphid ( pisum) [25-28], grain aphid (Sitobion avenae) [29,30], and

4 green-peach aphid () [31-33]. Multiple approaches have been used for introduction of exogenous silencing RNA into aphids, for example, feeding on treated artificial diet or on transgenic plants that expresses silencing RNAs.

RNA viruses of Insect

With the availability of complete genomic sequences more and more insect viruses have been discovered. Over the last 90 years, 16 families of insect viruses have been isolated and described [34]. Classification has been based increasingly on the phylogenetic/evolutionary history of each of the main virus genes. Viruses generally have been classified into four groups based on their genome, dsDNA virus, ssDNA virus, dsRNA and ssRNA viruses. Some properties of the major RNA viruses are listed in Table

1.

Small RNA viruses

Small RNA viruses include viruses of < 40 nm in diameter with RNA genomes

[34,35]. The dicistroviruses, iflaviruses, caliciviruses, nodaviruses and tetraviruses all are small RNA viruses [35]. They have minimized the number of different components, e.g. genome segments, genes, and proteins, required for self-propagation. Essentially, small

RNA viruses are the product of two genes, one for replication enzyme enabling genome amplification, the other for a structural or capsid protein enabling horizontal transmission between hosts and cells. generally comprise one or two segments of a single stranded RNA, of less than 10 kb and encoding up to four (generally one or three) ORFs

[35].

5

Table 1. Properties of the RNA insect viruses (adapted from [34] )

Family Insect-infecting Genome Virion Shape and Virion diameter envelop Cypovirus dsRNA 10 Icosahedral 80 nm No segments Reoviridae Idnoreovirus dsRNA 10 Icosahedral 80 nm No segments Entomobirnavirus dsRNA 2 segments Icosahedral 60 nm No Vesiculovirus ssRNA 11 – 15 kb Bullet-shaped 45 – 100 Yes nm x 100 – 430 nm Cripavirus ssRNA 9 – 11kb Icosahedral 30 nm No Iflavirus ssRNA 8.5 – 9.5kb Icosahedral 30 nm No Lagovirus ssRNA 7.5 – 8.5kb Icosahedral 35 – 40 nm No Tetraviridae Betatetravirus ssRNA 1 – 2 Icosahedral 40 nm No segments 6.5 – 8 kb Alphanodavirus ssRNA 2 segments Icosahedral 32 – 33 nm No 4.5kb Metavirus ssRNA 4 – 10kb Irregular approximately Yes 100 nm Pseudoviru ssRNA 5 – 10kb Irregular; round to No ovoid 30 – 40 nm

SS hel 3A pro RdRp VP2 VP3 VP1 (a)$Dicistrovirus$ VP4$

2A$ 3C$ VP2 VP3 VP1 2B 2C hel 3A 3D RdRp (b)$$ VP4$ pro$ pro$

(c)$Iflavirus$ VCAP hel pro RdRp

(d)$Betateravirus$

C Mtr Hel RdRp C NβV$ A VCAP

P40 RdRp

PrV$$

P130 VCAP 6

RdRp TaV/EeV$$ VCAP

(e)$Omegatetravirus$ Replicase

C C Mtr Hel RdRp A

HasV$ Capsid

C C A P17 VCAP other

nt:$$$$$$$1000$$$$$$$$$2000$$$$$$$$$3000$$$$$$$$$4000$$$$$$$$5000$$$$$$$$$6000$$$$$$$$7000$$$$$$$$$8000$$$$$$$$$9000$

Fig. 2 Summary of genome organization of dicistorvirus, picornavirus, iflavirus, betateravirus, and omegatetravirus. Open reading frames are indicated by orange, green, and yellow boxes, with the functional proteins of the polyproteins indicated. The box circle denotes VPg, and red and light blue boxes denote IREs. Nonstructural proteins are indicated by orange box, and green boxes indicate structural proteins. Abbreviation: SS, silencing suppressor domain; hel, helicase domainspro, chymotrysin-like cysteine protease; RdRp, RNA-dependent RNA ploymerase; VP, virion protein; VCAP, the capsid protein precursor; Mtr, methyltransferase.

7

Dicistroviruses

The dicistroviruses are characterized by having a positive single-stranded sense

RNA genome of between approximately 9 and 11 kb encoding two ORFs, icosahedral particles and capsid comprised of three major proteins of approximately 30 kDa [35]. In many respects dicistroviruses have a lot of characteristics of the and iflaviruses (Fig. 2a-c) [34], but subtle differences in their genomic organization, and the positions of structural proteins are reversed. The best-known dicistrovirus is Cricket paralysis virus (CrPV). It has been isolated from insects from five different orders and is probably the insect virus species with the broadest natural host range [36]. Aphid-lethal paralysis virus (ALPV) is also known to negatively impact agriculture insect pests [37].

CrPV and ALPV are the viruses with acute disease symptoms [38], but C virus causes chronic disease and often no symptoms [39].

The genome organization of dicistroviruses is shown in Fig. 2a [35,40]. The genome has two major open reading frames (ORF). The 5’ end of the genome codes for the non-structural polyprotein and 3’ end of the genome encodes for the structural protein. The genome also has two internal ribosomal entry sites (IRES), one at the 5’ end of the genome and the other between ORF1 and ORF2 [38,40].

Tetraviruses

The tetraviruses are characterised by the T=4 symmetry particle architecture. The protein coat that surrounds the infective nucleic acid in virus particles is arranged to form an icosahedron with 12 pentameric and 30 hexameric capsomere subunits [35]. The family comprises two genera based on their particle morphology and genome

8 organization: Betatetravirus (type species Nudaurelia b virus) and Omegatetravirus (type species Nudaurelia o virus) [41]. The betatetraviruses have positive sense monopartite ssRNA genomes of approximately 6.5 kb, while members of Omegatetravirus have bipartite genomes of approximately 7.5 kb. For the betatetraviruses, genomic sequences are available for the Nudaurelia b virus (NβV) [42], Thosea asigna virus [43], and

Eusprosterna elaeasa virus [44], and providence virus (walter, 2008). The genome organizations of among these viruses are not consistent (Fig. 2d) [34,45]. A virus obtained from the cotton bollworm, Helicoverpa armigera, the Helicoverpa armigera stunt virus (HaSV) [46-48] and Dendrolimus punctatus virus [49] are the omegatetraviruses for which complete genomic sequences are available (Fig. 2e) [35,45].

Viruses of aphids

Several viruses that infect different host aphids have been discovered (Table 2).

Rhopalosiphum padi virus (RhPV) and Aphid-lethal paralysis virus (ALPV) are two dicistroviruses that infect aphids. RhPV was firstly isolated from the bird cherry-oat aphid (Rhopalosiphum padi) [50]. It circulates in the plant phloem and is transmitted horizontally from plants to uninfected aphids [51]. ALPV was first isolated form the bird cherry-oat aphid (R. padi) after observation of infected aphids moving away from the food source and death induced by paralysis [52]. Now several ALPV-like viruses have been identified in organisms other than aphids, such as from bat fecal samples and honeybees [52-57]. ALPV is transmitted vertically in the bird cherry-oat aphid and the

English grain aphid (Sitobion avenae). By nucleic acid in situ hybridization, viral RNA was shown in the gut, brain and embryonic tissues in the bird cherry-oat aphid [58].

9

Table 2 Summary of viruses that infect aphids

Virus Family Host aphid Genome Virion Ref. structure Acyrthosiphon Pea aphid ssDNA Icosahedral [59] pisum densovirus ()

Acyrthosiphon Unclassified Pea aphid + ssRNA Icosahedral [60] pisum virus (Acyrthosiphon pisum) 31nm

Aphid-lethal Dicistroviridae Bird-cherry oat aphid + ssRNA Icosahedral [52] paralysis virus (Rhopalosiphum padi) 27nm

Milkweed aphid ()

Pea aphid (Acyrthosiphon pisum)

Brevicoryne Iflaviridae Cabbage aphid + ssRNA Icosahedral [61] brassicae virus (Brevicoryne brassicae)

Dysaphis Parvoviridae Rosy apple aphid ssDNA Icosahedral [59] plantaginea (Dysaphis plantaginea) 22nm densovirus

Myzus persicae Parvoviridae Green peach aphid ssDNA Icosahedral [62] densovirus (Myzus persicae) 20nm

Rhopalosiphum Dicistroviridae Bird cherry-oat aphid + ssRNA Icosahedral [63] padi virus (Rhopalosiphum padi)

Rosy apple aphid Unclassified Rosy apple aphid + ssRNA Icosahedral [59] virus (Dysaphis plantaginea) 32nm

10

The Acyrthosiphon pisum virus (APV) is a positive strand RNA virus with a genome of approximately 10 kb. The virus has two major ORFs and the virus encodes four capsid proteins ranging from 23 kDa to 66 kDa. APV was abundant in the epithelium and lumen of the digestive tract of three day-old nymphs. Infection with the virus appears to inhibit the growth of aphids [60].

The Brevicoryne brassicae virus (BrBV), the first iflavirus described in aphids, was isolated from the cabbage aphid (Brevicoryne brassicae). The virus has a positive ssRNA genome of 10.1 kb with a 3’ poly (A) tail. The virus is closely related to other insect iflaviruses. BrBV is not likely to be transmitted via the plant because no virus transcripts were detected in plants infected with BrBV [61].

The Rosy apple aphid virus (RAAV), a positive strand RNA virus, was isolated from the rosy apple aphid (Dysaphis plantaginea). It has a genome organization and a high amino acid sequence identity to APV. RAAV was present in plant leaves previously exposed to RAAV- infected aphids and therefore is may be horizontally transmitted via the plant phloem [59].

The Dysaphis plantaginea densovirus (DplDNV) is a densovirus isolated from the rosy apple aphid (Dysaphis plantaginea). DplDNV is a single-stranded DNA virus with a genome of approximately 5 kb. It is transmitted horizontally via the leaf and vertically from adults to nymphs. DplDNV infection induces winged morphs of the host aphid thereby facilitating virus dispersal. Infection with DplDNV negatively affects aphid fecundity.

11

Another densovirus was identified from the green peach aphid (Myzus persicae), named Myzus persicae densovirus. The virus has a genome of approximately 5.7 kb with five ORFs. The virus infects the stomach of infected aphids and transmission occurs horizontally (through plant and honeydew) and vertically from mother to offspring [60].

An additional densovirus was also identified from expressed sequenced tag from the pea aphid, which is a putative Acyrthosiphon pisum densovirus. No further characterization of the virus has been published [59].

Plant viruses transmitted by aphids

According to current knowledge on viral transmission, nearly 6% of the plant viral species are transmitted directly through seeds [64-66]. However, plant viruses most often use a partner, known as a vector, for efficient transmission to new hosts. Insects are the most common of the vectors, and among these, the aphid has been described as an efficient vector for up to 50% of the insect-vectored plant viruses [9,67]. The reasons are:

(1) aphid is able to gain access to the plant cell cytoplasm by breaking through the cell wall and membrane using their feeding organs, and moves frequently among plants; (2) prior to feeding on the phloem of a host plant, aphids have the extraordinary capacity to sample cell content without killing the plants [68,69]; (3) During the steps of the feeding process, aphids produce different salivas with different compositions and functions [70];

(4) aphids can acquire viruses at all steps of the feeding process.

Generally, three vector-transmission modes have been described: non-persistent, persistent, and intermediate semi-persistent transmission [71]. In non-persistent transmission, the virus is acquired on an infected plant within only few a seconds or

12 minutes because it can hold an infectious form within the vector for a few minutes [72].

In contrast, in persistent transmission, there is no limit for acquisition and inoculation times, and latent peroid [72]. Intermediate semi-persistent viruses, with acquisition and inoculation periods of several minutes to hours and no requirement fro a latent period prior to inoculation [73]. Further, a new classification of viruses was proposed [74], where the term non-circulative grouped the earlier non- and semi-persistent mode, and the term circulative replaced persistent, and included both virus species that replicate in insect cells (propagative), and those that do not (non-propagative). Two distinct viral strategies were found in non-circulative transmission: the capsid strategy and the helper strategy [75]. In the capsid strategy, the virus interacts directly with the vector via its coat protein (CP), whereas in the helper strategy the virus-vector interaction is mediated by an additional virus encoded non-structural protein, generally designated as helper.

Table 3 summarized aphid-transmitted plant viruses including the information of virus family, genus and the transmission mode [71,76]. The genera Carlavirus,

Alfamovirus and Cucumovirus have been demonstrated having CP to be the only viral component determining virion retention within the aphid vector [77,78]. Viruses belonging to the genera and Caulimovirus were characterized producing a

“helper component” for their aphid transmission [75,79,80]. Cauliflower , the type of member of the genus Caulimovirus, has adopted a helper-dependent transmission strategy, but this strategy evolved with both nonpersistent and semipersistent transmission [81,82]. Parsnip yellow fleck virus (PYFV) also employs a helper strategy in its transmission. Unlike in the potyvirus and caulimoviruses, PFYV

Table 3. Aphid-transmitted plant viruses from different families with different mode of transmission

Virus Family Genus Noncirculative Circulative Nonpersistent Semipersistent Nonpropagative Propagative Carnation latent virus Carlavirus ✔ Alfafa mosaic virus Alfamovirus ✔ Cucumovirus ✔ Cauliflower mosaic virus Caulimovirus ✔ Beet yellows virus Closterovirus ✔ Broad bean wilt virus-1 Comoviridae Fabavirus ✔ Barley yellow dwarf virus Luteovirus ✔ Carrot mottle virus Umbravirus ✔ Luteoviridae ✔ Pea enation mosaic virus-1 Enamovirus 13

Potato leaf roll virus Polerovirus ✔ Babuvirus ✔ Maclura mosaic virus Macluravirus ✔ Potato virus Y Potyvirus ✔ Bean yellow mosaic virus Potyvirus ✔ Turnip mosaic virus Potyvirus ✔ Celery mosaic virus Potyvirus ✔ Lettuce necrotic yellows Cytorhabdovirus ✔ virus Rhabdoviridae Sonchus yellow net virus Nucleorhadovirus ✔ Black raspberry necrosis Unassigned ✔ virus Strawberry mottle virus Unassigned ✔ Anthriscus yellows virus Waikavirus ✔ Sequiviridae Parsnip yellow fleck virus Sequivirus ✔

14 itself does not encode the helper [83], but the nature of the helper factor is not known.

Depending on whether a virus replicates in the vectoring aphid, transmission is divided into ‘circulative, non-propagative’ or ‘circulative, propagative’. A three genera in the family Luteoviridae are circulative non-propagatively transmitted viruses including

Barley yellow dwarf virus (BYDV) and related luteoviruses [84]. Moreover, vector specificity is a prominent feature of luteovirus transmission.

The circulative propagative plant viruses are similar to the circulative non- propagative viruses in many aspects of their acquisition and path of movement in the aphid. The main difference is that the infected association remains permanent for the rest of the life of the insect, and virus will be transmitted to offspring. Only the family

Rhabdoviridae belongs to the circulative propagative aphid-transmitted . This family includes viruses that replicate in either mammals and insects or plant and insects

[85]. However, virus encoded determinants for transmission have not been identified in plant Rhabdoviridae. It is likely that G protein may play the role in the aphid gut endocytosis process, but is still hypothetical and is based on gene function conservation among members of the vertebrate rhabdovirus [86,87]. Clearly, even in the best- characterized examples, many details are still missing.

Next Generation Sequencing in virology

In general, we can check the size and growth of international nucleotide databases to realize the importance to sequencing to science. By August 2010 in the published release of GenBank, there were some 970 million and 43 million bases of viral and phage origin respectively, representing an annual growth of 20-24% [88], a growth rate highly

15 above average for the database as a whole. The information generated has wide-ranging impact on all areas of virology, from diagnosis to pathogenesis, and from vaccine design, bio-control to viral evolution and ecology.

The new methods of sequencing have now been developed, including next generation sequencing (NGS) which has entirely revolutionized our ability to sequence. It is now possible to generate many millions of bases of sequence in one day and this bring many new opportunities accessible to us. There are an increasing number of NGS technologies in the market, all using slightly different methodologies to achieve clonal amplification and sequencing. Table 4 lists the summary of current methods for NGS including methods in template preparation, adapter type, sequencing length and disadvantage and advantage for each method [1,2].

Deciding which technology is best will depend on the specific experiment being planned. Several important factors should be considered, including the size of the genome, its complexity, as well as the depth of coverage and accuracy required. There are some general principles to guide making decision. For those looking to assemble complex genomes de novo, longer read lengths may be appropriate. 454 sequencing

(Roche), although relatively expensive, still has a niche in amplicon sequencing because of its longer reads. Helicos is appropriate for direct RNA sequencing, and PacBio for very long reads. However, accuracy may become an important issue because both are single-molecule sequencers. In general, the Illumina and SOLiD platforms currently offer the best all-round value for money, accuracy and throughput for RNA-seq, and those projects requiring high depths of coverage [1,2].

Table 4. Summary of current NGS technology (adapted from [1,2])

Method Template Adapter Sequencing Avg. reads Pros Cons preparation type reaction length (bases)

Roche 454 Emulsion PCR Linear Synthesis 330 Longer reads improve High reagent cost; high mapping in repetitive error rates in homo- regions; fast run time polymer repeats

SOLiD Emulsion PCR Linear Ligation 50 Two-base encoding Long run times provides inherent error correction 16

Illumina Bridge PCR Linear Synthesis 100 Currently the most Low multiplexing widely used platform in capability of samples the field

Helicos No Poly(A) Synthesis 32 Non-bias representation High error rates

amplification of templates for compared with other genome and seq-based reversible terminator applications chemistries

PacBio Single molecule Hairpin Synthesis, 1000 Has the greatest Highest error rates real-time potential for reads compared with other exceeding 1kb NGS chemistries

17

Discovery of RNA virus in insects by NGS

The development of NGS technology allowed researcher to identify a known or discover a novel insect RNA virus. Several great review papers described recently the discovery of RNA viruses of insects and the optimization of NGS data analysis for virus discovery [89,90]. With adoption of NGS, a number of novel viruses have been discovered at an increasing rate in recent years. Liu et al. gave a summary of the recently discovered RNA viruses of insects. For the positive strand RNA genomes, iflavirus, dicistrovirus, negevirus and some unassigned viruses were discovered. Novel Iflaviruses are commonly found in hymenopteran, lepidopteran, and hemipteran insects [91-94]. A plant-infecting iflavirus-like virus, Tomato matilda virus, has been identified from asymptomatic tomato plants (Solanum lycopersicum) [90]. Dicistroviridae is a rapidly increasing family of picornavirus-like viruses [40]. A new dicistrovirus-like virus with partial sequences, Big Sioux River Virus, was discovered from the honeybee transcriptome [53]. Several ALPV-like viruses were isolated from honeybee [53,56] and western corn rootworm [95]. For dsRNA genomes, some partial fragments from birnavirus, totivirus and reovirus were identified from a D. melanogaster cell line [96].

Several negative ssRNA viral sequences have been identified from two stink bug species.

One is a nearly complete genome of a Deer trick -like virus from Nezara viridula, and the other is a plant rhabdovirus-like virus Euchistus hero, which also infects its insect vectors [85,97]. Such newly discovered viruses may have potential for use in insect pest management. Moreover, this advanced virus discovery suggests that insects have many diverse novel viruses, which can be applied to the use in insect cell lines and laboratory insect colonies for virus research.

18

Analysis of NGS data

There is no standard method for analysis of sequences generated by NGS [98]. In general, the initial raw reads are treated with specific programs provided by the manufacturers for base calling, removal of adaptor sequences, and removal of low quality reads [89,90]. There are a lot of different ways to analyze data to discover viral sequences. DNA/RNA sequence data can be used to conduct Basic Local Alignment

Search Tool (BLAST) searches against NCBI database by blastn, blastx, or tblastx, a viral database or a local database to identify the putative viral sequences before read assembly [53,99]. For read assembly, two ways are generally used: de novo assembly without a known genome sequences and reference mapping by aligning the reads to known viral genomes. The putative reads that hit viral sequences with given E-values are extracted and used for further assembly.

Alternatively, all the reads can be de novo assembled first, and the resulting assembled contigs used for BLAST analysis. Further, potential virus contigs can be isolated. The contigs with significant hits (e-value <1× 10-3) to viruses were separated from non-virus hits. These contigs are then reassembled to generate longer virus contigs.

The longer contigs are blasted against non-redundant (nr) databases or virus databases, then the contigs with hits to virus (e-value <1× 105) are identified [89]. The next step is that the viral contigs are aligned against the total contig set to search for contigs that overlap viral contigs. This step is important to discover the novel viral sequences. When the virus genome is nearly assembled, the gaps can be filled by Sanger sequencing or RT-

PCR and -PCR.

19

The objectives of my studies are to: (1) deep sequence the RNA of virus particle isolated from soybean aphids in Iowa, Michigan, Ohio and China, (2) assemble the reads of virus sequences from different samples by multiple methods, (3) identify the complete or nearly complete genome sequences by de novo assembly and reference mapping, (4) predict the genome organization and conduct the phylogenetic analysis for each virus.

These studies will provide the first information on the types of viruses that infect soybean aphid.

Dissertation organization

The thesis is organized in the format consisting of one journal-style manuscript accompanied by a general introduction and a general conclusion. The manuscript is formatted according to the targeted journal. The manuscript “Discovery of known and novel viral genomes in soybean aphids collected in different regions ” will be submitted for publication in the PLoS ONE. Ying Feng is the primary investigator for this work under the supervision of Drs. W. Allen Miller, Karin Doman and Bryony C. Bonny, and is the first author of the manuscript.

References

1. Radford AD, Chapman D, Dixon L, Chantrey J, Darby AC, et al. (2012) Application of

next-generation sequencing technologies in virology. J Gen Virol 93: 1853-1868.

20

2. Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11:

31-46.

3. Heimpel GE, Frelich LE, Landis DA, Hopper KR, Hoelmer KA, et al. (2010)

European buckthorn and Asian soybean aphid as components of an extensive

invasional meltdown in North America. Biol Invasion 12.

4. Ragsdale DW, Landis DA, Brodeur J, Heimpel GE, Desneux N (2011) Ecology and

management of the soybean aphid in North America. Annu Rev Entomol 56: 375-

399.

5. McCornack BP, Ragsdale DW, Venette RC (2004) Demography of soybean aphid

(Homoptera: ) at summer temperatures. J Econ Entomol 97: 854-861.

6. McCornack BP, Carrillo MA, Venette RC, Ragsdale DW (2005) Physiological

constraints on the overwintering potential of the soybean aphid (Homoptera:

Aphididae). Environ Entomol 34.

7. Kennedy JS, Ibbotson A, Booth CO (1950) The distribution of aphid infestation in

relation to leaf age. Ann Appl Biol 37: 651-679.

8. Riedell WE, Catangui MA (2006) Greenhouse studies of soybean aphid ( :

Aphididae) effect on plant growth, seed yield, and composition. J Agric Uraban

Entomol 23: 225-235.

9. Nault LR (1997) Transmission of plant viruses: a new synthesis. Ann

Entomol Soc Am 90.

10. Katis NI, Tsitsipis JA, Stevens M, Powell G (2007) Transmission of Plant Viruses.

Aphids as Crop Pests, CABI, London: 353-377.

21

11. Nault BA, Shah DA, Straight KE, Bachmann AC, Sackett WM, et al. (2009)

Modeling temporal trends in aphid vector dispersal and cucumber mosaic virus

epidemics in snap bean. Environ Entomol 38: 1347-1359.

12. Davis JA, Radcliffe EB, Ragsdale DW (2005) Soybean aphid, aphis glycines

Matsumura, a new vector of Potato Virus Y in Potato. Amer J of Potato Res 82:

197-201.

13. Dedryver CA, Le Ralec A, Fabre F (2010) The conflicting relationships between

aphids and men: a review of aphid damage and control strategies. C R Biol 333:

539-553.

14. McCarville MT, Kanobe C, MacIntosh GC, O'Neal M (2011) What is the economic

threshold of soybean aphids (Hemiptera: Aphididae) in enemy-free space? J Econ

Entomol 104: 845-852.

15. Ragsdale DW, MacIntosh GC, Venette RC, Potter DB, Macrae IV (2007) Economic

threshold for soybean aphid (Hemiptera: Aphididae). F Econ Entomol 100: 1258-

1267.

16. Johnson KD, O'Neal ME, Ragsdale DW, DiFonzo CD, Swinton SM (2009)

Probability of economical management of soybean aphid ( Hemiptera: Aphididae)

in Norht America. J Econ Entomol 102: 2101-2108.

17. Devonshire AL (1989) Resistance of aphids to insecticides. In: Minks AK, Harrewijin

P, editor. Aphids Their biology, natural enemies and control Amsterdam:

Elsevier: 123-139.

22

18. Flickinger EL, Juenger G, Roffe TJ, Smith MR, Irwin RJ (1991) Poisoning of Canada

geese in Texas by parathion sprayed for control of Russian wheat aphid. J Wildl

Dis 27: 265-268.

19. Furfaro H (2012) Studies show links between pesticides, bee deaths. Ames Tribune

April 7 Ames: Stephens Media LLC.

20. Webster JA (1991) Developing aphid-resistant cultivars, in: K.R. Campbell, R.D.

Eikenbary (Eds) Aphid-plant interactions, Elsevier, New York, USA:

87-100.

21. Hill CB, Crull L, Herman TK, Voegtlin DJ, Hartman GL (2010) A new soybean

aphid (Hemiptera: Aphididae) biotype identified. J Econ Entomol 103: 509-515.

22. Hill CB, Li Y, Hartman GL (2006) Soybean aphid resistance in soybean Jackson is

controlled by a single dominant gene. Crop Sci 46: 1606-1608.

23. Hill CB, Li Y, Hartman GL (2006) A single dominant gene for resistance to the

soybean aphid in the soybean Dowling. Crop Sci 46: 1601-1605.

24. Mian MAR, Kang ST, Beli SE, Hammond RB (2008) Genetic linkage mapping of the

soybean aphid resistance gene in PI 243540. Theor appl Genet 117: 955-962.

25. Jaubert-Possamai S, Le Trionnaire G, Bonhomme J, Christophides GK, Rispe C, et al.

(2007) Gene knockdown by RNAi in the pea aphid Acyrthosiphon pisum. BMC

Biotechnol 7: 63.

26. Mutti NS, Park Y, Reese JC, Reeck GR (2006) RNAi knockdown of a salivary

transcript leading to lethality in the pea aphid, Acyrthosiphon pisum. J Insect Sci

6: 1-7.

23

27. Whyard S, Singh AD, Wong S (2009) Ingested double-stranded RNAs can act as

species-specific insecticides. Insect Biochem Mol Biol 39: 824-832.

28. Christiaens O, Swevers L, Smagghe G (2014) DsRNA degradation in the pea aphid

(Acyrthosiphon pisum) associated with lack of response in RNAi feeding and

injection assay. Peptides 53: 307-314.

29. Xu L, Duan X, Lv Y, Zhang X, Nie Z, et al. (2014) Silencing of an aphid

carboxylesterase gene by use of plant-mediated RNAi impairs Sitobion avenae

tolerance of Phoxim insecticides. Transgenic Res 23: 389-396.

30. Zhang M, Zhou Y, Wang H, Jones H, Gao Q, et al. (2013) Identifying potential RNAi

targets in grain aphid (Sitobion avenae F.) based on transcriptome profiling of its

alimentary canal after feeding on wheat plants. BMC Genomics 14: 560.

31. Pitino M, Coleman AD, Maffei ME, Ridout CJ, Hogenhout SA (2011) Silencing of

aphid genes by dsRNA feeding from plants. PLoS One 6: e25709.

32. Mao J, Zeng F (2014) Plant-mediated RNAi of a gap gene-enhanced tobacco

tolerance against the Myzus persicae. Transgenic Res 23: 145-152.

33. Bhatia V, Bhattacharya R, Uniyal PL, Singh R, Niranjan RS (2012) Host generated

siRNAs attenuate expression of serine protease gene in Myzus persicae. PLoS

One 7: e46343.

34. Possee RD, King LA (2014) Insect Viruses. eLS.

35. Gordon KH, Waterhouse PM (2006) Small RNA viruses of insects: expression in

plants and RNA silencing. Adv Virus Res 68: 459-502.

24

36. Wilson JE, Powell MJ, Hoover SE, Sarnow P (2000) Naturally occurring dicistronic

cricket paralysis virus RNA is regulated by two internal ribosome entry sites. Mol

Cell Biol 20: 4990-4999.

37. Van Munster M, Dullemans AM, Verbeek M, Van Den Heuvel JF, Clerivet A, et al.

(2002) Sequence analysis and genomic organization of Aphid lethal paralysis

virus: a new member of the family Dicistroviridae. J Gen Virol 83: 3131-3138.

38. Hertz MI, Thompson SR (2011) Mechanism of translation initiation by

Dicistroviridae IGR IRESs. Virology 411: 355-361.

39. Habayeb MS, Ekengren SK, Hultmark D (2006) Nora virus, a persistent virus in

Drosophila, defines a new picorna-like virus family. J Gen Virol 87: 3045-3051.

40. Bonning BC, Miller WA (2010) Dicistroviruses. Annu Rev Entomol 55: 129-150.

41. Gordon KH, Hanzlik TN (1998) Tetraviruses. In " The Insect Viruses" L.K. Miller

and L. K. Ball, eds. Plenum Press, London: 269-299.

42. Gordon KH, Williams MR, Hendry DA, Hanzlik TN (1999) Sequence of the genomic

RNA of nudaurelia beta virus (Tetraviridae) defines a novel virus genome

organization. Virology 258: 42-53.

43. Pringle FM, Gordon KH, Hanzlik TN, Kalmakoff J, Scotti PD, et al. (1999) A novel

capsid expression strategy for Thosea asigna virus (Tetraviridae). J Gen Virol 80

( Pt 7): 1855-1863.

44. Gorbalenya AE, Pringle FM, Zeddam JL, Luke BT, Cameron CE, et al. (2002) The

palm subdomain-based active site is internally permuted in viral RNA-dependent

RNA polymerases of an ancient lineage. J Mol Biol 324: 47-62.

25

45. Walter CT, Pringle FM, Nakayinga R, de Felipe P, Ryan MD, et al. (2010) Genome

organization and translation products of Providence virus: insight into a unique

tetravirus. J Gen Virol 91: 2826-2835.

46. Gordon KH, Johnson KN, Hanzlik TN (1995) The larger genomic RNA of

Helicoverpa armigera stunt tetravirus encodes the viral RNA polymerase and has

a novel 3'-terminal tRNA-like structure. Virology 208: 84-98.

47. Hanzlik TN, Dorrian SJ, Gordon KH, Christian PD (1993) A novel small RNA virus

isolated from the cotton bollworm, Helicoverpa armigera. J Gen Virol 74 ( Pt 9):

1805-1810.

48. Hanzlik TN, Dorrian SJ, Johnson KN, Brooks EM, Gordon KH (1995) Sequence of

RNA2 of the Helicoverpa armigera stunt virus (Tetraviridae) and bacterial

expression of its genes. J Gen Virol 76 ( Pt 4): 799-811.

49. Yi F, Zhang J, Yu H, Liu C, Wang J, et al. (2005) Isolation and identification of a

new tetravirus from Dendrolimus punctatus larvae collected from Yunnan

Province, China. J Gen Virol 86: 789-796.

50. Pal N, Boyapalle S, Beckett RJ, Miller WA, Bonning BC (2014) A baculovirus-

expressed dicistrovirus that is infectious to aphids. J Virol 88: 3610.

51. Gildow FE, D'Arcy CJ (1990) Cytopathology and experimental host range of

Rhopalosiphum padi virus, a small isomeric RNA virus infecting cereal grain

aphids. J Invertebr Pathol 55: 245-257.

52. Willianson C, Rybicki EP, Kasdorf GF, Von Wechmar MB (1988) Characterization

of a new picorna-like virus isolated from aphids. J Gen Virol 69: 787-795.

26

53. Runckel C, Flenniken ML, Engel JC, Ruby JG, Ganem D, et al. (2011) Temporal

analysis of the honey bee microbiome reveals four novel viruses and seasonal

prevalence of known viruses, Nosema, and Crithidia. PLoS One 6: e20656.

54. Dombrovsky A, Luria N (2013) The Nerium oleander aphid Aphis nerii is tolerant to

a local isolate of Aphid lethal paralysis virus (ALPV). Virus Genes 46: 354-361.

55. Ge X, Li Y, Yang X, Zhang H, Zhou P, et al. (2012) Metagenomic analysis of viruses

from bat fecal samples reveals many novel viruses in insectivorous bats in China.

J Virol 86: 4620-4630.

56. Granberg F, Vicente-Rubiano M, Rubio-Guerri C, Karlsson OE, Kukielka D, et al.

(2013) Metagenomic detection of viral pathogens in Spanish honeybees: co-

infection by Aphid Lethal Paralysis, Israel Acute Paralysis and Lake Sinai

Viruses. PLoS One 8: e57459.

57. Ravoet J, Maharramov J, Meeus I, De Smet L, Wenseleers T, et al. (2013)

Comprehensive bee pathogen screening in Belgium reveals Crithidia mellificae as

a new contributory factor to winter mortality. PLoS One 8: e72443.

58. Xie Q, Kang HY, Sparkes DL, Tao S, Hu ZQ, et al. (2013) Cytogenetic identification

of wheat-Psathyrostachys huashanica amphiploid x triticale progenies for English

grain aphid resistance. Sci agric 70: 161-166.

59. Ryabov EV, Keane G, Naish N, Evered C, Winstanley D (2009) Densovirus induces

winged morphs in asexual clones of the rosy apple aphid, Dysaphis plantaginea.

Proc Natl Acad Sci U S A 106: 8465-8470.

27

60. van den Heuvel JF, Hummelen H, Verbeek M, Dullemans AM, van der Wilk F

(1997) Characteristics of acyrthosiphon pisum virus, a newly identified virus

infecting the pea aphid. J Invertebr Pathol 70: 169-176.

61. Ryabov EV (2007) A novel virus isolated from the aphid Brevicoryne brassicae with

similarity to picorna-like viruses. J Gen Virol 88: 2590-2595.

62. van Munster M, Dullemans AM, Verbeek M, van den Heuvel JF, Reinbold C, et al.

(2003) Characterization of a new densovirus infecting the green peach aphid

Myzus persicae. J Invertebr Pathol 84: 6-14.

63. D'Arcy CJ, Burnett PA, Hewings AD, Goodman RM (1981) Purification and

characterization of a virus from the aphid Rhopalosiphum padi. Virology 112:

346-349.

64. Brunt AA (1996) Descriptions and lists from the VIDE database. Viruses of plants:

1484.

65. Matthews REF (1991) Plant Virology. Ac press, New-York: 897.

66. Astier S, Albouy J, Maury H (2001) Principes de virologie vegetale. Genome,

pouvoir pathogene, ecologie des virus. INRA ed.

67. Brunt AA, Crabree K, Dallwitz MJ, Gibbs AJ, Watson L, et al. (1996) Plant viruses

online: description and lists from the VIDE database. Version: 20 August 1996

http://biologyanueduau/Groups/MES/vide/.

68. Tjallingii WF, Hogen Esch T (1993) Fine structure of aphid stylet routes in plant

tissues in correlation with EPG signals. Physiol Entomol 18: 317-328.

69. Prado E, Tjallingii WF (1994) Aphid activities during sieve element punctures. Ento

Exp et App 72: 157-165.

28

70. Miles PW (1999) Aphid saliva. Biol Rev 74: 41-85.

71. Brault V, Uzest M, Monsion B, Jacquot E, Blanc S (2010) Aphids as transport

devices for plant viruses. C R Biol 333: 524-538.

72. Watson MA, Roberts FM (1939) A comparative study of the transmission of

hyocyamus virus 3, potato virus Y and cucumber virus by the vector Myzus

persicae (Sulz), M. circumflexus (Buckton) and Macrosiphum gei(Koch). Proc R

Soc London B 127: 543-576.

73. Sylvester ES (1956) Beet yellows virus transmission by the green peach aphid. J Econ

Entomol 49: 789-800.

74. Harris KF (1977) Aphids as virus vectors. Academic Press, New York: 166-208.

75. Pirone TP, Blanc S (1996) Helper-dependent vector transmission of plant viruses.

Annu Rev Phytopathol 34: 227-247.

76. Ng JC, Perry KL (2004) Transmission of plant viruses by aphid vectors. Mol Plant

Pathol 5: 505-511.

77. Megahed ES, Pirone TP (1966) Comparative transmission of Cucumber mosaic virus

acquired by aphids from plants or through a membrane. Phytopathol 56: 1420-

1421.

78. Ng JC, Tian T, Falk BW (2004) Quantitative parameters determining whitefly

(Bemisia tabaci) transmission of Lettuce infectious yellows virus and an

engineered defective RNA. J Gen Virol 85: 2697-2707.

79. Lung MCY, Pirone TP (1973) Studies on the reason for differential transmissibility of

Cauliflower mosaic virus isolates by aphids. Phytopathol 63: 910-914.

29

80. Lung MC, Pirone TP (1974) Acquisition factor required for aphid transmission of

purified cauliflower mosaic virus. Virology 60: 260-264.

81. Drucker M, Froissart R, Hebrard E, Uzest M, Ravallec M, et al. (2002) Intracellular

distribution of viral gene products regulates a complex mechanism of cauliflower

mosaic virus acquisition by its aphid vector. Proc Natl Acad Sci U S A 99: 2422-

2427.

82. Palacios I, Drucker M, Blanc S, Leite S, Moreno A, et al. (2002) Cauliflower mosaic

virus is preferentially acquired from the phloem by its aphid vectors. J Gen Virol

83: 3163-3171.

83. Elnagar S, Murant AF (1976) Role of helper virus, anthriscus yellows, in

transmission of parsnip yellow fleck virus by aphid Cavariella aegopodii. Ann

Appl Biol 84: 169-181.

84. Gray S, Gildow FE (2003) Luteovirus-aphid interactions. Annu Rev Phytopathol 41:

539-566.

85. Hogenhout SA, Redinbaugh MG, Ammar el D (2003) Plant and rhabdovirus

host range: a bug's view. Trends Microbiol 11: 264-271.

86. Walker PJ, Kongsuwan K (1999) Deduced structural model for animal rhabdovirus

glycoproteins. J Gen Virol 80 ( Pt 5): 1211-1220.

87. Goldberg KB, Modrell B, Hillman BI, Heaton LA, Choi TJ, et al. (1991) Structure of

the glycoprotein gene of sonchus yellow net virus, a plant rhabdovirus. Virology

185: 32-38.

88. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2011) GenBank.

Nucleic Acids Res 39: D32-37.

30

89. Liu S, Vijayendran D, Bonning BC (2011) Next generation sequencing technologies

for insect virus discovery. Viruses 3: 1849-1869.

90. Liu SJ, Chen YT, Bonning BC (2015) RNA virus discovery in insects. Curr Opin in

Insec Sci 8: 1-8.

91. Lipkin WI, Firth C (2013) Viral surveillance and discovery. Curr Opin Virol 3: 199-

204.

92. Carrillo-Tripp J, Krueger EN, Harrison RL, Toth AL, Miller WA, et al. (2014)

Lymantria dispar iflavirus 1 (LdIV1), a new model to study iflaviral persistence

in lepidopterans. J Gen Virol 95: 2285-2296.

93. Smith G, Macias-Munoz A, Briscoe AD (2014) Genome Sequence of a Novel

Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato. Genome

Announc 2.

94. Sparks ME, Gundersen-Rindal DE, Harrison RL (2013) Complete Genome Sequence

of a Novel Iflavirus from the Transcriptome of Halyomorpha halys, the Brown

Marmorated Stink Bug. Genome Announc 1.

95. Liu S, Vijayendran D, Carrillo-Tripp J, Miller WA, Bonning BC (2014) Analysis of

new aphid lethal paralysis virus (ALPV) isolates suggests evolution of two ALPV

species. J Gen Virol 95: 2809-2819.

96. Wu Q, Luo Y, Lu R, Lau N, Lai EC, et al. (2010) Virus discovery by deep

sequencing and assembly of virus-derived small silencing RNAs. Proc Natl Acad

Sci U S A 107: 1606-1611.

31

97. Mann KS, Dietzgen RG (2014) Plant rhabdoviruses: new insights and research needs

in the interplay of negative-strand RNA viruses with plant and insect hosts. Arch

Virol 159: 1889-1900.

98. Rusk N (2009) Focus on next-generation sequencing data analysis. Forward. Nat

Methods 6: S1.

99. Altschul SF, Gish W, Miller WM, Eyers EW, Lipman DJ (1990) Basic local

alignment search tool. J Mol Biol 215: 403-410.

32

CHAPTER 2

DISCOVERY OF KNOWN AND NOVEL VIRAL GENOMES IN SOYBEAN

APHIDS COLLECTED IN DIFFERENT REGIONS

Ying Feng1,2, Elizabeth N. Krueger2, Bryony C. Bonning3, W. Allen Miller*1,2

1Bioinformatics & Computational Biology Program

2Plant Pathology & Microbiology Department

3Entomology Department

Iowa State University, Ames, Iowa 50011

*Corresponding author: [email protected]

B. C. Bonning and W. A. Miller designed the experiments, E. N. Krueger carried out sample

collection and RNA isolation, Y. Feng analyzed the NGS data and wrote the paper, W. A. Miller

and B. C. Bonning revised the paper

33

Abstract

Soybean aphid (Aphis glycines) is one of the most economically important pest insects of soybean. In the US, soybean aphid represents the first insect pest of soybean to consistently cause significant yield losses over wide areas, with yield decreases as high as

40%. To develop a biological control method, it is critical to have information on pathogens of soybean aphid. We sought to identify viruses that infect soybean aphid using Next Generation (Illumina) Sequencing of partially purified viruses from soybean aphids collected in Iowa (IA), Michigan (MI), Ohio (OH), and China. For the data analysis, de novo assembly of sequences was conducted using CLC Genomics

Workbench. Deep sequencing analysis indicated that the soybean aphid viruses vary among the regions, but they are all positive stranded ssRNA viruses. Sequences corresponding to many viruses in the order were found. For example, aphid lethal paralysis-like dicistroviruses were found in all four regions. We also sequenced the near-complete genome of a new virus related to Big Sioux River Virus, a virus associated with honeybees. Sequences of a tetra-like virus, Aphis glycines virus 1

(AGV1, S. Liu and B. Bonning, unpublished) were abundant in all US samples, but partially sequence in the sample from China. Plant viral sequences were also found, including sequences with 93% identity to cotton leafroll dwarf polerovirus were found in the sample from China. A novel virus distantly related to the cileviruses of plant and insects was present in the OH samples. Sequences These results reveal the first viruses identified in soybean aphid and new plant viruses that may be vectored by soybean aphid.

34

Introduction

The soybean aphid (Aphis glycines) has been among the most economically important pest insects of temperate agriculture. It is an invasive pest that was first found in the United State in 2000, and quickly became the dominant insect pest in soybean[1].

The populations of soybean aphid increase rapidly and have heavy infections leading to significant yield loss. It has been estimated to cost up to US $3.6 billion annually in losses and control costs[2]. It is the main vector of most soybean viruses, such as mosaic virus (AMV)[3], (SMV)[4]. In addition to transmitting viruses to soybean, the soybean aphid damages plants directly [5]. Aphid produces honeydew, which makes plants sticky, difficult to handle, and fosters growth of sooty mold that further reduces yield. Some studies with tobacco[6], arrowleaf [7], and wheat[8,9] have found that plant viruses can negatively affect aphid performance and preferences[10]. However, there are no reports of viruses in the soybean aphid.

Therefore, we sought to determine whether aphid viruses exist that may be useful as biological control agents or as sources of aphicidal genes for use in plant resistance.

By allowing the study of multiple virus populations, deep sequencing has been the most cost and time efficient way to identify and classify novel and known viruses, viral diversity and evolution, and transmission. To investigate what, if any, viruses are present in soybean aphids, we sequenced total RNA from partially purified virus preparations from soybean aphids collected in Iowa, Michigan, Ohio and China. After de novo assembly and reference mapping assembly, we obtained the nearly complete genome sequences of (i) dicistroviruses identical to or similar to those known to infect other aphid species, (ii) a virus, of which fragments had been found previously associated with honey

35 bees, (iii) a plant polerovius, and (iv) a new -like virus of unknown host. Several of the sequences detected were reconstructed from host organisms highly divergent from those in which related viruses have been previously isolated or discovered. It is clear that viral transmission and maintenance cycles in nature are likely to be significantly more complex and taxonomically diverse than previously known.

Materials and Methods

Aphids. Aphid handling from the various samples was conducted on separate days to avoid cross-contamination between the samples. The aphid colonies were maintained isolated from each other spatially and/or temporally.

The four samples characterized are sourced from China, Ohio, Michigan, and

Iowa. The aphids from China were not frozen at the time of collection or international travel, but have been maintained at -80 °C since arriving in US. The Ohio and Michigan samples have been in colonies for at least 2 years in separate growth chamber. The Iowa sample is a lab colony annually supplemented with field-collected soybean aphid.

Sample collection, virus purification and viral RNA extraction. Two heavily infested trifoliate leafs and stems were collected from a colony for each U.S. location to mimic a field collected sample. The trifoliates were placed into a container at -20°C for at least one hour to kill the aphids present. The aphids were then isolated from the trifoliates by using a fresh paintbrush to remove the aphids from the leaves and into a microfuge tube.

A new paintbrush was used for each sample to minimize the chance of cross- contaminating the samples. Once harvested, the aphids were maintained at -80°C until the RNA extraction.

36

We partially purified virus particles (virions) by ultracentrifugation of an extract from 1-3 grams of aphids through a sucrose cushion, particles were treated with ribonuclease to eliminate the non-encapsidated (non-viral) RNAs and then treated with

Proteinase K to destroy the ribonuclease activity. RNA was extracted using TRIzol®

Reagent (Life Technologies, Grand Island, NY, USA) according to the manufacturer's protocol. The Trizol was heated to 65°C and the initial 5 minute incubation was also carried out at 65°C. The resultant RNA was salt/ethanol precipitated, resuspended in

UltraPure Distilled Water and was analyzed on a Nanodrop instrument.

Library preparation and sequencing. The RNA samples were submitted for library preparation at the ISU DNA Facility using the Illumina TruSeq library preparation kit beginning with the fractionation and adapter ligation with out the polyA selection. Then the samples were run in a single lane on the Illumina HiSeq 2000 for 100 bp paired-end sequencing. The data were returned as four files for the China, Michigan, Ohio, and Iowa samples, respectively.

Sequence assembly and full virus genome discovery. For each sample, the pair-end files were imported into CLC Genomics Workbench version 6.0.4 (CLC Bio, Boston,

MA, USA) as Illumina paired-end reads, discarding the read names but maintaining the quality scores. The reads can be trimmed in CLC by three ways, quality trimming, adaptor trimming and sequence filtering. We used the default trimming quality cutoff values 0.05 and the maximum number of ambiguities as 2 per 100bp read to process each sample. For adaptor trimming, choosing the Illumina adapter list to remove the adaptors from the sequencing reads. For sequence filtering, we filtered the reads below length 15 or above length 108. The output files were saved within the CLC database. In the output

37 files, several files give the summary of trimming data, including name of the sequence used as input, number of reads, average length of the reads, number of reads retained after trimming, the percentage of the input reads that are retained, and average length after trimming.

The paired trimmed sequence data from the trimming process was de novo assembled using CLC Genomics Workbench to form the contigs. CLC bio’s de novo assembly works by using de Bruijn garaphs. The basic idea is to make a table of all sub- sequences of a certain length (call words) found in the reads. The word size is determined automatically in CLC. When there is a SNP or a sequencing error, we get a so-called bubble. In this study, we used automatic values for word size and default value of bubble size 50. We also used the default value 200bp for the minimum contig length. Scaffolding was performed by CLC, in which the insert sizes don’t need to be specified. The insert size corresponds to the estimated distance between reads. CLC estimates the distance between paired reads automatically by a default distance estimation algorithm. After the initial contig creation, we selected the option to map reads back to contigs using default parameters as 2 for mismatch cost, 3 for insertion cost, 2 for deletion cost, 0.5 for length fraction, 0.8 for similarity fraction. The output file gives a figure showing the reads remapped to the assembled contigs, and a list showing the number of reads remapped to each contig.

The assembled contigs were queried against the NCBI nucleotide and protein database using BLASTn and BLASTx by CLC workbench. The filter parameters for

BLAST used default value. Contigs with significant hits (E-value < 1E-03 for BLASTx,

E-value < 1E-10 for BLASTn) to viruses from non-virus hits were isolated. These

38 contigs were re-assembled to generate longer contigs by de novo assembly as mentioned above. The longer contigs were queried against all organism database and virus database using BLAST with default parameter, and were isolated from all contigs by hitting to viruses (E-value < 1E-05). Further, the viral longer contigs were queried against all longer contigs to look for contigs that overlap viral contigs. This essential step will help to identify the novel viral sequence. Finally, we assemble the virus genome manually by checking the overlapping information. Once the longer congtigs was assembled, the whole sequence was further characterized by determining the potential ORFs using default parameters.

Phylogenetic and conservation analyses. Multiple sequence alignment, determination of identity and similarity percentages, and initial phylogenetic determination were performed using MEGA software [11]. ClustalW was conducted for multiple amino acid sequence alignment with default parameters by MEGA6. The sequences are the amino acid sequences of RNA-dependent RNA polymerase (RdRp) for each virus.

Phylogenetic trees were conducted using maximum likelihood method in MEGA 6. The test of phylogeny is bootstrap method with 500 replicates, Jones-Taylor-Thornton (JTT) model, uniform rates for rates pattern, and complete deletion for gaps/missing data treatment. For tree inference options, we chose Nearest-Neighbor-Interchange (NNI) as

ML heuristic method using default initial tree for ML.

Results

Deep sequencing analysis of virus hits. In the HiSeq data used in this study, a total of

263,808,602 reads was obtained from the samples, distributed as 18% from China, 26%

39 from Iowa, 24% from Ohio, and 32% from Michigan. After trimming, 0.9 % of the reads were removed because they failed our analysis criteria, leaving 47,539,052, 69,028,800,

82,248,632, and 62,848,082 reads, respectively. With initial de novo assembly, 64.2% of the raw reads were from contigs lengths from 157 bp to 27,554 bp (Fig. 1, Table 1). The reads distribution between the four samples, as well as the summary statistics for the de novo assembly, are reported in Table S1. As shown in workflow (Fig. 1), we performed

BLASTn and BLASTx to screen the contigs with potential viral sequences. Only 24.89% and 16.64% o f the reads analyzed by BLASTn and BLASTx, respectively, hit known viral gene sequences using e-value of 10-3 cutoff for significance (Fig. 1). Most of the reads hit aphid genome sequences (with < 1E-60). We obtained longer contigs by reassembling the potential viral contigs, to ultimately construct complete genomes. In an alternative route, reads were assembled by reference mapping using known virus genomes as references to assemble the complete genome. de novo assembly

For the de novo assembly, 4,932, 30,996, 35,065, 27,512 contigs were generated in the samples from China, Iowa, Michigan and Ohio, respectively (Table 1). The majority of the contigs were identified as belonging to aphid species. A large number of the contigs were bacterial or insect sequences. For example, nearly 66% (20,164) of contigs were identified as insect in the sample from Iowa. Merely 2.6% of the de novo assembled, BLAST identified contigs were determined to be of viral origin. These sequences were from members of Caulimoviridae, Dicistroviridae, Luteoviridae,

Iflaviridae, , Nodaviridae, , and

Permutotetraviridae families (Table S1).

40

Aphid lethal paralysis virus (ALPV), Rhopalosiphum padi virus (RhPV), and Big

Sioux River viruses (BSRV) were the most common virus represented by the contigs in all four samples. However, most of the viral contigs differed among the regions (Table

S1). Two large viral contigs ( >1000 bp) for Cotton leafroll dwarf virus and Maize yellow dwarf virus were found only in the China sample. Large contigs matching Drosophila A virus and Kashmir bee virus were found in the Iowa and Michigan samples. Contigs with sequence similarity to Gardner-Rasheed feline sarcoma virus (GR-FeSV) were found in the Michigan and Ohio samples. A 949 nt contig in the Ohio sample was identified as similar to Citrus leprosis virus cytoplasmic type 2.

Genome assembly of known and novel viruses

We used all potential viral contigs as well as all contigs that did not hit any sequence in the Genebank database as inputs for re-assembly. Six segments of length from 4839 to 10174 nt were identified (Table 2). Of the six segments, three (S2, S4, S5) were closely related to Aphid lethal paralysis virus AP1 (ALPV-AP1), RhPV, and CLDV with 88 %, 93.7%, and 93% full length amino acid identity respectively. These three segments had lengths close to those of the related virus genome.

The other three segments (S1, S3, S6) shared low aa identity with top BLAST hits. The top BLAST hit of S1 was Euprosterna elaeasa virus (EeV) with 33% aa identity, and similar length. Strikingly, S1 is firstly reported in soybean aphid sharing

98% similarity with Aphis glycines virus 1 (identified by Sijun Liu, unpublished). S1 was found in all four regions with abundant read counts. From the sample from Ohio, we assembled a 10,147 nt segment named S3, which has 34% aa identity with Citrus leprosis

41

C type 2 virus (CiLV-C2). This segment is similarly related to Negev virus, Loreto virus, and Hibiscus green spot virus with ~33% aa identity to all 3 viruses. In our study, S6 with length of 9445 nt, was most similar to the RhPV. In a previous study of viruses isolated from honey bee, four contigs of size 1473, 861, 1164, and 1311 nt (GenBank

JF423195-8) were designated from a hypothetical Big Sioux River virus (BSRV), after its place of discovery in South Dakota. Interestingly, all four contigs of BSRV have high aa identity to the sequence of S6 as shown in Fig. 2F.

Virus Genome Organizations

The genome organization of S1 is shown in Fig. 2. Four ORFs have sizes of 3438 nt of ORF1 (44-3,481 nt), 510 nt of ORF2 (1,945-2,454 nt), 735 nt of ORF3 (3,267-4,001 nt), and 810 nt of ORF4 (4,005-4,805 nt). Part of ORF1 of the S1 hit the sequences of putative replicase of EeV and RNA polymerase of Thosea asigna virus (TaV), while

ORF3 of S1 had highest similarity to the capsid protein of Bat sobemovirus and Pothos latent virus (Fig. 2A). In the BLAST analysis, the S2 sequence showed 95% and 94 % aa identity with the ORF1 (GenBank AFU81560) and ORF2 (GenBank AFU81561) of

ALPV-An, respectively (Fig. 2B).

S3 contains four large ORFs (ORF1 20-7,417 nt; ORF2 7,314-8,660 nt; ORF3

8,632-9,237 nt; ORF4 9,293-9,991 nt) as illustrated in Fig. 2C. The top BLAST hit of S3 is CiLV-C2, which is a bipartite RNA virus. RNA1 of CiLV-C2 (8,717 nt) contains two

ORFs, and RNA2 (4,989 nt) contains five ORFs [12]. The ORF1 and ORF2 of S3 showed 33% aa identity with the RNA1 of CiLV-C2, while there was no matches to the

RNA2 of CiLV-C2 in sequence of S3 (Fig. 2C).

42

The sequence of S4 contains two ORFs (ORF1 580-6,576 nt; ORF2 7,103-9,560 nt) with high similarity to RhPV (Fig. 2D). S5 showed similar genome organization as its top BLAST hit CLDV, having six ORFs (Fig. 2F). ORF0 to ORF5 encoded the silencing suppressor protein, RNA dependent RNA polymerase, capsid protein, movement protein, and read-through protein[13]. The sequence of S6 had two ORFs (ORF1 3-2,150 nt;

ORF2 2,753-8,752 nt) with 71% and 73% amino acid identity to the polyprotein

(ABX74939.1) and structural polyprotein (ABX74940.1) of RhPV respectively. We also identified that all four contigs of BSRV were included in S6 as shown as Fig. 2F.

Virus phylogenetic analysis

Based on the results of ClustalW multiple alignments, we constructed a maximum likelihood bootstrap phylogenetic tree for each sequence. In Fig. 3A, S1 was close to the branch of EeV and TaV, which belongs to , Alphapermutotetravirus.

The other two hitting viruses, Drosphila A virus is an unclassified dsRNA virus, and Bat sobemovirus is an unclassified Sobemovirus from bat feces (Fig. 2A, Fig. 3A).

S2, S4 and S6 all had high identity to RhPV and ALPV. As in the BLASTx result, S2 is on the same branch as ALPV, and S4 and RhPV were placed on the same branch respectively (Fig. 3B). These six sequences as well as DCV and CPV belong to

Cripavirus of the family Dicistroviridae.

A phylogenetic tree was constructed based on the multiple alignments of the

BLAST hits by the S3 sequence segment. S3 was not grouped to any branch in a phylogenetic tree, but it was more close to CiLV-C (Fig. 3C). The Lereto virus, Piura virus, Negewotan virus and Negevirus are novel insect viruses, which are related

43 distantly to CiLV-C[14]. In the other subgroup, there are all plant viruses, including

Tobacco mosaic virus, Hibiscus latent Fort Pierce virus, Yellow tailflower mild mottle virus and so on.

S5 is closely related to CLDV, and belongs to the same genus, Polerovirus

(Luteoviridae), as Maize yellow dwarf virus-RMV (Fig. 3D). Its close relatives include other plant poleroviruses, such as Melon aphid-born yellows virus or Pepper vein yellows virus.

Discussion

Our study is the first research survey on the viruses sequence of soybean aphids, and the first to identify any viruses sequences associated with soybean aphid. The large samples allowed us to be more informed on the viruses the soybean aphids carried, and the deep sequencing allowed us to study multiple viral populations in the most cost and time efficient way. The results revealed known and novel viruses, which provide valuable information on viral diversity and evolution.

In this study, we identified six complete or nearly complete virus genomes from different regions. Three out of six represent novel viruses (S1, S3, and S6), as they have low aa identity and low coverage with the known virus sequences. The top hitting sequence of S1 is insect virus EeV, a member of the Tetraviridae family[15], but with only 33% aa identity and 38% coverage at most. And their genome organization is not similar. S1 was found in all samples of US, and even partial sequence in the China sample. It indicates that the S1 may be a novel widespread virus in soybean aphid, so we named it as Aphis glycines virus 1 (AGV1). There were relatively few sequence reads

44 obtained from the China soybean aphid sample, possibly because the sample has been stored at room temperature, possibly causing selective degradation of less stable viruses, whereas the IA, MI, OH viruses were isolated from live aphid colonies.

S3 from the Ohio sample has little identity to any virus but has partial identity

(33%) to the plant virus CiLV-C2, with a longer genome 10,147 nt compared to 8,717 nt for CiLV-C2. In the phylogenetic tree (Fig. 3C), two viral subgroups are distantly related to S3. One subgroup contains all plant viruses belonging to cilevirus and higrevirus genera. The other subgroup, negevirus [14], contains only insect viruses. Negevirus is a new taxon of insect-specific viruses. Six novel insect viruses, Negev, Ngewotan, Piura,

Loreto, Dezidougou, and Santana, isolated from mosquitoes and phlebotomine sand flies form this new genus[14]. They are closest but still distant relative are CiLV-C, and viruses in genus Cilevirus. It is clear that S3 is a novel virus, but whether it is a plant virus or an insect virus is unknown. We designated S3 as Aphis glycines virus 2 (AGV2).

Also of interest, S6 has similarity to all four contigs of BSRV (Fig. 2F). Those contigs were obtained by deep sequencing of virus isolated from honey bee [16]. S6 shares 71% aa identity with RhPV, but share highest aa identity (85% to 93%) with each contigs of BSRV. We designate S6 as Aphis glycines virus 3 (AGV3). Which hosts

AGV3 actually infects are unknown, but given the high number of reads we obtained

(over 4,500 reads per sample), it is highly likely to infect soybean aphid.

Most of the virus contigs in our study hit the sequences of ALPV and RhPV.

Clearly these viruses are infecting soybean aphid, and are unlikely to be merely associated with it. Based on previous researches, RhPV was known only to infect cereal

45 aphids, including aphid R.padi (bird cherry-oat aphid) [17], R. rufiabdominalis (rice root aphid) [17], Diuraphis noxia (Russian wheat aphid) [18], R. maidis (green corn aphid) and Acyrthosiphum dirrhodum (grain aphid) [19], and Schizaphis graminum

(greenbug)[17,20]. The predicted ORFs of S4 both have >90% aa identity to RhPV, hence S4 is not a new virus with 90% similarity to RhPV. S1 is named as RhPV-AG, a new RhPV-like virus isolated from Aphis glycines.

ALPV, an aphid-pathogenic virus, was associated with rapid decline of the population of the major aphid species colonizing small grains under natural conditions

[21,22]. ALPV was first isolated from the aphid Rhopalosiphum padi. It is much more pathogenic to a R. padi than RhPV and is present at much higher concentrations in aphids than is RhPV [23]. Several ALPV-like viruses have been identified in organisms other than aphids, such as from bat fecal samples and honeybees [16,24-28]. Considering that the sequence identity of amino acid between S2 and ALPV was 92%, we named S2 as

ALPV-AG, a new isolates from A. glycines. Recently, a new isolate of ALPV (ALPV-

AP) was isolated from the pea aphid, Acyrthosiphon pisum, which is closely related to

ALPV-AM, an ALPV isolate fro honeybee [29]. The ALPV-AP genome is 88% identical to ALPV-AG from the sample of China, and is 86% identical to ALPV-AG from the sample of Michigan. These results indicate that the ALPV is highly adaptable to many insect species. ALPV cause disease symptoms and limit R.padi population in nature [23].

ALPV-An, a strain of ALPV, was highly pathogenic to Myzus persicae [25]. However, no acute pathogenicity was observed in ALPV-AP infection of A. pisum under ideal rearing conditions [29], and neither in ALPV-AG infection of A. glycines. It suggests that the diversity also exists in the biological properties of ALPVs such as host range.

46

S5 appeared only in the sample from China, sharing 93% aa identity with the plant virus

CLDV. CLDV is the causal agent of cotton blue disease (CBD) and is transmitted by the

Aphis gossypii Glover [13,30]. It is a (+)ssRNA virus in a new species within the genus

Polerovirus (Luteoviridae) [30]. These viruses are transmitted in a circulative, non- propagative manner. CBD is an important disease present in cotton crops in South

America, Africa and Asia [31]. The infected cotton plants show a moderate to severe stunting due to shortening of internodes, leaf rolling, vein yellowing and intensive dark green color of the foliage [32]. CBD only found in South America, Africa and Asia explained why we find this virus only in the sample from China. Presence of

CLDV in soybean aphid suggests soybean aphid may be a vector, which has not been reported.

In summary, our results (Table 2 & Table S1), suggest that a variety of viruses associate with soybean aphid, but it is not clear which actually replicate in this host. We assembled only six full viral genomes, but there are still a lot of small contigs in each sample that hit a wide range of viral genomes. However, most of the contigs have short hits length (< 100 nt), and low read count (< 50), due to several possible reasons: 1) sequence random matching because the hitting length is short even e-value less than 1E-10 the contig still has a low identity to the hitting virus; 2) samples are contaminated during the isolation; 3) some noncirculative virions remain on the surface of aphids.

Overall, this study provides the first information on the types of viruses that infect soybean aphid. By this information, we can better understand the possible plant or insect viruses soybean aphids may carry. Such viruses have potential for use as biological control agents or as sources of non-infectious transgenic aphicidal genes in plants. It also

47 shows that viral transmission and maintenance cycles in nature are likely to be highly complex and taxonomically diverse.

Acknowledgements

We would like to thank Dr. Xiulian Sun, and Andy Michel from Ohio State University for provision of soybean aphids from Ohio, Douglas Landis, from Michigan State

University for aphids from Michigan, Matt O’Neal for aphids from Iowa. We acknowledge the Iowa Soybean Association for the funding.

References

1. Ragsdale DW, Voegtlin DJ, O'Neil RJ (2004) Soybean aphid biology in North

America. Annals of the Entomological Society of America 97: 204-208.

2. Kim CH, Schaible G, Garrett L, Lubowski R, Lee D (2008) Economic impacts of the

US soybean aphid infestation: a multi-regional competitive dynamic anlysis.

Agric Res Econ Rev 37: 227-242.

3. Bol JF (2003) Alfalfa mosaic virus: coat protein-dependent initiation of infection. Mol

Plant Pathol 4: 1-8.

4. Jayaram C, Hill JH, Miller WA (1991) Nucleotide sequences of the coat protein genes

of two aphid-transmissible strains of soybean mosaic virus. J Gen Virol 72 ( Pt 4):

1001-1003.

5. Macedo TB, Bastos CS, Higley LG, Ostlie KR, Madhavan S (2003) Photosynthetic

responses of soybean to soybean aphid (Homoptera: Aphididae) injury. J Econ

Entomol 96: 188-193.

48

6. McIntyre JL, Dodds JA, Hare JD (1981) Effects of localized infections of Nicotiana

tabacum by tobacco mosaic virus on systemic resistance against diverse

pathogens and an insect. Phytopathology 71: 297-301.

7. Ellsbury K, Schneeweiss R, Montano DE, Gordon KC, Kuykendall D (1987) Gender

differences in practice characteristics of graduates of family medicine residencies.

J Med Educ 62: 895-903.

8. Mowry TM (1994) Russian Wheat Aphid (Homoptera, Aphididae) Survival and

Fecundity on Barley Yellow Dwarf Virus-Infected Wheat Resistant and

Susceptible to the Aphid. Environmental Entomology 23: 326-330.

9. Fiebig M, Poehling HM, Borgemeister C (2004) Barley yellow dwarf virus, wheat, and

Sitobion avenae: a case of trilateral interactions. Entomologia Experimentalis Et

Applicata 110: 11-21.

10. Donaldson JR, Gratton C (2007) Antagonistic effects of soybean viruses on soybean

aphid performance. Environ Entomol 36: 918-925.

11. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5:

molecular evolutionary analysis using maximum likelihood, evolutionary

distance, and maximum parsimony methods. Mol Biol Evol 28: 2731-2739.

12. Roy A, Choudhary N, Guillermo LM, Shao J, Govindarajulu A, et al. (2013) A novel

virus of the genus Cilevirus causing symptoms similar to citrus leprosis.

Phytopathology 103: 488-500.

13. Correa RL, Silva TF, Simoes-Araujo JL, Barroso PA, Vidal MS, et al. (2005)

Molecular characterization of a virus from the family Luteoviridae associated

with cotton blue disease. Arch Virol 150: 1357-1367.

49

14. Vasilakis N, Forrester NL, Palacios G, Nasar F, Savji N, et al. (2013) Negevirus: a

proposed new taxon of insect-specific viruses with wide geographic distribution. J

Virol 87: 2475-2488.

15. Zeddam JL, Gordon KH, Lauber C, Alves CA, Luke BT, et al. (2010) Euprosterna

elaeasa virus genome sequence and evolution of the Tetraviridae family:

emergence of bipartite genomes and conservation of the VPg signal with the

dsRNA Birnaviridae family. Virology 397: 145-154.

16. Runckel C, Flenniken ML, Engel JC, Ruby JG, Ganem D, et al. (2011) Temporal

analysis of the honey bee microbiome reveals four novel viruses and seasonal

prevalence of known viruses, Nosema, and Crithidia. PLoS One 6: e20656.

17. D'Arcy CJ, Burnett PA, Hewings AD (1981) Detection, biological effects, and

transmission of a virus of the aphid Rhopalosiphum padi. Virology 114: 268-272.

18. von Wechmar MB, Rybicki EP (1981) Aphid transmission of three viruses causes

Free State streak disease. S Afr J Sci 77: 488-492.

19. Williamson C, Wechmar MBV, Rybicki EP (1989) Further characterization of

Rhopalosiphum padi virus of aphids and comparison of isolates from South

Africa and Illinois. J Invertebr Pathol 54: 85-96.

20. Gildow FE, D'Arcy CJ (1988) Barley and oats as reservoirs for an aphid virus and the

influence on barley yellow dwarf virus transmission. Phytopathol 78: 811-816.

21. Laubscher JM, von Wechmar MB (1992) Influence of aphid lethal paralysis virus and

Rhopalosiphum padi virus on aphid biology at different temperatures. J Invert

Pathology 60: 134-140.

50

22. Laubscher JM, von Wechmar MB (1993) Assessment of aphid lethal paralysis virus

as an apparent population growth-limiting factor in grain aphids in the presence of

other natural enemies. Biocontrol Sci Tech 3: 455-466.

23. Williamson C, Rybicki EP, Kasdorf GGF, von Wechmar MB (1988) Characterization

of a new picorna-like virus isolated from aphids. J gen Virol 69: 787-795.

24. Willianson C, Rybicki EP, Kasdorf GF, Von Wechmar MB (1988) Characterization

of a new picorna-like virus isolated from aphids. J Gen Virol 69: 787-795.

25. Dombrovsky A, Luria N (2013) The Nerium oleander aphid Aphis nerii is tolerant to

a local isolate of Aphid lethal paralysis virus (ALPV). Virus Genes 46: 354-361.

26. Ge X, Li Y, Yang X, Zhang H, Zhou P, et al. (2012) Metagenomic analysis of viruses

from bat fecal samples reveals many novel viruses in insectivorous bats in China.

J Virol 86: 4620-4630.

27. Granberg F, Vicente-Rubiano M, Rubio-Guerri C, Karlsson OE, Kukielka D, et al.

(2013) Metagenomic detection of viral pathogens in Spanish honeybees: co-

infection by Aphid Lethal Paralysis, Israel Acute Paralysis and Lake Sinai

Viruses. PLoS One 8: e57459.

28. Ravoet J, Maharramov J, Meeus I, De Smet L, Wenseleers T, et al. (2013)

Comprehensive bee pathogen screening in Belgium reveals Crithidia mellificae as

a new contributory factor to winter mortality. PLoS One 8: e72443.

29. Liu S, Vijayendran D, Carrillo-Tripp J, Miller WA, Bonning BC (2014) Analysis of

new aphid lethal paralysis virus (ALPV) isolates suggests evolution of two ALPV

species. J Gen Virol 95: 2809-2819.

51

30. Distefano AJ, Bonacic Kresic I, Hopp HE (2010) The complete genome sequence of

a virus associated with cotton blue disease, cotton leafroll dwarf virus, confirms

that it is a new member of the genus Polerovirus. Arch Virol 155: 1849-1854.

31. Cauquil J (1977) Etudes sur une maladie d’origine virale du cotonnier: la maladie

bleue. Coton et Fibres Tropicales 32: 259-278.

32. Cauquil J, Vaissayre M (1971) La “maladie bleue” du cotonnier en Afrique:

transmission de cotonnier a cotonnier par Aphis gossypii glover. Coton et Fibres

Tropicales 26: 462-466.

52

Tables

Table 1. Summary of sequencing results by location

Sample Reads after No. of contigs by de Avg. length of No. of virus contigs location trimming novo assembly contigs (bp) BLASTn BLASTx

China 47,539,052 4,932 898 107 144

Iowa 69,028,800 30,997 546 285 52

Michigan 82,248,632 35,065 525 416 81

Ohio 62,848,082 27,512 542 326 67

53

Table 2. Virus genome assembly segments and BLASTx alignment results.

Sequence Top BLASTx hits

Segment Location Length Read Virus Genome Hit Identity

(nt) counts length length (%)

(nt) (aa)

IA 4839 35895 631 33 Euprosterna elaeasa S1 MI 4838 38515 5698 631 33 virus OH 4839 22194 631 33

IA 9120 603 1048 93

MI 9918 507 Aphid lethal 1501 95 S2 9835 OH 9075 567 paralysis virus 756 93

China 9814 2365146 1021 94

Citrus leprosis C S3 OH 10147 11924 8717 471 33 type 2 virus

MI 10045 1018 Rhopalosiphum 694 92 S4 10011 China 9494 3772552 padi virus 485 90

China 5841 6277 Cotton leafroll S5 5865 638 95 dwarf virus

IA 9445 4202 1990 73

MI 9443 4783 Rhopalosiphum 1990 73 S6 10011 OH 9445 4163 padi virus 1990 73

China 9445 20784187 1990 73

Table S1. Viruses hits to the assembled contigs by BLAST in different location

China Iowa Michigan Ohio Acyrthosiphon pisum virus Aphid lethal paralysis virus Aphid lethal paralysis virus Aphid lethal paralysis virus Aphid lethal paralysis virus Big Sioux River virus Big Sioux River virus Big Sioux River virus Barley yellow dwarf virus Cafeteria roenbergensis Bovine viral diarrhea virus-1 Bovine viral diarrhea virus-1 virus Big Sioux River virus Euprosterna elaeasa virus Euprosterna elaeasa virus Citrus leprosis virus cytoplasmic type 2 Brevicoryne brassicae picorna-like Influenza A virus Gardner_Rasheed feline Euprosterna elaeasa virus virus sarcoma virus Citrus leprosis virus C Rhopalosiphum padi virus Grapevine leafroll-associated Gardner-Rasheed feline sarcoma virus virus 7

Cotton leafroll dwarf virus Tanapox virus Kirsten murine sarcoma virus Kirsten murine sarcoma virus 54 Euprosterna elaeasa virus White bream virus Maize dwarf mosaic virus Micromonas pusilla virus 12T Flock house virus Micromonas pusilla virus 12T Rhopalosiphum padi virus Hana virus Rhopalosiphum padi virus Stealth virus 1 Hibiscus green spot virus Stealth virus 1 Lake Sinai virus 2 Yaba monkey tumor virus Loreto virus Maize yellow dwarf virus Modoc virus Negev virus Ngewotan virus Oyster mushroom isometric virus Peanut clump virus Rhopalosiphum padi virus Rosy apple aphid virus All hits to the assembly contigs by BLASTn or BLASTx were selected by e-value less then 1E-10.

Percentage)of)reads)remaining) Workflow)

Raw"reads" Treatment"of"raw"data"

Trimmed" IniDal"assembly" reads" BLAST" De#novo# assembly" IsolaDng"potenDal" 55 viruses"conDgs" BLASTn"to" virus" ReHassemble"to" form"longer"

BLASTx"to" conDgs" virus" Reference" BLAST"to"idenDfy" mapping"to" viruses" idenDfy"viruses" 0" 10" 20" 30" 40" 50" 60" 70" 80" 90" 100"

Figure 1. Workflow for processing deep sequencing data. Two strategies used to analyze the data were de novo assembly and reference mapping. Average percent remaining reads after each of the filtering steps. Low-quality reads are moved first, followed by

BLASTn and BLASTx comparison to virus sequences.

56

ORF2 ORF3 A.# ORF1 ORF4 S1

Euprosterna#elaeasa#virus#NP_573541.1#(33%)# Bat#sobemovirus#AGN73380.1#(31%)# 500#

Thosea#asigna#virus#AAQ14329.1#(32%)# Pothos#latent#virus#YP_9032636.1#(22%)#

ORF1 ORF2 B.# S2

ALPV#AFU81560.1##(95%)## ALPV#AIJ00044.1#(94%)## 1000#

C.# ORF1 ORF3 ORF2 ORF4 S3 1000# CLiV_C2#AGE82887.1#(33%)#

D.# ORF1 ORF2 S4

RhPV#NP_046155.1#(92%)# RhPV#NP_046156.1#(88%)# 1000# ORF4 E.# ORF1 ORF0 ORF2 ORF3 ORF5 S5

CLDV#YP_003915148.1#(95%)# CLDV#ACU82791.1#(93%)# 600# F.# ORF1 ORF2 S6

RhPV#ABX74939.1#(71%)# RhPV#ABX74940.1#(73%)#

BSRV1#(92%)# BSRV2#(93%)##BSRV3#(89%)# BSRV4#(85%)# AEH42813.1# AEH42814.1# AEH42816.1# AEH42817.1#

1000#

Figure 2. Genome organizations of the assembled virus segments S1-S6 and similar

RNA viruses. Viral sequences were positionally mapped to the genome of S1-S6 (A-F) based on the identities of the associated amino acid (aa) sequences to the corresponding reference genome proteins. Blue bar: assembled genome; red bar: viral sequences that share closest similarity with the assembled genome; parenthesis: percentage of aa identity to the corresponding sequences.

A.# B.#

88 Rhopalosiphum padi virus (NP_046155.1) 100 S4

100 S6 S2 100 Euprosterna elaeasa virus (NP_573541.1) 100 Aphid lethal paralysis virus (AFU81560.1) Thosea asigna virus (AAQ14329.1) Drosophila A virus (YP_003038595.1) 100 Drosophila C virus (NP_044945.1) 100 Drosophila melanogaster tetravirus SW-2009a (ACU32793.1) Cricket paralysis virus (NP_647481.1)

S1 55 Acute bee paralysis virus (NP_066241.1)

100 Kashmir bee virus (NP_851403.1) 0.1 97 Formica exsecta virus 1 (YP_008888535.1) Plautia stali intestine virus (NP_620555.1) (ABS82435.1) 100 77 Himetobi P virus (AHD25455.1) 54 Triatoma virus (NP_620562.1) 57 0.1

C.# D.#

S3 100 Cotton leafroll dwarf virus (YP_003915148.1) 100 Citrus leprosis virus C (ABA42875.1) 38 S5 100 Citrus leprosis virus cytoplasmic type 2 (AGE82887.1) Brassica yellows virus (ADW41603.1) Hibiscus green spot virus (YP_004928118.1) 26 100 Turnip yellows virus (AFZ39160.1) 86 Loreto virus (AFI24687.1) 97 Maize yellow dwarf virus-RMV (YP_008083739.1) Piura virus (AFI24678.1) 100 Beet western yellows virus (BAP16771.1) 100 Ngewotan virus (AFY98072.1) Melon aphid-borne yellows virus (ABV55637.1) 100 Negev virus (AFI24684.1) 100 100 Cucurbit aphid-borne yellows virus (AFM68932.1) 39 Passion fruit mosaic virus (YP_004465358.1) Potato leafroll virus (NP_056748.3) 55 Hibiscus latent Fort Pierce virus (ACI24009.1) 52 Cucumber green mottle mosaic virus (AGO32737.1) Cereal yellow dwarf virus-RPV (BAA01054.1)

100 Tobacco mild green mosaic virus (AGC82908.1) Sugarcane yellow leaf virus (CAJ26280.1) Yellow tailflower mild mottle virus (YP_008802584.1) 100 0.05 93 Bell pepper mottle virus (YP_001333649.1) 100 Tobacco mosaic virus (ABN79257.1)

0.1

58

Figure 3. Phylogenetic analysis of assembled virus segments. Phylogenetic trees comparing RNA-dependent RNA polymerase of (A) S1 with representative members of the Alphapermutotetravirus genus within the Permutotetraviridae family; (B) S2,

S4 and S6 with representative members of the Aparavirus and Cripavirus genus within the Dicistroviridae family; (C) S3 with representative members of the

Cilevirus, Negevirus, and Tobamovirus genus; and (D) S5 with representative members of Polerovirus genus within the Luteoviridae family. Phylogenetic trees were constructed by using the maximum likelihood method in the MEGA6 program[11] with 500 bootstrap replicates. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown next to the braches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.

59

CHAPTER 3 GENERAL CONCLUSIONS

Soybean aphid has a negative effect on agricultural economy in North

America. Extensive studies focus on aphid damage and control strategies. Except chemical management or host plant resistance, biological control was seen as a great potential management strategy. A number of researches work on specific virus in insect or aphid as the transmission vector of specific plant virus. Now NGS technologies make identifying and discovering virus much more easier and faster. We can conduct metagenomic analyses for virus sequence discovery.

By now, there is no research on the investigation of the viruses that soybean aphid is carrying. In this study, we conduct the deep RNA sequencing of virus particle isolated from soybean aphids in Iowa, Michigan, Ohio and China by NGS. The reads of virus sequences from different samples was assembled by multiple methods.

Complete or nearly complete genome sequences were identified by de novo assembly method and reference mapping method. We discovered six full- or nearly full-length virus sequences, including dicistro-like virus, tetra-like virus, and a plant cile-like virus. We predict the genome organization and conduct the phylogenetic analysis for each virus. These results reveal the first viruses identified in soybean aphid and new plant viruses that may be vectored by soybean aphid. The information will shed light on the potential use of aphid viruses as biological control agents or as sources of non- infectious transgenic aphicidal genes in plants.

.

60

ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. W. Allen Miller, for his guidance and patience and financial support during my graduate study. I am also grateful to my co- advisor Dr. Karin Dorman, for her guidance, encouragement and full support. I am also indebted to my committee member, Dr. Bryony Bonning who has given me a lot of suggestions and help.

I also want to thank our lab members Dr. Jelena Kraft, Dr. Jimena Carrillo-

Tripp, Alice Hui, Dr. Juliette Doumayrou, Dr. Sung ki Cho, Bliss Kernodle, and

Elizabeth J Carino, colleagues in Dr. Bonning's, and Dr. Dorman's lab, many faculty members and staffs in the program of Bioinformatics and Computational Biology for their help and valuable discussion and suggestion.

I would like to give my special thanks to my dear boyfriend Xiaochu Lou, friends Dr. Lei Gong, and Dr. Haibo Liu for their insightful discussion, unselfish help, and mental support.

Last but not least, I thanks to my parents for their encouragement, patience, respect and love. My love and gratitude for them can hardly be expressed in words. I dedicate this thesis to my parents: Donglian Feng and Lizhu Zhang.

.”