DISCOVERY OF KNOWN AND NOVEL IN WILD AND CULTIVATED BLUEBERRY THROUGH TRANSCRIPTOMIC AND VIRAL METAGENOMICS APPROACHES

By

NORSAZILAWATI SAAD

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2017

© 2017 Norsazilawati Saad

To my husband, my beautiful children, mother and father

ACKNOWLEDGMENTS

First of all, I would like to express my gratitude to God for instilling perseverance in my heart to embrace the challenging moments throughout my PhD journey. This venture of course would not have been possible without the blessing of my husband, Mohd Khairil Anwar Abdul

Latif; I would like to thank him for the endless love, support and patience while being away from me and my children during my study here. My gratitude also goes to my beautiful children,

Sofiyah and Syuaib, for accepting the ways things are throughout their stay with me; the late pickup from school and sometimes the lack of time spent with them, especially when I was sick or too busy with exams and research. Despite of these, this hard work is for you, so that our family can have a better future.

I am undoubtedly indebted to my parents, Zaini Amir and Saad Osman, for the unconditional love and motivation throughout my life; My mother flew all the way here almost each year just to help me in taking care of my children and I couldn’t have never repaid that. I am also grateful to my parents in law and other family members for their support along the way.

A very special acknowledgement to my committee chair, Dr Phil Harmon, and members;

Dr Jeffrey Jones, Dr Arvind Varsani, Dr James Olmstead and Dr Svetlana Folimonova, for their guidance, feedback and support in my PhD research and dissertation. The appreciation also goes to the head of department of plant pathology, Dr Rosemary Loria, for her thoughtful concern and reassurance during my PhD years. A special gratitude for Ricardo Alcalá-Briseño and Jose

Carlos Huguet Tapia for providing assistance and helpful feedback to my research especially on bioinformatics analysis.

I would like to extend the recognitions to the departmental staff for their assistance throughout the duration of my study, especially to Jessica Ulloa, Michael Morrow and Lauretta

4

Rahmes. This acknowledgment would not be completed without mentioning thanks to my friends for their encouragement and moral support, especially Sevgi Coskan, Maria Climent,

Juliana Pereira, Salma Arous, as well as other members in Harmon’s and Jones’s lab. I feel grateful to have my friends and neighbors for the support and help throughout my stay here.

My PhD education has been made possible with the funding provided by the Ministry of

Higher Education Malaysia and Universiti Putra Malaysia (UPM). Hence my appreciation goes to the sponsors, the head of department of plant protection and the staffs at UPM for their kind support.

Finally, I would like to say thanks to Dr Jane Polston for the supervision and facilities to conduct most of my PhD research in the department. Also, I would like to thank Heather

Capobianco and Camisha Alexis for the assistance during collection and processing of samples.

Thank you to all for the inspiration and endless support.

5

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 9

LIST OF FIGURES ...... 11

LIST OF ABBREVIATIONS ...... 13

ABSTRACT ...... 15

CHAPTER

1 LITERATURE REVIEW ...... 17

Introduction to Blueberry ...... 17 Southern Highbush Blueberries (SHB) in Florida ...... 18 New and Emerging Diseases of Blueberry ...... 19 Blueberry red ringspot virus ...... 20 Nepoviruses ...... 22 Strawberry latent ringspot virus ...... 28 Blueberry mosaic associated virus ...... 28 Blueberry latent virus ...... 30 Blueberry scorch virus ...... 31 Blueberry shock virus ...... 33 Blueberry virus A ...... 35 Unassigned viral families ...... 36 Next-Generation Sequencing (NGS) for Identification of Viruses ...... 39 Plant Viral Metagenomics ...... 40 Sample Preparation Methods to Generate Viral Metagenomes ...... 41 Analysis of Metagenome Data ...... 45 Applications of Viral Metagenomics ...... 49 Objectives ...... 52

2 TRANSCRIPTOMIC ANALYSIS OF EXISTING RNA-SEQ DATA ENABLES THE DISCOVERY OF PLANT VIRUSES ...... 56

Introduction ...... 56 Materials and Methods ...... 57 Source of Transcriptome Libraries ...... 57 Transcriptome Analysis ...... 58 Virus Validation in vitro ...... 60 Primer Design ...... 60 Detection of Viruses ...... 61 Sequence and Phylogenetic Analyses ...... 63

6

Results...... 64 Transcriptome Analysis Led to Virus Identification ...... 64 Validation of a Putative Novel and Known Virus ...... 66 Sequence and Phylogenetic Analysis of Putative Novel and Known Virus ...... 67 Discussion ...... 69 Viral Transcriptome Analysis for Identification of Viruses ...... 69 Detection of a Novel Potyvirus in V. arboreum ...... 71 Detection of BRRV in Vc x Vd ‘Emerald’...... 72 Conclusion ...... 74

3 CHARACTERIZATION OF RNA PLANT VIROMES FROM WILD AND CULTIVATED V. corymbosum IN FLORIDA LEAD TO THE DISCOVERY OF KNOWN AND NOVEL VIRUSES ...... 84

Introduction ...... 84 Materials and Methods ...... 85 Plant Materials ...... 85 Sample Preparation and Generation of V. corymbosum RNA Plant Viromes ...... 85 Analyses of RNA Plant Viromes ...... 86 Sequence and Phylogenetic Analyses ...... 87 Virus Validation and Detection in vitro ...... 87 Results...... 88 General Analyses of the Viromes ...... 88 Diversity of Virus Sequences and Comparison of Viral Populations among the Viromes ...... 89 Identification of Plant Viruses from the Viromes ...... 91 Sequence Comparison and Phylogenetic Analyses of the Complete Viral .....93 Discussions ...... 99 Viral Metagenomics Unraveled Plant Viral Diversity in Wild and Cultivated V. corymbosum in Florida ...... 99 Comparison of Virus Diversity among the Plant Viromes ...... 100 Identification of Plant Viruses from the Viromes ...... 101 Characterization of known viruses in V. corymbosum new to Florida ...... 102 Characterization of a novel in V. corymbosum ...... 105 Conclusion ...... 106

4 CHARACTERIZATION OF DNA PLANT VIROMES FROM WILD AND CULTIVATED V. corymbosum IN FLORIDA ...... 124

Introduction ...... 124 Materials and Methods ...... 125 Plant Materials ...... 125 Sample Preparation and Generation of V. corymbosum DNA Plant Viromes ...... 125 Analyses of DNA Plant Viromes ...... 126 Results...... 128 General Analyses of the DNA Plant Viromes ...... 128

7

Characterization and Comparison of Plant Virus Diversity in the Viromes of Wild and Cultivated V. corymbosum ...... 128 Identification of Plant Viral Sequences from the Viromes ...... 130 Sequence Comparison and Phylogenetic Analyses of a Putative Novel Viral ...... 132 Discussion ...... 134 Conclusion ...... 136

5 SUMMARY ...... 146

APPENDIX

A CHAPTER 2 SUPPLEMENTARY DATA ...... 148

B CHAPTER 3 SUPPLEMENTARY DATA ...... 156

C CHAPTER 4 SUPPLEMENTARY DATA ...... 162

LIST OF REFERENCES ...... 165

BIOGRAPHICAL SKETCH ...... 185

8

LIST OF TABLES

Table page

1-1 Virus species reported in Vaccinium spp. in United States and around the world ...... 54

1-2 Major NGS platforms (Goodwin et al., 2016)...... 55

2-1 BLASTx analysis of V. arboreum and Vc x Vd ‘Emerald ...... 77

2-2 No and percentage of reads aligned to new species of , and BRRV ...... 77

2-3 Detection frequency (%) of total DNA samples from Vc x Vd ‘Emerald’ ...... 79

2-4 Detection of BRRV using RRSV3F/4R primers and back to back primer ...... 80

2-5 Nt length of each ORFs in different BRRV isolates from Florida and other countries .....82

3-1 The number of processed reads and scaffolds, and the percentage (%) of putative plant virus scaffolds for each RNA library corresponding to different sampling sites ...109

3-2 BLASTx analyses of wild V. corymbosum virome from Gainesville...... 113

3-3 BLASTx analyses of wild V. corymbosum virome from High Springs...... 114

3-4 BLASTx analyses of wild V. corymbosum virome from Interlachen...... 115

3-5 BLASTx analyses of wild V. corymbosum virome from Island Grove...... 117

3-6 BLASTx analyses of cultivated V. corymbosum virome from Interlachen...... 117

3-7 BLASTx analyses of cultivated V. corymbosum virome from Island Grove...... 118

3-8 Complete viral genomes assembled from each plant virome ...... 119

3-9 Percentage average reads coverage and pairwise identity of each RNA segment of BlMaV...... 119

3-10 The length of each RNA segment and the encoded ORFs of BlMaV ...... 119

3-11 Nucleotide length of each ORFs in different BRRV isolates ...... 120

3-12 The nucleotide length and pairwise nucleotide comparison of BlVT and PrVT ...... 120

4-1 The number of reads and scaffolds, and the percentage (%) of associated plant virus scaffolds for each DNA library ...... 137

4-2 BLASTx analyses of the viromes of wild V. corymbosum from Gainesville ...... 138

9

4-3 BLASTx analyses of wild V. corymbosum virome from High Springs ...... 139

4-4 BLASTx analyses of wild V. corymbosum virome from Interlachen ...... 140

4-5 BLASTx analyses of wild V. corymbosum virome from Island Grove ...... 140

4-6 BLASTx analyses of cultivated V. corymbosum virome from Interlachen ...... 141

4-7 BLASTx analyses of cultivated V. corymbosum virome from Island Grove ...... 141

A-1 Blueberry root transcriptome libraries from V. arboreum and Vc x Vd ‘Emerald’ ...... 148

A-2 The no. of contigs obtained from de novo assembly of reads from V. arboreum and ‘Emerald’ libraries...... 149

A-3 Details of primer sequences used for the validation of viruses ...... 149

A-4 Pairwise identity of the NIb regions from the de novo assembled scaffold of putative new member in the family Potyviridae from Florida ...... 151

A-5 Whole genome nucleotide alignment of BRRV isolates ...... 153

B-1 Primer designed based on the de novo assembled complete genome of a putative novel Tepovirus ...... 156

B-2 Pairwise identity of BBLV isolates from Florida and other regions ...... 156

B-3 Pairwise identity of NP gene of BlMaV isolates from Florida and other regions ...... 157

B-4 Pairwise identity of complete genomes of BRRV isolates ...... 157

B-5 Pairwise identity of the RdRp of BlVT and members in the family . ....158

B-6 Pairwise identity of the CP of BlVT and members in the family Betaflexiviridae...... 159

C-1 Pairwise identity of Rep of BG-1, BG-2 and members in the family ...... 162

10

LIST OF FIGURES

Figure page

2-1 Transcriptome analysis pipeline used for data mining of viral sequences...... 75

2-2 Location of primers on the de novo assembled virus scaffold ...... 76

2-3 Reads coverage, expression level of transcripts from each CDS, region with high coverage and SNPs position in each Vc x Vd ‘Emerald’ ...... 78

2-4 Schematic representation of the in silico assembled BRRV genome ...... 80

2-5 Evolutionary analysis of the NIb region from the NP scaffold ...... 81

2-6 The evolutionary relationship between BRRV isolates ...... 82

3-1 Map showing the locations of the collected blueberry samples ...... 107

3-2 Virome analysis pipeline used for identification of viruses ...... 108

3-3 Viral populations in the viromes of wild and cultivated V. corymbosum ...... 110

3-4 Comparison of viral populations among the viromes of V. corymbosum ...... 111

3-5 Pairwise comparison and phylogenetic analysis of the respective viruses ...... 120

4-1 Viral populations in the DNA viromes of wild and cultivated V. corymbosum ...... 137

4-2 Genome organization of BG-1 and BG-2 ...... 142

4-3 Pairwise comparison of Rep of BG-1, BG-2 and members in the family Geminiviridae ...... 143

4-4 Alignment of the putative Reps of BG-1, BG-2 and members representing different genera in the family Geminiviridae ...... 144

4-5 Unrooted phylogenetic tree of members in the family Geminiviridae ...... 145

A-1 The number and percentage (%) of contigs produced by de novo assembly of reads .....150

A-2 Gel electrophoresis of the PCR optimization and amplification of the NP scaffold ...... 153

A-3 Gel electrophoresis of the DNA samples from ‘Emerald’ for BRRV detection...... 154

A-4 Gel electrophoresis showing amplification of full length of BRRV...... 154

A-5 Gel electrophoresis showing PCR using back to back and abutting primer to obtain full length of BRRV ...... 155

11

B-1 PCR to obtain the CP region of the putative new Tepovirus ...... 160

B-2 Symptoms on samples of V. corymbosum from Island Grove ...... 161

12

LIST OF ABBREVIATIONS aa amino acid

Acc. no Accession number

BLAST Basic local alignment search tool

CDD Conserved Domain Database

CDS coding sequences

CDS Conserved Domain Search

CP coat protein dsDNA double-stranded DNA dsRNA double-stranded RNA

FAOSTAT Food and Agriculture Organization of the United Nations

Gb Gigabase

MP movement protein mRNA messenger RNA

NCBI National Center for Biotechnology Information

NCR non-coding region

NGS Next-generation sequencing

NIb Nuclear inclusion body nt nucleotide

ORF Open reading frame

RdRp RNA dependent RNA polymerase

Rep Replicase

RNA-seq RNA sequencing rRNA ribosomal RNA

RT-PCR reverse transcription polymerase chain reaction

13

SHB Southern highbush blueberry sRNA small RNA ssDNA single-stranded DNA ssRNA single-stranded RNA

TAV translational transactivator

USDA United States Department of Agriculture

UTR untranslated region

14

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

DISCOVERY OF KNOWN AND NOVEL VIRUSES IN WILD AND CULTIVATED BLUEBERRY THROUGH TRANSCRIPTOMIC AND VIRAL METAGENOMICS APPROACHES

By

Norsazilawati Saad

December 2017

Chair: Philip F. Harmon Major: Plant Pathology

In this study, transcriptomic and viral metagenomics approaches were carried out to explore: 1) virus diversity; and 2) discover known and novel viruses in wild and cultivated blueberry in Florida. The initial study involved transcriptomic analysis of existing blueberry root transcriptomes from wild and cultivated species Vaccinium arboreum and V. corymbosum, respectively. This study confirmed the presence of Blueberry red ringspot virus (BRRV), and one putative new virus species in blueberry, probably belong to the family Potyviridae.

In addition to the transcriptomic analysis study, a viral metagenomic approach was utilized to characterize virus diversity in wild and cultivated blueberries, V. corymbosum, in

Florida. RNA and DNA plant viromes from V. corymbosum collected from six locations in

Florida were generated by Illumina HiSeq 2500 sequencing and analyzed using a virome analysis pipeline. Based on the RNA virome analyses, both viromes of wild and cultivated V. corymbosum produced sequences with similarities to plant virus species from a diverse range of families, including , , , Geminiviridae,

Ophioviridae, , . In the RNA viromes, four complete de novo assembled viral genomes, including Blueberry latent virus (BBLV), Blueberry mosaic associated

15

virus (BlMaV), BRRV, and a new species in the genus Tepovirus were identified. Similarly, in the DNA viromes, sequence similarity to plant virus species from three virus families, including

Caulimoviridae, Geminiviridae and , representing a total of nine virus genera.

Additionally, three complete de novo assembled viral genomes were recovered from the DNA viromes, including BRRV and two putative novel virus species probably belonging to the family

Geminiviridae.

This work resulted in several first reports of known and novel viruses in blueberry in

Florida. The RNA virome analyses led to the first report of BBLV, BlMaV, and BRRV in blueberry in Florida. These analyses also resulted in the first report of a novel virus species in the genus Tepovirus in blueberry. Although further validation and infectivity assays are needed, the discovery of two putative novel virus species closely related to the geminiviruses from the DNA viromes has marked the first report of a single-stranded DNA virus associated with blueberry.

16

CHAPTER 1 LITERATURE REVIEW

Introduction to Blueberry

The genus Vaccinium, in the Ericaceae or Heath family, is organized into sections comprised of important fruit crops including blueberries. Blueberries are mostly derived from the section Cyanococcus, while cranberries from Oxycoccus, and lingonberries from Vitis-idaea

(Stevens, 1969; Luby et al., 1991). Blueberries are produced worldwide in 28 countries with the

United States being the largest, producing over 330,000 tons of cultivated and wild blueberries in

2015 (USDA, 2016; FAOSTAT, 2017). Other major producers of blueberries in the world include Canada, Poland, Germany and Mexico (FAOSTAT, 2017). The United States and

Canada accounted for almost 80% of the global blueberry exports from 2008-2010 (Evans &

Ballen, 2014). According to the blueberry statistic data in 2015, the 5 leading states of cultivated blueberry producers in United States are California, Georgia, Michigan, Oregon, and

Washington (USDA, 2016). Although the total cultivated blueberry production in Florida is low compared to other states, the blueberry breeding group at the University of Florida have released southern highbush blueberry (SHB) cultivars that produce the earliest ripening blueberries in

North America. These berries are harvested in late April to May when few berries are available, thus demanding the highest market price per pound of berries (Williamson et al., 2012; USDA

2016).

The three species of blueberries in the section Cyanococcus produced commercially include highbush (V. corymbosum L.), lowbush (V. angustifolium Ait), and rabbiteye (V. virgatum Aiton) (Hancock et al., 2008; Ballington, 2008). Highbush is the most popular commercially cultivated blueberry in the world, being produced in Argentina, Australia, Canada,

Chile, New Zealand, United States, and several European countries (Strik, 2005; Strik and

17

Yarborough, 2005). The northern parts of the United States (Michigan, New Jersey, North

Carolina, Oregon, and Washington) produce northern highbush blueberry (V. corymbosum). The southern United States (Florida, Georgia and southern California) predominantly grow southern highbush blueberry (interspecific hybrids of V. virgatum, V. corymbosum, and V. darrowii

Camp) (Hancock et al., 2008; Williamson et al., 2012). The main commercial blueberry areas in

Florida include north-central counties (Alachua, Lake, Marion, Putnam, and Sumter), central counties (Hernando, Hillsborough, Orange, Pasco, and Polk), and south-central counties (Desoto,

Hardee, Highlands, Manatee, and Sarasota). The north-central region accounts for 40% of the state blueberry acreage, with the central region accounting for 35%, and the south-central region accounting for 25 % (Williamson et al., 2012; Evans & Ballen, 2014).

Southern Highbush Blueberries (SHB) in Florida

Blueberry cultivars have been bred principally through interspecific hybridization by exploiting species in Vaccinium section Cyanococcus Gray (Coville, 1937; Ballington, 2001). In

1949, the University of Florida started its blueberry breeding program to obtain cultivars with adaptation to the local climate and improved disease resistance in north and central Florida.

Interspecific hybridization was used to select traits from both early ripening high quality fruit

(i.e., northern highbush cultivars) and low-chill wild Vaccinium species native to the southeastern United States (Sharpe & Darrow, 1959; Lyrene, 1997). Wide hybridization of interspecific crosses between the northern highbush blueberry cultivars with the lowbush blueberry, rabbiteye blueberry, and wild highbush blueberry native to the southeast Georgia and northeast Florida (Alachua county and central Florida) consequently produced the low-chill SHB cultivars (Lyrene, 2002). The use of microsatellite markers showed that seven Vaccinium species from section Cyanococcus, (e.g., V. angustifolium, V. constablaei Gray, V. corymbosum, V.

18

darrowii, V. elliottii Chapman, V. tenellum Ait., and V. virgatum) constitute the current genetic base of cultivated SHB (Brevis et al., 2008).

In 1976 and 1977 following the establishment of the southern highbush blueberry breeding program at the University of Florida, Professor Ralph Sharpe released three cultivars,

‘Sharpblue’, ‘Flordablue’, and ‘Avonblue’ (i.e., intercrosses between V. corymbosum and (V. darrowi x V. virgatum)) (Lyrene & Williamson, 1997; Williamson & Lyrene, 2004). ‘Gulfcoast’ and ‘Misty’ were later introduced in the mid to late 1980s (Williamson & Lyrene, 2004). During the mid to late 1990s, the blueberry breeding program released improved cultivars comprising of

‘Emerald’, ‘Star’, and ‘Jewel’. These three cultivars have been widely grown in Florida alongside other minor cultivars such as ‘Windsor’, ‘Springhigh’, ‘Primadonna’, ‘Snowchaser’, and ‘Sweetcrisp’ (Williamson et al., 2012).

In addition to the use of Vaccinium species from section Cyanococcus in the Florida blueberry breeding program, V. arboreum (sparkleberry), in the section Batodendron, has also been utilized to develop SHB cultivars. These cultivars have broader soil adaptation and are suitable for machine harvest. This research resulted in the release of a new commercial cultivar called ‘Meadowlark’ in 2009. (Lyrene, 1997; Olmstead et al., 2013).

New and Emerging Virus Diseases of Blueberry

Vaccinium spp. are exposed to existing and emerging viruses as a result of expanding acreage of blueberry across the world, primarily in North America (Martin et al., 2009; 2012).

Since blueberry is cultivated in areas where there are wild Vaccinium spp., there is increasing risk of virus movement between wild and cultivated blueberries. This is theoretically possible because viruses can move from commercial cultivars to native species as well as from wild species to cultivated blueberries, causing the spread of existing and new viruses. The cost for

19

cultivation and production of a perennial crop such as blueberry is significant, hence viral diseases in this crop can be economically devastating (Martin et al., 2012).

A number of viruses have been reported in Vaccinium spp. To date, blueberry is a host to

15 species of viruses from eight known and two unassigned genera (Table 1-1) (Caruso and

Ramsdell, 1995; Martin et al., 2009; 2012; Thekke-Veetil et al., 2014; Woo and Pearson, 2014a;

2014b). Viral diseases usually produce a range of virus-like symptoms on plants with a different scale of severity (i.e., mild to severe), and can result in plant death. It is also possible for viruses to cause no symptoms. Variation in the virus symptoms can be influenced by multiple factors such as the production systems, locations, and the type and age of the cultivars (Martin et al.,

2012).

Blueberry red ringspot virus

Blueberry red ringspot virus (BRRV), a causal agent for red ringspot disease in blueberry, is a paratretrovirus that belongs to the genus Soymovirus in the family Caulimoviridae

(Gillet and Ramsdell, 1988; Geering and Hull, 2012). BRRV has an 8.3 kb circular double- stranded DNA genome encapsidated in a nonenveloped, icosahedral particle with a diameter of

42-46 nm that can exist as a virion or form inclusion bodies in the nucleus or cytoplasm, respectively (Kim et al., 1981; Glasheen et al., 2002). Members in the genus Soymovirus have a genome that encodes for 8 proteins with discontinuities in both the transcribed and non- transcribed strand (Figure 2.1). These gaps are sealed upon infection of the virus into the host cell. The closed dsDNA is then transcribed into mRNA in the nucleus by host DNA-dependent

RNA polymerase. This serves as a template for synthesis of viral proteins and reverse transcribed into new copies of dsDNA genomes. New virions are released following encapsidation of the new dsDNA genomes (Hohn & Richert-Poeggeler; 2006).

20

Red ringspot disease originally was described in New Jersey with associated symptoms observed on highbush blueberry in the 1950’s (Hutchinson & Varney, 1954) and since has rapidly expanded to other states in the US (Martin et al., 2012). Symptoms are usually seen in late summer and early fall on older leaves as red blotches resulted from the coalescence of round red spots. Also common is the appearance of pale green lesions surrounded by red rings with a diameter of 2-3 mm and 5-15 mm on leaves and stems, respectively (Cline et al., 2009). The red spots on leaves is a typical disease diagnostic characteristic that is commonly observed on the upper leaf surface, but both sides of the leaves can be symptomatic depending on cultivar.

Sometimes the red rings can also be visible on ripening fruit but disappear as the fruit ripens.

Infected fruits can also become distorted and unmarketable, such as in the case of cultivar

‘Ozarkblue’ (Martin et al., 2012).

Reliable diagnostic tests are available for BRRV. Testing commonly involves conventional polymerase chain reaction (PCR). BRRV specific primers from New Jersey are used since no reliable detection is available for a routine enzyme linked immunosorbent assay

(ELISA) test (Polashock et al., 2009), and it is not readily sap-transmissible (Caruso & Ramsdell,

1995). Another test used to validate BRRV is visualization of virus particles or inclusion bodies by doing transmission electron microscope (TEM) on the infected plant. Virus particles will appear as icosahedral symmetry shape with 42-46 nm diameter without envelope. However, PCR is routinely used since TEM is more time consuming and requires special equipment and skill.

To date the vector for BRRV and other members in Soymovirus genus remain unknown, though the red ringspot disease can be transmitted through grafting and softwood cuttings

(Hutchinson and Varney 1954; Holland et al., 2013). Infected plants used in propagation can be the source of virus spread due to symptoms variation within cultivars from softwood cuttings and

21

undetectable symptoms on hardwood cuttings (Martin et al., 2012). Although aphids and mealybugs are proposed to be responsible for BRRV transmission, there is a lack of experimental or other existing evidence to support this assumption (Polashock et al., 2009). The unsuccessful identification of BRRV vector suggests the likelihood of vegetative propagation as the mode of virus spread although this situation has restricted the epidemiology of red ringspot disease, thus confounding the control of BRRV. It was reported recently that BRRV in the southeastern United States does not cause significant yield loss due to the relatively benign infection in southern highbush blueberry cultivar ‘Star’ and ‘Jewel’, while surprisingly may cause early ripening of berries in ‘Star’ (Williford et al., 2016).

Nepoviruses

Members of the genus Nepovirus from the family are the most dominant viruses that are known to infect Vaccinium spp. (Table 1-1). The genomes of virus species in the

Nepovirus genus are comprised of two positive-sense, single-stranded RNA molecules that are separately encapsidated in isometric particles, of which both are required for infectivity (Le Gall et al., 2005; Eastwell et al., 2012). In addition to serological relationships, nepoviruses are further assigned into three subgroups (A, B and C) based on the length and arrangement of

RNA2 as well as their sequence relatedness (Fauquet et al., 2005). Both M and B components of subgroup A contain RNA2 with Mr±(1.3–1.5)×106, while only M component of subgroup B and

C contains RNA2 with Mr±(1.4–1.6)×106 and Mr±(1.9–2.2)×106, respectively (Le Gall et al.,

2005; Digiaro et al., 2005). Species demarcation criteria for the members in the family

Secoviridae is based on 25% and 20% amino acid sequence divergent in the CP and Pro-Pol region, respectively, defined by the conserved CG and GDD motifs (Sanfacon et al., 2009).

While the majority of nepoviruses that infect blueberry belongs to subgroup C, there is no complete genome sequence obtained from blueberry. Only three complete genome sequences are

22

available for the members of this group, which are: Tomato ringspot virus (ToRSV) (Rott et al.,

1991; 1995); Blackcurrant reversion virus (BRV) (Latvala-Kilby and Lehto, 1999; Pacot-Hiriart et al., 2001); and Grapevine Bulgarian latent virus (GBLV) (Elbeaino et al., 2011).

Blueberry latent spherical virus. Blueberry latent spherical virus (BLSV) was the first

Nepovirus isolated from asymptomatic highbush blueberry in Japan (Isogai et al., 2012). The virus belongs to subgroup C nepoviruses based on the genome organization. It shares closest sequence similarity to Peach rosette RdRp (57%) and Apricot latent ringspot virus

CP (43%) (Isogai et al., 2012). Graft inoculation of six blueberry cultivars with BLSV failed to produce any symptoms, although reverse transcription-polymerase chain reaction (RT-PCR) was able to detect this virus in all of them (Isogai et al., 2012). Chenopodium quinoa Willd. quinoa,

Luffa cylindrical (L) Roem, and Nicotiana benthamiana produced systemic symptoms following mechanical inoculation of BLSV to these herbaceous hosts. This illustrated that the virus is associated with latent infection of blueberry (Isogai et al., 2012). Since this virus is a relatively new species in the genus Nepovirus, the transmission mode and epidemiology of this virus are yet to be resolved (Isogai et al., 2012).

Blueberry leaf mottle virus. Ramsdell and Stace-Smith (1979) described Blueberry leaf mottle virus (BLMoV) as an agent causing mottling and distortion on leaves of highbush blueberry. Symptoms were first observed in 1977 in Michigan. The virus also appeared to be occurring in grapevine in New York, but was reported by other group of researcher as a strain of

Grapevine Bulgarian latent virus (GBLV) at that time based on their distant serological relationship (Ramsdell & Stace-Smith, 1979). However, the perplexity between these two serologically distantly related viruses has been clarified following characterization of GBLV genome, which has clearly shown that they are distinct species belonging to the same subgroup C

23

of Nepovirus (Elbeaino et al., 2011). Further partial genome sequence of RNA1 and RNA2 of

BLMoV revealed highest CP similarity with CLRV and Tomato ringspot virus (ToRSV). This confirms that this virus belongs to the subgroup C of Nepovirus (Bacher et al., 1994a). The sequence of 3’ non-coding regions (NCR) in RNA 1 and RNA2 of BLMoV are almost similar with only four nucleotide differences (Bacher et al., 1994b).

Although BLMoV is a member of nematode transmitted genera, it is spread randomly by honeybees through an infected pollen and possibly by seeds (Childress & Ramsdell, 1986; 1987).

Depending on the cultivars, infected blueberry bushes display different degrees of symptom severity, with symptoms in the cultivar ‘Rubel’ being the most severe (Caruso & Ramsdell,

1995). Sandoval et al. (1995) found the presence of BLMoV in commercial blueberry fields as well as in wild Vaccinium spp. bushes surrounding the field, suggesting virus movement between the cultivated and wild neighboring areas. Infected blueberry tissues can be tested for BLMoV by using available commercial ELISA kit or RT-PCR (Martin et al., 2012). However, these tests are not completely reliable because of the lack of virus population structure data for BLMoV.

The occurrence of BLMoV is restricted to North America, thus confounding the plant quarantine and certification programs (Martin et al., 2012).

Cherry leaf roll virus. In 1955, a disease caused by Cherry leaf roll virus (CLRV) was recorded in sweet cherry (Prunus avium L.) in England (Cropley, 1961). Following its first discovery, the virus was then reported to occur in other regions in Europe, North America and other parts of the world, infecting a variety of herbaceous and woody plants, including fruit trees of important horticultural crops (Rebenstorf et al., 2006; Eastwell, 2010). CLRV can significantly influence agricultural sector due to its ability to infect a broad range of hosts in different regions, particularly affecting cash crop production. In contrast with other nematode-

24

transmitted nepoviruses, the vector of this virus is still unidentified although it is known to be readily transmitted via mechanical inoculation and naturally by seed or pollen (Wang et al.,

2002; Eastwell et al., 2012).

Recently, the first occurrence of CLRV in blueberry species of southern highbush

(Vaccinium darrowii cv. Jubilee 83) and its complete genome sequence was reported in New

Zealand, along with other isolates from different hosts (Woo et al., 2013; Woo & Pearson,

2014a). These virus isolates have a long 3’ non-coding regions (~1.5kb) that are conserved between the ~8 kb RNA1 and ~7 kb RNA2 genome, a feature also displayed in isolates from cherry and rhubarb (Woo & Pearson, 2014a). CLRV is thus confirmed to be closely related to other members of subgroup C Nepovirus based on phylogenetic analysis of the CP and Pro-Pol regions (Eastwell et al., 2012; Woo & Pearson, 2014a).

Peach rosette mosaic virus. Peach rosette mosaic virus (PRMV) was initially identified as a causal agent for causing rosette mosaic of peaches (Prunus persica) in the 1970’s before being reported in grape (Vitis labrusca cv. Concord) and later in highbush blueberry (Dias &

Cation, 1976; Ramsdell & Myers, 1974; Ramsdell & Gillet, 1981). The geographical distribution of PRMV is limited to Michigan, Ontario, and New York (Martin et al., 2012). PRMV was described to only infect ‘Jersey’ and ‘Berkeley’ blueberry cultivars, where they were planted at a vineyard near a PRMV-infested site in Michigan (Ramsdell & Gillet, 1981). Leaves of PRMV- infected bushes are malformed, distorted and unevenly distributed throughout the plant. The extent of yield losses caused by PRMV in blueberries is unknown, yet it is destructive in peach and grape (Dias & Cation, 1976).

Recognized as a soil-borne virus, PRMV is transmitted by two species of nematodes,

Xiphinema americanum Cobband and Longidorus diadecturus Eveleigh & Allen (Allen et al.,

25

1984). PRMV can be mechanically transferred to Chenopodium spp. and by grafting to peach and grapevine (Dias et al., 1975). A complete genome sequence of PRMV confirmed its status in subgroup C of Nepovirus genus. (Lammers et al., 1999; Sanfacon et al., 2011). Serological and

RT-PCR are common assays to detect PRMV, while under certain condition indexing virus by

Chenopodium quinoa provides a more reliable detection than ELISA (Ramsdell et al., 1979).

However, detection outcomes should be carefully interpreted since these methods are developed based on solely one virus isolate (Martin et al., 2012).

Tobacco ringspot virus. In the early 1960s, a necrotic ringspot disease associated with

Tobacco ringspot virus (TRSV) in blueberry was described in New Jersey (Varney et al., 1960;

Lister et al., 1963). The disease was then reported in six other areas in the United States (e.g.,

Arkansas, Connecticut, Illinois, Michigan, New York, Oregon) and two other countries, Canada

(e.g., New Brunswick) and Chile (Converse & Ramsdell, 1982; Jaswal, 1990; Caruso &

Ramsdell, 1995; Medina et al., 2006; Fusch et al., 2010). TRSV-infected blueberry causes unproductive bushes while symptom expression varies depending on the cultivars (Caruso &

Ramsdell, 1995). Cultivars that are susceptible to TRSV include ‘Collins’, ‘Concord’,

‘Pemberton’, ‘Rubel’, and ‘Stanley’, while none has been reported in rabbiteye or lowbush blueberry (Caruso & Ramsdell, 1995).

The nematode X. americanum, is the vector for transmission of TRSV in blueberry and is also seed transmitted in some other weeds and crops (Stace-Smith, 1985). Plant sap containing

TRSV can be mechanically inoculated to a wide range of herbaceous hosts (Stace-Smith, 1985).

Based on the serological relationship and presence of RNA2 in both the M and B component of virus particles, TRSV is considered as a distinct member of Subgroup A Nepovirus. Similar to the previously discussed nepoviruses, TRSV can be detected by RT-PCR or commercial ELISA

26

kit. Primers have been developed for simultaneous detection of different grapevine-infecting nepoviruses subgroups (Digiaro et al., 2007). However, the detection of both TRSV and ToRSV are sometimes problematic because these viruses are unevenly distributed in different parts of plant tissues (Fuchs et al., 2010).

Tomato ringspot virus. Tomato ringspot virus (ToRSV) was first reported in 1972 in blueberry (Caruso & Ramsdell, 1995), but was only restricted to the highbush type planted in

Washington, New York, Oregon, Pennsylvania, Canada, and Chile (Converse & Ramsdell, 1982;

Jaswal, 1990; Medina et al., 2006; Fuchs et al., 2010). It was determined that the most susceptible highbush cultivars to ToRSV were ‘Berkeley’, ‘Earliblue’, Pemberton’, and ‘Stanley’ based on an observation in the 1980s in Oregon (Caruso & Ramsdell, 1995). ToRSV-infected bushes produced comparable symptoms of necrotic ringspot caused by TRSV, with some variation in symptom severity observed between cultivars. Differentiation of these two viruses can be done by using nucleic acid probes or ELISA since they are serologically unrelated

(Martin et al., 2012).

Like TRSV, ToRSV is vectored by nematodes consists of Xiphinema spp. (Converse &

Ramsdell, 1982; Forer & Stouffer, 1982). It is readily transmissible by sap inoculation to a variety of herbaceous species (Stace-Smith, 1985). Like other nepoviruses-infecting blueberries,

ToRSV belongs to subgroup C of genus Nepovirus based on the almost inseparable middle and bottom components of its nucleoprotein, as well as the presence of high M. Wt RNA2 (Stace-

Smith, 1984). Complete sequences of RNA1 and RNA2 of ToRSV further validate the subgroup clustering (Rott et al., 1991; Sanfacon et al., 2011). Serological methods, such as ELISA, can easily detect ToRSV in infected blueberry tissues, since there are commercially available antibodies against its coat protein.

27

Strawberry latent ringspot virus

In 1964 in Scotland, wild and cultivated rosaceous species (e.g., black currant, cherry, plum, raspberry, and strawberry) were found to be naturally infected with Strawberry latent ringspot virus (SLRSV) (Lister, 1964). Besides its occurrence throughout the European countries, SLRSV has been reported in other continents around the world including Asia,

Oceania, and North America (Murant et al., 1974; CABI, 2003). Small fruit, stone fruit, vegetables, ornamentals (roses), grapevine, and olive were among the main crops infected with

SLRSV (Schmelzer, 1969; Tang et al., 2013). SLRSV in raspberry and strawberry is often symptomless, although it can also cause varying degrees of decline and mottling. Other symptoms include chlorotic ringspot and stunting in rose, line pattern of red horse-chestnut, strap leaf of celery, yellow mottle of common spindle, and mosaic of black locust (Murant et al.,

1974). SLRSV is seed-borne in some plant species such as raspberry and celery, but it is mainly transmitted by the nematodes, Xiphinema diversicaudatum and X. coxi (Murant et al., 1974).

SLRSV has not been reported in blueberry until recently in New Zealand, where it was detected by Woo and Pearson (2014b) in V. darrowii, a new host of SLRSV. Based on the phylogenetic analysis of the CP region, three of the SLRSV isolates from New Zealand form a cluster with those isolates from North America, including one strawberry isolate from the United

States. This suggests that the virus arrived in New Zealand via a single introduction event (Woo

& Pearson, 2014b). SLRSV is currently in an unassigned genus of the family Secoviridae after previously being placed in the genus Nepovirus and Sadwavirus (Mayo, 2005; Tzanetakis, 2006;

Sanfaçon et al. 2011).

Blueberry mosaic associated virus

Although at first mosaic disease of blueberry was thought of as a physiological disorder, it was shown to be associated with viruses in the 1950s because of its graft-transmissibility

28

attribute (Varney, 1957). Since its first report, the disease was later described in different regions of North and South America, and other parts of the world such as Asia, Europe, New Zealand, and South Africa (Martin et al., 2009; Martin et al., 2012; Isogai et al., 2016). In the United

States, mosaic of blueberry has been recorded in blueberry cultivated areas in Indiana, Michigan,

New Jersey, New York, Oregon, Washington, and recently in Kentucky (Ramsdell and Stretch,

1987; Gauthier et al., 2015). Mosaic disease of blueberry is most common in highbush cultivars of ‘Bluecrop’, ‘Cabot’, ‘Concord’, ‘Earliblue’, ‘Jersey’, ‘Pioneer’, ‘Rubel’, and ‘Stanley’. This disease has not been observed on other Vaccinium spp. other than highbush and occasionally in

V. pallidum, a lowbush dryland blueberry (Ramsdell and Stretch, 1987; Caruso & Ramsdell,

1995). Blueberry bushes with mosaic diseases exhibit bright yellow to yellow-green colorations on leaves, giving rise to mosaic and mottling patterns that occasionally turn pink. Symptoms can develop as patches or may be well distributed throughout the infected bush, transiently occurring depending on the time of year (Ramsdell & Stretch, 1987). Although there is a lack of data on the economic importance of blueberry mosaic disease, infected bushes were reported to have yield reduction, besides producing poor quality berries with delayed maturity (Caruso &

Ramsdell, 1995).

The causative agent associated with mosaic disease of blueberry has not yet been confirmed due to difficulties in characterizing the agent itself (Ramsdell and Stretch, 1987).

However most recently, a new virus designated as Blueberry mosaic associated virus (BlMaV) has been detected in mosaic-diseased blueberries as well as asymptomatic plants from North

America. It was suggested as a putative causal agent of blueberry mosaic disease (Thekke-Veetil et al., 2014). Based on the phylogenetic analysis of the RdRp region, BlMaV is proposed as a new member of the genus Ophiovirus, the only genus in the family Ophioviridae. BlMaV has a

29

close association to Citrus psorosis virus (CPsV) within Ophiovirus genus, based on the phylogeny clustering and genome organization (Thekke-Veetil et al., 2015). Nucleocapsids of the ophiovirus virions are naked and flexuous, with a diameter of about 3 nm, forming kinked circles of at least two different contour lengths, the shortest being about 760 nm (Milne et al.,

2011). In the case of BlMaV, the genome consists of three segments of negative strand ssRNA

(RNAs 1–3), encoding for four proteins on the viral complementary strand. RNA1 has two

ORFs expressing a 272 kDa RdRp and a 23 kDa protein of unknown function. A 58 kDa movement protein (MP) and 40 kDa nucleocapsid protein (NP) are expressed from the ORFs on

RNAs 2 and 3, respectively (Thekke-Veetil et al., 2014).

Similar to its closest relative, CPsV, the natural vector of BlMaV is still unknown although other ophioviruses are transmitted via fungal spores, suggesting that the BlMaV could have a similar soil-borne vector (Milne et al., 2003). The practice of asexual propagation in blueberry cultivation can provide a means for virus spread through propagation of infected stock in producing nursery plants. Hence, a fast and reliable detection assay needs to be developed following the virus characterization for BlMaV screening (Thekke-Veetil et al., 2014).

Blueberry latent virus

Blueberry latent virus (BBLV) was inadvertently discovered when a new disorder known as blueberry fruit drop disease was observed in the Pacific Northwest (Oregon, Washington and

British Columbia) in the early 2000s (Martin et al., 2009; 2011). Although it was later found that blueberry fruit drop was not related to BBLV, the virus was further characterized due to its prevalence in the preliminary survey (Martin et al., 2011). This led to the isolation of a 3.5 kb dsRNA molecule, which belonged to a virus currently recognized as BBLV (Martin et al., 2011).

The genome organization of BBLV is similar to Southern tomato virus (STV) which contains two partially overlapping ORFs encoding a replicase and an unknown protein (Martin et

30

al., 2011; Sabanadzovic et al., 2009). While the genome organization of BBLV is analogous to the viruses in the family , its RdRp was found to be closely associated to the members of the Partitiviridae family (Martin et al., 2011; Krupovic et al., 2015). Thus, BBLV is currently placed in a new genus Amalgavirus in the family Amalgaviridae, along with STV as the type species (Adams et al., 2014).

Geographical distribution of BBLV is broad, as it was present in non- and symptomatic plants from Arkansas, Michigan, New Jersey and the Pacific Northwest, as well as blueberry germplasm originated from North America (Martin et al., 2011; 2012). Comparison of partial and complete sequences of BBLV isolates from Japan and the US indicated that this virus has a very stable population structure, with less than 0.5% diversity among isolates from the US and

Japan (Isogai et al., 2011; Martin et al., 2011; 2012). BBLV has an efficient seed transmission in the absence of movement protein, suggesting that it is replicating in its host throughout cell division (Martin et al., 2011). Observation on some BBLV-infected highbush cultivars for almost a decade indicated that the presence of these viruses in blueberries is not a concern due to the absence of symptoms (Martin et al., 2011).

Blueberry scorch virus

In the 1980s, blueberry scorch and Sheep Pen Hill diseases were documented in

Washington and New Jersey, respectively, on highbush blueberries. The diseases were caused by distinct strains of the same virus, a carlavirus, known as Blueberry scorch virus (BlScV)

(Stretch, 1983; Podleckis et al., 1986; Martin & Bristow, 1988; Martin et al., 1991; Cavileer et al., 1994). Virions of BlScV are non-enveloped, flexuous particles (690 nm long x 14 width) composed of a 33,500 kDa capsid protein which encapsidates approximately an 8.5kb positive- sense ssRNA (Martin & Bristow, 1988; Cavileer et al., 1994). There are six ORFs in the BlScV genome. The first ORF expresses a putative polymerase of 223 kDa comprising motifs for

31

methyltransferase, NTP-binding/helicase and RdRp; ORFs 2-4 encode for the triple gene block proteins (25, 12, and 7 kDa) associated with viral movement followed by the CP and cysteine- rich protein expressed via ORF 5 and 6, respectively (Cavileer et al., 1994). According to the organization and sequence of the viral genome as well as serological relationship, the virus was placed in the family , with other members of the genus Carlavirus (Mayo et al.,

2005). Sequence comparisons of BlScV from Washington and New Jersey confirmed that they are distinct strains with more than 10% divergence while sequence analysis with other carlaviruses at the 3’-terminal and CP regions showed that BlScV strains are more closely related to Potato virus S and Lily symptomless virus (Cavileer et al., 1994).

BlScV symptom expression is affected by a combination of different factors such as the season of occurrence, cultivar and virus strain. Full blighting of blossoms, necrosis of young foliage, and stem dieback are among the typical symptoms of BlScV in susceptible cultivars, although some cultivars may appear asymptomatic (Bristow et al., 2000). Moreover, some infected cultivars may develop a red line shape or marginal leaf chlorosis (Martin & Bristow,

1995). Scorched flowers can either remain on the bushes until the following season or instantly drop (Martin et al, 2012). Symptoms appear after a while upon infection, indicating a latent period in the disease development before being distributed throughout the whole plant (Martin &

Bristow, 1995). The disease can cause yield reduction in some cultivars and eventually kill the plant, such as the case of ‘Berkeley’, while others can remain productive for some times

(Bristow et al., 2000; Martin et al, 2012). Incidence of blueberry scorch disease is reported in northern highbush commercial plantings, with more than 15 cultivars being found to be susceptible to BlScV (Bristow et al., 2000).

32

In addition to the United States (e.g., Connecticut, Massachusetts, Michigan, New Jersey,

Oregon, and Washington) and Canada (Martin et al., 2006), the incidence of BlScV has also been documented in the Netherlands (Ciuffo et al., 2005), Italy (Moretti et al., 2011), Poland

(Paduch-Cichal, 2011) and Germany (Richert-Pöggeler, 2013). Aphid Ericaphous fimbriata has been shown to inefficiently transmit BlScV in a non-persistent manner, although its significant role in natural disease spread is unknown (Bristow et al., 2000; Martin et al., 2009). In addition to the vector transmission, BlScV can be graft-transmitted to several half-high and southern blueberry cultivars (Bristow et al., 2000). Lawrence (1994) had demonstrated that it is possible to mechanically transmit the virus via infectious transcripts.

Diagnostic techniques based on serology or nucleic acid for BlScV detection are necessary due to the unreliable protocol of using host indicators and the absence of symptoms in some plants (Martin, 2006). Double antibody sandwich-ELISA (DAS-ELISA) is the most common and inexpensive technique for detecting and identifying viruses and has been used by

Wegener et al. (2006) for mass-detection of BlScV in blueberries (Paduch-Cichal, 2014).

Nonetheless, RT-PCR method provides higher sensitivity for BlScV detection since the source of tissue and sampling date affected the outcomes of DAS-ELISA (Wegener et al. 2006). Since it has been documented that blueberry scorch disease can cause serious yield loss, it is important to control the virus spread by including symptomless mother plants for virus testing as BlScV can be distributed through infected nursery stock (Oudemans et al., 2011).

Blueberry shock virus

Blueberry shock virus (BlShV), was at first confused with BlScV symptoms on blueberry during its earliest occurrence in 1980 in Washington (MacDonald et al., 1991). BlShV-infected blueberry produced second flush of foliage after blooming and fewer berries are produced during late summer although the plants looked normal, while other symptoms are similar to BlScV

33

(Martin & Bristow, 1995). It was also observed that after 1-3 years the flower and fruit of the infected blueberries were developing as usual without any additional symptoms (Bristow et al.,

2002). Incidence of BlShV has been recorded in the Pacific Northwest (e.g., Oregon,

Washington and British Columbia), California, Nova Scotia, Canada, Pennsylvania, New York, and Michigan (Martin et al., 2012).

BlShV is primarily transmitted through pollen (Bristow & Martin, 1999). Although there is a low level of BlShV transmission through seed, BlShV is pollen-borne like other ilarviruses.

In ilaviruses, the primary mechanism of BlShV transmission appears to be the transfer of BlShV- contaminated pollen by honeybees from flowers on infected plants to flowers on healthy plants

(Bristow & Martin, 1999). The virions of BlShV are nonenveloped, quasi-spherical of approximately 26-29 nm in diameter, composed of 180 CP subunits with a MW of about 27 kDa for each subunit (MacDonald et al., 1991). BlShV has a segmented, tripartite genome of plus sense ssRNA. Serological test by indirect ELISA demonstrated that the virus is distantly related to Prunus necrotic ringspot virus (PNRSV) and Apple mosaic virus (ApMV), members of subgroup 3 in the genus Ilarvirus (MacDonald et al., 1991). Hence, based on the physicochemical virus properties, BlShV is placed in the genus Ilarvirus. In the Pacific

Northwest, BlShV can be detected using ELISA or RT-PCR in buds early in the season and in leaf tissue as the season progresses until August (Martin et al., 2012). Substantial yield loss caused by BlShV-infected blueberry may be temporary if there is a combination between disease recovery and productivity recurrence (Bristow & Martin, 1999). Some fraction of the infected berries can be reproduced as demonstrated by Bristow and Martin (2002).

Management of BlShV involved removal of infected plants to reduce virus spread when virus infection has been established in the field. However, this strategy would not be able to fully

34

inhibit the emergence of BlShV because they are pollen-borne (virus is transmitted via pollens before symptoms develop) and are unequally distributed in blueberries during their early infection period. Since it may require 4-6 years for replanting to establish complete production after removal of an infected field, the more economical option is to let the virus run its course, as recommended in the Pacific Northwest (Martin et al., 2012). Due to these reasons, the best practice would be to prevent the introduction of BlShV on nursery stock into new planting locations (Martin et al., 2012).

Blueberry virus A

Recently, Isogai et al. (2013) has described a full genome of a new closterovirus,

Blueberry virus A (BVA), which is dsRNA, was isolated from a highbush blueberry cultivar,

‘Spartan’, in Japan. Although at first the virus was associated with leaf yellowing of blueberry, later study using graft transmission showed that BVA causes latent infection in blueberry. The

BVA genome consists of an approximately 17 kbp ssRNA plus sense molecule; 10 open reading frames; ORF 1a encodes a 338-kDa protein containing motifs of papain-like proteases, MT and

HEL domains; ORF 1b contains RdRp domain; ORF 3 contains heat shock protein 70 homolog

(HSP70h); ORF 4 encodes for a putative 60 kDa protein; ORF 5 encodes for a putative 23 kDa major CP; ORFs 6-9 encode proteins with unknown functions due to non-similarity to other virus proteins, which is a signature of closteroviruses (Isogai et al., 2013). Putative MT, HEL, RdRp,

HSP70h, and CP proteins of BVA has highest similarity but more than 10% divergence with other members in Closterovirus genus. BVA was proposed as a new virus species of the family

Closteroviridae because it failed to form a clade with other closteroviruses based on the phylogenetic analysis of the RdRp, HSP70h, and CP. Mechanical passage of purified virions and transmission by cotton aphids to herbaceous hosts and blueberry seedlings, respectively, for

BVA transmission studies have not been successful (Isogai et al., 2013).

35

Unassigned viral families

Blueberry shoestring virus. Blueberry shoestring virus (BSSV), a Sobemovirus, was discovered as the causal agent of shoestring disease in the late 1970’s (Lesney et al., 1976). It is one of the most prevalent viruses of cultivated highbush blueberries (V. corymbosum), causing up to 25% yield loss in infected bushes (Ramsdell, 1995). In 1981, approximately $3 million yield loss was documented in a blueberry field in Michigan due to shoestring disease, making it one of the most economically important diseases of highbush blueberry (Ramsdell, 1987). The virus only infected lowbush blueberry in Nova Scotia and is widely distributed in blueberry plants in Michigan, New Jersey, North Carolina, and Washington (Ramsdell, 1987; 1995).

Although nine highbush cultivars were susceptible to the virus, the cultivars ‘Blueray’ and

‘Atlantic’ displayed field immunity to shoestring disease (Ramsdell, 1987).

BSSV has a latent period of about four years before starting to cause symptoms on healthy blueberry in an infected field and is spread in a horizontal pattern from bush to bush

(Ramsdell, 1987). Current and one-year-old stems on infected blueberry plants showed elongated

(0.2 x 1.2 cm) reddish streaks that fade as the growing season continues. Flower ‘breaking’ can occur sometimes, by developing longitudinal pink streaks on the petals. Infected leaves are narrow with shoestring’ symptom, or can be curled. Surface of immature berries of infected plants that are exposed to light may produce a premature reddish-purple cast (Ramsdell, 1979).

Aphids (Illinoia pepperi) vectored the transmission of BSSV into blueberry plants in a persistent, circulative manner (Morimoto et al., 1985). While BSSV cannot be inoculated mechanically to herbaceous plants, blueberry seedlings or rooted softwood cuttings can be infected when inoculated manually with the purified virus (Ramsdell, 1979).

The virion of BSSV is a non-enveloped isometric particle, size 28nm in diameter. It has a monopartite ssRNA genome containing four putative ORFs (ORF1, ORF2a, ORF2b, and

36

ORF3). The 15.4 kDa movement protein, a 65.55 kDa polyprotein protein (Protease‐VPg), a

62.22 kDa RNA‐dependent RNA polymerase protein, and a 27.85 kDa coat protein were encoded by ORF1 (nt No. 85‐492), ORF2a (nt No. 462‐2204), ORF2b (nt No. 1796‐3400), and

ORF3 (nt No. 3186‐4001), respectively (Yanagisawa et al., 2016). Based on the physicochemical properties and genomic sequence arrangement, BSSV is assigned to the genus Sobemovirus

(Ramsdell, 1979; Truve et al., 2012; Yanagisawa et al., 2016).

BSSV can be detected in infected blueberry plants by commercial ELISA kits or by RT-

PCR (Ramsdell, 1995; Yanagisawa et al., 2016). Because of the long latent period before symptoms are expressed in infected blueberries, rogueing is not effective. Hence, control of shoestring disease spread can be achieved by using virus-free planting material and timely insecticide applications (Ramsdell, 1995).

Blueberry necrotic ring blotch virus. Southern highbush blueberry in Georgia, United

States, was first observed to be infected with Blueberry necrotic ring blotch virus (BNRBV) in

2006, causing the blueberry necrotic ring blotch disease (Martin et al., 2012). Since then, the disease was reported in blueberry fields in the southeastern quadrant of the U.S. including

Florida, Mississippi, North Carolina, and South Carolina (Martin et al., 2012). Northern highbush blueberries or native rabbiteye blueberries (V. virgatum) have not been infected with

BNRBV, while the only known susceptible cultivars to BNRBV so far are the southern highbush cultivars (Martin et al., 2012). Infected blueberry exhibited discrete necrotic rings with green centers, but can resemble the symptoms caused by fungal diseases as the rings fuse. Severely infected bushes can result in early defoliation, which can be mistaken for septoria leaf spot disease. Unlike BRRV which frequently produces symptoms on only the upper leave surface and

37

stems, both upper and lower surfaces of infected BNRBV leaves can display necrotic rings but stems do not develop symptoms (Martin et al., 2012).

The BNRBV genome is approximately 14 Kb long which encompasses four RNA segments of seven ORFs (RNA1, 2 and 4 have one ORF and RNA3 has up to five ORFs)

(Cantu-Iris et al., 2013; Quito-Avila et al., 2013). RNA1 expresses methyltransferase (MTR), cysteine-protease (C-Pro) and helicase (HEL) from a putative 215kDa protein, RNA2 expresses

HEL and RdRp from the putative 130 kDa protein, RNA3 expresses up to five small proteins of unknown functions, and RNA4 expresses a 34 kDa protein containing conserved motifs belongs to the 3A movement protein superfamily (Quito-Avila et al., 2013). Protein analysis of different

BNRBV genome segments showed amino acid relatedness to the -like supergroup protein domains that are conserved among the RNA viruses. Although BNRBV belongs to the same clade as virus species from the genus Cilevirus and Higrevirus based on the phylogenetic analysis of the RdRp, it was recently assigned to a new genus, Blunervirus (Quito-Avila et al.,

2013; Melzer et al., 2017).

BNRBV is most likely transmitted by an eriophyid mite based on its protein sequence relatedness to Citrus leprosis virus (CiLV). To test this hypothesis, transmission studies of

BNRBV using eriophyid mite are currently being conducted in Florida and Georgia (Burkle et al., 2012; Robinson et al., 2016). Recently, BNRBV was shown not to spread via vegetative propagation, and it was inferred that systemic infection does not occur in BNRBV-infected southern highbush plants (Holland et al., 2013; Robinson et al., 2016). Molecular techniques such as RT-PCR can detect BNRBV in symptomatic tissue (Cantu-Iris et al., 2013; Quito-Avila et al., 2013; Robinson et al., 2016).

38

Next-Generation Sequencing (NGS) for Identification of Viruses

New sequencing technology has been developed over the last decade to overcome the limitations of the traditional Sanger method. This “massively parallel” technology, termed as the next-generation sequencing (NGS) has a high throughput of 100-1000 factor daily, surpassing the conventional Sanger method (Kircher & Kelso, 2010). On top of the capacity of producing a large amount of short reads from multiple samples for each run, NGS can provide these sequencing data rapidly at a lower cost (Metzker, 2010). Since its introduction over more than a decade ago, NGS has been incorporated as one of the most important experimental tools in research, shifting the conventional approach in many scientific fields such as plant virology.

Several NGS platforms are available to date (Table 1-2), each with a unique DNA sequencing procedure derived from a combination of shared features categorized based on the preparation of template, sequencing and imaging techniques, and analysis of data (Metzker,

2010). The underlying features of all these platforms include ligation of specific adapters to the fragmented templates for library construction, followed by amplification of the templates on a solid surface (i.e., glass slide or microbead) and an automated sequencing reactions to produce millions of reads (Metzker, 2010).

The choice of a sequencing platform for a research project depends on multiple factors such as the average genome size of interest, the complexity of the genome (GC content), and the expected coverage as well as the desired accurateness (Barba et al., 2014). Illumina HiSeq 2500 provides the highest throughput per run at the lowest cost per Gb pairs with an adequate accuracy. Thus, Illumina has been the most used platform for research in plant virology in many countries (Barba et al., 2014).

39

Plant Viral Metagenomics

Over the past decade, the exponential development of various NGS sequencing technologies, and the decreasing sequencing cost each year, has revolutionized the field of plant virology by the expansion of a new area referred to as metagenomics. Metagenomics refers to the nucleotide sequence content analysis in an undefined mixture of microbial populations independent of the culture (Adams et al., 2009; Mokili et al., 2012). In plant virus metagenomics, a large sequence data that represented the constituent of total nucleic acid from an individual or pooled plant samples is generated by using deep sequencing coupled with bioinformatics analyses for identification of viruses.

The application of metagenomics in virus discovery was pioneered by Allander et al.

(2001) by using restriction enzyme digestion and sequence-independent single primer amplification (SISPA) following DNase treatment of serum samples. An improved viral metagenomics method was then developed by using a shotgun cloning-based technique to study uncultured marine viral communities (Breitbart et al., 2002). Since then, a growing number of metagenomics based approaches for discovering viral diversity in various environments and systems have been documented, including marine (Angly et al., 2006; Culley et al., 2006), freshwater (Desnues et al., 2008; Djigeng et al., 2009), reclaimed water (Rosario et al., 2009a), soil (Fierer et al., 2007; Kim et al., 2008), wastewater (Tamaki et al., 2012), human and animal fecal samples (Zhang et al., 2005; Nakamura et al., 2009; Victoria et al., 2009; Blinkova et al.,

2010; Reyes et al., 2010) and insects (Ng et al., 2012; Rosario et al., 2014; Rosario et al., 2016).

The major difference between viral metagenomics using plant samples with those studies using animals or other environmental sources is the immobility feature of plants that makes resampling possible (Roossinck et al., 2015a). The exploitation of this feature enables the identification of the original host plant and its physical location as well as allows further

40

characterization the viruses discovered via metagenomics and is termed ecogenomics (Roossinck et al., 2010; 2015a).

Sample Preparation Methods to Generate Viral Metagenomes

Most microbes contain universal regions, such as the ribosomal RNA genes that enable the use of ribosomal DNA profiling for their identification and taxonomic classification (Pace et al., 2012). However, viruses lack these universal genes in addition to having smaller genomes that are usually present in lower proportion relative to those of host or other cellular organisms, causing interference of viral nucleic acids (Mokili et al., 2012). Therefore, an enrichment method such as centrifugation, filtration and/or precipitation as well as nuclease-treatment have been incorporated in viral metagenomics studies to concentrate virus particles and nucleic acids by removing hosts or non-viral nucleic acid in a sample (Thurber et al., 2009; Hall 2014).

Consequently, the metagenome dataset will likely produce a higher proportion of virus reads that can be used to assemble larger virus contigs, leading to an increase in virus detection sensitivity while eliminating the need of resequencing virus genomes (Hall et al., 2014).

There are several approaches to obtain plant viromes, including the utilization of (i) total

RNA or DNA, (ii) virion-associated nucleic acids (VANA) extracted from virus-like particles

(VLPs), (iii) double-stranded RNAs (dsRNA), (iv) virus-derived small interfering RNAs

(siRNAs) and (v) in silico approach (Roossinck, 2015a).

(i) Total RNA/DNA. RNA or DNA viromes can be instantly generated from total RNA or DNA extraction from plants, respectively. Several studies have employed the use of total

RNA/DNA extracted from pooled or individual plants coupled with sequence-independent amplification or high-throughput sequencing to discover novel viruses, unravel biodiversity of viruses, and elucidate etiology of viral diseases (Rwahnih et al., 2009; Loconsole et al., 2012a;

Wylie et al., 2014; Ong et al., 2016). However, the use of RNA/DNA has a major drawback in

41

which many of the sequences generated in the viromes belong to non-viral origin. In addition to this, the viromes may not produce detectable viral reads for low titer virus species or those with dsRNA genomes (Roossinck, 2015a). To overcome these drawbacks, some modifications have been made to reduce the copious amounts of host-derived sequences in the metagenomics data.

To enhance the proportion of virus-derived reads, modification can be done during sample preparation through the enrichment of polyadenylated RNAs (Gu et al., 2014). Besides this, subtractive hybridization has also been employed to deplete host rRNA sequences in the RNA libraries by removing host cDNA from infected plant with cDNA from uninfected plant (Adams et al., 2009). Yet, the modified method would not be suitable for detection of viruses lacking poly-A tails, latent, or persistent plant viruses that are commonly found in uncultivated plants

(Roossinck, 2012; 2015b). As for the discovery of DNA viruses, circular DNA genomes in the extracted DNA are enriched using rolling circle amplification protocol to enhance the fraction of virus sequences in the DNA libraries, as demonstrated in metagenomics studies from different environments including insects, freshwater systems, and plants (Zawar-Reza et al., 2014;

Kraberger et al., 2015; Rosario et al., 2016).

(ii)VANA extracted from VLPs. The majority of viruses have nucleic acid that are protected within stable virus particles. Based on this principle, VLPs are purified from metagenomics sample to optimize concentration of virus particles while eliminating hosts nucleic acids prior to extraction of VANA. This technique has been widely applied in various environmental metagenomics studies, including marine systems, human clinical samples, and plants (Breitbart & Rohwer, 2005; Culley et al., 2007; Dombrovsky et al., 2013; Hany et al.,

2014). There are basically three simple steps involved in concentrating VLPs prior to VANA extraction. These include centrifugation, filtration, and nuclease treatment. Both RNA and DNA

42

viruses can be detected simultaneously from amplified VANA as demonstrated by Melcher et al.

(2008) and Candresse et al. (2014). Nevertheless, VANA extracted from VLPs may not be able to detect viruses with unstable particles or those with naked nucleic acids like endornaviruses, and is not appropriate for plants with high level of secondary compounds (Roossinck et al.,

2015a). Besides being laborious, it is not possible to recover many viruses using VANA extracted from VLPs because most of them require a unique purification protocol (Wu et al.,

2015)

(iii) dsRNA. The hallmark of virus infection in plants is the production of dsRNA molecules as a replicative intermediate or as the original state of dsRNA viruses. Although plants can also form incomplete dsRNA molecules resulted from folded structure of other type of host

RNAs, the presence of dsRNA in plant cells usually signifies virus infection (Roossinck, 2015a).

The dsRNA approach is a well-developed protocol in plant virology, where dsRNA is extracted by using cellulose chromatography column following isolation of the total nucleic acid (Dodds et al., 1984). The dsRNA approach has enable the discovery of novel virus species in crops

(Villanueva et al., 2012; Elbeaino et al., 2014) and demonstrated that wild plants ordinarily harbor persistent viruses (Roossinck, 2012). Besides being the most suitable method for detecting mycoviruses (Rwahnih et al., 2011), the utilization of dsRNA has also been shown to enhance the level of virus reads relatives to the use of total RNA extraction (Rwahnih et al.,

2009). Yet, the dsRNA approach is arduous and inadequate for the detection of negative sense ssRNA or DNA viruses (Rossinck, 2015a). A few studies have recently introduced some modifications to the established dsRNA protocol for improving sensitivity of virus detection in the RNA libraries by exploiting bead based homogenization (Kesanakurti et al., 2016) and anti- dsRNA monoclonal antibodies in a pull-down assay (Blouin et al., 2016).

43

(iv) siRNA. Plants produce virus-derived siRNAs of 21-24 nt in length through RNA silencing mechanisms in response to virus infection, by cleaving viral dsRNA using Dicer-like proteins. Virtually a decade ago, known and novel viruses in diseased plants were detected by utilizing virus-derived siRNAs based on the overlapping feature of both sense and antisense viral siRNAs (Donaire et al., 2009; Kreuze et al., 2009). This approach has been used in many virus detection studies since then (Pirovano et al., 2015; He et al., 2015; Morelli et al., 2017). This method has enabled the detection of both plant ssDNA and dsDNA viruses via the assembly of virus-derived siRNA reads to reconstruct the complete genomes (Kreuze et al., 2009; Loconsole et al., 2012b; Seguin et al., 2014; Maliogka et al., 2015). DNA viruses could be found in siRNAs libraries because the transcription of coding and noncoding regions of circular dsDNA genome occurs in both directions, allowing the formation of dsRNA to initiate siRNAs production

(Seguin et al., 2014). While siRNA method is capable of detecting both types of RNA and DNA viruses, it is difficult to assemble complete viral genomes because of the short length of reads produced in the metagenome data. Also, most persistent viruses possess encapsidated dsRNA genome, a feature that can hinder the production of siRNAs and subsequently prevent the detection of these viruses in the metagenomics sequence data (Roossinck, 2015a).

(v) In silico approach. The generation of vast amounts of sequencing data by the ‘omics’ approach opens the door to countless possibilities. In addition to the in vitro methods described above to generate metagenomics data for identification of viruses, the publicly available sequence data from transcriptomics or genomics database can provide an alternative source for data mining of viral sequences. Plant-derived sequence data sets that are generated for other projects can be used to search for viral sequences through in silico analyses as demonstrated by

Jo et al. (2015). In this study, five RNA virus genomes were successfully obtained by de novo

44

assembly of the existing grapevine transcriptome data. Similar studies were conducted in different hosts including pepper (Jo et al., 2016a), apple and pear (Jo et al., 2016b). The later included comparison of two de novo assemblers, Trinity and Velvet, for identification and construction of viral genomes using RNA-Seq data. Besides using plant host databases for data mining of viral sequences, many studies that utilize the in silico approach have been applied in the context of metagenomics to unravel virus diversity in an environment. This includes a meta- analysis study to characterize viral communities of coral. It incorporates different types of publicly available libraries from corals, including transcriptomes and metagenomes datasets

(Wood-Charlson et al., 2015). In addition, Roux et al. (2015) unraveled viral diversity and virus- host interactions while using publicly available microbial genome data sets to mine for viral sequences.

The foremost advantage of using the in silico approach is the straightforwardness. A sample preparation step is not required in this approach hence saving the time and cost involved for generating sequence data. However, this approach demands more time and effort during the bioinformatic analysis step due to the presence of relatively high host background in the sequence data. In addition, several viruses might be missed depending on the type of libraries used for data mining of viral sequences. For instance, the use of mRNA-Seq data may be inefficient for the discovery of RNA viruses as most of them lack poly-A tails. Finally, the absence of plant tissues when using the in silico approach further complicates the need for validation of endogenous viruses or characterization of new virus species.

Analysis of Metagenome Data

The process of analyzing the enormous volume of short reads generated by NGS is perhaps the most critical part of viral metagenomics studies that necessitates proper computational resources and expertise (Scholz et al., 2012). The standard protocol in

45

metagenomics data analyses basically involves three main steps which are the (i) preprocessing of raw reads, (ii) assembly of reads, and (iii) annotation of contigs and sequence characterization.

(i) Preprocessing of raw reads. The preprocessing step in metagenomics analyses involved trimming the adapters from the raw reads of sequence data and quality filtering the reads to retain high quality bases. It was demonstrated that reads trimming can improve the quality and reliability of viral metagenome analysis by consuming lower computational resources and shorter execution time (Del Fabbro et al., 2013). To increase the efficiency of downstream assembly processes, non-viral sequences can be removed from the viral metagenome data via the in silico subtraction approach by mapping the reads to accessible host genome (Mokili et al., 2012; Massart et al., 2014). However, filtering unwanted sequences comes at a cost, whereby viral sequences that may have evolved from plants could be mistakenly removed during the in silico subtraction of host plant sequences (Roossinck, 2012).

(ii) Assembly. Following the preprocessing step in metagenome data analyses is the assembly process of the reads. There are two assembly strategies: de novo assembly and reference-based mapping (Thomas et al., 2012). In de novo assembly, contigs (consensus sequence) are generated based on the overlapping sequence reads to reconstruct genome of novel viruses, in which the reference genome is unavailable. A group of assembly programs based on the K-mer de Bruijn graphs are developed to resolve the huge computational resources typically required for executing de novo assembly (Thomas et al., 2012). Among the widely used de

Bruijn graphs-based programs include Velvet (Zerbino et al., 2008), SOAP-denovo (Li et al.,

2010), ABySS (Simpson et al., 2009), Trinity Grabherr et al., 2011), and SPades (Bankevich et al., 2012). The parameter used in the assembly of reads is delineated by the assembly algorithm

46

(Wu et al., 2015). Stringent assembly parameter can be applied to circumvent the production of chimeric sequences (Mokili et al., 2012).

The other assembly strategy is the reference-based mapping for contigs with sequence homologs to known viruses. In reference-based mapping, gene functions and taxonomic identification can be assigned to the virus sequence by aligning the viral reads to the closely related reference genome (Scholz et al., 2012). Once the virus has been assigned taxonomically, the coverage of different genomic regions may be determined from the distribution of viral reads across the genome. The level of viral replication of a virus in the plant tissue may also be estimated by calculating the percentage of virus-associated reads and the virus copy numbers (Jo et al., 2015). In addition, the reference-based mapping approach can also be used to detect the presence of virus variants by identifying the occurrence of single nucleotide variations in the viral genomes. Hence, the use of both reference-based mapping and de novo assembly may provide more insights on the viral genomes found in a metagenome data. The most widely used algorithm for reference-based mapping is the Burrows–Wheeler aligners, including BWA (Li &

Durbin, 2009) and Bowtie (Langmead & Salzberg, 2012).

Although the assembly process provides a means to construct viral genomes, it remains a challenge in viral metagenome data analysis. The titer of a virus species differs in each population and often fluctuates over time. Some viruses have multiple segments that varies in size and concentration because some genes are produced more abundantly. The variation of viral genome content that is present in a population in any given time may impact the number of reads produced in a metagenome data. Hence, some genomic positions may have low or no coverage at all during the assembly process due to the disproportionate number of virus species and their genomes. This often results in an incomplete viral genome. Furthermore, it is likely to obtain

47

artificial recombinants or chimeric viral genomes resulting from in silico error of the de novo assembled contigs due to the complex population structures of viruses (Prosperi et al., 2011).

Subsequently, these chimeric sequences can lead to false inference of phylogenetic and molecular evolution analyses of viruses (Martin et al., 2011). Thus, it is important to incorporate experimental validation steps in any viral metagenomic analysis studies as to eliminate the possibilities of constructing false in silico derived genomes.

(iii) Classification and annotation of contigs. The assembled contigs representing a partial or complete viral genome are taxonomically classified by using homology sequence search against a local or public database for taxonomic classification (Mokili et al., 2012). The commonly used tools for homology search are BLASTn and BLASTx, where the contigs are aligned to nucleotide and amino acid sequences in the database, respectively (Wu et al., 2015).

Viral sequences assembled from contigs are then assigned to families, genus or species of known or novel viruses. This is done based on sequence and phylogenetic analysis of the virus hallmark genes or taxonomically informative genomic regions (Smits et al., 2014). However, a problem in of de novo assembled sequences arises when they have no close homologs in the databases, resulting in an abundant unknown sequence (i.e., usually 60-95%), known as viral dark matter (Hurwitz & Sullivan, 2013; Massart et al., 2014, Roux et al., 2015). Another main drawback in the data analysis is the lack of robust tools or automated programming for the classification these unknown viral sequences obtained from metagenome data. Automated algorithms have been developed to eliminate the steps such as manual sequence comparisons of contigs obtained from metagenome data for taxonomic classification (Simmonds, 2015). These automated programs can automatically assign viral sequences to taxon based on distance-based method such as the program Pairwise Sequence Comparison (PASC) (Bao et al., 2014) and

48

DEmARC (Lauber & Gorbalenya, 2012). Apart from the development of robust programs for the classification of vial sequences from metagenome data, it has been recently proposed that these metagenome sequence data, which often lacks biological and experimental characterization, should be incorporated in the International Committee on Taxonomy of Viruses (ICTV) taxonomy (Simmonds et al., 2017).

Applications of Viral Metagenomics

Viral discovery and diversity. Plant viral metagenomics offer several advantages compared to the conventional methods. The main striking value of this approach over the conventional ones is that it requires no a priori knowledge of the host or pathogen (Adams et al.,

2009; Kreuze et al., 2009). Based on the sequence-independent manner, known and novel viruses or viroids that present in a plant species could be detected and identified up to the species level in symptomatic as well as asymptomatic plants (Elbeaino et al., 2014; Wylie et al., 2014; Kraberger et al., 2015; He et al., 2015; Ong et al., 2016; Igori et al., 2017). The identification of new plant viruses has been described in more than fifty peer-reviewed publications (Barba et al., 2014;

Roossinck et al., 2015a). While the wild plant species in natural settings are known to potentially harbor uncharacterized viruses, most metagenomics studies have focused on the cultivated plant species (Roossinck et al., 2015a). However recently, there has been an increase in the viral metagenomics of wild plants, owing to the great viral diversity demonstrated in several extensive studies of various wild plant species (Muthukumar et al., 2009; Roossinck et al., 2010; Wylie et al., 2012; Ong et al., 2016; Koh et al., 2016). Although plant viruses may not cause diseases in wild plants as shown by the absence of viral-like symptoms in infected plants (Muthukumar et al., 2009), the likelihood that viruses may cause diseases in cultivated plants cannot be ruled out

(Min et al., 2012; Bernardo et al., 2013). Likewise, mutualistic interactions between viruses and

49

hosts can be beneficial to the plants by protecting the hosts from extreme environmental conditions and against other pathogenic viruses (Stobbe & Roossinck, 2014).

Viral diagnostics. There is a possibility of host range shifting when viruses are introduced into new environments. Due to this threat to cultivated plants, plant viral metagenomics may be a suitable method for viral surveillance in agricultural settings (Rossinck et al., 2015a). The distribution of asymptomatic infected plant materials through germplasm exchange also may carry viruses that can eventually cause a disease in the presence of new favoring environments and the right vectors. Thus, metagenomics may provide a way for indexing viruses or viroids in plants due to the broad-spectrum detection capabilities that are useful for plant viral diagnostics applications including plant quarantine and phytosanitary, certification programs, and seed lot screening (Massart et al., 2014). Rwahnih et al. (2015) showed that the metagenomics approach is far more efficient than the conventional bioassay for phytosanitary testing of the propagative materials for commercial grapevine. Ho et al. (2015) have developed a bioinformatics pipeline that can be incorporated into the certification scheme for virus detection in berry crops. It has also been shown that metagenomics is a robust tool in detection of new pathogens that can be overlooked in a plant quarantine setting (Candresse et al.,

2014). Hence, the combination of bioassay and metagenomics approach can provide economic stability and food security by preventing new disease outbreaks in important agricultural crops.

Viral etiology of diseases. Metagenome data can produce an enormous number of reads that can be used to determine possible viral etiology of an unknown disease. A number of studies have successfully identified the causal agent of new viral diseases in various plant species including grapevine (Rwahnih et al., 2009), tomatoes (Li et al., 2012), maize (Adams et al.,

2013), quince (Morelli et al., 2017), cherry (Candresse et al., 2013), citrus (Loconsole et al.,

50

2012a; Loconsole et al., 2012b; Roy et al., 2013; Vives et al., 2013), coffee (Romalho et al.,

2014), and nectarine (Villamor et al., 2016). A metagenomic approach has significantly shortened the time required to determine the viral etiology of new diseases in these plants, that would otherwise be impossible to achieve using the conventional approach. The information that is gained from the identification of new viral diseases on different crops can subsequently be incorporated in the viral diagnostics procedure by developing broad-range primers for virus detection.

Molecular evolution of viruses. Viral genomes are highly divergent within and between infected hosts because of high mutation and recombination rates, rapid replication, and large population sizes (Duffy et al., 2008). Consequently, populations of virus in a host can exhibit a high level of genetic diversity due to the constant production of genetic variants (Massart et al.,

2014). Virus populations within a host are often referred to as mutant clouds, swarms, or quasi- species because of the diversity (Beerenwinkel et al., 2012). It was demonstrated by NGS that the evolutionary dynamics of viral quasi-species by natural selection and genetic drift can affect virus emergence (Fabre et al., 2012). Hence, viral metagenomics coupled with NGS can be used to assess molecular evolutionary events in a viral quasi-species within and between hosts in various environments. This can provide information to help prevent future disease epidemics. In addition, the valuable data gained from viral population genetics can further assist in taxonomic assignment of viruses.

Virus ecology and virus-vector interactions. Plant viruses have been found in diverse environments through metagenomics studies, showing that the existence of plant viruses are not exclusively limited to plants. Zhang et al. (2006) showed that RNA viruses were abundantly presence in human feces, implying that plant viruses can occur outside of their natural plant

51

hosts. Besides being detected in the human feces, Pepper mild mottle virus (PMMoV) was also detected in food products (Colson et al., 2010), wastewater (Rosario et al., 2009b), and drinking water source (Haramoto et al., 2013). In addition to these environments, plant viruses were found in other hosts such as bats (Donaldson et al., 2010) and ancient animal feces (Ng et al., 2014).

Metagenomics approach also have been extended to insects for the identification of plant viruses present in the environment through ‘vector-enable-metagenomics’ (VEM). The VEM approach was initiated by Ng et al. (2011) to describe DNA viruses that were present in whiteflies (Bemisia tabaci) in agricultural areas. A novel mastrevirus and alphasatellite, as well as a carlavirus was discovered from dragonflies (Rosario et al., 2013) and whiteflies (Rosario et al., 2014), respectively, following the former VEM study. Recently, a worldwide survey of viral communities in the Asian citrus psyllid, Diaphorina citri, had revealed several novel viruses using VEM (Nouri et al., 2016). In addition to viral discovery and deciphering virus-vector interactions, the VEM approach is a promising tool for virus control through the management of their insect vectors by incorporating techniques such as RNAi or genome editing (Kaur et al.,

2015).

Objectives

Fifteen species of viruses including RNA and DNA viruses from seven known and two unassigned viral families have been reported in highbush, lowbush, and rabbiteye blueberries world-wide. Only two species of viruses, BNRBV and BRRV, have been documented and observed in Florida, respectively (Cantu-Iris et al., 2013). No studies have been conducted to determine the diversity of viral populations in blueberry in Florida since viral diseases are currently not a limiting factor in the blueberry industry in this state. Hence, we developed the following objectives to determine the diversity of viral populations in wild and cultivated blueberries in Florida:

52

1. Identify viral sequences through analysis of existing root transcriptomes of V. arboreum and V. corymbosum x V. darrowi ‘Emerald’

2. Characterize plant RNA viromes from wild and cultivated V. corymbosum in Florida

3. Characterize plant DNA viromes from wild and cultivated V. corymbosum in Florida

53

Table 1-1. Virus species reported in Vaccinium spp. in United States and around the world. Type of Family Genus Virus References genome dsDNA Caulimoviridae Soymovirus Blueberry red ringspot Hutchinson & Varney (1954), (RT) virus Gillet & Ramsdell (1988), Kim et al. (1981) dsRNA Amalgaviridae Amalgavirus Blueberry latent virus Martin et al. (2010) ssRNA Ophioviridae Ophiovirus Blueberry mosaic associated Varney (1957), Thekke- (-) virus Veetil et al. (2014) ssRNA Betaflexiviridae Carlavirus Blueberry scorch virus Martin & Bristow (1988), (+) Bristow et al. (2000) Ilarvirus Blueberry shock virus MacDonald et al. (1991), Bristow & Martin (1999) Unassigned Blueberry virus A Isogai et al. (2013) Secoviridae Nepovirus Blueberry latent spherical Isogai et al. (2012) virus Blueberry leaf mottle virus Ramsdell & Stace-Smith (1979), Childress & Ramsdell, (1986;1987) Cherry leaf roll virus Woo et al. (2013), Woo & Pearson (2014) Peach rosette mosaic virus Ramsdell & Gillet (1981), Lammers et al. (1999), Allen et al. (1984) Tobacco ringspot virus Lister et al. (1963); Converse & Ramsdell (1982) Tomato ringspot virus Converse & Ramsdell (1982) Unassigned Strawberry latent ringspot Woo & Pearson (2014) virus Unassigned Sobemovirus Blueberry shoestring virus Ramsdell (1979), Lesney & Ramsdell (1976) Blunervirus Blueberry necrotic ring Quito-Avila et al. (2013), blotch virus Cantu-Iris et al. (2013) dsDNA(RT): double-stranded DNA reverse-transcribing; dsRNA: double-stranded RNA; ssRNA (-): negative sense single-stranded RNA; ssRNA (+): positive sense single-stranded RNA.

54

Table 1-2. Major NGS platforms (Goodwin et al., 2016). Read Throughput Time Error Cost per Gb Technique Platform length (bp) (Gb) (h/d) rate (%) (US$, approx.) SOLiD SBL 75 (SE) 240 10d ≤0.1 70 5500xl 50-100 SBL BGISEQ-500 8-40 1 ≤0.1 70 (SE/PE) SBS Illumina 150 (PE) 650-750 1-3.5d 0.1 22 (CRT) HiSeq2500 SBS Ion Torrent 400 (SE) 1-2 7.3h 1 450-800 (SNA) SBS Ion Proton 200 (SE) 10 2-4h 1 80 (SNA) SMRT PacBio ~20Kb 0.5-1 4 13 1000 Approx., approximate; bp, base pairs; CRT, cyclic reversible termination; d, days; Gb, gigabase pairs; hours; Kb, kilobase pairs; PE, paired-end sequencing; SBS, sequencing by synthesis; SE, single-end sequencing; SNA, single nucleotide addition.

55

CHAPTER 2 TRANSCRIPTOMIC ANALYSIS OF EXISTING RNA-SEQ DATA ENABLES THE DISCOVERY OF PLANT VIRUSES

Introduction

Sparkleberry (Vaccinium arboreum) is one of the native Vaccinium species that has been used in the southern highbush blueberry (SHB) breeding program at the University of Florida

(Lyrene et al., 2003). Sparkleberry has deep-roots, favorable soil adaptation characteristics, and an upright growth habit conducive to machine harvesting (Olmstead et al., 2013). Sparkleberry is being investigated as a rootstock for SHB cultivars (Casamali et al., 2016). The southern highbush blueberry cultivar Vc x Vd ‘Emerald’ is an interspecific hybrid of V. corymbosum and

V. darrowi that is grown commercially in Florida. While symptoms characteristic of viral infection are often apparent on Vc x Vd ‘Emerald’ in production and in natural populations of V. darrowi, the makeup of and relationship between, the viral populations affecting these communities of plants are unknown. Viral diseases are not currently limiting factors in blueberry production in Florida, nor have ecological impacts of viral disease been documented in native Vaccinia. However, practices including grafting and interspecific hybridization of native species certainly provide mechanistic avenues for spillover of recalcitrant pathogens to occur

(Power and Mitchell, 2004).

Vast amounts of sequence data generated through various ‘omics’ approaches today open doors to many possibilities for post hoc analyses. One such possibility involves utilizing publicly available data sets from transcriptomics or genomics projects to data-mine for viral sequences.

Plant transcriptome data generated by horticulturalists and other plant scientists have been used to search for viral sequences using in silico analyses. In one study, three nearly complete genomes (Grapevine rupestris stem pitting-associated virus, Grapevine pinot gris virus, and

Potato virus Y) were obtained from de novo assembled contigs of an existing grapevine

56

transcriptome (Jo et al., 2015). Similar studies were conducted later by using publicly available pepper, apple, and pear transcriptomes. Nearly complete genomes of Bell pepper endornavirus and Apple stem grooving virus were assembled in these studies (Jo et al., 2016a; 2016b). In addition, the apple and pear transcriptome analyses study includes comparison of two de novo assemblers, Trinity and Velvet, for their ability to identify and construct viral genomes from

RNA-Seq data (Jo et al., 2016b). The authors found that Trinity and Velvet were suitable for de novo assembly of contigs from mRNA and sRNA sequence reads, respectively. The research showed that Trinity produced longer but fewer contigs, while Velvet produced more contigs that were shorter. Overall, these studies illustrate that plant transcriptome data can be resources used to gain insights into viral communities affecting plants.

The objective of this study is to identify viral sequences through transcriptome analysis of 32 root RNA libraries from V. arboreum and Vc x Vd ‘Emerald’ SHB generated by Olmstead et al. (unpublished data) as part of an unrelated study. Additional analyses were performed on the complete genomes of Blueberry red ringspot virus isolates assembled from the transcriptomes to determine gene expression profiles and single nucleotide polymorphisms

(SNPs) of the virus. Secondary objectives include in vitro validation of selected viral sequences identified in the hosts, followed by sequence and phylogenetic analysis to compare the viral sequences obtained from this study with other published viral genomes.

Materials and Methods

Source of Transcriptome Libraries

In brief, two Vacinium species, V. arboreum and Vc x Vd ‘Emerald’, were used to produce eight clonal plants for each genotype. Total RNA was extracted from subsample of root tissue from each plant using a Plant/Fungi Total RNA Purification Kit cat #25800 (Norgen

Biotek Corp., Canada) following the recommended manufacturer’s instructions. The RNA

57

extracts were subjected to rRNA depletion using Epicentre Ribo-Zero™ rRNA Removal Kits followed by RNA library construction using Epicentre ScriptSeq v2 RNA-Seq library preparation kit according to the manufacturer’s protocol. Two libraries containing 100 bp paired end reads were generated in replicate sequencing reactions to produce 16 transcriptomes for each plant. This was done using Illumina HiSeq 2000 platform at the Interdisciplinary Center for

Biotechnology Research (ICBR) Gene Expression Core, University of Florida (UF), producing a total of 32 transcriptomes.

Transcriptome Analysis

Raw reads from each of the transcriptome libraries were filtered for quality by trimming the reads with a minimum quality score of 20 using FASTQ Quality Filter in FASTx-Toolkit

(http://hannonlab.cshl.edu/fastx_toolkit). Universal and indexed adapters were trimmed from the

5’ and 3’ ends using FASTA/Q Clipper in FASTx-Toolkit. The reads were analyzed according to the transcriptome analysis pipeline illustrated in Figure 2-1. VelvetOptimiser Version 2.2.5

(https://github.com/Victorian-Bioinformatics-Consortium/VelvetOptimiser.git) was used to select a k-mer value of 53 to optimize the de novo assemblies. Contigs from each library were de novo assembled independently using Velvet (Zerbino and Birney, 2008), resulting in 32 sets of contigs with 16 sets from each plant species. Contigs with length ≥ 500 nts (nt) were extracted from each assembled dataset and were then combined based on the plant species, producing a set contig each for V. arboreum and Vc x Vd ‘Emerald’. By using this approach, contigs with viral hits can be traced back to its corresponding library.

Contigs were compared to a local plant virus database (Zheng et al., 2017) by BLASTx

(Altschul et al., 1997) with e-value of <10-5. Contigs with similarity to plant viruses were organized by family, genus, and species according to the 2016 Virus Taxonomy Release of the

International Committee on Taxonomy of Viruses (ICTV) website

58

(https://talk.ictvonline.org/taxonomy/) for approved virus species. Contigs with hits to virus species not yet approved by ICTV were assigned to its corresponding genus and family based on the information provided in the NCBI Taxonomy Database

(https://www.ncbi.nlm.nih.gov/taxonomy).

Scaffolds with the same viral hits were assembled from overlapping contigs using the assembler built in to Geneious 9.1.6. Contigs with similarity to plant viruses by BLASTx to local plant virus database and the corresponding scaffolds were then compared to the sequences in the non-redundant GenBank protein database by using BLASTx. This step was performed to confirm that the assembled contigs and scaffolds only produced hits to plant virus sequences and not host. Contigs and scaffolds with homology to viruses were then selected based on the longest nt length for further sequence analysis in Geneious 9.1.6 and in vitro validation.

Reference based mapping was performed using Bowtie2 (Langmead & Salzberg, 2012).

In the mapping process, reads from libraries that corresponded to individual plants were aligned to the de novo assembled scaffold generated from the step above, which was used as a reference sequence to obtain complete or partial viral genome. In addition, further analyses based on the

RNA-seq mapping data reads from libraries corresponded to the individual plants of Vc x Vd

‘Emerald’ were performed in Geneious 9.1.6. The analyses include the calculation of average reads coverage from libraries corresponded to each individual plant of Vc x Vd ‘Emerald’. The expression level of transcripts from each coding DNA sequence (CDS) were estimated in libraries of individual Vc x Vd ‘Emerald’ plants. Gene regions containing high reads coverage and SNPs were also identified in libraries of individual Vc x Vd ‘Emerald’ plant based on the

RNA-seq mapping data.

59

Virus Validation in vitro

Sample source and extraction of nucleic acid. Root samples of individual V. arboreum and Vc x Vd ‘Emerald’ plants were used for validation experiments.

V. arboreum. Total RNA was extracted from 50 mg of frozen ground root tissue of V. arboreum using a Plant/Fungi Total RNA Purification Kit cat #25710 (Norgen Biotek Corp.,

Canada) following the recommended manufacturer’s instructions. Each RNA sample was quantified using NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc. DE. USA).

The extracted RNA from each sample (40ng/ µl) were equally pooled for further validation studies.

Vc x Vd ‘Emerald’. Total DNA was extracted from 30 mg of ground root tissue of Vc x

Vd ‘Emerald’ plants using a CTAB protocol. DNA from leaves (ECS1-L) and root of BRRV symptomatic ‘Emerald’ (ECS1-R) were included as BRRV positive control. DNA from leaves of

Southernbelle (SB) was included as BRRV negative control for the amplification of full length

BRRV in pooled DNA from Vc x Vd ‘Emerald’ root samples using primers BRRV_b2 and

BRRV_Ab1 (Table A-3). Each DNA sample was quantified using NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc. DE. USA). Individual plant as well as pooled

Vc x Vd ‘Emerald’ DNA samples (20ng/µl) were used for the detection and differentiation of the episomal and endogenous BRRV.

Primer Design

Primer design for the detection of a potyvirus in V. arboreum. A set of primers (NP1_F and

NP_1R) were designed based on the NIb region of a de novo assembled scaffold belonging to the

Potyviridae family for the detection of a putative novel virus sequence (Figure 2-2a, Table A-3).

Primer design for the detection of BRRV in Vc x Vd ‘Emerald’. A set of published primer

(RRSV3F/RRSV4R) (Polashock et al., 2009) was used for the detection of a BRRV obtained

60

from Vc x Vd ‘Emerald’ transcriptome data analysis. For the differentiation of the episomal and endogenous form of the pararetrovirus sequence, two sets of back to back primer, BRRV_b1,

BRRV_b2 and one abutting primer, BRRV_Ab1 were designed in different ORFs V and VI of the de novo assembled BRRV genome (Figure 2-2b, Table A-3). All primers were designed by primer3 in Geneious 9.1.6 (Biomatters Ltd).

Detection of Viruses

Detection of a Potyvirus in V. arboreum. First, strand cDNA was synthesized with

ImProm-II reverse transcriptase (Promega, USA) in an RT step. Next, DNA was amplified in

PCR with Taq DNA Polymerase (New England Biolabs, USA) according to the manufacturer’s instructions. The RT reaction was carried out by first incubating a total of 10 µl reaction mixture containing 360 ng total RNA and 1 µM Oligo (dT)21v primer at 70ºC for 10 min. It was then incubated at 25 ºC for 5 min, 42°C for 1 h, and 70°C for 15 min following the addition of 10 µl solution containing a final concentration of 6 mM MgCl2, 1 mM dNTP mix, 20U Rnasin, and 1

µl reverse transcriptase. Subsequently, PCR was carried out using the cDNA template (2 µl) in a reaction mixture containing 2.5 mM MgCl2, 0.5 mM dNTP mix, 0.5 µM of forward and reverse primer, and 0.625U Taq DNA Polymerase. The cycling conditions involved were: initial denaturation at 94°C for 3 min, 35 cycles of 94°C for 30 secs, 62°C for 45 secs, and 72°C for 1 min 40 secs, and final extension at 72°C for 5 min. The RT-PCR reaction was carried out in the

Veriti 96-Well Thermal Cycler (Applied Biosystems Corp., Foster City, CA).

Detection of BRRV in Vc x Vd ‘Emerald’. Detection of BRRV was carried out by preparing a total of 20 µl reaction mixture containing 20 ng of total DNA, 3.1 mM MgCl2, 0.5 mM dNTP mix, 1.25 µM of forward and reverse primer, 1.25 µM Spermidine, and 0.625U Taq

DNA Polymerase (New England Biolabs, USA). The cycling conditions for PCR were as follows: 94°C for 3 min, 35 cycles of 94°C for 30 secs, 57°C for 45 secs, 72°C for 45 secs, and

61

final extension at 72°C for 5 min. PCR products obtained by primers RRSV3F/RRSV4R were resolved on a 1.0% agarose gel and the expected amplicon was purified using illustra GFX PCR

DNA and Gel Band Purification Kits (GE Healthcare Life Sciences, UK). Next, Sanger sequencing was performed at Eurofins MWG Operon LLC (Eurofins Scientific, USA) to confirm that the PCR products were BRRV sequences.

Attempts were also made to differentiate the episomal and endogenous form of the

BRRV. BRRV positive (ECS1-L and ECS1-R) and negative (SB) control DNA samples as well as pooled DNA extracted from Vc x Vd ‘Emerald roots were enriched for circular genomes by rolling circle amplification (RCA) using the illustra TempliPhi DNA Amplification Kit (GE

Healthcare, Little Chalfont, Buckinghamshire, UK). The RCA products were diluted in nuclease free water with a ratio of 1:1 before being use as template in PCR. PCR was carried out as described below.

BRRV_b1 forward and reverse primer: PCR was carried out using a total of 25 µl reaction mixture containing 20 ng of total DNA, 0.2 mM dNTP mix, 0.5 µM of forward and reverse primer, and 0.5U KAPA HiFi DNA Polymerase (Kapa Biosystems Inc., USA). This was performed under the following cycling conditions: 95°C for 5 min, 35 cycles of 98°C for 20 secs,

64°C for 15 secs, and 72°C for 4 min 15 secs, and final extension at 72°C for 10 min.

BRRV_b2 and BRRV_Ab1 forward and reverse primer: PCR was carried out using a total of 25 µl reaction mixture containing 20 ng of total DNA or 1 µl of the diluted RCA product,

0.2 mM dNTP mix, 0.5 µM of forward and reverse primer, and 0.5U Hot Start II High-Fidelity

DNA Polymerase (Thermo Fisher Scientific Inc. USA). This was performed under the following cycling condition: 98°C for 3 min, 35 cycles of 98°C for 10 secs, 62°C for 30 secs, and 72°C for

4 min 15 secs, and final extension at 72°C for 20 min.

62

All PCR products for differentiation of the BRRV were resolved on 0.8% agarose gel.

The expected full-length amplicon of BRRV (~8.3 kB) produced by back to back primer

BRRV_b1 using pooled DNA of Vc x Vd ‘Emerald in PCR was gel purified using illustra GFX

PCR DNA and Gel Band Purification Kits (GE Healthcare Life Sciences, UK). This was used as an inset in cloning reaction with TOPO-XL cloning kit (Invitrogen, USA) following the manufacturer’s instructions. The clone was sequenced by Sanger method at Eurofins MWG

Operon LLC (Eurofins Scientific, USA).

Sequence and Phylogenetic Analyses

Sequence and phylogenetic analysis of a potyvirus in V. arboreum. Pairwise alignment of the amino acid sequences (NIb protein) between the putative new member of

Potyviridae obtained from this study and other selected members from different genera in the family Potyviridae were computed by multiple alignment using MUSCLE (Edgar, 2004).

Phylogenetic analysis of the amino acid sequences of the NIb protein region was performed by neighbor joining method in MEGA (version 7.0) (Kumar et al., 2016) using bootstrap test with

1000 replicates to infer the relationship between the putative new virus scaffold and other selected members representing eight genera in the family Potyviridae.

Sequence and phylogenetic analysis of BRRV in Vc x Vd ‘Emerald’. Identity of the amino acid sequences of different ORFs with known protein functions between the in silico assembled BRRV genomes obtained from each library and other BRRV isolates were computed by multiple alignment using MUSCLE (Edgar, 2004). Evolutionary relationship between the amino acid sequences of these ORFs were inferred by the construction of phylogenetic trees in

MEGA (version 7.0) (Kumar et al., 2016) by neighbor joining method, using bootstrap test with

1000 replicates.

63

Results

Transcriptome Analysis Led to Virus Identification

Mining of plant virus sequences in the transcriptome. Plant virus sequences were identified using a metagenomics approach through analysis of the existing transcriptome libraries from blueberry root that were not known to harbor any viruses. A total of 1,410,287 and

4,317,432 contigs (length ≥ 500 nt) were obtained from de novo assembly of reads from V. arboreum and of Vc x Vd ‘Emerald’ libraries, respectively (Table A-2). The number of contigs obtained from each library from these plants is shown in Figure A-1. Comparison of the contigs and scaffolds generated from assembly of the V. arboreum and Vc x Vd ‘Emerald’ reads to the local plant virus protein database by BLASTx produced hits to virus species belonging to four families of viruses (Table 2-1). Three of the viral sequence contigs from the same families

(Caulimoviridae, Partitiviridae, ) were identified in both plants except for virus hits from the Potyviridae that was uniquely identified in V. arboreum (Table 2-1). Based on the

BLASTx results, the longest scaffold (nt) with viral hits to Wheat eqlid mosaic virus (WEqMV) from V. arboreum and BRRV from Vc x Vd ‘Emerald’, were selected for further sequence analysis and validation in vitro. Local pairwise alignment of the 2003 nt scaffold (NP scaffold), obtained from 4 contigs length 569-1622 nt, from V. arboreum showed a 43% nt identity covering the complete NIb region of WEqMV (Acc. no NC009805). However, local pairwise alignment of the amino acid (aa) sequence of the NP scaffold showed a low aa identity (26%) to the complete NIb region of Wheat eqlid mosaic virus (YP001468095), which contains a conserved reverse transcriptase (RT)-like superfamily domain. Mapping of reads from V. arboreum transcriptome to the NP scaffold as the reference sequence using Bowtie2 showed that

2542 reads from 174,092,337 reads (0.01 %) were aligned to the scaffold, with a majority of the reads derived from a7 and a8 library (Table 2-2). Reads from a7 and a8 libraries produced up to

64

246x and 144x coverage, respectively, that were distributed along the NP scaffold. In contrast, the number of reads from a1 to a6 that aligned to the NP scaffold were very low, thus producing

24x to 44x lower coverage than those in a7 and a8.

De novo assembly of reads from Vc x Vd ‘Emerald’ libraries yielded a complete genome of BRRV, genus Soymovirus, family Caulimoviridae, a plant virus with an open circular ~8.3 kb dsDNA genome. Based on the de novo assembly result, a whole genome of BRRV was initially obtained by scaffolding 75 contigs ranging from 500 to 1659 nt in size, producing a sequence of

8293 nt length. Reference-based mapping was then performed independently in each library by using the in silico assembled BRRV scaffold as a reference sequence that resulted in eight complete genomes of BRRV. Each contained 8 ORFs; ORF I (movement protein), A, B, C, IV

(coat protein), V (reverse transcriptase), VI (translational transactivator), and VII. Whole genome analysis of the nt sequences showed that these BRRV genomes shared highest nt identity

(97%) to the published sequence of BRRV from Poland (JN205460). A total of 900,057 reads of

367,406,012 reads (1.7%) from eight Vc x Vd ‘Emerald’ libraries were mapped to the BRRV scaffold with a significantly highest proportion of reads derived from e11 library (Table 2-2).

Analysis of the transcript expression level for each library showed that the coat protein (CP), translational transactivator (TAV), and movement protein (MP) were the most highly expressed gene in all libraries (Figure 2-3a). In contrast, reverse transcriptase (RT) and a hypothetical gene encoded by ORF B were consistently expressed in low level in all libraries while other CDS showed a variable level of expression (Figure 2-3a). In addition, regions with highest reads coverage were identified in the CP and TAV region in all libraries (Figure 2-3a). The RT contains region with high reads coverage in all libraries except for library e9 and e10, whereas the MP region contains high read coverage in library e10, e12, e15 and e16. The mapping of

65

reads from each library to the BRRV scaffold also showed that library e11 displayed the highest average reads coverage with 8 to 88 times more than other libraries, which is in line with the percentage of mapped reads (Figure 2-3b). Furthermore, identification of single nt polymorphism

(SNP) in reads mapped to BRRV scaffold indicated that there were 2 to 21 SNPs present in all libraries, except for library e11 which did not display any SNP (Figure 2-3b).

Validation of a Putative Novel and Known Virus

Detection of a potyvirus in V. arboreum. The presence of a ~1.6 kb amplicon from the

NP scaffold with highest hit to WEqMV obtained by BLASTx analysis of the contigs assembled from V. arboreum root transcriptome was validated by RT-PCR and Sanger sequencing. Pairwise nt sequence alignment of the amplicon and the NP scaffold by MUSCLE indicated that the amplicon was 100% identical to the assembled NP scaffold. BLASTx analysis of the consensus sequence obtained from the pairwise alignment to the protein GenBank database showed similarity to the NIb of WEqMV (YP001468095) as the only hit with significant alignment.

Subsequently, a conserved domain search (CDS) against the conserved domain database (CDD) in NCBI (Marchler-Bauer A et al., 2017) was performed using the translated protein sequence and indicated the presence of RT like superfamily containing the RNA-dependent RNA polymerase (RdRp) domain, thus validating the presence of the NP scaffold as a viral sequence.

Detection of BRRV in Vc x Vd ‘Emerald. BRRV was detected by PCR using the published virus specific primers (RRSV3F/RRSV4R primers) (Polashock et al., 2009) in 100% of the root samples of Vc x Vd ‘Emerald’ plants, from which the RNA-seq libraries were obtained, generating a 549 nt amplicon. One of the samples was sent for a direct sequence, producing an amplicon with highest similarity (99%) to the transcriptional transactivator gene of

BRRV isolate UF (JF 917085).

66

Since virus species in the Caulimoviridae are capable of integration into their host genome, back to back primers were used to differentiate between episomal and endogenous forms of the BRRV genome. A full-length BRRV genome, indicated by an 8.3 kB band was obtained with PCR using back to back primer (BRRV_b1) with a detection frequency of at least more than half (>60%) of the total number of PCR reactions (Table 2-3). Pairwise alignment of the cloned BRRV sequences obtained from BRRV_b1 amplified PCR product of pooled Vc x Vd

‘Emerald’ root DNA produced more than 98% nt identity to the in silico assembled BRRV genome (Figure 2-4). However, approximately 250 nts were missing in the 5’ end of the cloned sequence from BRRV_b1 reverse primer region.

In addition, PCR with back to back primer BRRV_b2 did not produce the expected size band in all tested samples, except for PCR using the RCA and DNA template from the ECS1-L and pooled Vc x Vd ‘Emerald’ root DNA. However, when tested with primer BRRV_Ab1, only

DNA from the ECS1-L produced the expected size band (Table 2-4).

Sequence and Phylogenetic Analysis of Putative Novel and Known Virus

Sequence and phylogenetic analysis of the novel potyvirus. Twenty-one amino acid sequences of the NIb region from members in eight genera (Brambyvirus, Bymovirus,

Ipomovirus, Poacevirus, Potyvirus, Macluravirus, Rymovirus, and Tritimovirus) and the NP scaffold were aligned with MUSCLE. Pairwise comparison of the amino acid sequences showed that the NP scaffold has low percentage of amino acid identities to other potyviruses, ranging from 14% to 24% identity (Table A-4). As depicted in the phylogenetic analysis of the amino acid sequences in the NIb protein between the NP scaffold to other selected members in the family Potyviridae, members belonging to the same genera were clustered within the same subgroup with high confidence of bootstrap value (Figure 2-5). It is shown that the NP scaffold

67

obtained from this study formed a distinct node yet clustered with those in the Tritimovirus, thus supporting the initial BLASTx results that showed it has the closest similarity to WEqMV.

Sequence and phylogenetic analysis of BRRV. Although the genome organization of the BRRV Florida isolates is similar with the previously reported isolates, there are slight differences in the length of ORFs from Florida (Figure 2-4, Table 2-5). ORF I of the BRRV

Florida isolates contain the putative ‘transport domain’ (GNLKYGVIKFDV; aa 196–207), which is important for the movement of viruses within the host. ORFs A, B, and C of the BRRV encodes for proteins with yet to be known functions, which are homologs of ORFs Ib, II, and III in Soybean chlorotic mottle virus (SbCMV). The coat protein (CP) gene of the Florida isolates, encoded by the ORF IV, contains the RNA binding domain (CWICQEDGHYANEC; aa 411-

425), which is a conserved motif among the caulimoviruses (Glasheen et al., 2002). ORF V encodes for the putative reverse transcriptase gene containing the putative protease

(YIDTGASLC; aa 31-39) and the core reverse transcriptase domains (YVDDIIIF; aa 356-363), which are conserved among caulimoviruses. Another conserved domain among the caulimoviruses, GLADTIY (aa 226–232), is also found in the ORF VI coding region of the

BRRV Florida isolates, expressing the putative translational transactivator protein. However, there is an amino acid insertion (E/Glu; aa 418) in the e9-derived ORF VI and two amino acid substitutions in the e12-derived ORF VI at aa 13 (S/Ser replaced by I/Ile) and aa 238 (V/Val replaced by I/Ile). The Florida isolates have the longest ORF VII compared to other isolates from other places, which is the least conserved regions among the caulimoviruses which encodes for unknown protein function (Glasheen et al., 2002).

The BRRV isolates from Florida (FL) were clustered in the same clade with other isolates from Czech Republic (CZ, HM159264), New Jersey (NJ, NC003138), Poland (PL), and distantly

68

related from the isolate from Slovenia (SL, JF421559) in the phylogenetic tree generated from the ORF V (RT) amino acid sequences (Figure 2-6). The FL isolates showed 99% aa identity to isolates from CZ, NJ, and PL. The FL isolates showed 97% aa identity to isolate from SL in the

ORF V (RT). Based on the species demarcation criteria for members in the family

Caulimoviridae, that are defined by 80% nt identity in the ORF V (RT), results clearly demonstrated that the scaffolds obtained from the blueberry transcriptomes are isolates of

BRRV. Further amino acid sequence alignment between the BRRV isolates using ORFs (I, IV,

V, VI and VII) with known protein functions showed that the BRRV genomes from Florida were identical to each other as depicted in the phylogenetic analyses where the local isolates were clustered together in the same group (Figure 2-6). There is also a single amino acid variation in the ORF VI (TAV) sequences within the FL isolates which separated the scaffold obtained from e9 and e12 libraries into distinct nodes. It was also shown that the isolate from NJ has the most distantly related amino acid sequence from FL based on the phylogeny of different protein regions except for the ORF V (RT), with the highest divergence displayed in the ORF I (MP) region. Multiple alignments of amino acid sequences from different ORFs are in agreement with whole genome alignment (Table A-5) of the BRRV isolates, indicating that the isolates from FL had highest homology with isolate from PL and lowest homology with isolates from SL and NJ.

Discussion

Viral Transcriptome Analysis for Identification of Viruses

There are several approaches to obtain viral metagenomes, including the utilization of total RNA or DNA, virion-associated nucleic acids (VANA) extracted from virus-like particles

(VLPs), double-stranded RNAs (dsRNA), virus-derived small interfering RNAs (siRNAs), and data mining using available NGS sequence data (e.g., transcriptome or genome database)

(Roossinck, 2015; Jo et al., 2015a). It was previously demonstrated that plant mRNA libraries

69

can be utilized for host plant studies as well as providing a source for viral metagenome studies

(Jo et al., 2015a, 2015b; Jo et al., 2016; Jo et al; 2017). Data mining of virus sequences was conducted in this study by using available blueberry root transcriptomes generated from two blueberry species that are being used in the blueberry breeding program in Florida, which are V. arboreum and Vc x Vd ‘Emerald’. It is worth mentioning that this study is unique compared to the previous studies (Jo et al., 2015a, 2015b) in which we include transcriptome data generated from root tissue of a wild plant relative to the cultivated blueberry species. We also exploited the availability of root tissue samples used to generate the transcriptomes for in vitro validation of viruses found in this study, which appeared to be a missing component in the aforementioned studies.

Analysis of V. arboreum and Vc x Vd ‘Emerald’ root transcriptomes in both libraries produced contigs with high sequence similarity to one known pararetrovirus, BRRV, family

Caulimoviridae. It also produced contigs with low similarity to five virus species belonging to four families of viruses, Partitiviridae, Potyviridae, Rhabdoviridae, and an unassigned family of

Varicosavirus. While most of the contigs assembled from V. arboreum were short in length, we are able to assemble eight complete viral genomes from Vc x Vd ‘Emerald’. It was shown in a previous study that de novo assembly using Velvet was not able to assemble a complete viral genome, due to the shorter contigs length (Jo et al., 2016b). However, we successfully assembled complete viral genomes in this study. The amount of virus titer present in a sample might play the most important role in de novo viral genome assembly. The variation of viral genome content present in a population through time may also impact the number of reads produced in a transcriptome dataset. Hence, some sequences may have low or no coverage at all during the assembly process resulting in an incomplete viral genome. Nonetheless, we have shown that

70

analysis of available plant transcriptomes can identify contigs that potentially belong to novel virus species, in addition to known viruses. The presence of contigs with similarities to RNA and

DNA viruses further demonstrated that available plant transcriptome data can be used for the identification of both type of viruses since DNA virus are known to replicate via an RNA intermediate. This was shown in a recent study that identified different types of viruses that consist of ssRNA, dsRNA, ssDNA, and dsDNA, by screening a pepper transcriptome for viruses

(Jo et al., 2017).

Possible viral movement between wild and cultivated plant species have been observed in several studies, where potyviruses were found in two separate studies in legumes and orchids in

Australia (Wylie et al., 2010; Kehoe et al., 2012). The presence of four out of six contigs with homology to viral species belonging to the same family (Caulimoviridae, Partitiviridae, and unassigned family of Varicosavirus) in both libraries showed that the wild and cultivated blueberry species could share similar viral populations. This is not surprising since wild blueberry species native to Florida are routinely used in the UF blueberry breeding program

(Lyrene, 1997) and thus could provide means for viral movement. In addition, cultivated blueberry farms are frequently located near native blueberry relatives, potentially providing reservoirs for exchange of viruses.

Detection of a Novel Potyvirus in V. arboreum

Of the identified viral contigs obtained from viral metagenome analysis of the blueberry transcriptome data, those with homology to WEqMV and BRRV from V. arboreum and Vc x Vd

‘Emerald’, respectively, were selected for validation in vitro because they produced the longest scaffold length compared to other contigs in each library. The presence of a de novo assembled scaffold with similarity to WEqMV in V. arboreum was validated in vitro. Pairwise sequence comparisons showed that the scaffold has a 43% nt and 24% amino acid identity, suggesting that

71

the in silico assembled scaffold might represent a partial genome of a novel virus species, which likely belongs to a new genus in the family Potyviridae. This suggestion was made based on

Adams et al. (2005), who showed that the percent nt and amino acid identity value in the NIb region ranged from 42.2–59.4 and 29.2–58.1, respectively, for members in different genera.

Further phylogenetic analysis based on the NIb amino acid sequences showed that the scaffold appears to be a distinct member of a new genus in the family Potyviridae, with close relationship to the member of Tritimovirus.

Detection of BRRV in Vc x Vd ‘Emerald’.

In addition to the findings of a contig that possibly represents a novel virus species in V. arboreum, viral metagenome analysis of the Vc x Vd ‘Emerald’ transcriptomes had also lead to the assembly of 8 complete BRRV genomes of 8293-8296 nt in length. These results provide the first complete genome of BRRV from Florida. Analysis of the assembled reads from each library using BRRV scaffold from overlapping contigs not only allowed us to determine the number of mapped reads and average of reads coverage, but also estimated transcript expression level in each gene and identify genomic region of high read coverage as well as SNPs. One library, e11, was shown to contain a significantly greater number of mapped reads, and average reads coverage, compared to other libraries, which suggests the presence of high virus transcripts in the corresponding plant. Furthermore, mutation rate in each BRRV genome assembled from each library was identified to be in the range between 0-0.25%, which were calculated based on the identified number of SNPs. BRRV genome assembled from e10 and e16 library contain SNPs in the ORF B and TAV, respectively, which produced frameshift mutations thus could affect the protein function. The absence of SNP in e11 library suggested that the BRRV genome assembled from this library may be the dominant genome amongst the isolates from Florida. Analysis of transcript expression level for each assembled BRRV genome obtained from this study had

72

provided insights into the expression profile of each viral genes. Of all genes that constituted the

BRRV genome, the CP, TAV, and MP genes were consistently expressed in high level while the

RT gene constantly showed low expression level.

Additional sequence and phylogenetic analyses were performed in this study to compare the similarity and relationship between the genomes of BRRV isolates from FL to other BRRV isolates from other regions. Overall, whole genome alignment and phylogenetic analysis showed that the BRRV isolates from FL were closely related to the BRRV isolate from Poland with 97% nt identity, rather than 94% nt identity with another isolate from the US, implying that there could have been exchange of plant stock or germplasm between these regions.

Integrated viral sequences in host genome, known as endogenous pararetrovirus sequence

(EPRS), have been reported from all genera in the family Caulimoviridae, except none has been reported from the genus Soymovirus (Eid & Pappu, 2014). Although these EPRS are usually present in the host genome as degenerate, fragmented, and repositioned form, their complete genome can frequently be assembled in silico (Teycheney et al., 2011). Differentiation of the endogenous and episomal form of the pararetrovirus are usually determined by using methods such as genomic southern hybridization, reverse transcription-PCR (RT-PCR), immunocapture-

PCR (IC-PCR), and rolling circle amplification (RCA) (Bhat et al., 2016). In addition to these methods, back to back primers and abutting primers have also been used to amplify full length circular viral genome including those in the family Geminiviridae and Caulimoviridae

(Kraberger et al., 2015; Stainton et al., 2015).

Efforts were therefore taken to determine whether the BRRV found in this study are integrated into the host genome or present as an episomal form. In this study, differentiation of the BRRV forms was carried out using both type of primers that were designed in the RT and

73

TAV region. The results obtained for PCR by back to back primer set BRRV_b1 using total

DNA showed that the expected full BRRV amplicon was detected in each root sample for at least more than 50% of the total number of PCR reactions. Full length 8.3kb cloned sequence from pooled DNA Vc x Vd ‘Emerald’ root samples by PCR using primer set BRRV_b1 further confirms the presence of BRRV and suggests that BRRV was present as an episomal form in these samples.

We then tested additional sets of back to back primers (BRRV_b2f) and abutting primers

(BRRV_Ab1) by including RCA products besides total DNA as PCR template to confirm our previous results with primer set BRRV_b1. However, the absence of expected full length 8.3kb band in PCR reactions using RCA template from control samples of BRRV symptomatic Vc x Vd

‘Emerald’ and pooled Vc x Vd ‘Emerald’ root samples when tested with BRRV_b2f and

BRRV_Ab1 primers did not agree with the previous findings. Additional research is needed to determine why this was the case. We could not rule out the possibility of the virus existing in both episomal and endogenous forms in Vc x Vd ‘Emerald’ roots. Previous research has shown that EPRS can be activated into its episomal form in response to abiotic or genomic stresses which subsequently can lead to the appearance of symptoms and increased transcript levels

(Noreen et al., 2007; Iskra-Caruana et al., 2010).

Conclusion

A partial genome of a novel virus species in the family Potyviridae was de novo assembled from the wild blueberry relative V. arboreum transcriptome. Eight genome sequences of BRRV were discovered through analysis of blueberry root transcriptomes. Hence, these results demonstrate the usefulness of exploiting existing or publicly-available transcriptome data for the rapid and inexpensive discovery of known and new viruses.

74

Unknown viruses

Figure 2-1. Transcriptome analysis pipeline used for data mining of viral sequences. Raw reads from each library of V. arboreum and Vc x Vd ‘Emerald’ were processed by filtering the reads based on quality and trimming the adapter sequences. The processed reads were then assembled de novo to produce contigs. These contigs were aligned to a local plant virus database by BLASTx analysis to identify contigs with homology to viruses. Contigs with homology to the same virus species were then assembled to produce scaffold to obtain complete or partial viral genome. Apart from de novo assembly, a reference based-mapping approach was also used to obtain the complete viral genome of known and unknown viruses by mapping the reads to a reference genome or scaffold, respectively. The resulting contigs and scaffold were subjected to further sequence analysis, which include alignment to the non-redundant GenBank protein database by BLASTx. Finally, selected scaffolds with homology to plant viruses obtained by BLASTx analysis to both local plant virus and non-redundant GenBank protein database were validated in vitro.

75

Figure 2-2. Location of primers on the de novo assembled virus scaffold. (a) Pairwise local alignment of the 2003 nt scaffold of a new virus species to a Wheat eqlid mosaic virus (Potyviridae) reference genome. A ~1.6 kb amplicon of the putative new Potyviridae member was obtained from V. arboreum root tissue by RT-PCR and sequenced. (b) Sequence map of the BRRV scaffold with the corresponding primers used for virus detection and full genome amplification (c) The upper Figure and lower Figure showed a close-up diagram for set 1 back to back primer as well as set 2 and abutting primer position on the BRRV scaffold, respectively.

76

Table 2-1. Virus hits obtained by BLASTx analysis of the contigs and scaffolds obtained from V. arboreum and Vc x Vd ‘Emerald’ to the non-redundant protein sequence database in GenBank. Contig/ Query Plant Scaffold Region of Identity E- BLASTx Family/ Genus cov species length the genome (%) value (%) (nt) V. Wheat eqlid mosaic 2003 Potyviridae NIb 32 26 3E-06 arboreum virus* Grapevine 675 Partitiviridae RdRp 87 35 8E-32 partitivirus Black grass 548 Rhabdoviridae Polyprotein 97 38 2E-28 varicosa-like virus Blueberry red 509 Caulimoviridae MP 83 99 3E-98 ringspot virus* Vc x Vd Blueberry red Complete 8293 Caulimoviridae 100 97 0 ‘Emerald’ ringspot virus* genome Black grass 992 Rhabdoviridae Polyprotein 70 37 6E-39 varicosa-like virus Vicia faba Putative 861 Partitiviridae 89 40 3E-60 partitivirus RdRp Persimmon latent 755 Rhabdoviridae RdRp 96 24 4E-08 virus Persimmon latent 533 Rhabdoviridae Pro-ala 26 47 3E-05 virus *Refers to scaffold. MP: Movement protein; RdRp: RNA dependent RNA polymerase; Nuclear inclusion protein b: NIb; Pro-ala: proline-alanine-rich protein.

Table 2-2. No. and percentage of reads aligned to scaffold of putative new viral species of Potyviridae, and scaffold of BRRV. The percentage of mapped reads for were calculated by dividing the no. of mapped reads to the total no. of reads in each library. Scaffold Libraries Total no of reads No of mapped reads % of mapped reads NP a1 21,632,606 1 4.62265E-06 a2 20,532,381 11 5.35739E-05 a3 22,241,592 2 8.99216E-06 a4 18,985,683 1 5.26713E-06 a5 19,085,495 1 5.23958E-06 a6 23,596,288 4 1.69518E-05 a7 21,291,728 1553 0.007293912 a8 26,726,564 969 0.003625606 Total 174,092,337 2542 0.011014166 BRRV e9 46540462 13339 0.028661082 e10 42116082 63519 0.150818872 e11 55345060 622031 1.123914221 e12 45679471 84076 0.184056422 e13 55634226 44187 0.07942413 e14 30322048 12758 0.042074994 e15 46545303 52544 0.112887868 e16 45223360 7603 0.016812108 Total 367,406,012 900057 1.738649697 Libraries a1-a8 corresponds to V. arboreum plant no. 1-8; Libraries e9-e16 corresponds to Vc x Vd ‘Emerald’ plant no. 1-8; NP: scaffold of putative new viral species of Potyviridae; BRRV: Blueberry red ringspot virus.

77

Figure 2-3. Reads coverage, expression level of transcripts from each CDS, region with high coverage and SNPs position in each Vc x Vd ‘Emerald’ library based on the RNA-seq mapping to the BRRV scaffold were shown. a) Blue graph indicated the distribution of reads coverage along BRRV scaffold. Expression level of transcripts for each CDS (green) measured by transcripts per million (TPM) calculation using normalized total transcript count based on the following formula: TPM = (CDS read count * mean read length * 10^6) / (CDS length * total transcript count). The corresponding CDS based on the annotated reference BRRV scaffold were colored based on the transcript expression level which is denoted by blue, white and red for low, normal and highest TPM, respectively. Regions with high coverage and SNPs identification were displayed in yellow and red colored character, respectively. b) Chart showing the average read coverage and no of SNPs in each library. All analysis was performed using Geneious 9.1.6. Libraries e9-e16 corresponds to Vc x Vd ‘Emerald’ plant no. 1- 8.

a e9

e10

e11

e12

e13

e14

78

e15

e16

b Average coverage No. of SNPs 10000 25

8000 20

6000 15

4000 10

2000 5

0 0 e9 e10 e11 e12 e13 e14 e15 e16 e9 e10 e11 e12 e13 e14 e15 e16

Table 2-3. Detection frequency (%) of total DNA samples from Vc x Vd ‘Emerald’ root that were positives for amplification of full length BRRV with back to back primer set 1 (BRRV_b1f/BRRV_b1r) in different PCR reactions. No. of PCR Detection frequency Library Plant no. 1 2 3 4 5 6 7 8 9 (%) e9 28 + + + + + + + - + 89 e10 30 + + + + NI NI + + + 100 e11 37 + + - - NI + + - + 63 e12 41 + + - + NI NI + - - 57 e13 22 + + + + NI NI + - - 71 e14 25 - + - + + - + + + 67 e15 32 + + - + - + + + - 67 e16 34 - + + + + + - - + 67 Libraries e9-e16 corresponds to Vc x Vd ‘Emerald’ plant no. 1-8; (+): positive band of ~8.3kb; (-): absence of expected band; NI: not included

79

Table 2-4. Detection of BRRV using published RRSV3F/4R primers and amplification of full length BRRV by back to back primer set 2 (BRRV_b2f/BRRV_b2r) and abutting primer (BRRV_Ab1f/BRRV_Ab1r). Primers Source Samples RRSV3F/ BRRV_b2f/ BRRV_Ab1f RRSV4R BRRVb2r / BRRVAb1r RCA ECS1-L + + - ECS1-R + - - Pool eme + - - SB + - - DNA ECS1-L + + + ECS1-R + - - Pool eme + + - SB - - - ECS1-L: DNA from leaves of BRRV symptomatic ‘Emerald’ (#1); ECS1-R: DNA from root of BRRV symptomatic ‘Emerald’ (#1); Pool eme: pooled DNA from Vc x Vd ‘Emerald’ root samples; SB: DNA from leaves of Southernbelle (presumed to be BRRV negative).

Figure 2-4. Schematic representation of the in silico assembled BRRV genome and the corresponding back to back primers set 1 (red) used for producing the forward and reverse sequences (blue) by cloning and sanger sequence. Local alignment of both sequences showed more than 98% similarity to the genome scaffold.

80

Potyvirus Bymovirus Rymovirus Ipomovirus Brambyvirus Poacevirus Scaffold obtained from this study Tritimovirus Macluravirus

Figure 2-5. Evolutionary analysis based on amino acid sequences of the NIb region from the NP scaffold obtained from this study and other selected members of different genera in the family Potyviridae aligned by MUSCLE in MEGA7 (Kumar et al., 2016). The bootstrap consensus phylogenetic tree was constructed by Neighbor Joining using Poisson correction method based on 1000 replicates, showing branch nodes more than 75% bootstrap values.

81

Table 2-5. Nt length of each ORFs in different BRRV isolates from Florida and other countries. ORFs Total Isolates I (MP) A B C IV (CP) V (RT) VI (TA) VII length CZ 1101 312 561 600 1488 2004 1284 462 8302 8293- FL 1098 369 561 597 1488 2007 1284-1287 477 8296 NJ 939 369 561 600 1461 1974 1287 429 8303 PL 939 369 561 594 1455 1974 1284 462 8265 SL 1110 369 561 588 1476 2043 1284 462 8299 CZ- Czech Republic; FL-Florida; NJ- New Jersey; PL- Polish; SL-Slovenia

Figure 2-6. The evolutionary relationship between BRRV isolates from Florida and other isolates based on the pairwise distances of the amino acid sequences from a) CP, b) MP, c) RT and d) TA, using SbCMV as an outgroup. The bootstrap consensus phylogenetic tree was constructed by Neighbor-joining method based on matrix of pairwise distances estimated using a p-distance model with 1000 replicates, as shown by the value (%) at the branch nodes. The table below the phylogenetic trees showed the percent identity of amino acid sequences of the corresponding proteins between the BRRV isolates as estimated by MUSCLE. CZ: Czech Republic; FL: Florida; NJ: New Jersey; PL: Polish; SL: Slovenia; CP: coat protein; MP: movement protein; RT: reverse transcriptase; TA: transcriptional activator. a) b)

Isolates CR FL NJ PL SL Isolates CR FL NJ PL SL CZ 97 92 97 95 CZ 98 88 94 96 FL 97 92 97 95 FL 98 88 95 98 NJ 92 92 93 92 NJ 88 88 91 87 PL 97 97 93 95 PL 94 95 91 94 SL 95 95 92 95 SL 96 98 87 94

82

c) d)

Isolates CR FL NJ PL SL Isolates CR FL NJ PL SL CZ 99 98 99 97 CZ 96-97 96 97 95 FL 99 99 99 97 FL 96-97 95-96 96 94-95 NJ 98 98 99 97 NJ 96 95-96 97 95 PL 99 99 99 98 PL 97 96 97 96 SL 97 97 97 98 SL 95 94-95 95 96

83

CHAPTER 3 CHARACTERIZATION OF RNA PLANT VIROMES FROM WILD AND CULTIVATED V. corymbosum IN FLORIDA LEAD TO THE DISCOVERY OF KNOWN AND NOVEL VIRUSES

Introduction

Vaccinium corymbosum (V. corymbosum) is a diverse species that includes the low-chill southern highbush blueberries (interspecific hybrids of Vaccinium corymbosum L.) (SHB). These cultivars were developed through interspecific hybridization between the northern highbush (V. corymbosum), lowbush (V. angustifolium), rabbiteye (V. virgatum), and wild highbush blueberry

(Vaccinium spp) native to southeast Georgia and northeast Florida. SHB is cultivated near wild plants of the same and related species in Florida. This increases the potential risk of viruses moving between wild and cultivated plants. The diverse communities of native Vaccinia and the adjacent Florida’s blueberry production areas could serve as a reservoir for a diverse assemblage of viruses in these species, thus causing spillover and spillback between the wild and cultivated hosts (Roossinck & García-Arenal, 2015). Another pathway that potentially increases the diversity of viruses in cultivated Vaccinium spp. is the lack of virus screening prior to the use of native, wild blueberries in the development of new SHB cultivars.

There are 15 species of viruses including RNA and DNA virus from eight known and two unassigned genera that have been reported in highbush, lowbush, and rabbiteye blueberries world-wide. However, only two species of viruses, BNRBV and BRRV, have been documented and observed in Florida (Cantu-Iris et al., 2013). Viral diseases are currently not a major threat to the blueberry industry in Florida, so no studies have been conducted to determine the viral populations in wild or commercial blueberries in Florida. The objectives of this study were to: 1) characterize the RNA viromes generated from wild and cultivated V. corymbosum in Florida by using a metagenomic approach; and 2) identify novel and known viruses to better understand the

84

diversity of viruses and their contribution to the viral population in blueberry in Florida. To achieve these objectives, we generated RNA viromes from V. corymbosum plants collected from six locations in north central Florida, including two cultivated sites and four wild sites. These

RNA viromes were sequenced using Illumina HiSeq 2500 platform and analyzed using a metagenome analysis pipeline.

Materials and Methods

Plant Materials

A total of 20 samples per site (n=120) with and without virus like symptoms were collected from wild and cultivated blueberries (i.e., V. corymbosum) in central Florida. The wild blueberry samples were collected from: 1) O’leno State Park, High Springs, a wild site not adjacent to commercial blueberries; 2) Morning Side Nature Center, Gainesville, a wild site adjacent to residences; 3) Interlachen, FL, a wild site neighboring commercial blueberry production; and 4)

Island Grove, FL, another wild site neighboring commercial blueberry production (Fig. 3-1). The cultivated blueberries were collected from commercial plantings in Interlachen, FL and Island

Grove, FL. All samples were collected during Fall 2014 (September-October), except wild and cultivated samples from Island Grove which were collected in Fall 2015 (October).

Sample Preparation and Generation of V. corymbosum RNA Plant Viromes

Twenty 0.5 g samples of leaf tissue were pooled by location. Extracted total nucleic acid was enriched for dsRNA following the protocol described by Ho et al. (2015). The quality and concentration of dsRNA were estimated by gel electrophoresis (1% agarose) and NanoDrop

2000 spectrophotometer (Thermo Fisher Scientific Inc. DE. USA), respectively. Samples of dsRNA (100ng) from each site were sent for sequencing using TruSeq RNA Sample Prep Kit v2 on the Illumina HiSeq 2500 platform (Macrogen, Inc., Seoul, Korea) to generate RNA libraries containing 150 bp paired-end sequences.

85

Analyses of RNA Plant Viromes

Reads from RNA libraries were analyzed according to the virome analysis pipeline as illustrated in Figure 3-2. Raw reads from each library were processed using Trimmomatic software (Bolger et al., 2014) to remove adapters and low quality reads by trimming and quality filtering, respectively. The minimum length of reads that were retained from the libraries was

100. The quality filtered reads were de novo assembled to produce contigs and scaffolds by using

SPAdes assembler (Bankevich et al., 2012) using k-mer 55,77, and 99. Only scaffolds with length ≥ 500 were used for downstream analysis. These scaffolds were subjected to a two-step

BLASTx (Altschul et al., 1997) to identify plant viruses with highest sequence match by comparing the scaffolds to a local plant virus protein database (Zheng et al., 2017) followed by non-redundant GenBank protein database using a threshold e-value of 10-5.

Scaffolds with similarity to plant viruses were organized by family, genus, and species according to the 2016 Virus Taxonomy Release of the International Committee on Taxonomy of

Viruses (ICTV) website (https://talk.ictvonline.org/taxonomy/) for approved virus species.

Scaffolds with hits to virus species not yet approved by ICTV were assigned to its corresponding genus and family based on the information provided in the NCBI Taxonomy Database

(https://www.ncbi.nlm.nih.gov/taxonomy). Overlapping scaffolds with the same viral hits were assembled to produce longer scaffolds using Geneious assembler built in Geneious 9.1.6.

Scaffolds were mapped against the corresponding reference genome using Geneious mapper in Geneious 9.1.6 to observe the genome coverage. The scaffolds and the reference sequences were compared by alignment to predict the ORFs and coding sequence (CDS). Reads were finally aligned to published viral sequences and scaffolds in a reference-based mapping approach using Bowtie2 (Langmead & Salzberg, 2012) to obtain complete viral genomes for known and novel viruses, respectively.

86

Sequence and Phylogenetic Analyses

The respective ORF and segment of RNA sequence of the corresponding viruses obtained from this study and selected published reference sequences were aligned using Geneious local

Smith-Waterman alignment in Geneious 9.1.6. Alignment was further refined using MUSCLE

(Edgar, 2004). Pairwise identity comparison between selected virus sequences obtained from this study and other selected members from the corresponding viral genera were performed using

Sequence demarcation tool 1.2 (SDT) (Muhire et al., 2014). Phylogenetic analysis of the nucleotide and amino acid (aa) sequences of selected ORF was performed by neighbor joining method in MEGA (version 7.0) (Kumar et al., 2016) using bootstrap tests with 1000 replicates.

The phylogenetic tree was generated to infer the relationship between the known and new virus sequences to other published viral sequences from the corresponding viral genera.

Virus Validation and Detection in vitro

Sample source and extraction of nucleic acid. Total RNA was extracted from 30 mg of frozen leave tissue of cultivated V. corymbosum collected from Island Grove using a Plant/Fungi

Total RNA Purification Kit catalog no. 25710 (Norgen Biotek Corp., Canada) following the recommended manufacturer’s instructions. Each RNA sample was quantified using NanoDrop

2000 spectrophotometer (Thermo Fisher Scientific Inc. DE. USA). Equimolar concentration of pooled RNA (20 ng/ µl) and individual sample was used to validate the CP gene and to determine the incidence of the virus, respectively.

Primer design. A set of primers (NT_F and NT_R) (Table B-1) covering the whole CP region of the a novel Tepovirus we discovered was designed to validate and detect the presence of this virus. These primers were designed based on the MP region and the 3’ UTR of the de novo assembled complete genome of the putative novel Tepovirus. The expected amplicon size using these primers was ~1 kB.

87

Detection of putative novel Tepovirus. First strand cDNA was synthesized with

ImProm-II reverse transcriptase (Promega, USA) in an RT step followed by DNA amplification in PCR with Taq DNA Polymerase (New England Biolabs, USA), according to the manufacturer’s instructions. For the RT reaction, a total of 10 µl reaction mixture containing 180 ng total RNA and 1 µM Oligo (dT)21v primer was first incubated at 70ºC for 10 min, and then incubated at 25 ºC for 5 min, 42°C for 1 h, and 70°C for 15 min following the addition of 10 µl solution containing a final concentration of 6 mM MgCl2, 1 mM dNTP mix, 20U Rnasin, and 1

µl reverse transcriptase. Subsequently, PCR was carried out using the cDNA template (2 µl) in a reaction mixture containing 2.5 mM MgCl2, 0.5 mM dNTP mix, 0.5 µM of forward and reverse primer, and 0.625U Taq DNA Polymerase. The following cycling condition were followed: initial denaturation at 94°C for 3 min, 35 cycles of 94°C (30 sec), 62°C (45 sec) and 72°C (1 min

10 sec), and final extension at 72°C for 10 min. The RT-PCR reaction was carried out in the

Eppendorf AG 22331 Hamburg Mastercycler (Eppendorf AG., Hamburg, Germany).

Amplified fragments of the PCR products were visualized on 0.8% agarose gel. The expected full-length CP amplicon of the putative novel Tepovirus (~1 kB) produced by primers

NT_F and NT_R using pooled RNA of V. corymbosum in PCR was purified using QIAquick

PCR Purification Kit (Qiagen, Germany) and used as an inset in cloning reaction with pGem-T easy cloning kit (Promega, USA) following the manufacturer’s instructions. The clone was sequenced by Sanger method at Eurofins MWG Operon LLC ( Eurofins Scientific, USA).

Results

General Analyses of the Viromes

The RNA viromes generated from wild and cultivated V. corymbosum contained approximately 198 and 75 million paired-end reads of 150 nt in length, respectively. These reads were further reduced to approximately 183 and 69 million reads in the wild and cultivated V.

88

corymbosum viromes, respectively, following processing of reads previously described in section

3.2 (Table 3-1). De novo assembly of the reads using SPades assembler produced scaffolds more than 500 nt in length between 3485 to 4890 and 2790 to 3361 from the wild and cultivated V. corymbosum viromes, respectively. Comparison of the scaffolds to the local plant virus protein database followed by non-redundant GenBank protein database in two-step BLASTx analyses yielded a total of 224 (1.4%) and 43 (0.007%) putative plant virus scaffolds with matches to known viral sequences from the wild and cultivated V. corymbosum, respectively. The percentage of putative plant virus scaffolds with sequence similarity to known viruses was calculated from the total scaffolds in both wild and cultivated sites varied between locations, ranging from 0.60% to 1.78%.

Diversity of Virus Sequences and Comparison of Viral Populations among the Viromes

Scaffolds identified from both wild and cultivated V. corymbosum viromes produced sequence similarity to plant virus species from a total of 12 virus families, representing a total of

28 virus genera (Figure 3-3). Of the 28 viral genera, scaffolds with sequence similarities to virus species from 24 and 14 viral genera were identified in wild and cultivated V. corymbosum viromes, respectively. Plant viruses from 10 viral genera belonging to a diverse range of viral families including Amalgaviridae, Caulimoviridae, Endornaviridae, Geminiviridae,

Ophioviridae, Partitiviridae, and Virgaviridae were identified in both wild and cultivated V. corymbosum viromes as shown in the overlapping Venn diagram (Figure 3-3b). Although there was considerable overlap observed between the wild and cultivated viromes, each had unique viral genera represented: four viral genera in the wild and fourteen viral genera in the cultivated viromes (Figure 3-4b). Scaffolds with sequence similarity to species in the family

Betaflexiviridae and Bunyaviridae were found in viromes only from cultivated V. corymbosum.

89

Scaffolds with sequence similarity to species in the Bromoviridae, Closteroviridae and

Reoviridae were found in viromes from only wild V. corymbosum (Figure 3-3a).

In addition, although almost all wild V. corymbosum viromes produced scaffolds with similarity to plant virus sequences from an equally wide range of viral families, V. corymbosum collected from Island Grove had the least number of putative plant virus scaffolds with matches to known plant virus sequences representing a fewer number of viral genera compared to other wild sites (Table 3-1, Figure 3-3a). Likewise, the cultivated V. corymbosum viromes produced equivalent number of scaffolds with similarity to plant virus sequences representing an equal number of viral families in both locations.

Viral populations among the viromes of wild and cultivated V. corymbosum in each location varied, as represented by the scaffolds with sequence similarity to virus species from different viral genera (Figure 3-4). Scaffolds with sequence similarity to virus species from the genera Ophiovirus, unclassified partitivirus and unclassified virus were identified in viromes of wild V. corymbosum from all locations. Whereas scaffolds with similarity to virus sequences from nine viral genera including Ampelovirus, Closterovirus, Badnavirus, Bromovirus,

Caulimovirus, , Fijivirus, Ourmiavirus, Tobravirus were only identified in viromes of wild V. corymbosum collected from Gainesville, High Springs, and Interlachen (Figure 3-4a).

Likewise, the whole viromes of cultivated V. corymbosum contained scaffolds with similarity to virus sequences from the genera Amalgavirus, Capulavirus, and Mastrevirus. The rest were identified in either locations (Figure 3-4b).

Viral populations between the viromes of V. corymbosum collected from neighboring wild and cultivated sites at Interlachen and Island Grove were further compared. Scaffolds with sequence similarity to virus species from five viral genera (Amalgavirus, Capulavirus, Cilevirus,

90

Endornavirus, and Tobamovirus) were identified in viromes of both wild and cultivated V. corymbosum collected from Interlachen. Scaffolds with sequence similarity to virus species from three viral genera (Capulavirus, Ophiovirus, and unclassified partitivirus) were identified in viromes of both wild and cultivated V. corymbosum from Island Grove (Figure 3-4c, d). The viromes of V. corymbosum collected from Interlachen at the wild site produced scaffolds with similarity to virus species from ten genera while the cultivated site produced scaffolds with similarity to virus species from four genera. Similarly, viromes of V. corymbosum collected from

Island Grove contained scaffolds closely related to virus sequences belonging to two viral genera exclusively identified in wild sites and five viral genera exclusively identified in cultivated sites

(Figure 3-4d).

Identification of Plant Viruses from the Viromes

Of the total putative plant virus scaffolds from the wild V. corymbosum viromes mentioned earlier, the top three highest number of scaffolds were derived from virus species in the family Partitiviridae (68), Ophioviridae (52), and unclassified virus (29) (Figure 3-3a).

Scaffolds with closest sequence similarity to more than ten different partitivirus species were identified in all sampling locations where the wild V. corymbosum were collected, with the highest number of scaffolds being detected at Interlachen (43), followed by Gainesville (15),

High Springs (7) and Island Grove (3). The longest scaffold, 2009 nt in length, was de novo assembled from wild V. corymbosum virome collected at Interlachen, with the highest aa identity of 49% to Raphanus sativus cryptic virus 1 (AAX51289.2) (Table 3-4). Besides the partitiviruses, abundant scaffolds with high sequence similarity to a member of an Ophioviridae,

Blueberry mosaic associated virus (BlMaV), were also identified in all wild V. corymbosum viromes. The highest number of scaffolds was assembled from plants collected at High Springs

(20) followed by Interlachen (17), Island Grove (10), and Gainesville (5). The longest scaffold,

91

7946 nt in length, was de novo assembled from the virome of wild V. corymbosum collected at

High Springs, with the highest aa identity of 98% to BlMaV (AIF28241.1) (Table 3-3). A high number of scaffolds with similarity to an unclassified virus, Persimmon latent virus that has not been approved by the ICTV were also found in all the wild V. corymbosum viromes. The majority of scaffolds were derived from Gainesville (16), High Springs (6), Island Grove (5), and

Interlachen (2). The virome of wild V. corymbosum collected at High Springs produced the longest scaffold, 6476 nt length, with the highest aa identity of 53% to the Persimmon latent virus (BAM36036.2) (Table 3-3).

As for the cultivated V. corymbosum viromes, the overall top three highest number of scaffolds were derived from virus species in the family Betaflexiviridae (10), Virgaviridae (8), and Ophioviridae (6) (Figure 3-3a). The virome of V. corymbosum collected at Island Grove produced the greatest number of scaffolds with closest sequence similarity to Prunus virus T

(PrVT) (AHM92769.1), a Betaflexiviridae, and BlMaV (AIF28241.1), an Ophioviridae. For

Island Grove, a large number of scaffolds had similarity to Tobacco mosaic virus (TMV), a member in the family Virgaviridae (Table 3-6, 3-7).

The longest scaffold for PrVT was 2504 nt, with highest aa identity of 86%. BlMaV had largest scaffold of 2429 nt, with highest aa identity of 96% (Table 3-7). Similarly, the longest scaffold assembled for TMV was 3060 nt with 100% aa identity (Table 3-6).

In addition to recognizing viral genera representing the majority of viral sequence match produced by BLASTx analyses of the scaffolds assembled from all of the viromes, we observed the presence of an abundant number of scaffolds with low percentage of aa identity to known viral sequences particularly those in the wild viromes. Of the closely related virus species identified for the scaffold by BLASTx, more than 70% of the virus species had less than 65% aa

92

identity to the scaffolds assembled from almost all of the viromes. The percentage of virus species yielding low aa identity to the corresponding scaffolds were noticeably higher in viromes of wild V. corymbosum. Virome of cultivated V. corymbosum collected from Island Grove had the lowest percent of virus species (50%), with low aa identity (<65%) to known plant viruses compared to other viromes.

Sequence Comparison and Phylogenetic Analyses of the Complete Viral Genomes

A total of ten complete viral genomes of five virus species, representing five viral genera, were assembled from six RNA viromes using the virome analysis pipeline describe previously.

The viruses were: Blueberry latent virus (BBLV), Amalgavirus; BlMaV, Ophiovirus; Blueberry red ringspot virus (BRRV), Soymovirus; a putative new species closely related to PrVT,

Tepovirus; and TMV, Tobamovirus (Table 3-8). As shown in Table 3-8, each virome corresponding to different sampling location had at least one whole viral genome assembled by in silico approach, except for virome of wild V. corymbosum collected from Island Grove. In contrast, however, virome of cultivated V. corymbosum collected from Island Grove yielded four complete viral genomes, which were the maximum number of complete viral genomes assembled compared to other viromes.

Blueberry latent virus. Complete genome of BBLV was assembled from three (792-116 nt in length) and two (1132-2401 nt in length) scaffolds obtained from viromes of cultivated V. corymbosum from both Interlachen and Island Grove, respectively. Mapping of reads from

Interlachen to BBLV reference genome (NC014593) using Bowtie2 showed that 258 (0.00062%) reads were aligned to the reference sequence. However, the Island Grove virome had 11x higher number of reads mapped to BBLV compared to those from Interlachen, with mapped reads of

1943 (0.007%). The average coverage of reads mapped to the reference genome was shown to be

10x for Interlachen and 75x for Island Grove. However, there were 4 missing nucleotides

93

between bases 1066-1069 in the complete BBLV genome assembled from virome of cultivated

V. corymbosum at Interlachen. This might be due to the insufficient reads coverage in this region that were reflected by the lower percentage and average coverage of mapped reads compared to those from Island Grove.

The size of the genome assembled from the virome of cultivated V. corymbosum at Island

Grove is 3432 nt including the 5’ (167 nt) and 3’ (99 nt) untranslated region. Similar to the genome organization of other published sequences of BBLV, the genome encodes for two ORFs that were partially overlapped by 314 nt long (Martin et al., 2011). A putative protein of 375 aa was encoded by ORF1 (1125 nt), while a putative fusion protein of 1054 aa was encoded by

ORF2 (3162 nt). This contained the RNA-dependent RNA polymerase (RdRp) domain (aa 586-

782) (Martin et al., 2011; Marchler-Bauer et al., 2016).

Based on the pairwise nucleotide identity of the viral genomes as represented by the color-coded blocks, there were few nucleotide divergences observed between BLV sequences obtained from viromes of cultivated V. corymbosum with other published sequences as shown by the high percentage of pairwise identity (>99%) (Figure 3-5a). Phylogenetic analysis of the deduced aa sequences using the whole genome of BBLV sequences obtained from this study and other published isolates showed that these sequences were clustered in the same clade with high confidence of bootstrap value (Figure 3-5a).

Blueberry mosaic associated virus. Complete genomes of BlMaV consisted of three

RNA segments and were assembled from six scaffolds (1376-2429 nt in length) from viromes of

V. corymbosum collected from the cultivated site at Island Grove and from twenty scaffolds

(519-7946 nt in length) from viromes collected from O’leno State Park at High Springs. In the virome of wild V. corymbosum from Interlachen, however, only the RNA3 segment encoding the

94

complete nucleocapsid (NP) sequence was assembled. Mapping and the average coverage of reads from V. corymbosum viromes from Island Grove and High Springs to each RNA segment assembled from these viromes is shown in Table 3-9. The percentage of reads mapped to RNA1 was higher from the High Spring virome, while the Island Grove virome had a higher percentage of mapped reads for RNA2 and RNA3 (Table 3-9).

The length of each RNA segment assembled from each virome varied, although the size of corresponding ORFs were similar when compared to the reference sequence (Table 3-10). The complete segmented genome of BlMaV assembled from Island Grove and High Springs viromes accounted for 11,271 nt and 11,565 nt, respectively, and differed by 196 and 98 nt from the reference genomes (KJ04366-8) described previously (Thekke-Veetil et al., 2014). The length of

RNA1 assembled from Island Grove is approximately 7747 nt, encoding for RdRp (7014 nt) and a 23 kDa protein (585 nt), which are separated by 123 bases of intergenic region, including the 3’ untranslated region (Table 3-10). However, the 5’ untranslated region (UTR) and a stretch of 30 nt between bases 7316 to 7345 could not be recovered for RNA1. Likewise, the length of RNA1 assembled from High Springs is approximately 7747 nt, encoding for RdRp (7014 nt) and 23 kDa protein (579 nt), which are separated by 140 bases of intergenic region, including the 5’

(190 nt) and 3’ (43 nt) UTRs (Table 3-10).

The length of RNA2 assembled from Island Grove virome is 1981 nt in size, and differed by 47 nt from the reference sequence, whereas the RNA 2 from High Springs is 1939 nt, differed by 103 nt from the reference sequence. These RNAs contain an ORF encoding a movement protein of 1545 nt in length. RNA2 assembled from Island Grove virome has 5’ UTR of 334 nt and 3’UTR of 60 nt. Likewise, the RNA2 assembled from the High Springs virome has 5’ UTR of 337 nt and 3’UTR of 99 nt in length.

95

Viromes of V. corymbosum collected from Island Grove, High Springs, and Interlachen produced RNA3 ranging from 1543 to 1660 nt in size, which differs by 27 to 60 nt compared to the reference sequence. RNA3 contains an ORF of 1368 nt in length that is similar in size across the viromes and encodes the NP. The 5’ and 3’ UTRs of the RNA3 assembled from Island Grove are 111 and 64 nt, respectively, which were shorter in length than those of the reference sequence. The RNA3 constructed from High Springs has 5’ and 3’ UTRs that was longer than those of the reference sequence, with length of 165 and 127 nt, respectively. However, the RNA3 assembled from Interlachen contains 5’ and 3’ UTRs of 163 and 61 nt long, respectively. The

RNA3 is longer at the 5’ UTR and shorter at the 3’ UTR than those of the reference sequence.

Each RNA sequence obtained from the virome was compared to the reference sequence

(KJ04366-8) by pairwise local alignment. RNA1, 2, and 3 produced approximately between 80-

81%, 80% and 83% of nucleotide identity, respectively (Table 3-9). Pairwise nucleotide analysis was performed in SDT using the NP ORF sequence obtained from viromes of cultivated V. corymbosum from Island Grove as well as wild V. corymbosum from High Springs and

Interlachen. As shown in Figure 3-5b, the NP obtained from Island Grove and Interlachen viromes shared highest pairwise nucleotide identity of more than 90% and both sequences shared around 86% pairwise identity with sequence isolates from Arkansas (KJ04368) and Japan

(LC066301). However, NP sequence assembled from High Springs virome appeared to be considerably diverged from other isolates including those from Florida, by sharing just ~81% pairwise identity. The results of pairwise analysis using SDT were reflected in the phylogenetic analysis of the deduced NP aa sequences, whereby isolates from Arkansas and Japan were grouped in the same clade as those isolates from Island Grove and Interlachen with high confidence of bootstrap value, although both forming a separate subgroup (Figure 3-5b). As

96

expected, the NP sequence from High Springs formed a distinct clade from other isolates, suggesting that BlMaV sequence detected in the wild V. corymbosum was slightly deviated from the other isolates.

Blueberry red ringspot virus. A complete genome of BRRV was assembled only from the virome of cultivated V. corymbosum at Island Grove from a single scaffold of 8392 nt in length, with 0.95% of reads and 4500x average coverage mapped to the scaffold by Bowtie2.

Similar to the genome organization of other published sequences, the scaffold of BRRV encoding for eight ORFs: ORF I (movement protein), A, B, C, IV (coat protein), V (reverse transcriptase), VI (translational transactivator), and VII. Though, there were some differences in the length of ORFs I, C, RT, and VII, as well as the length of whole genome which is the longest when compared to other isolates (Table 3-11).

Whole genome pairwise comparison using SDT showed that the BRRV scaffold shared highest nt identity (97%) to the published sequence of BRRV isolates from Czech Republic

(HM159264) and Poland (JN205460) (Figure 3-5c). In addition, phylogenetic analysis using the whole genome of different BRRV isolates indicated that the BRRV genome recovered from this study was grouped in the same clade with isolate from Czech Republic and Slovenia, while the isolate from New Jersey was grouped with the isolate from Poland.

Tentative new species in the genus Tepovirus. Apart from finding known viruses discussed so far, a total of 10 scaffolds, with the greatest length of 2504 nt and represented by

18.5 % of total mapped reads (74,000x average coverage), from virome of cultivated V. corymbosum collected at Island Grove had closest sequence similarity to Prunus virus T in the

BLASTx analyses. Mapping of these scaffolds against the reference sequence (PrVT isolate

Aze239 as (NC024686), which is 6835 nt in length by Geneious assembler, showed that all 10

97

scaffolds spanned the whole genome of PrVT, including the 5’ and 3’ UTRs. The scaffolds were de novo assembled using Geneious assembler to produce larger scaffold of 7200 nt. Reads were mapped against the 7200 nt scaffold by Bowtie2 to resolve nucleotide ambiguities. This produced consensus sequences that were used to draft the new viral genome for further analysis and validation. The ORFs of the draft genome were predicted in Geneious by comparing the sequence to the PrVT genome.

Primers were designed in the movement protein (MP) ORF and 3’UTR regions of the draft genome to obtain the whole CP for validation and genome completion of the new virus by

PCR. The sequence from the cloned fragment (1010 bp) showed more than 96% pairwise identity to the viral genome draft. Alignment of sequence obtained by the Sanger method to the draft genome allowed us to assemble the complete genome of the new virus species, tentatively named as Blueberry virus T (BlVT). This virus had genome organization similar to the previously described sequences of members from the genus Tepovirus in the Betaflexiviridae.

The genome of 7200 nt contained three overlapping ORFs encoding for the RNA-dependent

RNA polymerase (RdRp) (5457 nt), MP (1146 nt), and CP (663 nt). The genome also contained the 5’ (109 nt) and 3’ (264 nt) UTRs. The RdRp and MP region overlapped by 89 nt (bases 5463 to 5566), whereas the MP and CP overlapped by 350 nt (bases 6274 to 6623). The differences between the length of BlVT and PrVT in the RdRp, MP, and CP regions were 120, 6, and 3 nt, respectively (Table 3-6).

Pairwise nt comparison between each ORF of BlVT and PrVT by MUSCLE indicated that these viruses shared between 61-70% pairwise identity in the corresponding ORFs and 68% and 59% pairwise identity in the 5’ and 3’ UTRs, respectively (Table 3-12). Pairwise aa comparison of the putative RdRp and CP encoded by BlVT and the corresponding proteins of

98

selected sequences representing members of the family Betaflexiviridae using SDT indicated that

BlVT shared the highest pairwise identity with the RdRp (55%) and CP (64%) of PrVT (Figure

3-5d). Furthermore, phylogenetic analysis of the putative RdRp encoded by BlVT showed that this virus is grouped by highly significant bootstrap value with the same clade as PrVT and

Potato virus T (PVT), the only members of the genus Tepovirus (Figure 3-5d). While phylogenetic analysis of the putative CP encoded by BlVT again indicated that this virus was clustered with PrVT, both viruses however were separated from PVT. The disparities of the phylogenetic grouping of the RdRp and CP protein of these viruses might be due to the differences of aa identity of the RdRp and CP protein between PrVT and PVT, as reported earlier

(Marais et al., 2015). Nt identity of the RdRp and CP of PrVT and PVT each were slightly higher and lower than the genus demarcation criteria, respectively, which is defined as sharing less than about 45% nt identity in these genes (Marais et al., 2015; Adams et al., 2016).

Additionally, the presence of BlVT in the cultivated V. corymbosum samples at Island

Grove was determined by PCR using primers designed earlier for the validation of CP gene. The same leave samples used to generate the virome of V. corymbosum collected from Island Grove were utilized to determine the incidence of BlVT. We detected BIVT in 15% of the 20 samples of five cultivars tested. Two samples from the cultivar ‘Gulf Coast’ and one sample from

‘Windsor’tested positive.

Discussions

Viral Metagenomics Unraveled Plant Viral Diversity in Wild and Cultivated V. corymbosum in Florida

Plant viral metagenomics have no doubt contributed to our understanding of viral populations as well as unravelling etiology of viral diseases in various plant species. This is supported by the exponentially increased discovery of novel plant viruses that have been

99

described thoroughly by Barba et al. (2014) and Roossinck et al. (2015a). In this study, viral populations in the wild and cultivated species of V. corymbosum, collected from different locations in Florida, were characterized using a viral metagenomic approach. Using the developed metagenome analysis pipeline, less than 2% of the de novo assembled scaffolds assembled from all the RNA viromes were considered as putative plant virus scaffolds. This is because a large percentage of scaffolds produced sequence similarity to either insect or fungal viruses. In addition, the application of a two-step BLASTx analyses in the metagenome analysis pipeline utilized in this study may contribute to the reduction in plant virus associated scaffolds due the filtering of plant derived scaffolds following the BLASTx analysis against the GenBank nonredundant protein database.

Comparison of Virus Diversity among the Plant Viromes

Most metagenomics studies have focused only on the cultivated plant species and crops, while native wild plant species are known to potentially harbor uncharacterized viruses

(Roossinck et al., 2015a). Recently, several extensive studies of various wild plant species through viral metagenomics have demonstrated that the viral population in these plants were comprised of diverse viral species (Muthukumar et al., 2009; Roossinck et al., 2010; Wylie et al.,

2012; Kehoe et al., 2014; Ong et al., 2016; Koh et al., 2016).

The metagenome analysis of the viromes of wild and cultivated species of V. corymbosum demonstrated that the population of viruses in this host are greatly diverse. This is indicated by the presence of scaffolds with sequence similarity to a total of 28 viral genera. A greater virus diversity was observed in the viral population in wild V. corymbosum native to

Florida, as shown by the number of viral genera, which is twice more than those of cultivated viromes. Previous study of luteovirus complex, Barley yellow dwarf virus and Cereal yellow dwarf virus, in grasslands had shown that interactions within and among plant communities,

100

insects, herbivores, abiotic factors, and the composition of the wild plant communities significantly affect the prevalence and spread of these viruses (Power et al., 2011; Moore et al.,

2011; Moore & Borer, 2012). Hence, several conditions could contribute to the greater diversity of viruses in viral population of the wild V. corymbosum compared to the cultivated ones. These such conditions include the presence of different wild Vaccinium species and other plant communities within and surrounding the areas, the occurrence of vectors, and abiotic factors at these locations. For example, it was known that there can be genotypic variation in the native population of V. corymbosum from the Florida peninsula due to natural introgression between this species and V. darrowi Camp (Lyrene, 1997). The continuous gene flow from interaction between V. corymbosum and its related wild species eventually could shape the viral population in the wild V. corymbosum. In addition, the presence of diverse plant communities within and surrounding the wild V. corymbosum may act as virus reservoir, thus increasing the heterogeneity in population of viruses due to their movement between these plants. The occurrence of insects as virus vectors in the wild area may be another important factor that contribute to the greater virus diversity in the wild V. corymbosum by facilitating the movement of viruses between plants. On the other hand, the movement of viruses in the cultivated plants may be restricted due to the implementation of agricultural management practices. Two key practices are the application of insecticides and treatment of soil with fungicides, which subsequently limit the movement of those viruses that require insects or fungal vector for their transmission.

Identification of Plant Viruses from the Viromes

The majority of scaffolds assembled from virome of wild and cultivated V. corymbosum shared very low sequence similarity to known plant virus sequences from diverse viral genera by

BLASTx analyses, implying the presence of novel viruses in these plants. In addition, some of

101

these scaffolds were present in low frequency and sequence similarity to different virus species within the same viral genera. This suggests the presence of heterogenous virus species or sequence variants of novel viruses. As shown by the BLASTx results obtained from wild V. corymbosum viromes, a large number of scaffolds with closest sequence similarity to partitiviruses, also known as persistent viruses, were found to dominate these viromes. These results were supported by previous study in Tallgrass Prairie Preserve of northeastern Oklahoma, which found that persistent viruses particularly partitiviruses are predominantly found in wild plants although their presence in cultivated plants are not unusual (Rossinck, 2012). The detection of these viruses was undoubtedly favored by the use of dsRNA-enriched sample since many of them have dsRNA genomes (Rossinck, 2012).

Characterization of known viruses in V. corymbosum new to Florida

The metagenome analysis of the viromes of wild and cultivated species of V. corymbosum had enabled the reconstruction of ten complete viral genomes of five virus species including BBLV, BlMaV, BRRV, and a putative new species closely related to PrVT and TMV.

Although complete genome of TMV was detected in four out of six viromes, TMW was not included for further sequences analysis since it is well known that tobamoviruses are frequently found in various environments due to its high stability (Rossinck, 2012). Moreover, the TMV strains detected in these viromes were the ‘lab strains’, suggesting the possibility of contamination by this virus during sample preparation.

The complete genomes of BBLV, an Amalgavirus, were recovered from both viromes of cultivated V. corymbosum. Pairwise sequence comparison between whole genome of BBLV from Florida with other published isolates showed more than 99% nucleotide identity. This finding is expected since it was reported previously that BBLV has a very stable population structure, with less than 0.5% diversity among partial and complete sequences of BBLV isolates

102

from Japan and the US (Isogai et al., 2011; Martin et al., 2011; 2012). Phylogenetic analysis of the complete genome of BBLV isolates from Arkansas, Florida, Michigan, Oregon showed that these isolates clustered together, suggesting that the complete genome of BBLV recovered from

Florida is another isolate of BBLV. Besides the identification of BBLV from the cultivated sites, a total of four scaffolds with little aa sequence similarity (27-37%) to BBLV, Southern tomato virus, and Rhododendron virus A were identified by BLASTx in the virome of wild V. corymbosum from Gainesville and Interlachen. These results suggest the presence of novel viruses that were distantly related to those of amalgaviruses in the virus population of the wild plant.

Scaffolds with high aa sequence similarity (96-97%) to an Ophiovirus, BlMaV, were identified in all the viromes except the cultivated V. corymbosum virome at Interlachen. The presence of scaffolds closely related to BlMaV in all the viromes of wild V. corymbosum suggests that the occurrence of this virus in the wild Vaccinium might be common. Although

BlMaV was identified in almost all the viromes, the complete genomes of BlMaV, which consisted of three segments of negative strand ssRNA (RNAs 1–3) were only recovered from viromes of cultivated and wild V. corymbosum at Island Grove and High Springs, respectively.

BLASTx results and reference-based mapping indicated that the virome of V. corymbosum from

High Springs produced the highest number of scaffolds and overall mapped reads to the BlMaV genomes, signifying that the wild V. corymbosum collected from this location probably contained a high titer of BlMaV. The length of complete segmented BlMaV genomes from Island Grove and High Springs differed by 196 and 98 nt from the reference genomes (KJ04366-8), respectively. The large differences in complete genome length for BlMaV from Island Grove is clearly due to the absence of the 5’ UTR of the RNA1. Pairwise comparison using the NP

103

regions of BlMaV isolates assembled from this study (Island Grove, Interlachen and High

Springs) and those from Arkansas and Japan indicated that the isolate from Island Grove and

Interlachen had 14% sequence divergence from the rest of the isolates. This result was supported by a previous study which found that BlMaV had low genetic diversity among isolates as shown by 13% nt divergent in the NP regions (Thekke-Veetil et al., 2015). In contrast, the isolate from

High Springs showed up to ~20% nt divergent with other isolates, thus suggesting that this sequence might be evolved from reassortment between BlMaV variants in a mixed infection. In addition, phylogenetic analysis using deduced aa sequences of the NP regions from Arkansas,

Florida and Japan showed that the isolate from High Springs form a separate branch while the isolates from Island Grove and Interlachen were clustered in the same subgroup. This result suggests that BlMaV could have been established earlier in the wild V. corymbosum at High

Springs in order for diversification to occur.

The analysis of RNA virome had additionally allowed for the assembly of a complete

BRRV genome (8392 nt), a dsDNA virus, from the virome of cultivated V. corymbosum at Island

Grove. Prior to this study, only BRRV symptomatic blueberry plants were observed in Florida but the complete genome has never been documented in this state. The detection of BRRV in the

RNA virome is not surprising since it produces RNA/DNA intermediate during its replication cycle, thus making it possible to form dsRNA complex at some point. The genome organization and ORFs of BRRV isolate from Island Grove is similar to the previously published isolates.

Overall, whole genome pairwise comparison analysis of the BRRV isolates from the Czech

Republic, Island Grove, New Jersey, Poland and Slovenia showed that these isolates had less than 10% nt divergence, implying low genetic diversity thus suggesting that this virus has a very stable population structure. In addition, pairwise comparison and phylogenetic analysis indicated

104

that BRRV isolate from Island Grove was closely related to an isolate from the Czech Republic, suggesting that there may be exchange of BRRV-infected material between these regions.

Characterization of a novel Tepovirus in V. corymbosum

In addition to the detection of known viruses of blueberry that are new to Florida, this study had also led to the discovery of a novel virus that is closely related to a member in the genus Tepovirus belonging to the family Betaflexiviridae. The use of viral metagenomics coupled with PCR-based targeted resequencing had allowed for the reconstruction of a complete novel viral genome. Partial resequencing completely validated the CP region of the de novo assembled scaffold representing the full length viral genome obtained by metagenome analysis, as demonstrated by previous study (Marais et al., 2015).

Pairwise aa comparison of the RdRp and CP of the putative novel virus with the corresponding proteins of selected members representing different genera in the family

Betaflexiviridae demonstrated that this novel virus was distantly related to other members of the respective viral family, though higher sequence homology was observed with PrVT.

Furthermore, pairwise nt comparison showed that the putative novel virus only shared 61 and

65% identity with the RdRp and CP of PrVT, respectively, which fall well below the currently accepted species demarcation criteria in the genus Tepovirus, described as having less than 72% nt identity between the RdRp and CP genes (Adams et al., 2016). Phylogenetic analyses of these genes demonstrated that the novel virus is consistently grouped with PrVT. The similar size and organization of the new viral genome to other members of the family Betaflexiviridae as well as statistically supported phylogenetic grouping further suggested that this novel virus, proposed as

BlVT, should be considered as a new species in the genus Tepovirus.

The biological information regarding BlVT is still lacking due to the unknown vector as well as the limited knowledge on the spread of this virus in the field. Although molecular

105

screening of BlVT in 20 V. corymbosum samples collected from Island Grove, Florida, indicated a 15% virus incidence, the infection of BlVT still could not be associated with specific virus symptoms at this point due to mixed viral infections in the plants. Hence in the near future, the primers developed in this study could potentially be used for the detection of BlVT to identify more isolates or related virus variants as well as to associate the virus with specific symptom.

Conclusion

We showed that analyses of viromes generated from dsRNA-enriched samples using metagenomics is a great tool to explore viral diversity of not only RNA viruses of all genome types (dsRNA, positive and negative sense ssRNA) but also DNA viruses to some extent. Based on metagenome analysis of the viromes, it was shown that the viral diversity varied by location, and the viral diversity in the wild V. corymbosum was found to be greater than the cultivated species, with considerable overlap and unique genera represented in both species.

Overall, this study has led to the discovery of three known blueberry viruses (BBLV,

BlMaV and BRRV) that are new to Florida, as well as a tentative novel Tepovirus (BlVT) that has never been reported in blueberry. In addition, this study has demonstrated the occurrence of

BlMaV in wild highbush, V. corymbosum, for the first time, as well as providing evidence that

BLV and BRRV may only occur in the cultivated blueberry plants. Taken together, the data obtained from this work suggest that tomorrow’s virus problem for blueberry producers in

Florida may be lurking in the fence rows and natural areas today.

106

Figure 3-1. Map showing the locations of the collected blueberry samples.

107

Figure 3-2. Virome analysis pipeline used for identification of viruses. Raw reads from metagenome data were processed by filtering the reads based on quality and trimming the adapter sequences. These reads were de novo assembled to produce contigs that were then subjected to a two-step BLASTx analysis to identify plant viruses, which include comparison to a local plant virus database followed by comparison to a non- redundant GenBank protein database. Contigs with homology to the same virus species were then assembled to produce scaffolds. Apart from de novo assembly, a reference based-mapping approach was also used to obtain complete or partial viral genomes. Published viral genomes and scaffolds were used as reference sequences for known and novel viruses, respectively.

108

Table 3-1. The number of processed reads and scaffolds, and the percentage (%) of putative plant virus scaffolds for each RNA library corresponding to different sampling sites. No. of reads No. of No. of putative % of putative Libraries scaffolds ≥ plant virus plant virus Raw Processed 500 scaffolds scaffolds GV 52,166,912 48,198,079 4256 52 1.22 HS 50,757,234 46,322,605 3485 60 1.72 IL 44,181,866 41,424,158 4890 87 1.78 IG 50,931,730 47,527,410 3899 25 0.64 ILC 44,632,114 41,527,161 3361 20 0.60 IGC 30,729,018 27,504,778 2790 23 0.82 GV: Gainesville; HS: High Springs; IL: Interlachen; IG: Island Grove; ILC: Interlachen cultivated site; IGC: Island Grove cultivated site.

109

a GV

HS

IL

IG Locations

Amalgaviridae Betaflexiviridae Bromoviridae ILC Bunyaviridae Caulimoviridae Closteroviridae Endornaviridae Geminiviridae Ophioviridae Partitiviridae Unassigned IGC Unclassified Virgaviridae

0 10 20 30 40 50 60 70 80 90 100 No of scaffolds b

Figure 3-3. Viral populations in the viromes of wild and cultivated V. corymbosum. a) Putative viral scaffolds with similarity to plant virus species from different range of virus families from each sampling location. b) Diversity of virus sequences represented by different range of viral genera of closely related viruses as identified by BLASTx analyses. Overlapping region indicated viral genera present in both sites. GV: Gainesville; HS: High Springs; IL: Interlachen; IG: Island Grove; ILC: Interlachen cultivated site; IGC: Island Grove cultivated site. Unclassified refers to virus that has not been approved by the ICTV.

110

Figure 3-4. Comparison of viral populations among the viromes of wild and cultivated V. corymbosum from each location represented by scaffolds with closest sequence similarity to virus species belonging to different viral genera as identified by BLASTx analyses. a) Scaffolds with sequence similarity to virus species representing different viral genera identified from the wild viromes. b) Scaffolds with sequence similarity to virus species representing different viral genera identified from the cultivated viromes. c) Scaffolds with sequence similarity to virus species representing different viral genera identified from viromes of wild and cultivated V. corymbosum from Interlachen. d) Scaffolds with sequence similarity to virus species representing different viral genera identified from viromes of wild and cultivated V. corymbosum from Island Grove. Unclassified virus genera and unclassified virus corresponds to non-approved virus species with known and unknown genera, respectively. a Unclassified virus Unclassified partitivirus Unclassified endornavirus Tobravirus Tobamovirus Tenuivirus Pomovirus Ourmiavirus Ophiovirus Mastrevirus Ilarvirus Fijivirus Endornavirus Interlachen Emaravirus Genera Island Grove Deltapartitivirus Closterovirus High Springs Cilevirus Caulimovirus Gainesville Capulavirus Bromovirus Badnavirus Ampelovirus Amalgavirus Alphapartitivirus 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 No of scaffolds

111

b Unclassified Partitivirus Unclassified Ourmiavirus Tospovirus Tobamovirus Tepovirus Tenuivirus Soymovirus Solendovirus

Genera Ophiovirus Interlachen Mastrevirus Island Grove Endornavirus Cilevirus Capulavirus Amalgavirus

0 2 4 6 8 10 12 No of scaffolds c Unclassified virus Unclassified partitivirus Unclassified endornavirus Tospovirus Tobamovirus Tenuivirus Solendovirus Pomovirus Ourmiavirus Ophiovirus Mastrevirus

Ilarvirus Genera Endornavirus Interlachen wild Deltapartitivirus Cilevirus Interlachen cultivated Closterovirus Capulavirus Ampelovirus Amalgavirus Alphapartitivirus 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 No of scaffolds

112

d Unclassified virus Unclassified partitivirus Unclassified Ourmiavirus Tepovirus Soymovirus

Ophiovirus Genera Mastrevirus Island Grove wild Endornavirus Island Grove cultivated Capulavirus Amalgavirus 0 2 4 6 8 10 12 14 16 18 No of scaffolds

Table 3-2. Plant viruses with its corresponding viral genera which produced closest sequence similarity to the scaffolds (>500 nt in length) as identified by BLASTx analyses of wild V. corymbosum virome from Gainesville. No. Highest Max Max Closely related Highest Genera of Identity Query Scaffold Proteins virus sp. E-value hits (%) cov (%) length (nt) Tobacco mosaic Tobamovirus 5 100 0 100 2687 MP, RdRp virus Blueberry mosaic 23kDa, Ophiovirus 5 96 0 94 2244 associated virus RdRp Raphanus sativus Unclassified 2.16E- 5 65 100 1681 CP, RdRp cryptic virus 1 partitivirus 168 Persimmon latent Unclassified RdRp, 16 54 0 100 6997 virus virus PArp Vicia faba Unclassified 1 49 4.82E-88 96 908 RdRp partitivirus 1 partitivirus Grapevine Unclassified 1 48 1.19E-80 84 1066 RdRp partitivirus partitivirus Alphapartiti- 3.09E- Rose partitivirus 3 42 86 1655 CP, RdRp virus 107 Cassia yellow Bromovirus 1 40 1.02E-33 86 577 RdRp blotch virus Raphanus sativus Unclassified 4 40 2.63E-74 93 1239 RdRp cryptic virus 2 partitivirus Rice grassy stunt Tenuivirus 1 36 9.66E-19 90 518 RdRp virus Frangipani mosaic Tobamovirus 1 36 1.57E-14 80 517 RdRp virus Pepper cryptic Deltapartiti- 1 33 1.09E-15 60 672 RdRp virus 1 virus Ambrosia asymptomatic virus Badnavirus 1 31 1.49E-18 82 631 RT 2 UKM-2007 Sugarcane white Mastrevirus 1 30 3.18E-09 71 551 CP streak virus

113

Table 3-2. Continued. Southern tomato Amalgavirus 1 29 9.69E-11 64 850 FP virus Rice black streaked Fijivirus 3 28 1.18E-75 81 4333 RdRp dwarf virus Blueberry latent Amalgavirus 1 27 6.60E-09 60 680 FP virus Maize rough dwarf Fijivirus 1 20 3.83E-06 61 2742 P2 virus

Table 3-3. Plant viruses with its corresponding viral genera which produced closest sequence similarity to the scaffolds (>500 nt in length) as identified by BLASTx analyses of wild V. corymbosum virome from High Springs. No. Highest Max Max Closely related virus Highest Genera of Identity Query Scaffold Proteins sp. E-value hits (%) cov (%) length (nt) Tobacco mosaic CP, Tobamovirus 4 100 0 97 3514 virus RdRp 23kDa, Blueberry mosaic MP, Ophiovirus 20 98 0 100 7946 associated virus NP, RdRp Grapevine Unclassified 2 77 4.58E-53 100 812 RdRp partitivirus partitivirus Cassava associated Gemycircular 2 63 5.23E-41 73 666 Rep gemycircular-virus 1 -virus Persimmon latent Unclassified PArp, 6 53 0 99 6476 virus virus RdRp Figwort mosaic Caulimovirus 1 48 1.23E-83 44 1270 RT virus Raphanus sativus Unclassified 1 45 1.38E-59 73 975 RdRp cryptic virus 1 partitivirus Diuris pendunculata Unclassified 1 43 3.20E-72 97 745 RdRp cryptic virus partitivirus Pinus sylvestris Unclassified 2 42 3.74E-32 89 602 RdRp partitivirus NL-2005 partitivirus Broad bean necrosis Pomovirus 1 39 1.47E-20 43 875 RdRp virus Citrus leprosis virus Cilevirus 1 38 4.62E-32 71 855 RdRp C Tobacco rattle virus Tobravirus 1 36 3.37E-24 50 1008 RdRp Strawberry vein Caulimovirus 1 35 1.27E-06 38 519 RT banding virus Rice grassy stunt P1.339K, Tenuivirus 5 33 1.22E-82 92 7978 virus RdRp Rose rosette virus Emaravirus 1 32 7.83E-08 49 506 RdRp Rice stripe virus Tenuivirus 5 31 2.24E-25 94 1403 RdRp

114

Table 3-3. Continued. Pineapple mealybug wilt-associated virus Ampelovirus 1 30 5.63E-19 62 1020 RdRp 3 Yerba mate Unclassified 1 29 7.32E-05 48 794 PP endornavirus 1 endornavirus Sugarcane white Mastrevirus 1 27 4.83E-09 66 554 CP streak virus Vicia faba Unclassified 1 27 4.24E-13 83 543 RdRp partitivirus 1 partitivirus Bromus catharticus Mastrevirus 1 26 3.23E-11 73 677 CP striate mosaic virus Tobacco streak virus Ilarvirus 1 23 4.64E-08 80 645 RdRp

Table 3-4. Plant viruses with its corresponding viral genera which produced closest sequence similarity to the scaffolds (>500 nt in length) as identified by BLASTx analyses of wild V. corymbosum virome from Interlachen. No. Highest Max Max Closely related virus Highest Genera of Identity Query Scaffold Proteins sp. E-value hits (%) cov (%) length (nt) CP, MP, Tobacco mosaic virus Tobamovirus 8 100 0 100 1316 RdRp 23kDa, Blueberry mosaic MP, Ophiovirus 17 97 0 100 3458 associated virus NP, RdRp Unclassified 6.90E- Grapevine partitivirus 5 80 100 1792 RdRp partitivirus 101 Diuris pendunculata Unclassified CP, PP 3 74 1.19E-85 100 1874 cryptic virus partitivirus (RdRp) Unclassified 2.13E- Rose partitivirus 2 65 92 1867 RdRp partitivirus 161 Raphanus sativus Unclassified 1.95E- CP, 9 49 100 2009 cryptic virus 1 partitivirus 143 RdRp Radish partitivirus Unclassified 1 47 6.33E-18 46 523 RdRp JC-2004 partitivirus Persimmon latent Unclassified 2 46 2.04E-64 98 798 RdRp virus virus Persimmon cryptic Unclassified 2 45 3.18E-47 82 1316 RdRp virus partitivirus Raphanus sativus Unclassified 4 43 1.53E-54 90 1578 RdRp cryptic virus 2 partitivirus Pinus sylvestris Unclassified 4 42 1.31E-38 99 933 RdRp partitivirus NL-2005 partitivirus Vicia faba partitivirus Unclassified 1 42 8.44E-71 93 908 RdRp 1 partitivirus Unclassified Arhar cryptic virus-I 1 40 1.58E-66 87 1243 RdRp partitivirus Alphapartiti- Beet cryptic virus 1 4 38 6.06E-46 88 789 RdRp virus

115

Table 3-4. Continued. Alphapartiti- Beet cryptic virus 1 4 38 6.06E-46 88 789 RdRp virus Alphapartiti- Vicia cryptic virus 1 38 3.72E-21 69 551 RdRp virus Pepper cryptic virus Deltapartiti- 2 38 1.93E-79 99 1364 RdRp 1 virus Black raspberry Unclassified 1 38 9.57E-34 97 557 RdRp cryptic virus partitivirus Southern tomato Amalgavirus 1 37 5.72E-41 70 867 RdRp virus Ourmia melon virus Ourmiavirus 3 37 7.15E-27 58 2079 RdRp Pepper cryptic virus Deltapartitivi 1 36 1.25E-26 95 615 RdRp 2 rus Citrus leprosis virus Cilevirus 1 35 4.35E-21 89 835 RdRp C 1a Apple mosaic virus Ilarvirus 1 35 3.37E-14 20 1910 (VM) Rhododendron virus Amalgavirus 1 32 1.71E-07 45 599 RdRp A Phaseolus vulgaris PP Endornavirus 1 32 3.11E-20 14 4100 endornavirus 2 (VH) Unclassified Rose cryptic virus 1 1 32 9.58E-20 95 609 RdRp partitivirus Deltapartiti- Fig cryptic virus 1 31 2.84E-12 87 522 RdRp virus Epirus cherry virus Ourmiavirus 2 31 1.43E-12 54 1334 RdRp Persea americana Unclassified PP 1 31 4.18E-26 98 602 endornavirus 1 endornavirus (RdRp) Grapevine leafroll- Closterovirus 1 28 3.98E-07 78 510 RdRp associated virus 2 VH, Beet virus Q Pomovirus 1 28 4.29E-39 26 9107 RdRp Oryza sativa Endornavirus 1 27 7.30E-11 99 1024 PP endornavirus Euphorbia caput- Capulavirus 1 26 7.15E-13 62 746 CP medusae latent virus Vicia faba Endornavirus 1 26 2.87E-08 83 599 PP endornavirus Grapevine leafroll- Ampelovirus 1 25 8.34E-14 84 1022 RdRp associated virus 4

116

Table 3-5. Plant viruses with its corresponding viral genera which produced closest sequence similarity to the scaffolds (>500 nt in length) as identified by BLASTx analyses of wild V. corymbosum virome from Island Grove. No. Highest Max Max Closely related Highest Genera of Identity Query Scaffold Proteins virus sp. E-value hits (%) cov (%) length (nt) 23kDa, Blueberry mosaic Ophiovirus 10 97 0 100 1680 MP, NP, associated virus RdRp Persimmon latent Unclassified PArp, 5 54 0 100 4832 virus virus RdRp Oryza rufipogon Endornavirus 1 51 2.15E-18 34 898 PP endornavirus Bell pepper PP Endornavirus 2 37 5.74E-32 89 779 endornavirus (RdRp) Raphanus sativus Unclassified 1 30 1.49E-09 88 537 RdRp cryptic virus 2 partitivirus Lagenaria siceraria Endornavirus 1 29 9.84E-06 28 928 PP endornavirus- California Fragaria chiloensis Unclassified 2 29 7.16E-14 92 699 RdRp cryptic virus partitivirus Euphorbia caput- medusae latent Capulavirus 1 27 9.28E-13 58 798 CP virus Phaseolus vulgaris Endornavirus 1 27 5.72E-13 56 1305 PP endornavirus 1

Table 3-6. Plant viruses with its corresponding viral genera which produced closest sequence similarity to the scaffolds (>500 nt in length) as identified by BLASTx analyses of cultivated V. corymbosum virome from Interlachen. No. Highest Max Max Closely related virus Highest Genera of Identity Query Scaffold Proteins sp. E-value hits (%) cov (%) length (nt) Blueberry latent CP, Amalgavirus 3 100 0 100 1116 virus RdRp Tobacco mosaic CP, MP, Tobamovirus 7 100 0 100 3060 virus RdRp Tomato spotted wilt Tospovirus 1 100 7.90E-87 74 521 NP virus Oryza sativa 5.74E- PP, Endornavirus 2 40 58 7644 endornavirus 114 RdRp Zucchini green Tobamovirus 1 35 4.67E-26 88 696 RdRp mottle mosaic virus Citrus leprosis virus Cilevirus 1 32 1.24E-23 79 934 RdRp C Sweet potato vein Solendovirus 1 28 3.63E-11 99 555 CP clearing virus

117

Table 3-6. Continued. Euphorbia caput- Capulavirus 1 26 1.00E-12 53 879 CP medusae latent virus Rice grassy stunt Tenuivirus 1 26 7.69E-19 96 766 P1.339K virus Bromus catharticus Mastrevirus 1 25 1.70E-12 75 751 CP striate mosaic virus Phaseolus vulgaris Endornavirus 1 22 4.34E-07 28 3537 PP endornavirus 1

Table 3-7. Plant viruses with its corresponding viral genera which produced closest sequence similarity to the scaffolds (>500 nt in length) as identified by BLASTx analyses of cultivated V. corymbosum virome from Island Grove. No. Highest Max Max Closely related virus Highest Genera of Identity Query Scaffold Proteins sp. E-value hits (%) cov (%) length (nt) Blueberry latent Amalgavirus 2 99 0 95 2401 CP, RdRp virus Blueberry mosaic 23kDa, MP, Ophiovirus 6 96 0 100 2429 associated virus NP, RdRp Blueberry red CP, HP, Soymovirus 1 93 0 24 8392 ringspot virus MP, RT, TA CP, MP, Prunus virus T Tepovirus 10 86 0 100 2504 RdRp Vicia faba Unclassified 8.01E- 1 55 86 570 RdRp partitivirus 1 partitivirus 52 Magnaporthe oryzae Unclassified 1.79E- 1 37 97 532 RdRp ourmia-like virus ourmiavirus 19 Euphorbia caput- 4.27E- Capulavirus 1 26 30 759 CP medusae latent virus 13 Bromus catharticus 1.15E- Mastrevirus 1 25 41 688 CP striate mosaic virus 12 CP: Coat protein; FP: Fusion protein; HP: hypothetical protein; MP: Movement protein; NP: Nucleocapsid protein; P2: major core protein; PArp: proline-alanine-rich protein; PP: polyprotein; Rep: Replicase; RdRp: RNA-dependent RNA polymerase; RT: Reverse transcriptase; TA: Translational activator; VH: Viral helicase; VM: Viral methyltransferase. No. of hits correspond to the no. of scaffolds that produced sequence similarity to the closely related virus species by BLASTx.

118

Table 3-8. Complete viral genomes assembled from each plant virome using a virome analysis pipeline. The tick mark indicated the complete viral genome that were successfully assembled from each plant virome that corresponds to the sampling locations. Virus Cultivated sites Wild sites Genus Vector Transmission species IL IG GV HS IL IG No Transmitted through BLV Amalgavirus   seed UK Other ophioviruses are BlMaV Ophiovirus transmitted via fungal   spores BRRV Soymovirus UK Vegetative propagation  UK One member in the PrVT Tepovirus genus is transmitted by  pollen and seed TMV Tobamovirus No Mechanical     BlMaV: Blueberry mosaic associated virus; BLV: Blueberry latent virus; BRRV: Blueberry red ringspot virus; GV: Gainesville; HS: High Springs; IG: Island Grove; IL: Interlachen; PrVT: Prunus virus T; TMV: Tobacco mosaic virus; UK: unknown vector.

Table 3-9. Percentage (%) and average coverage of mapped reads to each RNA segment of BlMaV obtained from virome of cultivated V. corymbosum from Island Grove and wild V. corymbosum from High Springs and Interlachen, and the percentage of pairwise nt identity between each RNA to the corresponding reference sequences. % of mapped reads (average coverage) % pairwise nt identity to ref. seq. RNA segments IGC HS ILW IGC HS ILW 1 0.01 (47x) 0.03 (271x) - 81.2 80.2 - 2 18.5 (211k x) 14.5 (269k x) - 79.5 79.7 - 3 0.003 (60x) 14.6 (360k x) 13.1 83.4 82.8 84.7 Reference sequence (Ref. seq.): Accession number KJ_04366-8; IGC: cultivated V. corymbosum from Island Grove; HS: High Springs; ILW: wild V. corymbosum from Interlachen; k: kilo.

Table 3-10. The length of each RNA segment and the encoded ORFs of the reference sequence and BlMaV obtained from virome of cultivated V. corymbosum from Island Grove and wild V. corymbosum from High Springs and Interlachen. RNA ORFs Length of RNA (nt) Length of CDS (nt) segments RS IGC HS ILW RS IGC HS ILW 1 RdRp/23kDa 7963 7747 7966 - 7014/585 7014/585 7014/579 - 2 MP 1934 1981 1939 - 1548 1545 1545 - 3 NP 1570 1543 1660 1592 1368 1368 1368 1368 Reference sequence (RS): Accession number KJ_04366-8; IGC: cultivated V. corymbosum from Island Grove; HS: High Springs; ILW: wild V. corymbosum from Interlachen; k: kilo.

119

Table 3-11. Nucleotide length of each ORFs in different BRRV isolates from Florida and other regions. ORFs Total Isolates I (MP) A B C IV (CP) V (RT) VI (TA) VII length CZ 1101 312 561 600 1488 2004 1284 462 8302 IGC-FL 1197 369 561 597 1488 2007 1284 522 8392 NJ 939 369 561 600 1461 1974 1287 429 8303 PL 939 369 561 594 1455 1974 1284 462 8265 SL 1110 369 561 588 1476 2043 1284 462 8299 CZ: Czech Republic; IGC-FL: Island Grove, Florida; NJ: New Jersey; PL: Polish; SL: Slovenia

Table 3-12. The nucleotide length and pairwise nucleotide comparison of each ORF and the UTR of BlVT and PrVT (NC_024686). Length (nt) Type of region Region % Pairwise nt identity PrVT BlVT ORFs RdRp 5337 5457 61 MP 1152 1146 70 CP 666 663 65 UTRs 5’ 46 109 68 3’ 79 264 59

Figure 3-5. Pairwise comparison analysis in SDT and phylogenetic analysis in MEGA7 using nucleotide and amino acid sequences of different genome region of the respective viruses. Pairwise sequence identities between the viruses were represented in different color. The bootstrap consensus phylogenetic tree was constructed by Neighbor Joining using the Maximum Composite Likelihood and Poisson correction method for nucleotide and amino acid sequences, respectively, based on 1000 replicates, showing branch nodes more than 75% bootstrap values. a) Pairwise identity and phylogenetic analysis using the full genome of BBLV isolates and selected members of the genus Amalgavirus. b) Pairwise identity and phylogenetic analysis using the NP gene of BlMaV isolates and selected members of the genus Ophiovirus. c) Pairwise identity and phylogenetic analysis using the full genome of BRRV isolates and selected members of the genus Soymovirus. d) Pairwise identity and phylogenetic analysis using the RdRp and CP protein of the putative novel Tepovirus and selected members representing different genera in the family Betaflexiviridae. Accessions number are shown in the figure. ACLSV- Apple chlorotic leaf spot virus; ASGV- Apple stem grooving virus; ASPV- Apple stem pitting virus; AVCaV- Apricot vein clearing associated virus; BanMMV- Banana mild mosaic virus; BanVX- Banana virus X; BBLV- Blueberry latent virus; BlMaV- Blueberry mosaic associated virus; BlVT- Blueberry virus T; BRRV- Blueberry red ringspot virus; CarChV1- Carrot Ch virus 1; CLBV- Citrus leaf blotch virus; CNRMV- Cherry necrotic rusty mottle virus; CPsV- Citrus psorosis virus; DVA- Diuris virus A; GarCLV- Garlic common latent virus; GVA- Grapevine virus A; LRNV- Lettuce ring necrosis virus; PCSV- Peanut chlorotic streak virus; PrVT- Prunus virus T; PVT- Potato virus T; RVA- Rhododendron virus A; SbCMV-

120

Soybean chlorotic mottle virus; SCSMaV- Sugarcane striate mosaic-associated virus; STV- Southern tomato virus. a

Full genome

b

Nucleocapsid

121

c

Full genome

d

RNA-dependent RNA polymerase

122

Coat protein

123

CHAPTER 4 CHARACTERIZATION OF DNA PLANT VIROMES FROM WILD AND CULTIVATED V. corymbosum IN FLORIDA

Introduction

Viral metagenomics has allowed the detection of novel ssDNA viruses in various environments, thus revealing the widespread nature of these viruses (Rosario et al., 2012). Since the taxonomically approved virus species of the ssDNA group are composed of circular genomes, these viruses can therefore be enriched by rolling circle amplification (RCA) in the sample preparation step prior to the construction of DNA libraries. Hence, the use of viral metagenomics through the enrichment of circular genomes have enabled the discovery of several ssDNA novel viruses belonging to the family Geminiviridae and Genomoviridae from wild and cultivated plant species (Dayaram et al., 2012; Candresse et al., 2014; Kraberger et al., 2015;

Susi et al., 2017; Varsani et al., 2017).

Although blueberry is known to host 15 species of viruses from eight known and two unassigned genera, only one double-stranded DNA (dsDNA) reverse-transcribing (RT) virus or also recognized as pararetrovirus, BRRV, was reported in this plant. A novel pararetrovirus,

Blueberry fruit drop associated virus causing blueberry fruit drop disease was recently identified in blueberry, after almost nearly three decades since the disease was first observed in blueberry

(Diaz-lara & Martin, 2016). In addition to the very few DNA viruses identified in blueberry, there has been no single-stranded DNA (ssDNA) virus known to infect this plant to date, making viral metagenomics as one of an appealing approach to explore the DNA viral population in this plant. Thus, an objective was set to characterize and describe viral diversity in the DNA plant viromes generated from wild and cultivated V. corymbosum in Florida through a metagenomic approach that incorporated the enrichment of virus particles and circular molecules using viral purification and RCA, respectively. This led to further objectives which include the identification of

124

novel and known viruses to better understand the molecular evolutionary role of these viruses in the viral communities of blueberry. To achieve these objectives, DNA plant viromes from V. corymbosum plants collected from six locations in north central Florida, including two cultivated and four wild species, were generated by sequencing using Illumina HiSeq 2000 platform and analyzed using a metagenome analysis pipeline.

Materials and Methods

Plant Materials

A total of 20 samples per site (n=120) with and without virus like symptoms were collected from wild and cultivated blueberries (i.e., V. corymbosum) in central Florida. The wild blueberries sample were collected from: 1) O’leno State Park, High Springs, a wild site not adjacent to commercial blueberries; 2) Morning Side Nature Center, Gainesville, a wild site adjacent to residences; 3) Interlachen, FL, a wild site neighboring commercial blueberry production; and 4)

Island Grove, FL, another wild site neighboring commercial blueberry production (Fig. 3-1). The cultivated blueberries were collected from commercial plantings in Interlachen, FL and Island

Grove, FL. All samples were collected during Fall 2014 (September-October), except wild and cultivated samples from Island Grove which were collected in Fall 2015 (October).

Sample Preparation and Generation of V. corymbosum DNA Plant Viromes

Forty grams of pooled leaves consisted of 2 g of fresh leaf tissue from each sample respective to their location were used for preparation of DNA. Purification of virus particles were performed by adapting the protocol from Gillet (1988) and Dayaram et al. (2014). Briefly, leaves were homogenized in 240 ml extraction buffer (0.1 M sodium phosphate buffer pH 7.2 containing

0.01 M 2-mercaptoethanol and 0.005 M thioglycolic acid) and filtered through cheesecloth. The leaves sap was then made to 6% urea (w/v) and 2.5% (v/v) Triton-X 100 and 8% (v/v) n-butanol.

The sap was centrifuged (5000 rpm for 10 min) in Sorvall RC-5B Superspeed centrifuge

125

(Thermo Fisher Scientific Inc.) following overnight stirring at 4oC. Twenty percent (w/v) polyethylene glycol 8000 was added to the sap followed by highspeed centrifugation (16k for 10 min). Finally, 3 ml SM buffer [0.1 M NaCl, 50 mM Tris/HCl (pH 7.5), 10 mM MgSO 4] was added to the pellet and filtered through a 0.45 µm and subsequently through a 0.2 µm using syringe. The filtrate was then subjected to DNA extraction using the High Pure Viral Nucleic

Acid kit (Roche Diagnostics, Basel, Switzerland) following the manufacturer’s recommendations and was enriched for circular molecules by RCA using the illustra TempliPhi DNA

Amplification Kit (GE Healthcare, Little Chalfont, Buckinghamshire, UK). The quality and concentration of the RCA products were estimated by gel electrophoresis (1% agarose) and

NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc. DE. USA), respectively. The

RCA products (1000 ng) from each location were sent for sequencing using TruSeq DNA PCR-

Free kit on the Illumina HiSeq 2500 platform (Macrogen, Inc., Seoul, Korea) to generate DNA libraries containing 150 bp paired-end sequences.

Analyses of DNA Plant Viromes

Reads from DNA libraries were analyzed according to the virome analysis pipeline as illustrated in figure 3-2 in chapter 3. Raw reads from each of the library were processed using

Trimmomatic software (Bolger et al., 2014) to remove adapters and low quality reads by trimming and quality filtering, respectively. The minimum length of reads that was retained from the DNA libraries was 150 nucleotides. These reads were de novo assembled to produce contigs and scaffolds by using SPAdes assembler (Bankevich et al., 2012) using k-mer 55,77, and 99.

Only scaffolds with length >1000 nucleotides were used for downstream analysis. These scaffolds were subjected to a two-step BLASTx (Altschul et al., 1997) to identify plant viruses with highest sequence match by comparing the scaffolds to a local plant virus protein database

(Zheng et al., 2017) followed by non-redundant GenBank protein database using a threshold e-

126

value of 10-5. Scaffolds with similarity to plant viruses were organized by family, genus, and species according to the 2016 Virus Taxonomy Release of the International Committee on

Taxonomy of Viruses (ICTV) website (https://talk.ictvonline.org/taxonomy/) for approved virus species. Scaffolds with hits to virus species not yet approved by ICTV were assigned to its corresponding genus and family based on the information provided in the NCBI Taxonomy

Database (https://www.ncbi.nlm.nih.gov/taxonomy). Overlapping scaffolds with the same viral hits were scaffolded using the assembler built in to Geneious 9.1.6 to produce longer scaffolds.

Scaffolds were mapped against the corresponding reference genome using Geneious mapper in

Geneious 9.1.6 to observe the genome coverage. The scaffolds and the reference sequences were compared by alignment to predict the open reading frames (ORFs) and coding sequence (CDS).

Reads were finally aligned to published viral sequences and scaffolds in a reference-based mapping approach using Bowtie2 (Langmead & Salzberg, 2012) to obtain complete viral genomes for known and novel viruses, respectively.

The respective ORF sequence of the corresponding viruses obtained from this study and selected published reference sequences were aligned using Geneious local alignment (Smith-

Waterman) in Geneious 9.1.6, and the alignment was further refined using MUSCLE (Edgar,

2004). Pairwise identity comparison between selected virus sequences obtained from this study and other selected members from the corresponding viral genera were performed using Sequence demarcation tool 1.2 (SDT) (Muhire et al., 2014). Phylogenetic analysis of the nucleotide and amino acid (aa) sequences of selected ORF was performed by neighbor joining method in

MEGA (version 7.0) (Kumar et al., 2016) using bootstrap test with 1000 replicates to infer the relationship between the known and new virus sequences to other published viral sequences from the corresponding viral genera.

127

Results

General Analyses of the DNA Plant Viromes

The DNA viromes generated from wild and cultivated V. corymbosum contained approximately 199 and 110 million paired-end reads of 150 nt in length, respectively. These reads were further reduced to approximately 148 and 79 million reads in the wild and cultivated

V. corymbosum viromes, respectively, following processing of reads previously described in section 3.2 (Table 4-1). De novo assembly of the reads using SPades assembler produced a total of 99,944 and 170,395 number of scaffolds of more than 1000 nt in length from the wild and cultivated V. corymbosum viromes, respectively. Comparison of the scaffolds to the local plant virus protein database followed by non-redundant GenBank protein database in a two-step

BLASTx analyses yielded a total of 39 (0.04%) and 9 (0.005%) putative plant virus scaffolds with similarity to known viral sequences from the wild and cultivated V. corymbosum, respectively. The percentage of putative plant virus scaffolds with sequence similarity to known viruses calculated from the total scaffolds in both wild and cultivated sites varied between locations, ranging from 0.004 to 3.15%.

Characterization and Comparison of Plant Virus Diversity in the Viromes of Wild and Cultivated V. corymbosum

Overall, scaffolds identified from wild and cultivated V. corymbosum viromes produced sequence similarity to plant virus species from a total of 3 viral families (Caulimoviridae,

Geminiviridae and Genomoviridae) representing a total of 9 viral genera (Figure 4-1a). Of the viral families identified, scaffolds with sequence similarities to virus species from all and two viral families were identified in wild and cultivated V. corymbosum viromes, respectively. Plant viruses from three viral genera belonging to the family Caulimoviridae (Badnavirus and

Caulimovirus) and Geminiviridae (Begomovirus) were identified in both wild and cultivated V.

128

corymbosum viromes as shown in the stacked chart (Figure 4-1b). Although there was some overlap between the wild and cultivated plant viromes, each had unique viral genera represented as shown by the presence of scaffolds with sequence similarity to virus species from six

(Becurtovirus, Capulavirus, Curtovirus, Gemycircularvirus, Solendovirus and Soymovirus) and one viral genera (Cavemovirus) in the wild and cultivated plant viromes, respectively (Figure 3-

4b).

Viral populations among the viromes of wild and cultivated V. corymbosum in each location varied, as represented by the scaffolds with sequence similarity to virus species from different viral families and genera (Figure 4-1b). Scaffolds with sequence similarity to virus species from the genera Curtovirus, Gemycircularvirus, Solendovirus and unclassified virus from the family Genomoviridae were only identified in virome of wild V. corymbosum collected from

Gainesville. Similarly, scaffolds with sequence similarity to virus species from the genera

Becurtovirus and an unclassified virus of unknown virus family were uniquely present in virome of wild V. corymbosum collected from Interlachen. The viromes of wild V. corymbosum collected from Island Grove and High Springs however, only produced scaffolds with sequence similarity to virus species from the genera Soymovirus and Begomovirus, respectively.

Conversely, the viromes of cultivated V. corymbosum collected from Island Grove and

Interlachen both produced scaffolds with sequence similarity to virus species from the genera

Caulimovirus. Within the viromes of cultivated plants, scaffolds with sequence similarity to virus species from the genera Begomovirus, unassigned virus in the family Geminiviridae, and unclassified viruses were present in the virome of cultivated V. corymbosum collected from

Island Grove whereas scaffolds with sequence similarity to virus species from the genera

129

Badnavirus and Cavemovirus were uniquely present in the virome of cultivated V. corymbosum collected from Interlachen.

Viral populations between the viromes of V. corymbosum collected from neighboring wild and cultivated sites at Interlachen and Island Grove were further compared. The viral populations in the wild and cultivated V. corymbosum from Island Grove and Interlachen were different, as shown by the viral genera present in each site (Figure 4-1b). However, scaffolds with sequence similarity to virus species from the genus Caulimovirus were identified in both wild and cultivated V. corymbosum collected from Interlachen.

Identification of Plant Viral Sequences from the Viromes

Wild plant viromes. Of the 39 de novo assembled scaffolds from the wild plant viromes putatively belonging to plant viruses, 49%, 44% and 5% of the scaffolds had similarity to plant viral sequences from the family Caulimoviridae, Geminiviridae and Genomoviridae (Figure 4-

1a). A large proportion of the scaffolds with sequence similarity to the plant viruses from the family Caulimoviridae produced matches against unclassified virus species (53%), followed by members of the genera Caulimovirus (21%), Badnavirus (11%), Soymovirus (11%) and

Solendovirus (5%) (Figure 4-1b, Table 4-2 to 4-5). Among the scaffolds with matches to the virus species in the family Caulimoviridae, the highest similarity was observed for Blueberry red ringspot virus, with 98-100% aa identity (Table 4-2, 4-5). Mapping of reads in the virome of V. corymbosum from Island Grove to the BRRV reference genome sequence (JF 421559), an isolate from Slovenia, indicated that 1.4% of reads were mapped with an average genome coverage of

7300x. Additionally, a consensus sequence of 8309 bp obtained from de novo assembled of reads representing the full genome of BRRV was shown to encode for eight ORFs typical to those of

BRRV and produced 95% pairwise nucleotide identity to the genome isolate from Slovenia in a pairwise alignment using MUSCLE. Although one scaffold obtained from the virome of V.

130

corymbosum from Gainesville produced high sequence similarity to BRRV, a full genome could not be de novo assembled from this virome due to the very low average coverage showed by the mapped reads. In contrast to BRRV, other scaffolds produced low sequence similarity and query coverage to other members in the family Caulimoviridae by BLASTx analysis, suggesting the presence of completely novel viruses.

As for the scaffolds with sequence similarity to viruses from the family Geminiviridae, a high fraction of these scaffolds produced matches against members belonging to the genera

Begomovirus (47%), unassigned virus species in the family Geminiviridae (29%) and

Capulavirus (12%). For the Genomoviridae, only one virus species had similarity to the query scaffold, which is the Common bean-associated gemycircularvirus (Table 4-2). Like most of the caulimovirus-like sequences, scaffolds with sequence similarity to virus species in the family

Geminiviridae and Genomoviridae had very low percentage of aa identity and query coverage, again suggesting the presence of completely novel viruses.

Cultivated plant viromes. Similar to the putative plant virus scaffolds assembled from the wild plant viromes, the highest proportion of the scaffolds from the cultivated plant viromes had sequence similarity to viral sequences from the family Caulimoviridae (67%) followed by

Geminiviridae (22%) (Figure 4-1a). Majority of the scaffolds with sequence similarity to virus species in the Caulimoviridae produced matches against members of the genus Caulimovirus

(50%) while the remaining number of scaffolds had equal matches against virus species in the genera Badnavirus, Cavemovirus, as well as an unclassified Caulimovirus (Figure 4-1b). The percentage of aa identity and query coverage of the scaffolds to the members of the

Caulimoviridae are relatively low, with no more than 55% and 45% identity and query coverage, respectively (Table 4-6, 4-7). As for the Geminiviridae, equal number of scaffold produced

131

similarity to members of the Begomovirus and an unassigned virus species in the family

Geminiviridae, with not more than 59% and 31% aa identity and query coverage, respectively. In addition, a scaffold of 5860 nt in length with 52% similarity to an unclassified virus ssDNA virus, Temperate fruit decay-associated virus was also identified in the virome of cultivated V. corymbosum from Island Grove (Table 4-7). These results again showed that the overall viral population of DNA viruses in the viromes of V. corymbosum were composed of novel viruses that are highly divergent from known viruses.

Sequence Comparison and Phylogenetic Analyses of a Putative Novel Viral Genome

Of all the putative plant virus scaffolds identified in the V. corymbosum viromes, the search for novel viral sequences were narrowed down based on the length of scaffold, the present of signature viral domains and motifs, as well as the genome arrangement compared to the corresponding viral genome obtained by BLASTx analysis. Although there were relatively high number of scaffolds with similarity to the viruses in the family Caulimoviridae, these scaffolds however, have low genome coverage when compared to the typical size of those of pararetroviruses, thus making it arduous for de novo assembly of the complete viral genome. In contrast, the scaffolds with sequence similarity to the geminiviruses appeared to have a higher genome coverage, which covers almost the complete genome size typical to those of geminiviruses.

Hence, further molecular characterization was performed on two scaffolds with low sequence similarity to Rhynchosia mild mosaic virus (RhMMV), a virus species in the genus

Begomovirus. These scaffolds were de novo assembled from the virome of wild V. corymbosum collected from Gainesville and High Springs, with length of 2592 and 2643 nt, respectively.

These scaffolds shared very low similarity to RhMMV (Acc no. NC 015488) in BLASTx analysis, with aa identity of just 33-34% and query coverage of 35-37% (Table 4-2, 4-3).

132

Pairwise comparison of these scaffolds using MUSCLE showed that these sequences shared

~91% nucleotide identity between each other. Thus, these scaffolds were tentatively named as

Blueberry geminivirus-1 (BG-1; 2592 nt scaffold) and Blueberry geminivirus-2 (BG-2; 2643 nt scaffold). Similar to other geminiviruses, probable virion-strand origin of replication represented by the stem-loop structure contain the conserved nonanucleotide sequence, TAATATTAC, was identified within the intergenic region in both BG-1 and BG-2 sequences (Zerbini et al., 2017)

(Figure 4-2). Three major ORFs encoding for a putative replicase (Rep), AC4 and capsid protein

(CP) were identified in the de novo assembled viral genomes of BG-1 and BG-2. Another two minor ORFs were additionally identified in these viral genomes as well as a putative intron.

Although these minor ORFs identified in both genomes were similar in size, however, the arrangement of these ORFs were relatively different (Figure 4-2).

Pairwise comparison of the putative Rep ORF of BG-1 and BG-2 with the corresponding

ORF of selected members in the family Geminiviridae using SDT 1.2 showed that these viral genomes shared highest aa identity to the Rep of RhMMV (NC_015488) with ~39% identity

(Figure 4-3). The putative CP of BG-1 and BG-2, however, had no similarity to known viruses in

BLASTx search using the non-redundant GenBank protein database. BLASTp search however, revealed that the putative CP of these viral genomes shared very low homology to one unclassified virus, Lake Sarah-associated circular virus-26, with 28% identity and 78% query coverage. Alignment of the putative Reps encoded by BG-1, BG-2 and other members in the family Geminiviridae by MUSCLE indicated that the Reps of BG-1 and BG-2 contain motifs involved in rolling-circle replication (RCR), recognized as motif I, II and III, as well as geminivirus Rep sequence (GRS), which is a conserved domain unique to those of geminiviruses

(Rosario et al., 2012; Nash et al., 2011) (Figure 4-4). In addition to the present of the RCR

133

domain, three conserved motifs (Walker-A, Walker-B and motif C) of the superfamily 3 (SF3) helicase domain typically found in circular replication-associated protein-encoding ssDNA

(CRESS-DNA) viruses were also identified in the putative Rep ORF of BG-1 and BG-2 (Rosario et al., 2012; Koonin, 1993; Gorbalenya et al., 1990). Although Walker-A and motif C identified in the putative Rep of BG-1 and BG-2 are similar to those of geminiviruses, however, the

Walker-B motif is more analogous to those of circoviruses (Figure 4-4).

Phylogenetic tree inferred from Rep amino acid sequences of BG-1, BG-2 and those of selected members representing various genera in the family Geminiviridae in MEGA showed that the phylogenetic grouping was similar to the tree described by the ICTV (Figure 4-5)

(Zerbini et al., 2017). The Rep sequences of BG-1 and BG-2 were grouped in the same clade with highly significant bootstrap value, suggesting that these de novo assembled viral genomes are closely related. The phylogenetic tree further revealed the clade that consists of BG-1 and

BG-2 Rep sequences was derived from a branch appeared to be the foundation for the grouping of other members belonging to the genera Begomovirus, Curtovirus, Eragrovirus, Topocuvirus,

Turncurtovirus and an unclassified virus.

Discussion

Metagenome analyses of the DNA viromes of wild and cultivated species of V. corymbosum collected from different locations in Florida showed that majority of the viromes yielded less than 0.5% scaffolds associated with plant viruses, thus suggesting the presence of novel viruses that are highly divergence from the currently recognized virus species. Compared to the plant RNA viruses, only two families of ssDNA viruses (Geminiviridae and

Genomoviridae) and one family of dsDNA (RT) (Caulimoviridae) are known to infect plants.

Hence, it is not surprising that this study yielded very low percentage of plant virus-associated

134

scaffolds since our current knowledge of DNA viruses infecting plants is very limited due to the inadequate characterization of plant viral communities (Roossinck, 2015c).

BLASTx analysis results showed that the large proportion of the scaffolds in both viromes of wild and cultivated V. corymbosum produced sequence similarity to virus species in the family Caulimoviridae. However, only two scaffolds from virome of wild V. corymbosum from Gainesville and Island Grove showed significant similarity to BRRV, a known member in the genus Soymovirus, leading to the recovery of a complete BRRV genome from de novo assembly of the reads from the virome of wild V. corymbosum collected at Island Grove. This finding described the first report of BRRV in the wild species of blueberry. Apart from BRRV, the remaining scaffolds had displayed very low affinity to other caulimoviruses in the BLASTx results. Viral sequences belonging to members from all genera in the family Caulimoviridae except the genus Soymovirus, have been reported to be integrated into the host genome, known as endogenous pararetrovirus sequence (EPRS) (Teycheney et al., 2011; Eid & Pappu, 2014).

Considering to this fact, viral purification coupled with RCA method were incorporated in the sample preparation step to generate virome containing sequences that were likely to be derived from intact virus particles and episomal viruses. However, the presence of host sequences identified in the viromes by BLASTx results in this study suggested that the viral enrichment method might not be completely infallible or otherwise, the viral particles may possibly contain cellular DNA (Rosario et al., 2012). Therefore, there is still a possibility that the scaffolds with similarity to members of Caulimoviridae identified in this study might be derived from integrated viral sequences.

Besides the presence of caulimovirus-like sequences in the wild and cultivated viromes of

V. corymbosum, a high proportion of scaffolds also displayed affinities to viral sequences from

135

species in the family Geminiviridae. Two of these scaffolds, BG-1 and BG-2, were further selected for sequence characterization due the scaffold length that falls within the size range of the viral genomes of geminiviruses. On top of the appropriate genome coverage, the presence of nonanucleotide motifs as well as RCR and the helicase motifs in the Rep sequences of the de novo assembled viral genomes strongly suggest the presence of a novel ssDNA virus, since these motifs are typically found in the CRESS-DNA viruses. Despite the presence of the Rep and CP

ORF as well as the canonical motifs in the Rep sequences, the genome organization of BG-1 and

BG-2 are relatively different from the established members of the geminiviruses, as shown by the present of minor ORFs in the complementary and virion sense strand. Furthermore, analysis of the Rep sequences revealed that BG-1 and BG-2 are closely related and shared just ~39% aa identity with members in the genus Begomovirus, suggesting that these viral genomes are highly divergent to those of known begomoviruses. Phylogenetic analysis of the Rep protein further suggested that these de novo assembled viral genomes could be novel viral species belonging to a completely different genus in the family Geminiviridae. However, whether BG-1 and BG-2 are variants of the same or different virus species remain to be investigated since the cut-off for species demarcation criteria varies for different genera in the family Geminiviridae.

Conclusion

The discovery of known and potentially novel viruses through the enrichment of virus particles and circular genomes prior to the construction of DNA libraries to generate metagenome data are demonstrated in this study. DNA viromes obtained from in the wild and cultivated species of blueberry (V. corymbosum) showed that the population of DNA viruses are potentially composed of completely novel viruses that are highly divergent to the currently taxonomically recognized species. The de novo assembled scaffolds might represent new viruses that probably belong to new viral genera or even families. This study had also revealed the

136

presence of BRRV, a common virus infecting the commercial blueberries, for the first time in the wild species of V. corymbosum. In addition, two de novo assembled viral genome has been discovered in this study, possibly representing putative novel virus species in the family

Geminiviridae. Further validation, however, is needed to determine the presence of these putative novel viruses in V. corymbosum due to the widespread nature of CRESS-DNA viruses.

Table 4-1. The number of reads and scaffolds, and the percentage (%) of associated plant virus scaffolds for each DNA library corresponding to different sampling sites. No. of reads No. of No. of putative % of putative Libraries scaffolds ≥ plant virus plant virus Raw Processed 1000 scaffolds scaffolds GV 48,794,662 40,043,706 825 26 3.15 HS 55,664,198 36,181,124 54494 2 0.04 IL 50,506,462 39,785,759 15468 10 0.06 IG 44,384,140 31,688,521 29157 1 0.03 ILC 59,203,488 41,255,540 84134 3 0.004 IGC 50,410,720 37,512,513 86261 6 0.007 GV: Gainesville; HS: High Springs; IL: Interlachen; IG: Island Grove; ILC: Interlachen cultivated site; IGC: Island Grove cultivated site.

Figure 4-1. Viral populations in the DNA viromes of wild and cultivated V. corymbosum collected from different locations in Florida represented by scaffolds with sequence similarity to various plant viruses obtained by BLASTx analyses. a) Putative viral scaffolds with similarity to plant virus species from different viral families. (b) Putative viral scaffolds with similarity to plant virus species from different viral genera. GV: Gainesville; HS: High Springs; IL: Interlachen; IG: Island Grove; ILC: Interlachen cultivated site; IGC: Island Grove cultivated site. a GV

HS

IL Caulimoviridae Geminiviridae IG

Locations Genomoviridae Unclassified ILC

IGC

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 No. of scaffolds

137

b Unclassified Genomoviridae Unclassified Caulimoviridae Unclassified Begomovirus Unassigned Geminiviridae Unclassified Soymovirus Solendovirus IGC ILC IG Gemycircularvirus

Genera Curtovirus IL HS GV Cavemovirus Caulimovirus Capulavirus Begomovirus Becurtovirus Badnavirus

0 1 2 3 4 5 6 7 8 9 10 11 12 No of. scaffolds

Table 4-2. Scaffolds (>1000 nt in length) representing different species of plant viruses from the corresponding viral genera obtained by BLASTx analyses of the viromes of wild V. corymbosum from Gainesville. No. Highest Max Max Closely related Highest Genera of identity query scaffold Proteins virus sp. E-value hits (%) cov (%) length (nt) Blueberry red Soymovirus 1 100 0 62 1501 MP ringspot virus Aristotelia Unclassified 5 77 6.80E-73 21 3314 MP chilensis virus 1 Caulimoviridae Strawberry vein Caulimovirus 1 68 3.35E-33 27 1076 RT banding virus Spinach severe Curtovirus 1 63 7.63E-08 5 2730 Rep curly top virus Citrus chlorotic Unassigned dwarf associated 4 58 3.08E-33 18 5567 Rep Geminiviridae virus Cauliflower Caulimovirus 1 47 9.54E-58 26 2764 RT mosaic virus Common bean- Unclassified associated 1 47 1.89E-28 19 1855 Rep Genomoviridae gemycircularvirus Blueberry fruit Unclassified 1 44 5.63E-17 24 1509 PP (RT) associated virus Caulimoviridae

138

Table 4-2. Continued. Yacon necrotic Badnavirus 1 41 1.16E-12 16 1181 ORF3 mottle virus Tomato leaf curl Begomovirus 1 38 3.94E-09 15 1535 MP New Delhi virus Unclassified Sida Brazil virus 1 37 2.11E-08 10 2609 MP Begomovirus Commelina yellow Badnavirus 1 36 3.08E-33 5 3739 RT mottle virus Sweet potato vein Solendovirus 1 36 8.89E-29 57 1245 Rep clearing virus East African cassava mosaic Begomovirus 1 35 6.20E-11 31 1064 MP Cameroon virus Rhynchosia mild Begomovirus 1 34 6.03E-38 37 2592 Rep mosaic virus Rudbeckia flower Unclassified 1 33 6.05E-50 51 2430 PP (RT) distortion virus Caulimoviridae Euphorbia caput- medusae latent Capulavirus 1 28 4.05E-11 30 1603 CP virus Watermelon chlorotic stunt Begomovirus 1 27 3.00E-08 36 1126 MP virus Cassava associated Gemycircular- 1 27 6.73E-07 33 1540 CP gemycircularvirus virus 1

Table 4-3. Scaffolds (>1000 nt in length) representing different species of viruses from the corresponding viral genera obtained by BLASTx analyses of wild V. corymbosum virome from High Springs. No. Highest Max Max Closely related Highest Genera of Identity query scaffold Proteins virus sp. E-Value hits (%) cov (%) length (nt) Rhynchosia mild Begomovirus 1 33 2.72E-37 35 2643 Rep mosaic virus Corchorus yellow Begomovirus 1 30 3.16E-30 72 1351 Rep vein virus

139

Table 4-4. Scaffolds (>1000 nt in length) representing different species of viruses from the corresponding viral genera obtained by BLASTx analyses of wild V. corymbosum virome from Interlachen. No. Highest Max Max Closely related Highest Genera of Identity query cov scaffold Proteins virus sp. E-Value hits (%) (%) length (nt) Euphorbia Capulavirus caput-medusae 1 70 1.04E-07 6 1827 Rep latent virus Strawberry vein Caulimovirus 2 56 3.27E-83 50 2113 RT banding virus Blueberry fruit Unclassified drop associated Caulimoviridae 2 51 5.79E-59 41 2141 Rep, RT virus Rudbeckia Unclassified flower Caulimoviridae 1 48 1.18E-26 32 1020 PP (RT) distortion virus Citrus chlorotic Unassigned dwarf Geminiviridae 1 45 6.47E-16 10 2650 Rep associated virus Piper DNA Unclassified 1 40 2.58E-26 29 1266 HP virus 1 virus Beet curly top Becurtovirus 1 36 5.38E-50 24 3729 Rep Iran virus Chayote Unclassified enation yellow Begomovirus 1 35 9.84E-08 7 4142 MP mosaic virus

Table 4-5. Scaffolds (>1000 nt in length) representing different species of viruses from the corresponding viral genera obtained by BLASTx analyses of wild V. corymbosum virome from Island Grove. No. Highest Max Max Proteins Closely related Highest Genus of Identity query cov scaffold virus sp. E-Value hits (%) (%) length (nt) Blueberry red Soymovirus 1 98 0 62 6090 RT, TA ringspot virus

140

Table 4-6. Scaffolds (>1000 nt in length) representing different species of viruses from the corresponding viral genera obtained by BLASTx analyses of cultivated V. corymbosum virome from Interlachen. No. Highest Max Max Closely related Highest Genera of Identity query cov scaffold Proteins virus sp. E-Value hits (%) (%) length (nt) Cauliflower mosaic Caulimovirus 1 46 1.57E-56 27 2595 RT virus

Blackberry Virus F Badnavirus 1 43 1.40E-06 11 1852 PP (P3)

Cassava vein Cavemovirus 1 39 1.93E-10 22 1447 MP mosaic virus

Table 4-7. Scaffolds (>1000 nt in length) representing different species of viruses from the corresponding viral genera obtained by BLASTx analyses of cultivated V. corymbosum virome from Island Grove. No. Highest Max Max Closely related Highest Genera of Identity query cov scaffold Proteins virus sp. E-Value hits (%) (%) length (nt) Citrus chlorotic Unassigned dwarf associated 1 59 1.41E-43 31 1331 Rep Geminiviridae virus Aristotelia Unclassified 1 55 4.29E-79 45 1523 MP chilensis virus 1 Caulimoviridae Temperate fruit Unclassified decay-associated 1 52 8.48E-91 16 5860 Rep virus virus Dahlia mosaic Caulimovirus 1 49 1.30E-29 31 1405 PP (RT) virus D10 Cauliflower Caulimovirus 1 40 1.68E-53 26 3318 RT mosaic virus Tomato yellow leaf curl Kanchanaburi Begomovirus 1 38 7.49E-18 25 1491 MP virus CP: Coat protein; FP: Fusion protein; HP: hypothetical protein; MP: Movement protein; NP: Nucleocapsid protein; P2: major core protein; PArp: proline-alanine-rich protein; PP: polyprotein; Rep: Replicase; RdRp: RNA- dependent RNA polymerase; RT: Reverse transcriptase; TA: Translational activator; VH: Viral helicase; VM: Viral methyltransferase. No. of hits correspond to the no. of scaffolds that produced sequence similarity to the closely related virus species by BLASTx.

141

Putative Rep Putative AC4 Putative CP Hypothetical protein Intergenic region Putative stem loop

Figure 4-2. Genome organization of BG-1 and BG-2 showing the stem-loop structure containing the nonanucleotide sequence located within the intergenic region, and putative ORFs encoding for putative replicase (Rep), AC4 and capsid protein (CP) and hypothetical protein.

142

Figure 4-3. Pairwise comparison analysis in SDT using putative Rep amino acid sequences of BG-1 and BG-2 and the Rep of selected members representing each genus in the family Geminiviridae. Pairwise sequence identities between the viruses were represented in different color. Node_17771_length_2592_MS refers to BG-1; Node_29_length_2643_HS refers to BG-2.

143

RCR Motifs SF3 Helicase Motifs I II GRS domain III Walker-A Walker-B Motif C

Figure 4-4. Alignment of the putative Reps encoded by BG-1 and BG-2 to those of other members representing different genera in the family Geminiviridae showed that these sequences contain rolling-circle replication (RCR) motifs I, II and III, geminivirus Rep sequence (GRS), and the superfamily 3 (SF3) helicase motifs (Walker-A, Walker-B and motif C). NODE17771 refers to BG-1; NODE29 refers to BG-2.

144

Curtovirus

Turncurtovirus

Topocuvirus Begomovirus (NW)

Unclassified Begomovirus (OW) Begomovirus (NW)

Eragrovirus

BG-1 BG-2

Mastrevirus

Grablovirus

Capulavirus

Becurtovirus

Unassigned

Figure 4-5. Unrooted phylogenetic tree constructed from Rep amino acid sequences of selected members representing various genera in the family Geminiviridae by Neighbor- Joining method in MEGA 7 using the p-distance method based on 1000 replicates. Only branches with more than 80% bootstrap values at nodes were shown. OW: Old World; NW: New World.

145

CHAPTER 5 SUMMARY

The advent of plant viral metagenomics has greatly changed our approach to tackle various issues related to plant pathology. The overwhelming amount of data can lead to virus discovery, and can facilitate the unraveling the viral disease etiology. This new approach has created a huge impact on the taxonomic classification of viruses, because so much new sequence data have been and continue to be produced.

In this study, two different approaches have been used to explore virus diversity in wild and cultivated blueberry (Vaccinium spp.) in Florida. The initial study which involved transcriptomic analysis of existing blueberry root transcriptomes from wild and cultivated

Vaccinium species. We identified a novel virus species probably belonging to the family

Potyviridae and characterized eight complete genomes of BRRV. The study also characterized the RNA and DNA viromes of wild and cultivated blueberry, V. corymbosum, through a viral metagenomics approach. Analyses of the RNA viromes have led to the discovery of well- characterized blueberry infecting viruses, which are BBLV, BlMaV and BRRV. Furthermore, a putative novel virus species possibly belong to the genus Tepovirus in the family Betaflexiviridae was also identified and characterized from the RNA viromes. Likewise, analyses of the DNA viromes had uncovered a putative novel virus genome potentially representing a new species in the family Geminiviridae, in addition to the recovery of a complete BRRV genome.

Although both the transcriptomes and viral metagenomics are reliable for virus discovery, it appears that viral metagenomics approach through the incorporation of viral enrichment steps are more ideal for virus diversity study, as shown by the number of scaffolds related to plant viruses belonging to a wide range of viral genera found by the latter approach. This is largely due to the presence of huge amount of host sequences in the transcriptomes besides the downside of

146

not being able to detect viruses lacking poly-A tails, latent or persistent plant viruses that are commonly found in the wild plants, as evidenced in this study.

Prior to this study, only Blueberry necrotic ring blotch has been documented in blueberry in Florida. The use of latest sequencing technology coupled with bioinformatic analyses had enabled the discovery of three known blueberry viruses new to Florida as well as three putative novel viruses belonging to different recognized virus families. Altogether, these findings demonstrated that transcriptomic analysis of existing transcriptome data can be utilized as an inexpensive tool for virus discovery whereas viral metagenomics through the enrichment of viral nucleic acids is a remarkable tool for the discovery known and novel viruses, as well as for understanding plant virus biodiversity.

147

APPENDIX A CHAPTER 2 SUPPLEMENTARY DATA

Table A-1. Blueberry root transcriptome libraries from V. arboreum and Vc x Vd ‘Emerald’. No. of raw reads No. of cleaned reads Plant species Libraries Lane 1 Lane2 Lane 1 Lane2 R1 R2 R1 R2 R1 R2 R1 R2 V. arboreum a1 8715881 8715881 8646785 8646785 8185645 4286568 8134082 4217906 a2 7960170 7960170 7888050 7888050 7658340 3842439 7604867 3774102 a3 7455758 7455758 7416425 7416425 7136873 3639093 7115630 3583433 a4 7590739 7590739 7560208 7560208 7125919 3738080 7107536 3699130 a5 7390883 7390883 7318853 7318853 7071891 3597709 7017794 3531683 a6 9332976 9332976 9359210 9359210 8805565 4558711 8832075 4534510 a7 8656556 8656556 8591613 8591613 8141778 4280046 8087350 4209323 a8 10620343 10620343 10566429 10566429 10007672 5152331 9970701 5078793 Vc x Vd e9 18384178 18384178 18262153 18262153 17412946 8951340 17305356 8817647 ‘Emerald’ e10 16945940 16945940 16748977 16748977 16106215 8315739 15942835 8140610 e11 21742327 21742327 21878400 21878400 20588410 10694012 20723622 10669926 e12 17737529 17737529 17606724 17606724 16946223 8696389 16849588 8555510 e13 22467923 22467923 22400614 22400614 20873172 10934562 20831485 10814613 e14 12151144 12151144 12051449 12051449 11406139 5928887 11326171 5833078 e15 18531797 18531797 18386427 18386427 17405258 8988151 17287284 8846509 e16 17948384 17948384 17946456 17946456 16870384 8738272 16890100 8673219

148

Table A-2. The no. of contigs obtained from de novo assembly of reads from each lane (1 and 2) of V. arboreum and of ‘Emerald’ libraries. Plant species Lane V. arboreum Vc x Vd ‘Emerald’ Lane 1 703837 2906867 Lane 2 706450 1410565 Total no. of contigs 1410287 4317432

Table A-3. Details of primer sequences used for the validation of viruses found in this study. Primer name Length Sequence Binding region Position in scaffold Expected size NP1_F 20 GTCGGTGTGGAATTGGCAAC NIb 35-54 1616 NP1_R 24 GCACTCTCTAAACACTCATCGTAC NIb 1674-1651 RRSV3F 24 ATCAGCCCCAGAAGAAAAGAAGTA TA 4,718- 4,741 549 RRSV4R 24 ATCCGAGAAATAGATAGTGTCAGC TA 5,244-5,267 BRRV_b1f 21 GACAGCAGTCACCCCAGTACA RT 4,128-4,148 8293-8296 BRRV_b1r 20 ATAACTGCTCCCCAATGGCC RT 4,127- 4,108 BRRV_b2f 23 TCGCCAAAGAAAGACCATGCTAC TA 5,332- 5,354 8293-8296 BRRV_b2r 24 ACCTCCTGAAGTAATTGTTGATGG TA 5,331-5,308 BRRV_Ab1f 23 AGGTAGGATGGGTACTGAATAATTTGAT TA 5,332-5,354 8293-8296 BRRV_Ab1r 28 AGGTAGGATGGGTACTGAATAATTTGAT TA 5,382-5,355 NIb: Nucleus inclusion body; TA: Transcriptional transactivator; RT: Reverse transcriptase.

149

a b

c d

Figure A-1. The number and percentage (%) of contigs produced by de novo assembly of reads from the following libraries: (a) V. arboreum (lane 1), (b) V. arboreum (lane 2), (c) Vc x Vd ‘Emerald’ (lane 1), and (d) Vc x Vd ‘Emerald’ (lane 2).

150

Table A-4. Pairwise identity of the amino acid sequences in the NIb regions from the de novo assembled scaffold of putative new member in the family Potyviridae from Florida and other selected potyviruses. NP NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC scaff 018 003 004 002 012 014 009 001 005 003 006 008 001 005 005 001 005 001 003 001 old 455 483 016 990 799 037 805 886 136 797 941 558 814 904 903 616 304 768 536 445 NP scaff 14 17 16 15 17 18 24 20 20 16 18 16 18 18 18 17 18 17 18 17 old NC 018 14 42 46 46 30 30 25 27 28 30 30 29 31 30 29 30 29 29 28 29 455 NC 003 17 42 58 57 31 31 30 32 32 30 31 30 31 30 29 30 32 30 30 31 483 NC 004 16 46 58 78 30 29 29 32 31 30 30 30 31 29 28 30 31 29 29 29 016 NC 002 15 46 57 78 30 28 28 29 28 27 29 29 30 29 27 27 30 28 28 28 990 NC 012 17 30 31 30 30 65 39 41 42 39 43 37 37 36 37 35 37 37 37 37 799 NC 014 18 30 31 29 28 65 39 41 42 38 42 36 36 36 35 34 37 33 33 35 037 NC 009 24 25 30 29 28 39 39 62 42 38 42 33 37 38 37 35 39 38 39 38 805 NC 001 20 27 32 32 29 41 41 62 42 40 44 36 40 39 41 37 40 37 40 41 886 NC 005 20 28 32 31 28 42 40 60 84 39 42 36 39 39 40 37 40 38 40 39 136 NC 003 16 30 30 30 27 39 38 38 40 39 52 32 38 39 39 36 38 36 38 37 797

151

Table A-4. Continued. NC 00694 18 30 31 30 29 43 42 42 44 42 52 36 37 39 38 38 39 36 37 39 1 NC 00855 16 29 30 30 29 37 36 33 36 36 32 36 40 43 42 39 41 40 41 42 8 NC 00181 18 31 31 31 30 37 36 37 40 39 38 37 40 66 65 51 52 50 52 54 4 NC 00590 18 30 30 29 29 36 36 38 39 39 39 39 43 66 80 55 57 56 56 58 4 NC 00590 18 29 29 28 27 37 35 37 41 40 39 38 42 65 80 55 57 53 55 58 3 NC 00161 17 30 30 30 27 35 34 35 37 37 36 38 39 51 55 55 59 61 58 62 6 NC 00530 18 29 32 31 30 37 37 39 40 40 38 39 41 52 57 57 59 58 59 62 4 NC 00176 17 29 30 29 28 37 33 38 37 38 36 36 40 50 56 53 61 58 62 61 8 NC 00353 18 28 30 29 28 37 33 39 40 40 38 37 41 52 56 55 58 59 62 64 6 NC 00144 17 29 31 29 28 37 35 38 41 39 37 39 42 54 58 58 62 62 61 64 5 NC 005903: Agropyron mosaic virus; NC 003483: Barley mild mosaic virus; NC 002990: Barley yellow mosaic virus; NC 005304: Beet mosaic virus; NC 008558: Blackberry virus Y; NC 018455: Chinese yam necrotic mosaic virus; NC 003536: Clover yellow vein virus; NC 006941: Cucumber vein yellowing virus; NC 005904: Hordeum mosaic virus; NC 004016: Oat mosaic virus; NC 005136: Oat necrotic mottle virus; NC 001445: Plum pox virus; NC 001616: Potato virus Y; NC 001814: Ryegrass mosaic virus; NC 014037: Sugarcane streak mosaic virus; NC 003797: Sweet potato mild mottle virus; NC 001768: Tobacco vein mottling virus; NC 012799: Triticum mosaic virus; NC 009805: Wheat eqlid mosaic virus; NC 001886: Wheat streak mosaic virus.

152

Table A-5. Whole genome nucleotide alignment of BRRV isolates by MUSCLE showing % nt identity between each isolate. CZ FL NJ PL SL

CZ 96 94 96 95 FL 96 94 97 94

NJ 94 94 95 93

PL 96 97 95 96

SL 95 94 93 95 a b

Figure A-2. Gel electrophoresis image of the PCR optimization and amplification of the NP scaffold. a) PCR optimization to obtain the NP scaffold with an expected band of ~1.6kB, using NP1 primer set in gradient PCR. Annealing temperature at 62ºC was selected for further used in PCR. b) Gel electrophoresis showing amplification of the NP scaffold using pooled RNA samples extracted from V. arboreum root that was used for direct sequencing. M: 1 kB DNA ladder. Courtesy of Norsazilawati Saad.

153

Figure A-3. Gel electrophoresis image of the DNA samples extracted from ‘Emerald’ for BRRV detection using the RRSV3F/R primer set, producing amplicon size of 546 bp. Courtesy of Norsazilawati Saad.

Figure A-4. Gel electrophoresis image showing amplification of full length of BRRV (~8.3 kB) using primer set BRRV_b1. Samples used in PCR were shown next to the gel. Two samples from V. arboreum were included in the reaction for comparison of the episomal and endogenous form of BRRV whereas ‘Southernbelle’ cultivar was included as a negative control. +ve ctrl: Positive control; -ve ctrl: Negative control. Courtesy of Norsazilawati Saad.

154

Figure A-5. Gel electrophoresis image showing PCR using a) back to back primer set BRRV_b2, and b) abutting primer BRRV_Ab1 to obtain full length of BRRV (~8.3 kB) for the attempt to differentiate the episomal and endogenous form of BRRV. M: 1 kB DNA ladder. 1: ‘Southernbelle’ negative control; 2: Pooled DNA extracted from ‘Emerald’ root; 3: DNA from root of BRRV symptomatic ‘Emerald’ (ECS1-R); 4: DNA from leaves of BRRV symptomatic ‘Emerald’ (ECS1-L); 5: ¼ diluted RCA product of ‘Southernbelle’ negative control; 6: ¼ diluted RCA product of pooled DNA extracted from ‘Emerald’ root; 7: ¼ diluted RCA product of ECS1-R; 8: ¼ diluted RCA product of ECS1-L; 9: ½ diluted RCA product of ‘Southernbelle’ negative control; 10: ½ diluted RCA product of pooled DNA extracted from ‘Emerald’ root; 11: ½ diluted RCA product of ECS1-R; 12: ½ diluted RCA product of ECS1-L. Courtesy of Norsazilawati Saad.

155

APPENDIX B CHAPTER 3 SUPPLEMENTARY DATA

Table B-1. Forward and reverse primer designed based on the de novo assembled complete genome of a putative novel Tepovirus. The forward primer and reverse primer binds to the MP and 3’UTR, respectively. Primer name Length Sequence Binding region Position in scaffold

NT_F 20 AGGGGTGCGAATTTTAGGCA MP 5,967- 5,986

NT_R 24 AACTAGACGAGGCTCTGGTG 3’UTR 6,976- 6,957

Table B-2. Pairwise identity of complete genomes of BBLV isolates from Florida and other regions, and selected members in the genus Amalgavirus. BBLV BBLV BBLV IGC BBLV BBLV BBLV ILF BBLV STV RVA Oregon Michigan Florida Japan Arkansas Florida Michigan-2 STV 100 BBLV Oregon 58 100 BBLV Michigan 58 100 100 BBLV IGC Florida 58 100 100 100 BBLV Japan 58 100 100 100 100 BBLV Arkansas 58 100 100 100 100 100 BBLV ILF Florida 58 100 100 100 100 100 100 BBLV Michigan-2 58 100 100 100 100 100 100 100 RVA 57 58 58 58 58 58 58 58 100

156

Table B-3. Pairwise identity of NP gene of BlMaV isolates from Florida and other regions, and selected members of the genus Ophiovirus. BlMaV HS BlMaV IGC BlMaV ILW BlMaV BlMaV Florida Florida Florida Arkansas Japan CPsV LRNV BlMaV HS Florida 100 BlMaV IGC Florida 81 100 BlMaV ILW Florida 81 92 100 BlMaV Arkansas 81 84 85 100 BlMaV Japan 79 85 84 94 100 CPsV 54 60 58 59 59 100 LRNV 50 58 57 58 57 55 100

Table B-4. Pairwise identity of complete genomes of BRRV isolates from Florida and other regions, and selected members of the genus Soymovirus. BRRV Czech SbCMV BRRV New Jersey BRRV Slovenia BRRV Poland BRRV IGC Florida PCSV Republic SbCMV 100 BRRV New Jersey 62 100 BRRV Slovenia 62 94 100 BRRV Czech Republic 62 95 95 100 BRRV Poland 62 95 96 97 100 BRRV IGC Florida 62 95 95 97 97 100 PCSV 61 62 61 62 62 62 100

157

Table B-5. Pairwise identity of the RdRp amino acid sequences of BlVT and selected members in the family Betaflexiviridae. AS GarCL CNRM SCSMa BanMM DV CarChV AVCa CLB ASG ACLS BlV PrV PV GV PV V V V V A 1 V V V V T T T A ASPV 100 GarCLV 41 100 CNRMV 40 37 100 SCSMaV 38 35 35 100 BanMMV 37 35 36 34 100 DVA 34 31 34 33 35 100 CarChV1 33 30 33 31 33 34 100 AVCaV 36 33 35 33 33 32 35 100 CLBV 32 31 33 30 33 33 35 46 100 ASGV 32 30 31 32 29 30 33 34 32 100 ACLSV 34 32 33 31 34 35 34 37 34 34 100 BlVT 32 30 32 30 32 35 33 35 34 33 34 100 PrVT 33 33 33 30 31 33 33 35 34 34 33 55 100 PVT 33 31 33 30 32 33 34 32 33 32 33 38 36 100 GVA 32 29 32 29 29 31 31 32 33 30 31 33 33 33 100

158

Table B-6. Pairwise identity of the CP amino acid sequences of BlVT and selected members in the family Betaflexiviridae. ACLSV PrVT BlVT ASGV GVA PVT AVCaV CNRMV SCSMaV BanMMV GarCLV BanVX CLBV ASPV DVA CarChV1 ACLSV 100 PrVT 32 100 BlVT 32 64 100 ASGV 32 27 24 100 GVA 28 28 32 34 100 PVT 25 31 30 29 35 100 AVCaV 24 26 26 24 25 29 100 CNRMV 24 14 14 25 16 23 16 100 SCSMaV 15 13 18 20 21 20 14 32 100 BanMMV 22 22 19 29 23 19 18 32 31 100 GarCLV 20 16 19 19 20 21 22 29 28 34 100 BanVX 15 13 18 24 23 17 18 30 28 33 25 100 CLBV 25 22 21 24 23 23 23 27 33 33 23 33 100 ASPV 26 24 21 24 25 22 22 30 33 29 30 31 27 100 DVA 22 27 21 29 26 28 22 23 20 19 20 18 25 27 100 CarChV1 22 21 24 22 21 22 19 17 18 17 21 20 20 18 23 100

159

a b

c

Figure B-1. PCR to obtain the CP region of the putative new Tepovirus. a) PCR optimization to obtain the CP region of the putative new Tepovirus using NT primer set in gradient PCR. Annealing temperature at 62ºC was selected for further used in PCR. b) Amplification of the CP region of the putative new Tepovirus using pooled RNA samples extracted from V. corymbosum leaves collected from Island Grove, producing expected band at ~1kB that was used as inset for cloning. c) Screening of RNA samples extracted from V. corymbosum leaves collected from Island Grove for the presence of the putative new Tepovirus. Cultivars name were shown at the top of the gel. M: 1kB DNA ladder. Courtesy of Norsazilawati Saad.

160

Gulfcoast-6 Gulfcoast-8 Windsor-11

Figure B-2. Symptoms on samples of V. corymbosum from Island Grove that were detected positive for the putative new Tepovirus by PCR, October 5, 2015. Photo courtesy of Norsazilawati Saad.

161

APPENDIX C CHAPTER 4 SUPPLEMENTARY DATA

Table C-1. Pairwise identity of putative Rep amino acid sequences of BG-1 and BG-2 and the Rep of selected members representing each genus in the family Geminiviridae using SDT. AF449 NC004 NC015 NC015 FM877 KM386 KC706 FJ665 X84 NC031 GU456 KC108 KT388 KT388 GU734 192 097 490 488 473 645 535 283 735 452 685 902 086 088 126 AF449 100 192 NC004 98 100 097 NC015 80 81 100 490 NC015 85 85 92 100 488 FM877 67 67 70 69 100 473 KM386 64 65 66 67 64 100 645 KC706 64 64 68 68 65 59 100 535 FJ6652 62 62 68 65 63 59 80 100 83 X8473 67 68 71 71 65 63 76 75 100 5 NC031 60 61 63 61 57 57 72 70 72 100 452 GU456 54 55 58 57 59 55 60 58 62 52 100 685 KC108 54 55 57 56 57 54 59 57 61 51 93 100 902 KT388 55 55 60 59 60 56 64 63 65 55 76 75 100 086 KT388 56 56 58 57 57 54 60 59 62 56 68 66 75 100 088 GU734 52 52 53 54 52 50 57 59 59 54 49 49 53 55 100 126

162

Table C-1. Continued. HCU49 FJ6656 FJ6656 AF003 DQ458 EF536 KJ4376 JX0942 KT214 KT214 KT214 HQ443 KP410 JQ9204 907 34 30 952 791 860 71 80 373 386 389 515 285 90 HCU49 100 907 FJ66563 46 100 4 FJ66563 45 91 100 0 AF0039 29 26 26 100 52 DQ4587 27 25 28 42 100 91 EF5368 30 28 29 41 39 100 60 KJ4376 30 25 26 39 41 38 100 71 JX0942 33 30 30 35 35 33 38 100 80 KT2143 42 42 43 36 32 33 33 55 100 73 KT2143 41 40 39 34 33 33 34 59 77 100 86 KT2143 40 42 42 30 35 33 35 50 68 73 100 89 HQ4435 26 26 27 28 29 27 31 28 30 27 29 100 15 KP4102 28 26 26 26 29 30 29 28 30 27 28 77 100 85 JQ9204 28 31 32 32 30 30 34 32 30 31 31 36 31 100 90

163

Table C-1. Continued. KP303687 JX559642 NC022002 BG-1 BG-2

KP303687 100

JX559642 33 100

NC022002 33 99 100

BG-1 32 29 28 100 BG-2 32 29 27 97 100 Abbreviation: AF003952 Maize streak virus KT214373 Alfalfa leaf curl virus AF449192 Macroptilium mosaic Puerto Rico virus[Bean] KT214386 Euphorbia caputmedusae latent virus DQ458791 Chickpea chlorotic dwarf virus KT214389 Plantago lanceolata latent virus EF536860 Wheat dwarf virus KT388086 Turnip leaf roll virus FJ665283 Bean golden mosaic virus KT388088 Turnip leaf roll virus FJ665630 Eragrostis curvula streak virus NC004097 Macroptilium mosaic Puerto Rico virus FJ665634 Eragrostis curvula streak virus NC015488 Rhynchosai mild mosaic virus FM877473 African cassava mosaic virus NC015490 Merremia mosaic Puerto Rico virus GU456685 Turnip curly top virus NC022002 Grapevine red blotch associated virus GU734126 Spinach severe curly top virus NC031452 Macroptilium bright mosaic virus HCU49907 Horseradish curly top virus X84735 Tomato pseudocurly top virus HQ443515 Spinach curly top Arizona virus JQ920490 Citrus chlorotic dwarf associated virus JX094280 French bean severe leaf curl virus JX559642 Grapevine geminivirus KC108902 Turnip curly top virus KC706535 Sida micrantha mosaic virus KJ437671 Axonopus compressus streak virus KM386645 Apple geminivirus KP303687 Mulberry mosaic dwarf associated virus KP410285 Beet curly top Iran virus

164

LIST OF REFERENCES

Adams, I., Miano, D., Kinyua, Z., Wangai, A., Kimani, E., Phiri, N., Reeder, R., Harju, V., Glover, R., Hany, U., 2013. Use of next‐generation sequencing for the identification and characterization of Maize chlorotic mottle virus and Sugarcane mosaic virus causing maize lethal necrosis in Kenya. Plant Pathology 62, 741-749.

Adams, I.P., Glover, R.H., Monger, W.A., Mumford, R., Jackeviciene, E., Navalinskiene, M., Samuitiene, M., Boonham, N., 2009. Next-generation sequencing and metagenomic analysis: a universal diagnostic tool in plant virology. Mol Plant Pathol 10, 537-545.

Adams, M., Lefkowitz, E., King, Q., Carstens, E., 2014. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses (2014). Archives of virology 159, 2831.

Adams, M.J., Antoniw, J.F., Fauquet, C.M., 2005. Molecular criteria for genus and species discrimination within the family Potyviridae. Arch Virol 150, 459-479.

Adams, M.J., Lefkowitz, E.J., King, A.M., Harrach, B., Harrison, R.L., Knowles, N.J., Kropinski, A.M., Krupovic, M., Kuhn, J.H., Mushegian, A.R., 2016. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses (2016). Archives of virology 161, 2921-2949.

Al Rwahnih, M., Daubert, S., Golino, D., Islas, C., Rowhani, A., 2015. Comparison of Next- Generation Sequencing Versus Biological Indexing for the Optimal Detection of Viral Pathogens in Grapevine. Phytopathology 105, 758-763.

Al Rwahnih, M., Daubert, S., Golino, D., Rowhani, A., 2009. Deep sequencing analysis of RNAs from a grapevine showing Syrah decline symptoms reveals a multiple virus infection that includes a novel virus. Virology 387, 395-401.

Al Rwahnih, M., Daubert, S., Urbez-Torres, J.R., Cordero, F., Rowhani, A., 2011. Deep sequencing evidence from single grapevine plants reveals a virome dominated by mycoviruses. Arch Virol 156, 397-403.

Allander, T., Emerson, S.U., Engle, R.E., Purcell, R.H., Bukh, J., 2001. A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus species. Proceedings of the National Academy of Sciences 98, 11609- 11614.

Allen, W., Schagen, J.V., Ebsary, B., 1984. Comparative transmission of the peach rosette mosaic virus by Ontario populations of Longidorus diadecturus and Xiphinema americanum (Nematoda: Longidoridae). Canadian Journal of Plant Pathology 6, 29-32.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25, 3389-3402.

165

Angly, F.E., Felts, B., Breitbart, M., Salamon, P., Edwards, R.A., Carlson, C., Chan, A.M., Haynes, M., Kelley, S., Liu, H., 2006. The marine viromes of four oceanic regions. PLoS biol 4, e368.

Bacher, J., Warkentin, D., Ramsdell, D., Hancock, J., 1994a. Sequence analysis of the 3'termini of RNA1 and RNA2 of blueberry leaf mottle virus. Virus research 33, 145-156.

Bacher, J., Warkentin, D., Ramsdell, D., Hancock, J., 1994b. Selection versus recombination: what is maintaining identity in the 3′ termini of blueberry leaf mottle nepovirus RNA1 and RNA2? Journal of general virology 75, 2133-2137.

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology 19, 455-477.

Bao, Y., Chetvernin, V., Tatusova, T., 2014. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification. Archives of virology 159, 3293-3304.

Barba, M., Czosnek, H., Hadidi, A., 2014. Historical perspective, development and applications of next-generation sequencing in plant virology. Viruses 6, 106-136.

Beerenwinkel, N., Günthard, H.F., Roth, V., Metzner, K.J., 2012. Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Frontiers in microbiology 3, 329.

Bernardo, P., Muhire, B., François, S., Deshoux, M., Hartnady, P., Farkas, K., Kraberger, S., Filloux, D., Fernandez, E., Galzi, S., 2016. Molecular characterization and prevalence of two capulaviruses: Alfalfa leaf curl virus from France and Euphorbia caput-medusae latent virus from South Africa. Virology 493, 142-153.

Bhat, A.I., Hohn, T., Selvarajan, R., 2016. Badnaviruses: The Current Global Scenario. Viruses 8.

Blinkova, O., Victoria, J., Li, Y., Keele, B.F., Sanz, C., Ndjango, J.-B.N., Peeters, M., Travis, D., Lonsdorf, E.V., Wilson, M.L., 2010. Novel circular DNA viruses in stool samples of wild-living chimpanzees. Journal of General Virology 91, 74-86.

Blouin, A.G., Ross, H.A., Hobson‐Peters, J., O'Brien, C.A., Warren, B., MacDiarmid, R., 2016. A new virus discovered by immunocapture of double‐stranded RNA, a rapid method for virus enrichment in metagenomic studies. Molecular Ecology Resources.

Bolger, A.M., Lohse, M., Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.

Breitbart, M., Rohwer, F., 2005. Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing. BioTechniques 39, 729-736.

166

Breitbart, M., Salamon, P., Andresen, B., Mahaffy, J.M., Segall, A.M., Mead, D., Azam, F., Rohwer, F., 2002. Genomic analysis of uncultured marine viral communities. Proceedings of the National Academy of Sciences 99, 14250-14255.

Bristow, P., Martin, R., 1999. Transmission and the role of honeybees in field spread of blueberry shock ilarvirus, a pollen-borne virus of highbush blueberry. Phytopathology 89, 124-130.

Bristow, P.R., Martin, R.R., Windom, G.E., 2000. Transmission, field spread, cultivar response, and impact on yield in highbush blueberry infected with Blueberry scorch virus. Phytopathology 90, 474-479.

Burkle, C., Olmstead, J., Harmon, P., 2012. A potential vector of Blueberry necrotic ring blotch virus and symptoms on various host genotypes. Phytopathology 102, S4.

CABI. 2003. Strawberry latent ringspot virus. In: Protection Crop Compendium. Wallingford, UK, CAB International. www.cabi.org/cpc.

Candresse, T., Filloux, D., Muhire, B., Julian, C., Galzi, S., Fort, G., Bernardo, P., Daugrois, J.H., Fernandez, E., Martin, D.P., Varsani, A., Roumagnac, P., 2014. Appearances can be deceptive: revealing a hidden viral infection with deep sequencing in a plant quarantine context. PLoS One 9, e102945.

Candresse, T., Marais, A., Faure, C., Gentit, P., 2013. Association of Little cherry virus 1 (LChV1) with the Shirofugen stunt disease and characterization of the genome of a divergent LChV1 isolate. Phytopathology 103, 293-298.

Cantu-Iris, M., Harmon, P.F., Londono, A., Polston, J.E., 2013. A variant of blueberry necrotic ring blotch virus associated with red lesions in blueberry. Arch Virol 158, 2197-2200.

Caruso, F.L., Ramsdell, D.C., 1995. Compendium of blueberry and cranberry diseases. American Phytopathological Society.

Casamali, B., Williamson, J.G., Kovaleski, A.P., Sargent, S.A., Darnell, R.L., 2016. Mechanical Harvesting and Postharvest Storage of Two Southern Highbush Blueberry Cultivars Grafted onto Vaccinium arboreum Rootstocks. HortScience 51, 1503-1510.

Cavileer, T.D., Halpern, B.T., Lawrence, D.M., Podleckis, E.V., Martin, R.R., Hillman, B.I., 1994. Nucleotide sequence of the carlavirus associated with blueberry scorch and similar diseases. Journal of General Virology 75, 711-720.

Childress, A., Ramsdell, D., 1986. Detection of blueberry leaf mottle virus in highbush blueberry pollen and seed. Virus 100, 10.

Childress, A., Ramsdell, D., 1987. Bee-mediated transmission of blueberry leaf mottle virus via infected pollen in highbush blueberry. Phytopathology 77, 167-172.

167

Ciuffo, M., Pettiti, D., Gallo, S., Masenga, V., Turina, M., 2005. First report of Blueberry scorch virus in Europe. Plant pathology 54, 565-565.

Cline, W., 2012. New and emerging diseases of blueberry, X International Symposium on Vaccinium and Other Superfruits 1017, pp. 45-49.

Colson, P., Richet, H., Desnues, C., Balique, F., Moal, V., Grob, J.-J., Berbis, P., Lecoq, H., Harlé, J.-R., Berland, Y., 2010. Pepper mild mottle virus, a plant virus associated with specific immune responses, fever, abdominal pains, and pruritus in humans. PloS one 5, e10041.

Converse, R., Ramsdell, D., 1982. Occurrence of tomato and tobacco ringspot viruses and of dagger and other nematodes associated with cultivated highbush blueberries in Oregon. Plant Disease 66, 710-712.

Cropley, R., 1961. Cherry leaf‐roll virus. Annals of Applied Biology 49, 524-529.

Culley, A.I., Lang, A.S., Suttle, C.A., 2006. Metagenomic analysis of coastal RNA virus communities. Science 312, 1795-1798.

Culley, A.I., Lang, A.S., Suttle, C.A., 2007. The complete genomes of three viruses assembled from shotgun libraries of marine RNA virus communities. Virology journal 4, 69.

Dayaram, A., Opong, A., Jäschke, A., Hadfield, J., Baschiera, M., Dobson, R.C., Offei, S.K., Shepherd, D.N., Martin, D.P., Varsani, A., 2012. Molecular characterisation of a novel cassava associated circular ssDNA virus. Virus research 166, 130-135.

Del Fabbro, C., Scalabrin, S., Morgante, M., Giorgi, F.M., 2013. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8, e85024.

Desnues, C., Rodriguez-Brito, B., Rayhawk, S., Kelley, S., Tran, T., Haynes, M., Liu, H., Furlan, M., Wegley, L., Chau, B., 2008. Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature 452, 340-343.

Dias, H., 1975. Peach rosette mosaic virus. CMI/AAB Descriptions of plant viruses 150.

Dias, H.F., Cation, D., 1976. The characterization of a virus responsible for peach rosette mosaic and grape decline in Michigan. Canadian Journal of Botany 54, 1228-1239.

Diaz-Lara, A., Martin, R.R., 2016. Blueberry fruit drop-associated virus: A New Member of the Family Caulimoviridae Isolated From Blueberry Exhibiting Fruit-Drop Symptoms. Plant Disease 100, 2211-2214.

Digiaro, M., Elbeaino, T., Martelli, G.P., 2007. Development of degenerate and species-specific primers for the differential and simultaneous RT-PCR detection of grapevine-infecting nepoviruses of subgroups A, B and C. Journal of virological methods 141, 34-40.

168

Djikeng, A., Kuzmickas, R., Anderson, N.G., Spiro, D.J., 2009. Metagenomic analysis of RNA viruses in a fresh water lake. PLOS one 4, e7264.

Dodds, J., Morris, T., Jordan, R., 1984. Plant viral double-stranded RNA. Annual review of phytopathology 22, 151-168.

Dombrovsky, A., Glanz, E., Lachman, O., Sela, N., Doron-Faigenboim, A., Antignus, Y., 2013. The complete genomic sequence of Pepper yellow leaf curl virus (PYLCV) and its implications for our understanding of evolution dynamics in the genus Polerovirus. PloS one 8, e70722.

Donaire, L., Wang, Y., Gonzalez-Ibeas, D., Mayer, K.F., Aranda, M.A., Llave, C., 2009. Deep- sequencing of plant viral small RNAs reveals effective and widespread targeting of viral genomes. Virology 392, 203-214.

Donaldson, E.F., Haskew, A.N., Gates, J.E., Huynh, J., Moore, C.J., Frieman, M.B., 2010. Metagenomic analysis of the viromes of three North American bat species: viral diversity among different bat species that share a common habitat. Journal of virology 84, 13004- 13018.

Duffy, S., Shackelton, L.A., Holmes, E.C., 2008. Rates of evolutionary change in viruses: patterns and determinants. Nature Reviews Genetics 9, 267-276.

Eastwell, K., Howell, W., 2010. Characterization of Cherry leafroll virus in sweet cherry in Washington State. Plant Disease 94, 1067-1067.

Eastwell, K.C., Mekuria, T.A., Druffel, K.L., 2012. Complete nucleotide sequences and genome organization of a cherry isolate of cherry leaf roll virus. Archives of virology 157, 761- 764.

Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32, 1792-1797.

Eid, S., Pappu, H.R., 2014. Expression of endogenous para-retroviral genes and molecular analysis of the integration events in its plant host Dahlia variabilis. Virus Genes 48, 153- 159.

Elbeaino, T., Digiaro, M., Fallanaj, F., Kuzmanovic, S., Martelli, G.P., 2011. Complete nucleotide sequence and genome organisation of grapevine Bulgarian latent virus. Archives of virology 156, 875-879.

Elbeaino, T., Giampetruzzi, A., De Stradis, A., Digiaro, M., 2014. Deep-sequencing analysis of an apricot tree with vein clearing symptoms reveals the presence of a novel betaflexivirus. Virus Res 181, 1-5.

Evans, E.A., Ballen, F.H., 2014. An Overview of US Blueberry Production, Trade, and Consumption, with Special Reference to Florida. EDIS #FE952. UF/IFAS Extension, Gainesville, FL. http://edis.ifas.ufl.edu/pdffiles/FE/FE95200.pdf

169

Fabre, F., Montarry, J., Coville, J., Senoussi, R., Simon, V., Moury, B., 2012. Modelling the evolutionary dynamics of viruses within their hosts: a case study using high-throughput sequencing. PLoS Pathog 8, e1002654.

FAOSTAT. 2015. Production: Crops: Blueberry. Food and Agriculture Organization of the United Nations, Rome, Italy.

Fauquet, C.M., Mayo, M.A., Maniloff, J., Desselberger, U., Ball, L.A., 2005. Virus taxonomy: VIIIth report of the International Committee on Taxonomy of Viruses. Academic Press.

Fierer, N., Breitbart, M., Nulton, J., Salamon, P., Lozupone, C., Jones, R., Robeson, M., Edwards, R.A., Felts, B., Rayhawk, S., 2007. Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil. Applied and environmental microbiology 73, 7059-7066.

Forer, L., Stouffer, R., 1982. Xiphinema spp. associated with tomato ringspot virus infection of Pennsylvania fruit crops. Plant Disease 66, 735-736.

Fuchs, M., Abawi, G., Marsella-Herrick, P., Cox, R., Cox, K., Carroll, J., Martin, R., 2010. Occurrence of Tomato ringspot virus and Tobacco ringspot virus in highbush blueberry in New York State. Journal of Plant Pathology, 451-459.

Gauthier, N., Polashock, J., Veetil, T., Martin, R., Beale, J., 2015. First report of blueberry mosaic disease caused by blueberry mosaic associated virus in Kentucky. Plant Disease 99, 421-421.

Geering, A.D., Scharaschkin, T., Teycheney, P.Y., 2010. The classification and nomenclature of endogenous viruses of the family Caulimoviridae. Arch Virol 155, 123-131.

Gillett, J., Ramsdell, D., 1988. Blueberry red ringspot virus. AAB Descriptions of plant viruses 327.

Glasheen, B.M., Polashock, J.J., Lawrence, D.M., Gillett, J.M., Ramsdell, D.C., Vorsa, N., Hillman, B.I., 2002. Cloning, sequencing, and promoter identification of Blueberry red ringspot virus, a member of the family Caulimoviridae with similarities to the "Soybean chlorotic mottle-like" genus. Arch Virol 147, 2169-2186.

Goodwin, S., McPherson, J.D., McCombie, W.R., 2016. Coming of age: ten years of next- generation sequencing technologies. Nat Rev Genet 17, 333-351.

Gorbalenya, A.E., Koonin, E.V., Wolf, Y.I., 1990. A new superfamily of putative NTP-binding domains encoded by genomes of small DNA and RNA viruses. FEBS letters 262, 145- 148.

170

Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644-652.

Gu, Y.-H., Tao, X., Lai, X.-J., Wang, H.-Y., Zhang, Y.-Z., 2014. Exploring the polyadenylated RNA virome of sweet potato through high-throughput sequencing. PloS one 9, e98884.

Hall, R.J., Wang, J., Todd, A.K., Bissielo, A.B., Yen, S., Strydom, H., Moore, N.E., Ren, X., Huang, Q.S., Carter, P.E., Peacey, M., 2014. Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery. J Virol Methods 195, 194-204.

Hancock, J., Lyrene, P., Finn, C., Vorsa, N., Lobos, G., 2008. Blueberries and cranberries, Temperate fruit crop breeding. Springer, pp. 115-150.

Hany, U., Adams, I., Glover, R., Bhat, A., Boonham, N., 2014. The complete genome sequence of Piper yellow mottle virus (PYMoV). Archives of virology 159, 385-388.

Haramoto, E., Kitajima, M., Kishida, N., Konno, Y., Katayama, H., Asami, M., Akiba, M., 2013. Occurrence of pepper mild mottle virus in drinking water sources in Japan. Applied and environmental microbiology 79, 7413-7418.

He, Y., Yang, Z., Hong, N., Wang, G., Ning, G., Xu, W., 2015. Deep sequencing reveals a novel closterovirus associated with wild rose leaf rosette disease. Molecular plant pathology 16, 449-458.

Ho, T., Martin, R.R., Tzanetakis, I.E., 2015. Next-generation sequencing of elite berry germplasm and data analysis using a bioinformatics pipeline for virus detection and discovery. Methods Mol Biol 1302, 301-313.

Holland, R., Christiano, R., Scherm, H., 2013. Transmission of bacterial leaf scorch, Blueberry red ringspot virus, and Blueberry necrotic ringblotch-associated virus through softwood cuttings. Location, transmission, and impact of xylella fastidiosa in southern highbush blueberries, 41.

Hurwitz, B.L., Sullivan, M.B., 2013. The Pacific Ocean Virome (POV): a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology. PloS one 8, e57355.

Hutchinson, M., Varney, E., 1954. Ringspot-A virus disease of cultivated blueberry.

Igori, D., Lim, S., Baek, D., Seo, E., Cho, I.-S., Choi, G.-S., Lim, H.-S., Moon, J.S., 2017. Complete nucleotide sequence and genome organization of peach virus D, a putative new member of the genus Marafivirus. Archives of Virology, 1-4.

171

Iskra-Caruana, M.-L., Baurens, F.-C., Gayral, P., Chabannes, M., 2010. A four-partner plant– virus interaction: enemies can also come from within. Molecular plant-microbe interactions 23, 1394-1402.

Isogai, M., Matsuhashi, Y., Suzuki, K., Yashima, S., Watanabe, M., Yoshikawa, N., 2016. Occurrence of blueberry mosaic associated virus in highbush blueberry trees with blueberry mosaic disease in Japan. Journal of General Plant Pathology, 1-3.

Isogai, M., Muramatu, S., Watanabe, M., Yoshikawa, N., 2013. Complete nucleotide sequence and latency of a novel blueberry-infecting closterovirus. Journal of General Plant Pathology 79, 123-127.

Isogai, M., Nakamura, T., Ishii, K., Watanabe, M., Yamagishi, N., Yoshikawa, N., 2011. Histochemical detection of Blueberry latent virus in highbush blueberry plant. Journal of General Plant Pathology 77, 304-306.

Isogai, M., Tatuto, N., Ujiie, C., Watanabe, M., Yoshikawa, N., 2012. Identification and characterization of blueberry latent spherical virus, a new member of subgroup C in the genus Nepovirus. Arch Virol 157, 297-303.

Jaswal, A.S., 1990. Occurrence of blueberry leaf mottle, blueberry shoestring,-tomato ringspot and tobacco ringspot viruses in eleven halfhigh blueberry clones grown in New Brunswick, Canada. Canadian Plant Disease Survey 70, 113.

Jo, Y., Choi, H., Cho, J.K., Yoon, J.Y., Choi, S.K., Cho, W.K., 2015a. In silico approach to reveal viral populations in grapevine cultivar Tannat using transcriptome data. Sci Rep 5, 15841.

Jo, Y., Choi, H., Cho, W.K., 2015b. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data. Genome Announc 3.

Jo, Y., Choi, H., Kim, S.M., Kim, S.L., Lee, B.C., Cho, W.K., 2016b. Integrated analyses using RNA-Seq data reveal viral genomes, single nucleotide variations, the phylogenetic relationship, and recombination for Apple stem grooving virus. BMC Genomics 17, 579.

Jo, Y., Choi, H., Kim, S.-M., Kim, S.-L., Lee, B.C., Cho, W.K., 2017. The pepper virome: natural co-infection of diverse viruses and their quasispecies. BMC Genomics 18, 453.

Jo, Y., Choi, H., Yoon, J.Y., Choi, S.K., Cho, W.K., 2016a. In silico identification of Bell pepper endornavirus from pepper transcriptomes and their phylogenetic and recombination analyses. Gene 575, 712-717.

Kaur, N., Hasegawa, D.K., Ling, K.S., Wintermantel, W.M., 2016. Application of Genomics for Understanding Plant Virus-Insect Vector Interactions and Insect Vector Control. Phytopathology 106, 1213-1222.

Kehoe, M., Coutts, B., Buirchell, B., Jones, R., 2014. Hardenbergia mosaic virus: crossing the barrier between native and introduced plant species. Virus research 184, 87-92.

172

Kesanakurti, P., Belton, M., Saeed, H., Rast, H., Boyes, I., Rott, M., 2016. Screening for plant viruses by next generation sequencing using a modified double strand RNA extraction protocol with an internal amplification control. J Virol Methods 236, 35-40.

Kim, K., Ramsdell, D., Gillett, J., Fulton, J., 1981. Virions and Ultrastructural Changes Associated With Blueberry Red Ringspot Disease. Phytopathology 71, 673-678.

Kim, K.-H., Chang, H.-W., Nam, Y.-D., Roh, S.W., Kim, M.-S., Sung, Y., Jeon, C.O., Oh, H.- M., Bae, J.-W., 2008. Amplification of uncultured single-stranded DNA viruses from rice paddy soil. Applied and environmental microbiology 74, 5975-5985.

Kircher, M., Kelso, J., 2010. High-throughput DNA sequencing--concepts and limitations. Bioessays 32, 524-536.

Koh, S.H., Ong, J.W., Admiraal, R., Sivasithamparam, K., Jones, M.G., Wylie, S.J., 2016. A novel member of the from a wild legume, Gompholobium preissii. Arch Virol 161, 2893-2898.

Koonin, E.V., 1993. A common set of conserved motifs in a vast variety of putative nucleic acid- dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication. Nucleic acids research 21, 2541-2547.

Kraberger, S., Farkas, K., Bernardo, P., Booker, C., Argüello-Astorga, G.R., Mesléard, F., Martin, D.P., Roumagnac, P., Varsani, A., 2015. Identification of novel Bromus-and Trifolium-associated circular DNA viruses. Archives of virology 160, 1303-1311.

Kreuze, J.F., Perez, A., Untiveros, M., Quispe, D., Fuentes, S., Barker, I., Simon, R., 2009. Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388, 1-7.

Krupovic, M., Dolja, V.V., Koonin, E.V., 2015. Plant viruses of the Amalgaviridae family evolved via recombination between viruses with double-stranded and negative-strand RNA genomes. Biol Direct 10, 12.

Kumar, S., Stecher, G., Tamura, K., 2016. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular biology and evolution, msw054.

Labonté, J.M., Suttle, C.A., 2013. Previously unknown and highly divergent ssDNA viruses populate the oceans. The ISME journal 7, 2169.

Lammers, A.H., Allison, R.F., Ramsdell, D.C., 1999. Cloning and sequencing of peach rosette mosaic virus RNA1. Virus research 65, 57-73.

Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359.

173

Latvala-Kilby, S., Lehto, K., 1999. The complete nucleotide sequence of RNA2 of blackcurrant reversion nepovirus. Virus research 65, 87-92.

Lauber, C., Gorbalenya, A.E., 2012. Partitioning the genetic diversity of a virus family: approach and evaluation through a case study of . Journal of virology 86, 3890-3904.

Lawrence, D.M., Hillman, B.I., 1994. Synthesis of infectious transcripts of blueberry scorch carlavirus in vitro. Journal of general virology 75, 2509-2512.

Le Gall, O., Iwanami, T., Jones, A., Lehto, K., Sanfacon, H., Wellink, J., Wetzel, T., Yoshikawa, N., 2005. Comoviridae, Virus Taxonomy, VIIIth Report of the International Committee on Taxonomy of Viruses. Elsevier/Academic Press, pp. 807-818.

Lesney, M., Ramsdell, D., 1976. Purification and some properties of blueberry shoestring virus, I International Symposium on Small Fruit Diseases 66, pp. 105-109.

Li, H., Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754-1760.

Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., 2010. De novo assembly of human genomes with massively parallel short read sequencing. Genome research 20, 265-272.

Lister, R., 1964. Strawberry latent ringspot: a new nematode‐borne virus. Annals of Applied Biology 54, 167-176.

Lister, R., VARNEY, E., RANIERE, L., 1963. Relationships of viruses associated with ringspot diseases of blueberry. Phytopathology 53, 1031-&.

Loconsole, G., Önelge, N., Potere, O., Giampetruzzi, A., Bozan, O., Satar, S., De Stradis, A., Savino, V., Yokomi, R., Saponari, M., 2012b. Identification and characterization of Citrus yellow vein clearing virus, a putative new member of the genus Mandarivirus. Phytopathology 102, 1168-1175.

Loconsole, G., Saldarelli, P., Doddapaneni, H., Savino, V., Martelli, G.P., Saponari, M., 2012a. Identification of a single-stranded DNA virus associated with citrus chlorotic dwarf disease, a new member in the family Geminiviridae. Virology 432, 162-172.

López-Bueno, A., Tamames, J., Velázquez, D., Moya, A., Quesada, A., Alcamí, A., 2009. High diversity of the viral community from an Antarctic lake. Science 326, 858-861.

Luby, J.J., Ballington, J.R., Draper, A.D., Pliszka, K., Austin, M.E., 1991. Blueberries and cranberries (Vaccinium). Genetic Resources of Temperate Fruit and Nut Crops 290, 393- 458.

Lyrene, P., Vorsa, N., Ballington, J., 2003. Polyploidy and sexual polyploidization in the genus Vaccinium. Euphytica 133, 27-36.

174

Lyrene, P.M., 1997. Value of various taxa in breeding tetraploid blueberries in Florida. Euphytica 94, 15-22.

MacDonald, S., Martin, R., Bristow, P., 1991. Characterization of an ilarvirus associated with a necrotic shock reaction in blueberry. Phytopathology 81, 210-214.

Maliogka, V.I., Olmos, A., Pappi, P.G., Lotos, L., Efthimiou, K., Grammatikaki, G., Candresse, T., Katis, N.I., Avgelis, A.D., 2015. A novel grapevine badnavirus is associated with the Roditis leaf discoloration disease. Virus research 203, 47-55.

Marais, A., Faure, C., Mustafayev, E., Barone, M., Alioto, D., Candresse, T., 2015. Characterization by Deep Sequencing of Prunus virus T, a Novel Tepovirus Infecting Prunus Species. Phytopathology 105, 135-140.

Marchler-Bauer, A., Bo, Y., Han, L., He, J., Lanczycki, C.J., Lu, S., Chitsaz, F., Derbyshire, M.K., Geer, R.C., Gonzales, N.R., 2016. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic acids research 45, D200-D203.

Martin, R., 2006. Blueberry scorch virus. AAB Descriptions of Plant Viruses.

Martin, R., MacDonald, S., Podleckis, E., 1991. Relationships between blueberry scorch and Sheep Pen Hill viruses of highbush blueberry, XV International Symposium on Small Fruit Virus Diseases 308, pp. 131-140.

Martin, R., Tzanetakis, I., Caruso, F., Polashock, J., 2009. Emerging and reemerging virus diseases of blueberry and cranberry, IX International Vaccinium Symposium 810, pp. 299-304.

Martin, R.R., Bristow, P.R., 1988. A carlavirus associated with blueberry scorch disease. Phytopathology 78, 1636-1640.

Martin, R.R., Bristow, P.R., 1995. Scorch. Compendium of Blueberry and Cranberry Diseases. FL Caruso and DC Ramsdell, eds. The American Phytopathological Society, St. Paul, MN, 51-52.

Martin, R.R., Polashock, J.J., Tzanetakis, I.E., 2012. New and emerging viruses of blueberry and cranberry. Viruses 4, 2831-2852.

Martin, R.R., Zhou, J., Tzanetakis, I.E., 2011. Blueberry latent virus: an amalgam of the Partitiviridae and Totiviridae. Virus Res 155, 175-180.

Massart, S., Olmos, A., Jijakli, H., Candresse, T., 2014. Current impact and future directions of high throughput sequencing in plant virus diagnostics. Virus Res 188, 90-96.

Mayo, M., 2005. Changes to virus taxonomy 2004. Archives of Virology 150, 189-198.

175

Medina, C., Matus, J., Zúñiga, M., San-Martin, C., Arce-Johnson, P., 2006. Occurrence and distribution of viruses in commercial plantings of Rubus, Ribes and Vaccinium species in Chile. Ciencia e Investigación Agraria 33, 23-28.

Melcher, U., Muthukumar, V., Wiley, G.B., Min, B.E., Palmer, M.W., Verchot-Lubicz, J., Ali, A., Nelson, R.S., Roe, B.A., Thapa, V., Pierce, M.L., 2008. Evidence for novel viruses by analysis of nucleic acids in virus-like particle fractions from Ambrosia psilostachya. J Virol Methods 152, 49-55.

Melzer, M., Freitas-Astúa, J., Kitajima, E., Rodrigues, J., Roy, A., Wei, G., 2017. ICTV taxonomic proposal 2016.011 a-dP. A. v1. Blunervirus. Create the unassigned genus Blunervirus.

Metzker, M.L., 2010. Sequencing technologies - the next generation. Nat Rev Genet 11, 31-46.

Michenaud-Rague, A., Robinson, S., Landsberger, S., 2012. Trace elements in 11 fruits widely- consumed in the USA as determined by neutron activation analysis. Journal of Radioanalytical and Nuclear Chemistry 291, 237-240.

Milne, R.G., García, M.L., Moreno, P., 2003. Citrus psorosis virus. CMI/AAB (eds) Descriptions of plant viruses.

Milne, R.G., Garcia, M.L., Vaira, A.M., 2011. Ophiovirus, The Springer Index of Viruses. Springer, pp. 995-1003.

Mokili, J.L., Rohwer, F., Dutilh, B.E., 2012. Metagenomics and future perspectives in virus discovery. Curr Opin Virol 2, 63-77.

Moore, S.M., Borer, E.T., 2012. The influence of host diversity and composition on epidemiological patterns at multiple spatial scales. Ecology 93, 1095-1105.

Moore, S.M., Manore, C.A., Bokil, V.A., Borer, E.T., Hosseini, P., 2011. Spatiotemporal model of barley and cereal yellow dwarf virus transmission dynamics with seasonality and plant competition. Bulletin of mathematical biology 73, 2707-2730.

Morelli, M., Giampetruzzi, A., Laghezza, L., Catalano, L., Savino, V.N., Saldarelli, P., 2017. Identification and characterization of an isolate of apple green crinkle associated virus involved in a severe disease of quince (Cydonia oblonga, Mill.). Archives of Virology 162, 299-306.

Moretti, M., Ciuffo, M., Gotta, P., Prodorutti, D., Bragagna, P., Turina, M., 2011. Molecular characterization of two distinct strains of blueberry scorch virus (BlScV) in northern Italy. Archives of virology 156, 1295-1297.

Muhire, B.M., Varsani, A., Martin, D.P., 2014. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation. PloS one 9, e108277.

Murant, A., 1974. Strawberry latent ringspot virus. CMI/AAB Descriptions of plant viruses.

176

Muthukumar, V., Melcher, U., Pierce, M., Wiley, G.B., Roe, B.A., Palmer, M.W., Thapa, V., Ali, A., Ding, T., 2009. Non-cultivated plants of the Tallgrass Prairie Preserve of northeastern Oklahoma frequently contain virus-like sequences in particulate fractions. Virus Res 141, 169-173.

Nakamura, S., Yang, C.-S., Sakon, N., Ueda, M., Tougan, T., Yamashita, A., Goto, N., Takahashi, K., Yasunaga, T., Ikuta, K., 2009. Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach. PloS one 4, e4219.

Nash, T.E., Dallas, M.B., Reyes, M.I., Buhrman, G.K., Ascencio-Ibañez, J.T., Hanley-Bowdoin, L., 2011. Functional analysis of a novel motif conserved across geminivirus Rep proteins. Journal of virology 85, 1182-1192.

Ng, T.F., Marine, R., Wang, C., Simmonds, P., Kapusinszky, B., Bodhidatta, L., Oderinde, B.S., Wommack, K.E., Delwart, E., 2012. High variety of known and new RNA and DNA viruses of diverse origins in untreated sewage. J Virol 86, 12161-12175.

Ng, T.F.F., Chen, L.-F., Zhou, Y., Shapiro, B., Stiller, M., Heintzman, P.D., Varsani, A., Kondov, N.O., Wong, W., Deng, X., 2014. Preservation of viral genomes in 700-y-old caribou feces from a subarctic ice patch. Proceedings of the National Academy of Sciences 111, 16842-16847.

Ng, T.F.F., Duffy, S., Polston, J.E., Bixby, E., Vallad, G.E., Breitbart, M., 2011. Exploring the diversity of plant DNA viruses and their satellites using vector-enabled metagenomics on whiteflies. PloS one 6, e19050.

Noreen, F., Akbergenov, R., Hohn, T., Richert‐Pöggeler, K.R., 2007. Distinct expression of endogenous Petunia vein clearing virus and the DNA transposon dTph1 in two Petunia hybrida lines is correlated with differences in histone modification and siRNA production. The Plant Journal 50, 219-229.

Nouri, S., Salem, N., Nigg, J.C., Falk, B.W., 2015. A diverse array of new viral sequences identified in worldwide populations of the Asian citrus psyllid (Diaphorina citri) using viral metagenomics. Journal of virology, JVI. 02793-02715.

Olmstead, J.W., Armenta, H.P.R., Lyrene, P.M., 2013. Using sparkleberry as a genetic source for machine harvest traits for southern highbush blueberry. HortTechnology 23, 419-424.

Ong, J., Phillips, R., Dixon, K., Jones, M.G., Wylie, S., 2016. Characterization of the first two viruses described from wild populations of hammer orchids (Drakaea spp.) in Australia. Plant Pathology 65, 163-172.

Oudemans, P.V., Hillman, B.I., Linder-Basso, D., Polashock, J.J., 2011. Visual inspections of nursery stock fail to protect new plantings from Blueberry scorch virus infection. Crop protection 30, 871-875.

177

Pace, N.R., Sapp, J., Goldenfeld, N., 2012. Phylogeny and beyond: Scientific, historical, and conceptual significance of the first tree of life. Proceedings of the National Academy of Sciences 109, 1011-1018.

Pacot-Hiriart, C., Latvala-Kilby, S., Lehto, K., 2001. Nucleotide sequence of black currant reversion associated nepovirus RNA1. Virus research 79, 145-152.

Paduch-Cichal, E., Chodorska, M., Kalinowska, E., Komorowska, B., 2014. Year-round blueberry scorch virus detection in highbush blueberry. Acta Scientiarum Polonorum. Hortorum Cultus 13.

Paduch-Cichal, E., Kalinowska, E., Chodorska, M., Sala-Rejczak, K., Nowak, B., 2011. Detection and identification of viruses of highbush blueberry and cranberry using serological elisa test and PCR technique. Acta Scientiarum Polonorum-Hortorum Cultus 10, 201-215.

Pirovano, W., Miozzi, L., Boetzer, M., Pantaleo, V., 2014. Bioinformatics approaches for viral metagenomics in plants using short RNAs: model case of study and application to a Cicer arietinum population. Front Microbiol 5, 790.

Podleckis, E., Davis, R., Stretch, A., SCHULZE, C., 1986. Flexuous rod particles associated with Sheep Pen Hill Disease of highbush blueberries, Phytopathology. AMER PHYTOPATHOLOGICAL SOC 3340 PILOT KNOB ROAD, ST PAUL, MN 55121, pp. 1065-1065.

Polashock, J.J., Ehlenfeldt, M.K., Crouch, J.A., 2009. Molecular Detection and Discrimination ofBlueberry red ringspot virusStrains Causing Disease in Cultivated Blueberry and Cranberry. Plant Disease 93, 727-733.

Power, A.G., Borer, E.T., Hosseini, P., Mitchell, C.E., Seabloom, E.W., 2011. The community ecology of barley/cereal yellow dwarf viruses in Western US grasslands. Virus Research 159, 95-100.

Power, A.G., Mitchell, C.E., 2004. Pathogen spillover in disease epidemics. the american naturalist 164, S79-S89.

Prosperi, M.C., Prosperi, L., Bruselles, A., Abbate, I., Rozera, G., Vincenti, D., Solmone, M.C., Capobianchi, M.R., Ulivi, G., 2011. Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing. BMC bioinformatics 12, 5.

Quito-Avila, D.F., Brannen, P.M., Cline, W.O., Harmon, P.F., Martin, R.R., 2013. Genetic characterization of Blueberry necrotic ring blotch virus, a novel RNA virus with unique genetic features. J Gen Virol 94, 1426-1434.

Ramalho, T., Figueira, A., Sotero, A., Wang, R., Duarte, P.G., Farman, M., Goodin, M., 2014. Characterization of Coffee ringspot virus-Lavras: A model for an emerging threat to coffee production and quality. Virology 464, 385-396.

178

Ramsdell, D., Gillett, J., 1981. Peach rosette mosaic virus in highbush blueberry. Plant Disease 65, 757-758.

Ramsdell, D., Myers, R., 1974. Peach rosette mosaic virus, symptomatology, and nematodes associated with grapevine degeneration in Michigan. Phytopathology 64, 1174-1178.

Ramsdell, D., Stace-Smith, R., 1979. Blueberry leaf mottle, a new disease of highbush blueberry, II International Symposium on Small Fruit Virus Diseases 95, pp. 37-48.

Ramsdell, D., Stretch, A., 1987. Blueberry mosaic. Agriculture handbook-United States Department of Agriculture, Combined Forest Pest Research and Development Program (USA).

Ramsdell, D.C., 1979. Physical and chemical properties of blueberry shoestring virus. Phytopathology 69, 1087-1091.

Rebenstorf, K., Candresse, T., Dulucq, M.J., Büttner, C., Obermeier, C., 2006. Host species- dependent population structure of a pollen-borne plant virus, Cherry leaf roll virus. Journal of virology 80, 2453-2462.

Reyes, A., Haynes, M., Hanson, N., Angly, F.E., Heath, A.C., Rohwer, F., Gordon, J.I., 2010. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334-338.

Richert-Pöggeler, K., Turhal, A.-K., Schuhmann, S., Maaß, C., Blockus, S., Zimmermann, E., Eastwell, K., Martin, R., Lockhart, B., 2012. Carlavirus biodiversity in horticultural host plants: Efficient virus detection and identification combining electron microscopy and molecular biology tools, XIII International Symposium on Virus Diseases of Ornamental Plants 1072, pp. 37-45.

Robinson, T.S., Scherm, H., Brannen, P., Holland, R.M., Deom, C.M., 2016. Blueberry necrotic ring blotch virus in southern highbush blueberry: insights into in-planta and in-field movement. Plant Disease.

Roossinck, M.J., 2012. Persistent plant viruses: molecular hitchhikers or epigenetic elements?, Viruses: Essential Agents of Life. Springer, pp. 177-186

Roossinck, M.J., 2012. Plant virus metagenomics: biodiversity and ecology. Annu Rev Genet 46, 359-369.

Roossinck, M.J., 2015b. A new look at plant viruses and their potential beneficial roles in crops. Molecular plant pathology 16, 331-333.

Roossinck, M.J., 2015c. Plants, viruses and the environment: Ecology and mutualism. Virology 479, 271-277.

Roossinck, M.J., García-Arenal, F., 2015d. Ecosystem simplification, biodiversity loss and plant virus emergence. Current opinion in virology 10, 56-62.

179

Roossinck, M.J., Martin, D.P., Roumagnac, P., 2015a. Plant Virus Metagenomics: Advances in Virus Discovery. Phytopathology 105, 716-727.

Roossinck, M.J., Saha, P., Wiley, G.B., Quan, J., White, J.D., Lai, H., Chavarria, F., Shen, G., Roe, B.A., 2010. Ecogenomics: using massively parallel pyrosequencing to understand virus ecology. Mol Ecol 19 Suppl 1, 81-88.

Rosario, K., Capobianco, H., Ng, T.F., Breitbart, M., Polston, J.E., 2014. RNA viral metagenome of whiteflies leads to the discovery and characterization of a whitefly-transmitted carlavirus in North America. PLoS One 9, e86748.

Rosario, K., Duffy, S., Breitbart, M., 2012. A field guide to eukaryotic circular single-stranded DNA viruses: insights gained from metagenomics. Archives of Virology 157, 1851-1871.

Rosario, K., Marr, C., Varsani, A., Kraberger, S., Stainton, D., Moriones, E., Polston, J.E., Breitbart, M., 2016. Begomovirus-Associated Satellite DNA Diversity Captured Through Vector-Enabled Metagenomic (VEM) Surveys Using Whiteflies (Aleyrodidae). Viruses 8, 36.

Rosario, K., Nilsson, C., Lim, Y.W., Ruan, Y., Breitbart, M., 2009a. Metagenomic analysis of viruses in reclaimed water. Environmental microbiology 11, 2806-2820.

Rosario, K., Padilla-Rodriguez, M., Kraberger, S., Stainton, D., Martin, D.P., Breitbart, M., Varsani, A., 2013. Discovery of a novel mastrevirus and alphasatellite-like circular DNA in dragonflies (Epiprocta) from Puerto Rico. Virus Res 171, 231-237.

Rosario, K., Symonds, E.M., Sinigalliano, C., Stewart, J., Breitbart, M., 2009b. Pepper mild mottle virus as an indicator of fecal pollution. Applied and environmental microbiology 75, 7261-7267.

Rott, M.E., Gilchrist, A., Lee, L., Rochon, D.A., 1995. Nucleotide sequence of tomato ringspot virus RNA1. Journal of General Virology 76, 465-473.

Rott, M.E., Tremaine, J., Rochon, D., 1991. Nucleotide sequence of tomato ringspot virus RNA- 2. Journal of general virology 72, 1505-1514.

Roux, S., Hallam, S.J., Woyke, T., Sullivan, M.B., 2015. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife 4.

Roy, A., Choudhary, N., Guillermo, L.M., Shao, J., Govindarajulu, A., Achor, D., Wei, G., Picton, D., Levy, L., Nakhla, M., 2013. A novel virus of the genus Cilevirus causing symptoms similar to citrus leprosis. Phytopathology 103, 488-500.

Sabanadzovic, S., Valverde, R.A., Brown, J.K., Martin, R.R., Tzanetakis, I.E., 2009. Southern tomato virus: the link between the families Totiviridae and Partitiviridae. Virus research 140, 130-137.

180

Sandoval, C.R., Ramsdell, D.C., Hancock, J.F., 1995. Infection of wild and cultivated Vaccinium spp. with blueberry leaf mottle nepovirus. Annals of applied biology 126, 457-464.

Sanfaçon, H., Iwanami, T., Karasev, A., Van der Vlugt, R., Wellink, J., Wetzel, T., Yoshikawa, N., 2011. Family secoviridae, Virus taxonomy. Elsevier, pp. 881-900.

Sanfaçon, H., Wellink, J., Le Gall, O., Karasev, A., Van der Vlugt, R., Wetzel, T., 2009. Secoviridae: a proposed family of plant viruses within the order that combines the families Sequiviridae and Comoviridae, the unassigned genera Cheravirus and Sadwavirus, and the proposed genus Torradovirus. Archives of virology 154, 899- 907.

Schmelzer, K., 1969. Strawberry latent ringspot virus in Euonymous, Acacia, and Aesculus. Phytopathol. Z 66, 1-24.

Scholz, M.B., Lo, C.C., Chain, P.S., 2012. Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr Opin Biotechnol 23, 9- 15.

Seguin, J., Rajeswaran, R., Malpica-Lopez, N., Martin, R.R., Kasschau, K., Dolja, V.V., Otten, P., Farinelli, L., Pooggin, M.M., 2014. De novo reconstruction of consensus master genomes of plant RNA and DNA viruses from siRNAs. PloS one 9, e88513.

Simmonds, P., 2015. Methods for virus classification and the challenge of incorporating metagenomic sequence data. J Gen Virol 96, 1193-1206.

Simmonds, P., Adams, M.J., Benko, M., Breitbart, M., Brister, J.R., Carstens, E.B., Davison, A.J., Delwart, E., Gorbalenya, A.E., Harrach, B., Hull, R., King, A.M., Koonin, E.V., Krupovic, M., Kuhn, J.H., Lefkowitz, E.J., Nibert, M.L., Orton, R., Roossinck, M.J., Sabanadzovic, S., Sullivan, M.B., Suttle, C.A., Tesh, R.B., van der Vlugt, R.A., Varsani, A., Zerbini, F.M., 2017. Consensus statement: Virus taxonomy in the age of metagenomics. Nat Rev Microbiol 15, 161-168.

Smits, S.L., Bodewes, R., Ruiz-Gonzalez, A., Baumgartner, W., Koopmans, M.P., Osterhaus, A.D., Schurch, A.C., 2014. Assembly of viral genomes from metagenomes. Front Microbiol 5, 714.

Stace-Smith, R., 1984. Tomato ringspot virus. CMI/AAB descriptions of plant viruses 290.

Stace-Smith, R., 1985. Tobacco ringspot virus. CMI/AAB Descriptions of Plant Viruses No. 309 (No. 17 revised). Association of Applied Biologists, Wellesbourne, UK.

Stainton, D., Collings, D.A., Varsani, A., 2015. Genome sequence of banana streak MY virus from the Pacific Ocean island of Tonga. Genome announcements 3, e00543-00515.

Stobbe, A.H., Roossinck, M.J., 2014. Plant virus metagenomics: what we know and why we need to know more. Front Plant Sci 5, 150.

181

Strik, B., 2005. Blueberry: an expanding world berry crop. Chronica Horticulturae 45, 7-12.

Strik, B.C., Yarborough, D., 2005. Blueberry production trends in North America, 1992 to 2003, and predictions for growth. HortTechnology 15, 391-398.

Susi, H., Laine, A.-L., Filloux, D., Kraberger, S., Farkas, K., Bernardo, P., Frilander, M.J., Martin, D.P., Varsani, A., Roumagnac, P., 2017. Genome sequences of a capulavirus infecting Plantago lanceolata in the Åland archipelago of Finland. Archives of Virology, 1-5.

Tamaki, H., Zhang, R., Angly, F.E., Nakamura, S., Hong, P.Y., Yasunaga, T., Kamagata, Y., Liu, W.T., 2012. Metagenomic analysis of DNA viruses in a wastewater treatment plant in tropical climate. Environmental microbiology 14, 441-452.

Tang, J., Ward, L.I., Clover, G.R., 2013. The diversity of Strawberry latent ringspot virus in New Zealand. Plant disease 97, 662-667.

Teycheney, P.-Y., Geering, A.D., 2011. Endogenous viral sequences in plant genomes. Recent Advances in Plant Virology, 343-362.

Thekke-Veetil, T., Ho, T., Keller, K.E., Martin, R.R., Tzanetakis, I.E., 2014. A new ophiovirus is associated with blueberry mosaic disease. Virus Res 189, 92-96.

Thekke-Veetil, T., Polashock, J., Marn, M.V., Pleško, I.M., Keller, K.E., Martin, R.R., Ho, T., Tzanetakis, I., 2014. Molecular Characterization and Population Structure of Blueberry Mosaic Associated Virus.

Thekke-Veetil, T., Polashock, J.J., Marn, M.V., Plesko, I.M., Schilder, A.C., Keller, K.E., Martin, R.R., Tzanetakis, I.E., 2015. Population structure of blueberry mosaic associated virus: Evidence of reassortment in geographically distinct isolates. Virus research 201, 79-84.

Thomas, T., Gilbert, J., Meyer, F., 2012. Metagenomics - a guide from sampling to data analysis. Microb Inform Exp 2, 3.

Thurber, R.V., Haynes, M., Breitbart, M., Wegley, L., Rohwer, F., 2009. Laboratory procedures to generate viral metagenomes. Nat Protoc 4, 470-483.

Truve, E., Fargette, D., 2012. Genus Sobemovirus. Virus Taxonomy Classification and Nomenclature of Viruses: ninth Report of the International Committee on Taxonomy of Viruses, 1185-1189.

Tzanetakis, I.E., Postman, J.D., Gergerich, R.C., Martin, R.R., 2006. A virus between families: nucleotide sequence and evolution of Strawberry latent ringspot virus. Virus research 121, 199-204.

USDA/ERS., 2013. Fruit and tree nuts yearbook tables: Berries: Blueberries.

182

Van Steenis, C.G.G.A., 1972. Flora Malesiana. Wolters-Noordhoff Publ. Co. Groningen, The Netherlands.

Varney, E., Raniere, L., 1960. Necrotic ringspot, a new virus disease of cultivated Blueberry. Phytopathology 50.

Varsani, A., Roumagnac, P., Fuchs, M., Navas-Castillo, J., Moriones, E., Idris, A., Briddon, R.W., Rivera-Bustamante, R., Zerbini, F.M., Martin, D.P., 2017. Capulavirus and Grablovirus: two new genera in the family Geminiviridae. Archives of Virology 162, 1819-1831.

Victoria, J.G., Kapoor, A., Li, L., Blinkova, O., Slikas, B., Wang, C., Naeem, A., Zaidi, S., Delwart, E., 2009. Metagenomic analyses of viruses in stool samples from children with acute flaccid paralysis. J Virol 83, 4642-4651.

Villamor, D., Mekuria, T., Pillai, S., Eastwell, K., 2016. High-Throughput sequencing identifies novel viruses in nectarine: insights to the etiology of stem-pitting disease. Phytopathology 106, 519-527.

Villanueva, F., Sabanadzovic, S., Valverde, R.A., Navas-Castillo, J., 2012. Complete genome sequence of a double-stranded RNA virus from avocado. J Virol 86, 1282-1283.

Vives, M.C., Velázquez, K., Pina, J.A., Moreno, P., Guerri, J., Navarro, L., 2013. Identification of a new enamovirus associated with citrus vein enation disease by deep sequencing of small RNAs. Phytopathology 103, 1077-1086.

Wegener, L.A., Punja, Z., Martin, R., Bernardy, M., MacDonald, L., 2006. Epidemiology and identification of strains of Blueberry scorch virus on highbush blueberry in British Columbia, Canada. Canadian journal of plant pathology 28, 250-262.

Williamson, J., Olmstead, J., Lyrene, P., 2012. Florida's commercial blueberry industry.

Williford, L., Savelle, A., Scherm, H., 2016. Effects of Blueberry red ringspot virus on Yield and Fruit Maturation in Southern Highbush Blueberry. Plant Disease 100, 171-174.

Woo, E., Pearson, M., 2014a. Comparison of complete nucleotide sequences and genome organization of six distinct cherry leaf roll virus isolates from New Zealand. Archives of virology 159, 3443-3445.

Woo, E., Pearson, M., 2014b. First Report of Strawberry latent ringspot virus in Vaccinium darrowii. Journal of Phytopathology 162, 820-823.

Woo, E., Ward, L., Pearson, M., 2013. First Report of Cherry leaf roll virus in Vaccinium darrowii. New Disease Reports 27.

Wood-Charlson, E.M., Weynberg, K.D., Suttle, C.A., Roux, S., van Oppen, M.J., 2015. Metagenomic characterization of viral communities in corals: mining biological signal from methodological noise. Environ Microbiol 17, 3440-3449.

183

Wu, Q., Ding, S.W., Zhang, Y., Zhu, S., 2015. Identification of viruses and viroids by next- generation sequencing and homology-dependent and homology-independent algorithms. Annu Rev Phytopathol 53, 425-444.

Wylie, S., Nouri, S., Coutts, B., Jones, M., 2010. Narcissus late season yellows virus and Vallota speciosa virus found infecting domestic and wild populations of Narcissus species in Australia. Archives of virology 155, 1171-1174.

Wylie, S.J., Li, H., Jones, M.G., 2014. Yellow tailflower mild mottle virus: a new tobamovirus described from Anthocercis littorea (Solanaceae) in Western Australia. Archives of virology 159, 791-795.

Wylie, S.J., Luo, H., Li, H., Jones, M.G., 2012. Multiple polyadenylated RNA viruses detected in pooled cultivated and wild plant samples. Arch Virol 157, 271-284.

Yanagisawa, H., Tomita, R., Katsu, K., Uehara, T., Atsumi, G., Tateda, C., Kobayashi, K., Sekine, K.-T., 2016. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses. Viruses 8, 70.

Zawar-Reza, P., Arguello-Astorga, G.R., Kraberger, S., Julian, L., Stainton, D., Broady, P.A., Varsani, A., 2014. Diverse small circular single-stranded DNA viruses identified in a freshwater pond on the McMurdo Ice Shelf (Antarctica). Infect Genet Evol 26, 132-138.

Zawar-Reza, P., Argüello-Astorga, G.R., Kraberger, S., Julian, L., Stainton, D., Broady, P.A., Varsani, A., 2014. Diverse small circular single-stranded DNA viruses identified in a freshwater pond on the McMurdo Ice Shelf (Antarctica). Infection, Genetics and Evolution 26, 132-138.

Zerbini, F.M., Briddon, R.W., Idris, A., Martin, D.P., Moriones, E., Navas-Castillo, J., Rivera- Bustamante, R., Roumagnac, P., Varsani, A., 2017. ICTV Virus Taxonomy Profile: Geminiviridae. Journal of General Virology 98, 131-133.

Zerbino, D.R., 2010. Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics Chapter 11, Unit 11 15.

Zerbino, D.R., Birney, E., 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research 18, 821-829.

Zhang, T., Breitbart, M., Lee, W.H., Run, J.-Q., Wei, C.L., Soh, S.W.L., Hibberd, M.L., Liu, E.T., Rohwer, F., Ruan, Y., 2005. RNA viral community in human feces: prevalence of plant pathogenic viruses. PLoS Biol 4, e3.

Zheng, Y., Gao, S., Padmanabhan, C., Li, R., Galvez, M., Gutierrez, D., Fuentes, S., Ling, K.-s., Kreuze, J., Fei, Z., 2017. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs. Virology 500, 130-138.

184

BIOGRAPHICAL SKETCH

Norsazilawati Saad was born in 1984 in Pulau Pinang, Malaysia. She received her bachelor’s degree in science, BSc (Hons) with a major in microbiology from the Universiti Sains

Malaysia in 2006. In 2008, she was hired as a tutor at the Universiti Putra Malaysia and received a scholarship from the Ministry of Higher Education to continue her postgraduate study. She then received her Master of Science degree specializing in plant virology from Universiti Putra

Malaysia in 2012 before pursuing her doctoral degree at the University of Florida in the following year. In 2017, she received her Doctor of Philosophy in plant pathology from the

University of Florida.

185