University of South Florida Scholar Commons

Graduate Theses and Dissertations Graduate School

12-31-2010 Discovery of Novel From Animals, Plants, and Insect Vectors Using Viral Metagenomics Terry Fei Fan Ng University of South Florida

Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the American Studies Commons, and the Other Oceanography and Atmospheric Sciences and Meteorology Commons

Scholar Commons Citation Ng, Terry Fei Fan, "Discovery of Novel Viruses From Animals, Plants, and Insect Vectors Using Viral Metagenomics" (2010). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/3506

This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected].

Discovery of Novel Viruses from Animals, Plants, and Insect Vectors

Using Viral Metagenomics

by

Terry Fei Fan Ng

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy College of Marine Science University of South Florida

Major Professor: Mya Breitbart, Ph.D. John Paul, Ph.D. David Mann, Ph.D. John Cannon, Ph.D. James Casey, Ph.D

Date of Approval: October 26, 2010

Keywords: Animal Viruses, Plant Viruses, Insect Vectors, Metagenomics, Sequencing

Copyright © 2010, Terry Fei Fan Ng

ACKNOWLEDGEMENTS

This dissertation work was performed through collaborations with colleagues as stated in each chapter and appendix.

The author would like to thank his Ph.D. supervisor Dr. Mya Breitbart, the

College of Marine Science, his family Wei Ming Ng, Fung Yee Hung, Tze Yan Ng, and the colleagues that have contributed towards this work.

TABLE OF CONTENTS

LIST OF TABLES ...... ii

LIST OF FIGURES ...... iii

ABSTRACT ...... iv

INTRODUCTION ...... 1

DISCUSSION ...... 10

REFERENCES ...... 22

APPENDIX 1: DISCOVERY OF A NOVEL SINGLE-STRANDED DNA FROM A SEA TURTLE FIBROPAPILLOMA USING VIRAL METAGENOMICS ...... 29

APPENDIX 2: NOVEL ANELLOVIRUS DISCOVERED FROM A MORTALITY EVENT OF CAPTIVE CALIFORNIA SEA LIONS...... 40

APPENDIX 3: VECTOR-ENABLED METAGENOMICS REVEALS HIGHLY DIVERSE AND UNCHARACTERIZED VERTEBRATE, PLANT, INSECT, AND BACTERIAL VIRUSES IN MOSQUITOES ...... 47

APPENDIX 4: EXPLORING THE DIVERSITY OF PLANT VIRUSES AND THEIR SATELLITES USING VECTOR-ENABLED METAGENOMICS ON WHITEFLIES ...... 82

i

LIST OF TABLES

Table 1: Classification of sequences in the insect vector viromes, showing the most significant matches based on tBLASTx analysis to the Genbank non-redundant database, the amino acid identities between the sequence queries and the Genbank matches, as well as the families and hosts of the matches ...... 20

ii

LIST OF FIGURES

Figure 1: Conceptual diagram showing the dynamics among viral reservoirs, such as animals, humans, native plants, crops, and insect vectors ...... 9

Figure 2: Taxonomic classification of the metagenomic sequences from the three mosquito DNA viromes and two whitefly DNA viromes ...... 18

Figure 3: Conceptual diagram showing how viral metagenomics performed on diseased organisms and insect vectors facilitates surveillance of viruses ...... 19

iii

ABSTRACT

Understanding emerging viruses is critical for disease monitoring and prediction; however, surveys of novel viruses are hindered by the lack of a universal assay for viruses. Viral metagenomics, consisting of viral particle purification and shotgun sequencing, is a powerful technique for discovering viruses in a wide variety of sample types. However, current protocols are not effective on tissue samples (e.g., lungs, livers and tumors), where they are hindered by the high amount of host nucleic acids which limits the percentage of sequences that originate from viruses. In this dissertation, a modified viral metagenomics protocol was developed and utilized to effectively purify viruses from tissues, enabling the sequencing of novel viruses from animals, plants, and insect vectors.

Viral metagenomics performed directly on tissue samples enabled the discovery of novel vertebrate, plant, insect and bacterial viruses. From a sea turtle fibropapilloma, viral metagenomics revealed a novel tornovirus STTV1, which is only the second single-stranded DNA virus known in reptiles and is extremely different from any previously described viruses. Similarly, from the lung of a sea lion involved in a mortality event, viral metagenomics identified a novel sea lion anellovirus (ZcAV), which is the first anellovirus characterized from a

iv marine animal. The STTV1 and ZcAV genomes were highly divergent from known viruses, to a degree that they could not have been detected by degenerate PCR assays or microarrays, demonstrating viral metagenomics as an effective method for characterizing novel viruses.

In addition to discovery of viruses in individual diseased animals, this dissertation pioneered a technique called vector-enabled metagenomics (VEM) to examine viruses present in insect vectors. VEM combines the power of metagenomics to sequence novel viruses with the ability of insect vectors to integrate viral diversity over space, time, and many host individuals and species.

VEM allows for the investigation of viral diversity among the broad range of hosts that the insects feed on, providing an unprecedented snapshot of the viral diversity in natural reservoirs. This dissertation describes the first viral metagenome performed on mosquitoes and whiteflies, providing significant insights to the viral diversity in animal and plant reservoirs. Both animal and plant viruses were represented in the mosquito viromes, which likely originate from animal blood and plant nectar that the mosquitoes feed on. Mosquito viromes contained a diverse range of viruses, including vertebrate, insect, plant, and bacterial viruses, and almost all the viral sequences were novel, suggesting the pan-animal virome is largely uncharacterized. In contrast, only plant viruses were observed in the whitefly viromes because whiteflies feed solely on plants.

Whitefly viromes contained known and novel viral sequences infecting crops, novel viral sequences infecting native plants, as well as novel satellites that were the first viral satellites to be documented in North America. Distinct viromes were

v found amongst the three mosquito samples as well as between the two whitefly samples, demonstrating the diverse and dynamic nature of the viruses in plant and animal reservoirs.

By enabling the discovery of virus in diseased organisms and in insect vectors, viral metagenomics is a powerful technique that will significantly enhance our fundamental scientific understanding of the diversity, transmission, biogeography, and emergence of viruses. The viral metagenomic approach described here has implications for surveillance of emerging viruses, prediction of viral epidemics, and proactive control of diseases.

vi

CHAPTER 1: INTRODUCTION

Emerging viral diseases cause epidemics in humans, animals and plants.

Emerging animal viruses often originate through interactions between animals and humans, either created by inter-specific transmission, or mediated by an insect vector (Fig. 1). Although animal viruses are the source of most human viral epidemics (Cleaveland et al., 2001; Wolfe et al., 2007), such as severe acute respiratory syndrome coronavirus (SARS-CoV) , human immunodeficiency virus

(HIV) and avian flu H5N1, these emerging viruses were not known in the animals until they became infectious to humans (Gao et al., 1999; Guan et al., 2002;

Guan et al., 2003). Inversely, a number of human viruses, such as metapneumovirus and influenza virus, cause infection in animals (de Jong et al.,

2007; Kaur et al., 2008; Yu et al., 2009), in a process called reverse zoonosis

(Fig. 1). Certain viruses (arboviruses) can be transmitted by insects (Mellor,

2000), causing severe diseases in humans (e.g. yellow fever, dengue and various encephalitis), in animals (e.g. bluetongue disease), or in both (e.g. Rift

Valley fever, West Nile encephalomyelitis). As of 2008, 188 human viruses have been described (Woolhouse et al., 2008). Of the viruses known to infect humans,

75% are zoonotic, which accounts for 80% of the emerging viruses (Taylor et al.,

1

2001). Considering the vast number of animal species, and the fact that each animal species likely hosts a broad range of viruses, the total reservoir of animal virus diversity is enormous. However, our knowledge of the diversity of viruses infecting non-human animals is severely limited. Understanding virus diversity and the dynamics amongst human, animal, and insect vector reservoirs is of critical importance for protecting public health.

Similarly, emerging plant viruses post serious threats to agriculture around the world, but our understanding of viral diversity in plants is extremely limited.

Plant viruses (such as begomoviruses) are limiting factors in the production of tomato, pepper, squash, melon, cotton in the subtropics and tropics, and have caused widespread famines in the developing world (Legg & Fauquet, 2004).

Surveillance of plant viruses is biased towards agents of visible and economically important diseases in crops. Little is known about the potentially emergent viruses in crops that have not yet made their presence known, or the viruses that are circulating in native plants (Fig. 1). However, these undiscovered viruses can provide a reservoir of biodiversity for recombination and reassortment with existing pathogenic plant viruses, and evolve into highly virulent pathogens

(Brown, 2000; Zhou et al., 1997; Zhou et al., 1998). Most known plant viruses are exclusively vector-transmitted (Blanc, 2004), so virus diversity in insect vectors such as whiteflies encompasses plant viruses that are potentially mobile amongst plants. Understanding viral diversity and dynamics between crops, native plants and insect vectors is important for monitoring emerging plant diseases (Fig.1).

2

One of the barriers to monitoring emerging viruses in different reservoirs is the lack of an effective method to characterize novel viruses. Current viral identification techniques have limited capability for characterizing novel viruses due to their specificity. Diagnostic methods such as ELISA with antibodies for a specific virus, PCR or microarrays with primers designed for a specific virus, or

PCR with degenerate primers that amplify a closely related group of viruses are effective methods for detecting close relatives of known viruses (e.g. Symonds et al., 2009; Wang et al., 2002). However, these methods are not capable of identifying novel viruses for which specific assays do not exist. Viral particle purification and shotgun sequencing (viral metagenomics) has recently emerged as an effective method for describing novel viruses, including those with limited homology to previously described viral families (reviewed in Delwart, 2007;

Edwards & Rohwer, 2005). In viral metagenomics, viral particles are purified from host cells, bacteria, and free nucleic acids. The purified viral DNA is then amplified using sequence-independent methods and shotgun sequenced. The proportion of virus sequences resulting from the total sequencing effort is dependent on the purity of the viruses. Therefore, effective purification of viruses is critical for viral metagenomics. Although viral metagenomics has the potential to characterize novel viruses in various animal and plant samples, most applications to date have been limited to samples that contain low host nucleic acids, such as nasal samples (Allander et al., 2007; Allander et al., 2005;

Nakamura et al., 2009; Willner et al., 2009), fecal samples (Blinkova et al., 2009;

Blinkova et al., 2010; Breitbart et al., 2008; Cann et al., 2005; Kapoor et al.,

3

2008; Nakamura et al., 2009; Victoria et al., 2009; Zhang et al., 2006) and blood

(Breitbart & Rohwer, 2005; Jones et al., 2005). Performing viral metagenomics directly on tissue samples (e.g., lungs, livers and tumors) has been hindered by the high amount of host nucleic acids, which limits the percentage of resulting sequences that originate from viruses. This dissertation describes a modification of current viral metagenomics protocols in order to effectively purify viruses from tissues, enabling the sequencing of novel viruses.

Although identifying etiological agents actively infecting individual animals is important, the best defense against emerging infectious diseases is to proactively understand the source of these viruses in animal and plant reservoirs

(Fig.1). By understanding natural viral diversity, we can better prepare for emerging viral pathogens before they reach epidemic proportions. However, broad surveys of natural viral diversity are technically challenging due to the inability to sample a sufficient number of individuals from different host species and the difficulty of characterizing previously undescribed viruses. An effective strategy must be capable of simultaneously identifying a wide range of viral types in a large number of individuals. Since whiteflies and female mosquitoes feed on a wide range of animals and plants (Fig.1), they effectively sample numerous important viral reservoirs. In this dissertation, I have developed the use of vector- enabled metagenomics (VEM) to assess the diversity of viruses found in insect vectors and the hosts they feed upon. This method combines the power of metagenomics for discovering novel viruses with the natural ability of insect vectors to integrate viral diversity over space, time, and many hosts.

4

This Ph.D. dissertation describes the use of viral metagenomics to discover novel viruses in

 A sea turtle fibropapilloma (Chapter 2). Fibropapillomatosis (FP) is a

debilitating neoplastic disease affecting sea turtles worldwide (Greenblatt et

al., 2005). As all species of marine turtles are either endangered or

threatened, the additional stress of FP poses a potential threat to the future

survival of sea turtles. Although current evidence strongly supports the

involvement of an alphaherpesvirus, fibropapillomatosis-associated turtle

herpesvirus (FPTHV) (Lackovich et al., 1999; Lu et al., 2000; Quackenbush

et al., 2001; Quackenbush et al., 1998; Yu et al., 2001), the etiological agent

responsible for fibropapillomatosis is still unconfirmed. Viral metagenomics

was performed to determine if any other DNA viruses (in addition to FPTHV)

were found in the fibropapilloma of a Florida green sea turtle (Chelonia

mydas). A novel single-stranded DNA virus, sea turtle tornovirus 1 (STTV1),

was discovered. The single-stranded, circular genome of STTV1 was

approximately 1800 nucleotides in length. STTV1 only has weak amino acid

level identities (25%) to Chicken Anemia Virus (CAV) in short regions of its

genome, hence STTV1 may represent the first member of a novel virus

family. A total of 35 healthy turtles and 27 turtles with FP were tested for

STTV1 using PCR, and only two turtles severely afflicted with FP were

positive. Since STTV1 was not found in the majority of the FP turtles, it does

not appear to be the causative agent of FP. However, the biological effects

of this virus need further examination, since the affected turtles were

5

systemically infected with STTV1. STTV1 exists as a quasispecies, with

several genome variants identified in the fibropapilloma of each positive

turtle, suggesting rapid evolution of this virus.

 A sea lion lung (Chapter 3). Mortality events amongst marine mammals are

common, yet the etiological agents often remain unidentified. To investigate

potential viral pathogens associated with a mortality event of three captive

California sea lions (Zalophus californianus), viral metagenomics was

performed on a lung sample from a post-mortem captive California sea lion.

This study identified a novel California sea lion anellovirus (ZcAV), with 35%

amino acid identity to feline anelloviruses in the ORF1 region. The double-

stranded replicative form of ZcAV was detected in lung tissue, suggesting

that ZcAV replicates in sea lion lungs. Specific PCR revealed the presence

of ZcAV in the lung tissue of all three sea lions involved in the mortality

event, but not in three other sea lions from the same zoo. In addition, ZcAV

was detected at low frequency (11%) in the lungs of free-living sea lions.

The higher prevalence of ZcAV and presence of the double-stranded

replicative form in the lungs of sea lions from the mortality event suggest that

ZcAV was associated with the death of these animals.

 Animal insect vector - mosquitoes (Chapter 4). Understanding viral

diversity is critical for disease monitoring and prediction; however, broad

surveys are hindered by the lack of a universal assay for viruses and the

inability to sample a sufficient number of individual hosts. This chapter

describes the development of a new approach, called vector-enabled

6 metagenomics (VEM) to explore the diversity of viruses present in animal and plant reservoirs. This method combines the power of metagenomics for discovering novel viruses with the natural ability of mosquitoes to integrate viral diversity over space, time, and many hosts. Viruses were purified from three mosquito samples collected from San Diego, California and approximately half a million sequences were generated for each mosquito

DNA virome. The majority of the sequences were novel, suggesting that the viral community in mosquitoes, as well as the animal and plant hosts they feed on, is highly diverse and largely uncharacterized. The mosquito viromes contained a broad range of sequences related to animal, plant, insect and bacterial viruses. Animal viruses identified included representatives from the

Anelloviridae, , , and families, which mosquitoes may have obtained from viremic vertebrate hosts during blood feeding. One virome contained sequences closely related to human papillomavirus 23, as well as a novel papillomavirus. Sequences related to plant viruses ( / ) were identified in all three viromes, which were potentially acquired when the mosquitoes fed on plant nectar. Diverse bacteriophages and insect viruses were identified, including a novel densovirus likely infecting Culex erythrothorax. The three mosquito viromes had distinct virus profiles, demonstrating heterogeneity in local virus reservoirs and the advantages of this technique for sampling the total viral community.

7

 Plant insect vector - whiteflies (Chapter 5). Current knowledge of plant

virus diversity is biased towards agents of visible and economically important

diseases. Little is known about viruses that have not caused major diseases

in crops, or viruses from native vegetation, which are a reservoir of

biodiversity that can contribute to viral emergence. Since the majority of

known plant viruses are exclusively vector-transmitted (Blanc, 2004),

examination of insect vectors presents an excellent opportunity to explore

the diversity of plant viruses. The whitefly Bemisia tabaci species complex is

an important vector of many plant viruses. Vector-enabled metagenomics

was employed to examine the DNA viruses circulating in whiteflies collected

from two important agricultural regions in Florida, USA. VEM successfully

characterized the active and abundant viruses that produce disease

symptoms in crops, as well as the less abundant viruses infecting adjacent

native vegetation. PCR assays designed from the metagenomic sequences

enabled the complete sequencing of four novel begomovirus genome

components, as well as the first discovery of plant virus satellites in North

America. One of the novel begomoviruses was subsequently identified in

symptomatic Chenopodium ambrosiodes from the same field site, validating

VEM as an effective method for proactive monitoring of plant viruses without

a priori knowledge of the pathogens.

8

Figure 1. Conceptual diagram showing the dynamics among viral reservoirs, such as animals, humans, native plants, crops, and insect vectors. Mosquitoes and whiteflies are examples of the insect vectors for animal and plant viruses respectively.

9

CHAPTER 2: DISCUSSIONS

This dissertation developed methods for performing viral metagenomics directly from tissue samples, which was applied to several animals and insect vectors, resulting in the discovery of a diverse range of novel vertebrate, plant, insect and bacterial viruses. Current viral identification techniques detect viruses using assays designed based on known sequences. While they can successfully detect close relatives of previously described viruses (e.g. Symonds et al., 2009;

Wang et al., 2002), these methods are ineffective for discovery of novel viruses with divergent genomes. As a sequence-independent method, viral metagenomics overcomes the limitations of current diagnostic methods for describing novel viruses, and is not restricted by host organism, tissue type, or prior knowledge of the virus types present.

The two marine animal samples examined in this study using viral metagenomics emphasize the advantage of this method for discovery of divergent viruses. From a sea turtle fibropapilloma, viral metagenomics revealed a novel tornovirus STTV1 (Chapter 2). Sea turtle tornovirus 1 is only the second single-stranded DNA virus known in reptiles, and the genome of STTV1 is extremely different from any previously described viruses. Similarly, a novel sea lion anellovirus (ZcAV) was characterized using metagenomics from the lung of a

10 sea lion involved in a mortality event (Chapter 3). ZcAV is the first anellovirus described from a marine animal, and the virus was subsequently detected in other post-mortem sea lions in the same mortality event and other wild California sea lions. The STTV1 and ZcAV genomes were highly divergent from any current

Genbank viral sequences, to a degree where they would not have been detected by degenerate PCRs or microarrays, validating viral metagenomics as an effective method for characterizing novel viruses.

Although viral metagenomics is a powerful technique for describing novel viruses, one key difficulty in applying this method to tissues is to effectively remove the background host nucleic acids while preserving viral nucleic acids.

Previously published viral metagenomic techniques worked with samples containing low level of host nucleic acids, such as nasal samples (Allander et al.,

2007; Allander et al., 2005; Nakamura et al., 2009; Willner et al., 2009), fecal samples (Blinkova et al., 2009; Blinkova et al., 2010; Breitbart et al., 2008; Cann et al., 2005; Kapoor et al., 2008; Nakamura et al., 2009; Victoria et al., 2009;

Zhang et al., 2006) and blood (Breitbart & Rohwer, 2005; Jones et al., 2005).

This dissertation described an effective method to purify viruses directly from tissues without contamination from the high amount of background host nucleic acid by combining steps of centrifugation, filtration, and chloroform and nuclease treatments. With such methodological advances, we were able to describe novel viruses directly from tissues, such as tumors (Chapter 2) and lungs (Chapter 3).

Since metagenomics can be performed on pooled samples from many individuals, this method enables the exploration of viruses infecting a population

11 or species in a given region, which will be important for understanding viruses affecting ecologically or economically important species, including humans, endangered species, and crops.

Most previous metagenomic studies have characterized viruses from one host species at a time; therefore, the total virus diversity in animal and plant reservoirs remains largely unknown. Although understanding the complement of viruses present in natural reservoirs is critical for monitoring emerging viruses

(Fig. 1), achieving such a goal is challenging due to the difficulty of sampling a sufficient number of individuals and host species. In order to investigate the viral diversity in animal and plant reservoirs, this dissertation pioneered a technique called vector-enabled metagenomics (VEM), which combines the power of metagenomics to sequence novel viruses and the ability of insect vectors to integrate viral diversity over space, time, and many host individuals and species.

VEM allows for the investigation of viral diversity among the broad range of hosts that the insects feed on, providing an unprecedented snapshot of the viral diversity in natural reservoirs. Mosquito viromes (Chapter 4) contained a diverse range of viruses, including vertebrate, insect, plant, and bacterial viruses. This was the first viral metagenomic study performed on mosquitoes, and almost all the viral sequences were novel, suggesting the pan-animal virome is largely uncharacterized. Whitefly viromes (Chapter 5) contained known and novel viral sequences infecting crops, novel viral sequences infecting native plants, and novel satellites that were the first viral satellites to be documented in North

America. As the first viral metagenome performed on whiteflies, this study

12 demonstrated a diverse range of geminiviruses and satellites present in whiteflies. Both studies validate the effectiveness of VEM in characterizing viral diversity in insect vectors, providing a snapshot of the viral diversity in animal and plant reservoirs.

VEM revealed fundamental differences in the viral diversity represented in mosquitoes and whiteflies (Table 1, Fig. 2). The majority of the mosquito virome sequences (48%-80%) had no significant similarities to the Genbank non- redundant database, representing uncharacterized viral genomes too divergent to be identified (Fig. 2). This is consistent with other DNA viromes from animals

(e.g., Breitbart & Rohwer, 2005) or environmental samples (e.g., Breitbart et al.,

2002), which are dominated by „unknown‟ sequences with no significant homology to characterized genes. Most identified viral sequences in the mosquito viromes had only low levels of amino acid identities to known viruses

(Table 1), suggesting a highly novel and diverse viral community sampled by the mosquitoes. In contrast to the mosquito viromes, the whitefly viromes had an extremely high percentage of identifiable viral sequences. More than 79% of the sequences in each virome had both high amino acid (Fig. 2) and nucleotide identities to known plant viruses. It should be noted that the ability to identify a virus depends on the available sequences in the database. The whitefly- transmitted geminiviruses have been studied extensively (Brown, 1994; Brown &

Bird, 1992; Mansoor et al., 2003; Polston & Anderson, 1997), with more than two hundred whitefly-transmitted viruses described (Fauquet et al., 2008). On the other hand, most mosquito virus studies have focused on detecting specific, well-

13 described RNA arboviruses (such as Huang et al., 2001; Kuno, 1998), and relatively little is known about DNA viruses in mosquitoes, except for a few animal papillomavirus and poxviruses (Brody, 1936; Chihota et al., 2001; Dalmat,

1958; Fenner et al., 1952; Kilham & Dalmat, 1955; Tripathy et al., 1981). The discrepancy between the mosquito and whitefly viromes can be explained by the differences in current knowledge of DNA viruses in whiteflies versus mosquitoes.

Although the mosquito viromes were dominated by unknown sequences

(Fig. 2), ongoing advancement of virus discovery can help elucidate the identities of these sequences. For example, some of the sequences identified in the mosquito viromes only had significant similarities to HPV112, and were unrelated to other HPV types (Chapter 5). Prior to the recent discovery of HPV112

(Ekström et al., 2010), these sequences had no significant BLAST similarities, and were considered “unknown”. This demonstrates that increasing knowledge of virus diversity has the potential to reveal the identities and hosts of many of the unknown sequences identified through metagenomics in the future.

Both animal and plant viruses were represented in the mosquito viromes

(Fig. 2, Table 1A). These viruses likely originate from animal blood and plant nectar that the mosquitoes feed on. In contrast, only plant viruses were observed in the whitefly viromes (Fig. 2, Table 1B) because whiteflies feed solely on plants.

Since the Genbank database is biased towards viral sequences from humans and important crops, recognition of these viruses is more likely than recognition of viruses from other animals and native plants. For example, although the mosquitoes feed on both humans and animals, the only vertebrate viruses

14 identified with high sequence identities were the human papillomaviruses, and the other animal viruses identified only had weak amino acid level identities.

Each insect vector virome investigated had a distinct virus profile.

Distinct viromes were found amongst the three mosquito samples (Chapter 4 and

Table 1), as well as between the two whitefly samples (Chapter 5 and Table 1).

Differences in virus profiles may be due to changes in the viral community over space or time, driven by interactions of the insect vectors with hosts in their local environments. Regardless of the factors driving the differences amongst viromes, these data demonstrate the dynamic nature of the viruses in the natural reservoirs, and that the insect vectors effectively sample the heterogeneity in the circulating viral community.

Virus discovery is the first step and often the limiting step in disease surveillance and preventive medicine (Fig. 3). Current diagnostic tests, such as

PCR, qPCR and microarrays, excel at monitoring for known viruses, but have limited capacity for characterizing novel viruses. Viral metagenomics overcomes many difficulties of virus discovery, and once the virus is characterized, this enables the development of specific tests for the novel viruses, allowing for further investigation (Fig. 3). Because metagenomics provides sequence information, numerous tests (PCR, qPCR and microarrays) can be designed and performed downstream to monitor the viruses and determine their prevalence in healthy and diseased populations. The presence of a viral sequence does not establish a link between the virus and disease; however, knowing the sequence from a putative pathogen can facilitate diagnostic tests such as in situ

15 hybridization and replicative form analysis, to connect viral infection with disease symptoms. According to the , group II ssDNA viruses, group IV (+) ssRNA viruses, group VI (+) ssRNA viruses and group VII dsDNA viruses all have replicative form intermediates different from the packaged viral nucleic acids (Fauquet et al., 2005). Analysis of the replicative forms using molecular techniques can reveal whether a virus is actively replicating in the diseased tissue. Furthermore, in situ hybridization enables the visualization of the viruses in microscopy using a probe specific to the virus, and when combined with histological analysis, can link a virus to disease symptoms (Weissenböck et al., 2004). The unprecedented capability of viral metagenomics for viral discovery and its power to facilitate other studies (Fig. 3) will significantly advance our understanding of the diversity, reservoirs, and transmission mechanisms of viruses.

Multiple viruses (tens to hundreds; Chapter 4) can be sequenced in a single experiment using metagenomics, rather than tests that assay for a single virus at a time and without a priori knowledge of the expected virus types. To circumvent problems of sampling a sufficient number of individual hosts, examining insects using VEM enables the investigation of viral diversity in animal, plant and insect vector reservoirs (Fig. 3 bottom, Fig.1). Such investigations reveal the viruses that are potentially circulating between these important reservoirs (Fig. 1). For example, HPV23 and a novel papillomavirus were characterized in the mosquito virome (Chapter 4), suggesting that human papillomaviruses can be acquired by mosquitoes. Understanding the viral

16 diversity in different reservoirs enables us to foresee potential pathogenic viruses before they reach epidemic proportions. For example, viral metagenomics enabled the discovery of a novel begomovirus from whiteflies in Florida, which was subsequently identified in the symptomatic weed Chenopodium ambrosiodes from the same field site (Chapter 5). Viral metagenomic approaches present a significant advancement from current reactive approaches to viral epidemics that don‟t identify a pathogen until it causes major detrimental effects.

By enabling the simultaneous discovery of a wide range of viruses, viral metagenomics is a powerful technique that will significantly enhance our fundamental scientific understanding of the diversity, transmission, biogeography, and emergence of viruses (Fig. 1). Viral metagenomics performed on individual animals will identify novel viruses potentially involved in diseases

(Chapter 2 and 3), while VEM performed on insect vectors will provide a snapshot of the pan-animal and pan-plant viromes (Chapter 4 and 5). By understanding the viruses in different systems and monitoring the movement between them (Fig. 1), viral metagenomics has the potential to transform the field of virus surveillance, leading to a paradigm shift from reactive to proactive biodefense (Fig. 3).

17

Figure 2. Taxonomic classification of the metagenomic sequences from the three mosquito DNA viromes and two whitefly DNA viromes. Classification is based on tBLASTx (E-value < 0.001) against the Genbank non-redundant database. Mosquito samples were obtained from 3 sites in San Diego: Buena Vista Lagoon (Mosquito SD-BVL), River Bank (Mosquito SD-RB), and Wild Animal Park (Mosquito SD-WAP). Approximately half a million 454 sequences were generated for each of the mosquito viromes. Whitefly samples were obtained from 2 sites in Florida: Homestead and Citra. Less than 200 Sanger sequences were generated for each of the whitefly viromes.

18

Figure 3. Conceptual diagram showing how viral metagenomics performed on diseased organisms and insect vectors facilitates surveillance of viruses.

19

Table 1A Most significant matches from Genbank Maximum a.a. % similarity Family Hos t Virus Mos SD-BVL Mos SD-RB Mos SD-WAP

Human papillomavirus type 23 91 Papillomaviridae Human Human papillomavirus type 112 76 SEN virus 32 Human Torque teno virus 48 Torque teno douroucouli virus 48 Primate Anellovirirdae Torque teno tupaia virus 40 Bovine Torque teno virus 69 Other Mammal California sea lion anellovirus 58 Torque teno canis virus 45 Beak and feather disease virus 63 62 61 Canary circovirus 61 Columbid circovirus 66 Duck circovirus 43 Bird Goose circovirus 51 Gull circovirus 49 52 43 Circoviridae Muscovy duck circovirus 50 Raven circovirus 43 68 Cyclovirus Chimp11 50 Mammal feces Cyclovirus NG14 56 Cyclovirus PK5006 50 Environmental Circovirus-like genome 45 65 Poxviridae Bird Canarypox virus 48 Human Human herpesvirus 1 70 Herpesviridae Mammal Bovine herpesvirus 34 Ageratum yellow vein virus 54 Bean golden mosaic virus 69 Maize streak virus 53 Miscanthus streak virus 46 Pepper yellow leaf curl Indonesia virus 41 Geminiviridae Plant Soybean crinkle leaf virus 46 Squash leaf curl China virus 59 Tomato leaf curl New Delhi virus 54 Tomato leaf curl New Delhi virus 54 Blainvillea yellow spot virus 50 Banana bunchy top virus 53 Nanoviridae Plant Coconut foliar decay virus 51 Milk vetch dwarf virus 53 Animal/ Plant? Ageratum yellow vein virus 60 Other Plant/ Fungi? SsHADV-1 62 Table 1B Most significant matches from Genbank Maximum a.a. % similarity Family Hos t Virus Citra Homes tead

Anoda geminivirus 96 Cabbage leaf curl Jamaica virus 91 Calopogonium mucunoides begomovirus 100 Cucurbit leaf crumple virus 97 Cucurbit leaf curl virus 100 Macroptilium golden mosaic geminivirus 95 Macroptilium golden mosaic virus 86 Malvastrum yellow mosaic Helshire virus 99 Malvastrum yellow mosaic Jamaica virus 99 Merremia mosaic virus 100 Geminiviridae Plant Papaya leaf curl Guangdong virus 82 Sida golden mosaic Florida virus 96 Sida golden mosaic virus 98 100 Sida golden yellow vein virus 98 Sida yellow mosaic Yucatan virus 75 Tobacco leaf curl Cuba virus 96 Tobacco leaf rugose virus 86 Tomato severe rugose virus 77 Tomato yellow leaf curl Sardinia virus 100 Tomato yellow leaf curl virus 100 Wissadula golden mosaic St Thomas virus 88 Wissadula golden mosaic virus 88 Table 1. Classification of sequences in the insect vector viromes, showing the most significant matches based on tBLASTx analysis to the Genbank non-redundant database, the amino acid identities between the sequence queries and the Genbank matches, as well as the families and hosts of the 20 matches. Sequence identities were color coded based on the level of similarity. A) Classification of vertebrate and plant virus sequences present in the three San Diego mosquito viromes (Appendix 3). B) Classification of virus sequences present in the two whitefly viromes (Appendix 4).

21

REFERENCES

Allander, T., Andreasson, K., Gupta, S., Bjerkner, A., Bogdanovic, G., Persson, M. A. A., Dalianis, T., Ramqvist, T. & Andersson, B. (2007). Identification of a third human polyomavirus. J Virol 81, 4130-4136.

Allander, T., Tammi, M. T., Eriksson, M., Bjerkner, A., Tiveljung-Lindell, A. & Andersson, B. (2005). Cloning of a human parvovirus by molecular screening of respiratory tract samples. Proc Natl Acad Sci U S A 102, 12891-12896.

Blanc, S. (2004). Insect transmission of viruses. In Microbe-vector interactions in vector-borne diseases, p. 103. Edited by S. H. Gillespie, G. L. Smith & A. Osbourn. Cambridge: Cambridge University Press.

Blinkova, O., Kapoor, A., Victoria, J., Jones, M., Wolfe, N., Naeem, A., Shaukat, S., Sharif, S., Alam, M. M., Angez, M., Zaidi, S. & Delwart, E. L. (2009). Cardioviruses Are Genetically Diverse and Cause Common Enteric Infections in South Asian Children. J Virol 83, 4631-4641.

Blinkova, O., Victoria, J., Li, Y., Keele, B. F., Sanz, C., Ndjango, J.-B. N., Peeters, M., Travis, D., Lonsdorf, E. V., Wilson, M. L., Pusey, A. E., Hahn, B. H. & Delwart, E. L. (2010). Novel circular DNA viruses in stool samples of wild-living chimpanzees. J Gen Virol 91, 74-86.

Breitbart, M., Haynes, M., Kelley, S., Angly, F., Edwards, R. A., Felts, B., Mahaffy, J. M., Mueller, J., Nulton, J., Rayhawk, S., Rodriguez-Brito, B., Salamon, P. & Rohwer, F. (2008). Viral diversity and dynamics in an infant gut. Res Microbiol 159, 367-373.

22

Breitbart, M. & Rohwer, F. (2005). Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing. BioTechniques 39, 729-736.

Breitbart, M., Salamon, P., Andresen, B., Mahaffy, J. M., Segall, A. M., Mead, D., Azam, F. & Rohwer, F. (2002). Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci U S A 99, 14250-14255.

Brody, A. L. (1936). The transmission of fowl-pox. Cornell Univ Agric Exp Station Mem.

Brown, J. (1994). Current status of Bemisia tabaci as a pest and virus vector in world agroecosystems. FAO Plant Protection Bulletin 42, 3-32.

Brown, J. & Bird, J. (1992). Whitefly-transmitted diseases in the Americas and the Caribbean Basin: past and present. Plant Disease 76, 220-225.

Brown, J. K. (2000). Molecular markers for the identification and global tracking of whitefly vector-Begomovirus complexes. Virus Res 71, 233-260.

Cann, A. J., Fandrich, S. E. & Heaphy, S. (2005). Analysis of the virus population present in equine faeces indicates the presence of hundreds of uncharacterized virus genomes. Virus Genes 30, 151-156.

Chihota, C. M., Rennie, L. F., Kitching, R. P. & Mellor, P. S. (2001). Mechanical Transmission of Lumpy Skin Disease Virus by Aedes aegypti (Diptera: Culicidae). Epidemiol Infect 126, 317-321.

Cleaveland, S., Laurenson, M. K. & Taylor, L. H. (2001). Diseases of humans and their domestic mammals: pathogen characteristics, host range and the risk of emergence. Philos Trans R Soc B-Biol Sci 356, 991-999.

Dalmat, H. T. (1958). Arthropod transmission of rabbit papillomatosis. J Exp Med 108, 9-&. de Jong, J. C., Smith, D. J., Lapedes, A. S., Donatelli, I., Campitelli, L., Barigazzi, G., Van Reeth, K., Jones, T. C., Rimmelzwaan, G. F., Osterhaus, A. D. M. E. & Fouchier, R. A. M. (2007). Antigenic and

23

genetic evolution of swine influenza A (H3N2) viruses in Europe. J Virol 81, 4315-4322.

Delwart, E. L. (2007). Viral metagenomics. Rev Med Virol 17, 115-131.

Edwards, R. A. & Rohwer, F. (2005). Viral metagenomics. Nat Rev Microbiol 3, 504-510.

Ekström, J., Forslund, O. & Dillner, J. (2010). Three novel papillomaviruses (HPV109, HPV112 and HPV114) and their presence in cutaneous and mucosal samples. Virology 397, 331-336.

Fauquet, C., Briddon, R., Brown, J., Moriones, E., Stanley, J., Zerbini, M. & Zhou, X. (2008). Geminivirus strain demarcation and nomenclature. Archives of Virology 153, 783-821.

Fauquet, C. M., Mayo, M. A., Maniloff, J., Desselberger, U. & Ball, L. A., (eds) (2005). Virus taxonomy: classification and nomenclature of viruses. San Diego: Elsevier Academic Press.

Fenner, F., Day, M. & Woodroofe, G. (1952). The mechanism of the transmission of myxomatosis in the European rabbit (Oryctolagus cuniculus) by the mosquito Aedes aegypti. Aust J Exp Biol Med Sci 30, 139-152.

Gao, F., Bailes, E., Robertson, D. L., Chen, Y. L., Rodenburg, C. M., Michael, S. F., Cummins, L. B., Arthur, L. O., Peeters, M., Shaw, G. M., Sharp, P. M. & Hahn, B. H. (1999). Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397, 436-441.

Greenblatt, R. J., Quackenbush, S. L., Casey, R. N., Rovnak, J., Balazs, G. H., Work, T. M., Casey, J. W. & Sutton, C. A. (2005). Genomic variation of the fibropapilloma-associated marine turtle herpesvirus across seven geographic areas and three host species. J Virol 79, 1125-1132.

Guan, Y., Peiris, J. S. M., Lipatov, A. S., Ellis, T. M., Dyrting, K. C., Krauss, S., Zhang, L. J., Webster, R. G. & Shortridge, K. F. (2002). Emergence of multiple genotypes of H5N1 avian influenza viruses in Hong Kong SAR. Proc Natl Acad Sci U S A 99, 8950-8955. 24

Guan, Y., Zheng, B. J., He, Y. Q., Liu, X. L., Zhuang, Z. X., Cheung, C. L., Luo, S. W., Li, P. H., Zhang, L. J., Guan, Y. J., Butt, K. M., Wong, K. L., Chan, K. W., Lim, W., Shortridge, K. F., Yuen, K. Y., Peiris, J. S. M. & Poon, L. L. M. (2003). Isolation and Characterization of Viruses Related to the SARS Coronavirus from Animals in Southern China. Science 302, 276-278.

Huang, C., Slater, B., Campbell, W., Howard, J. & White, D. (2001). Detection of arboviral RNA directly from mosquito homogenates by reverse- transcription-polymerase chain reaction. J Virol Methods 94, 121-128.

Jones, M. S., Kapoor, A., Lukashov, V. V., Simmonds, P., Hecht, F. & Delwart, E. (2005). New DNA viruses identified in patients with acute viral infection syndrome. J Virol 79, 8230-8236.

Kapoor, A., Victoria, J., Simmonds, P., Slikas, E., Chieochansin, T., Naeem, A., Shaukat, S., Sharif, S., Alam, M. M., Angez, M., Wang, C. L., Shafer, R. W., Zaidi, S. & Delwart, E. (2008). A highly prevalent and genetically diversified Picornaviridae genus in South Asian children. Proc Natl Acad Sci U S A 105, 20482-20487.

Kaur, T., Singh, J., Tong, S., Humphrey, C., Clevenger, D., Tan, W., Szekely, B., Wang, Y., Li, Y., Alex Muse, E., Kiyono, M., Hanamura, S., Inoue, E., Nakamura, M., Huffman, M. A., Jiang, B. & Nishida, T. (2008). Descriptive epidemiology of fatal respiratory outbreaks and detection of a human-related metapneumovirus in wild chimpanzees (Pan troglodytes) at Mahale Mountains National Park, Western Tanzania. Am J Primatol 70, 755-765.

Kilham, L. & Dalmat, H. (1955). Host-virus-mosquito relations to Shope fribromas in cotton tail rabbits. Am J Hyg 61, 45-54.

Kuno, G. (1998). Universal diagnostic RT-PCR protocol for arboviruses. J Virol Methods 72, 27-41.

Lackovich, J. K., Brown, D. R., Homer, B. L., Garber, R. L., Mader, D. R., Moretti, R. H., Patterson, A. D., Herbst, L. H., Oros, J., Jacobson, E. R., Curry, S. S. & Klein, A. P. (1999). Association of herpesvirus with fibropapillomatosis of the green turtle Chelonia mydas and the loggerhead turtle Caretta caretta in Florida. Dis Aquat Org 37, 89-97.

25

Legg, J. P. & Fauquet, C. M. (2004). Cassava mosaic geminiviruses in Africa. Plant Mol Biol 56, 585-599.

Lu, Y., Wang, Y., Yu, Q., Aguirre, A. A., Balazs, G. H., Nerurkar, V. R. & Yanagihara, R. (2000). Detection of herpesviral sequences in tissues of green turtles with fibropapilloma by polymerase chain reaction. Arch Virol 145, 1885-1893.

Mansoor, S., Briddon, R., Zafar, Y. & Stanley, J. (2003). Geminivirus disease complexes: An emerging threat. Trends in Plant Science 8, 128-134.

Mellor, P. S. (2000). Replication of arboviruses in insect vectors. J Comp Pathol 123, 231-247.

Nakamura, S., Yang, C. S., Sakon, N., Ueda, M., Tougan, T., Yamashita, A., Goto, N., Takahashi, K., Yasunaga, T., Ikuta, K., Mizutani, T., Okamoto, Y., Tagami, M., Morita, R., Maeda, N., Kawai, J., Hayashizaki, Y., Nagai, Y., Horii, T., Iida, T. & Nakaya, T. (2009). Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach. PLoS ONE 4, 8.

Polston, J. & Anderson, P. (1997). The emergence of whitefly-transmitted geminiviruses of tomato in the Western Hemisphere. Plant Disease 81, 1358-1369.

Quackenbush, S. L., Casey, R. N., Murcek, R. J., Paul, T. A., Work, T. M., Limpus, C. J., Chaves, A., duToit, L., Perez, J. V., Aguirre, A. A., Spraker, T. R., Horrocks, J. A., Vermeer, L. A., Balazs, G. H. & Casey, J. V. (2001). Quantitative analysis of herpesvirus sequences from normal tissue and fibropapillomas of marine turtles with real-time PCR. Virology 287, 105-111.

Quackenbush, S. L., Work, T. M., Balazs, G. H., Casey, R. N., Rovnak, J., Chaves, A., duToit, L., Baines, J. D., Parrish, C. R., Bowser, P. R. & Casey, J. W. (1998). Three closely related herpesviruses are associated with fibropapillomatosis in marine turtles. Virology 246, 392-399.

Symonds, E. M., Griffin, D. W. & Breitbart, M. (2009). Eukaryotic viruses in wastewater samples from the United States. Appl Environ Microbiol 75, 1402-1409. 26

Taylor, L. H., Latham, S. M. & woolhouse, M. E. J. (2001). Risk factors for human disease emergence. Philos Trans R Soc B-Biol Sci 356, 983-989.

Tripathy, D., Hanson, L. & Crandell, R. (1981). Poxviruses of veterinary importance: diagnosis of infections. In Comparative diagnosis of virus diseases. Edited by E. Kurstak & C. Kurstak. London: Academic Press.

Victoria, J. G., Kapoor, A., Li, L. L., Blinkova, O., Slikas, B., Wang, C. L., Naeem, A., Zaidi, S. & Delwart, E. (2009). Metagenomic analyses of viruses in stool samples from children with acute flaccid paralysis. J Virol 83, 4642-4651.

Wang, D., Coscoy, L., Zylberberg, M., Avila, P. C., Boushey, H. A., Ganem, D. & DeRisi, J. L. (2002). Microarray-based detection and genotyping of viral pathogens. Proc Natl Acad Sci U S A 99, 15687-15692.

Weissenböck, H., Bakonyi, T., Chvala, S. & Nowotny, N. (2004). Experimental Usutu virus infection of suckling mice causes neuronal and glial cell apoptosis and demyelination. Acta Neuropathol 108, 453-460.

Willner, D., Furlan, M., Haynes, M., Schmieder, R., Angly, F. E., Silva, J., Tammadoni, S., Nosrat, B., Conrad, D. & Rohwer, F. (2009). Metagenomic analysis of respiratory tract DNA viral communities in cystic ribrosis and non-cystic fibrosis individuals. PLoS ONE 4.

Wolfe, N. D., Dunavan, C. P. & Diamond, J. (2007). Origins of major human infectious diseases. Nature 447, 279-283.

Woolhouse, M. E. J., Howey, R., Gaunt, E., Reilly, L., Chase-Topping, M. & Savill, N. (2008). Temporal trends in the discovery of human viruses. Proceedings of the Royal Society B: Biological Sciences 275, 2111-2115.

Yu, H., Zhou, Y.-J., Li, G.-X., Zhang, G.-H., Liu, H.-L., Yan, L.-P., Liao, M. & Tong, G.-Z. (2009). Further evidence for infection of pigs with human-like H1N1 influenza viruses in China. Virus Res 140, 85-90.

Yu, Q. G., Hu, N. J., Lu, Y. N., Nerurkar, V. R. & Yanagihara, R. (2001). Rapid acquisition of entire DNA polymerase gene of a novel herpesvirus from

27

green turtle fibropapilloma by a genomic walking technique. J Virol Methods 91, 183-195.

Zhang, T., Breitbart, M., Lee, W. H., Run, J. Q., Wei, C. L., Soh, S. W. L., Hibberd, M. L., Liu, E. T., Rohwer, F. & Ruan, Y. J. (2006). RNA viral community in human feces: prevalence of plant pathogenic viruses. PLoS Biol 4, 108-118.

Zhou, X., Liu, Y., Calvert, L., Munoz, C., Otim-Nape, G. W., Robinson, D. J. & Harrison, B. D. (1997). Evidence that DNA-A of a geminivirus associated with severe cassava mosaic disease in Uganda has arisen by interspecific recombination. J Gen Virol 78, 2101-2111.

Zhou, X., Liu, Y., Robinson, D. J. & Harrison, B. D. (1998). Four DNA-A variants among Pakistani isolates of cotton leaf curl virus and their affinities to DNA-A of geminivirus isolates from okra. J Gen Virol 79, 915- 923.

28

Appendix 1: Discovery of a novel single-stranded DNA virus from a sea

turtle fibropapilloma using viral metagenomics

This appendix was published in:

Terry Fei Fan Ng, Charles Manire, Kelly Borrowman, Tammy Langer, Llewellyn

Ehrhart, Mya Breitbart. 2009. Discovery of a novel single-stranded DNA

virus from a sea turtle fibropapilloma using viral metagenomics. Journal of

Virology, 83: 2500-2509

29

30

31

32

33

34

35

36

37

38

39

Appendix 2: Novel anellovirus discovered from a mortality event of captive

California sea lions

This appendix was published in:

Terry Fei Fan Ng, William Suedmeyer, Frances Gulland, Elizabeth Wheeler, and

Mya Breitbart. 2009. Novel anellovirus discovered from a mortality event

of captive California sea lions. Journal of General Virology, 90:1256 -1261

40

41

42

43

44

45

46

Appendix 3: Vector-enabled metagenomics reveals highly diverse and

uncharacterized vertebrate, plant, insect, and bacterial viruses in

mosquitoes

This appendix is in review:

Terry Fei Fan Ng, Dana Willner, Yan Wei Lim, Robert Schmieder, Betty Chau,

Christina Nilsson, Simon Anthony, Yijun Ruan, Forest Rohwer, Mya

Breitbart. 2010. Vector-enabled metagenomics reveals highly diverse and

uncharacterized vertebrate, plant, insect, and bacterial viruses in

mosquitoes.

47

Highly Diverse Vertebrate, Plant, Insect, and Bacterial Viruses in Mosquitoes Revealed by Vector-enabled Metagenomics

Terry Fei Fan Ng1, Dana L. Willner2, Yan Wei Lim2,4, Robert Schmieder3, Betty Chau2, Christina Nilsson4, Simon Anthony5, Yijun Ruan4, Forest Rohwer2, Mya Breitbart1

1 College of Marine Science, University of South Florida, St Petersburg, FL, USA 2 Department of Biology, San Diego State University, San Diego, CA, USA 3 Computational Science Research Center, San Diego State University, San Diego, CA, USA 4 Genome Institute of Singapore, Singapore 5 Wildlife Disease Labs, San Diego Zoo‟s Institute for Conservation Research, San Diego, CA, USA

Corresponding author: Mya Breitbart College of Marine Science 140 7th Ave S, Saint Petersburg, FL, USA, 33701 [email protected]

Running title: Vector-enabled metagenomics of mosquitoes

48

Abstract Understanding viral diversity is critical for disease monitoring and prediction; however, broad surveys are hindered by the lack of a universal assay for viruses and the inability to sample a sufficient number of individual hosts. Here we report the use of vector-enabled metagenomics (VEM) to provide a snapshot of total viral diversity, by combining the power of metagenomics for discovering novel viruses with the natural ability of mosquitoes to integrate viral diversity over space, time, and multiple hosts. DNA viromes were sequenced from three mosquito samples from San Diego, California. The majority of the sequences were novel, suggesting that the viral community in mosquitoes, as well as the animal and plant hosts they feed on, is highly diverse and largely uncharacterized. The mosquito viromes contained sequences related to a broad range of animal, plant, insect and bacterial viruses. Animal viruses identified included anelloviruses, circoviruses, herpesviruses, poxviruses, and papillomaviruses, which mosquitoes may have obtained from vertebrate hosts during blood feeding. Notably, human papillomavirus sequences were present in one of the mosquito samples. Sequences similar to plant viruses were identified in all mosquito viromes, which were potentially acquired through feeding on plant nectar. Numerous bacteriophages and insect viruses were also detected, including a novel densovirus likely infecting Culex erythrothorax. The distinct virus profiles of the three mosquito viromes demonstrated significant heterogeneity in local virus reservoirs. By accessing virus diversity through sampling insect vectors, VEM can transform current approaches for emerging virus surveillance, prediction of viral epidemics, and proactive control of vector- borne diseases.

49

Author Summary Understanding virus diversity is important for protecting animal and human health, but characterizing the viral community is challenging due to the vast number of species and individuals to sample. In this paper, we pioneer the vector-enabled metagenomics (VEM) approach, which involves purification of viral particles from insect vectors and metagenomic sequencing of these viruses. Because female mosquitoes feed on animal blood and plant nectar, they effectively integrate viral diversity over space, time, and many hosts. Here we use VEM to describe DNA viruses in three mosquito samples from San Diego, California. The majority of the sequences were novel, suggesting that the viral community in mosquitoes, as well as the animal and plant hosts they feed on, is highly diverse and largely uncharacterized. New animal viruses identified included members from the , Circoviridae, Herpesviridae, Poxviridae and Papillomaviridae families, which mosquitoes may have obtained from viremic vertebrate hosts during blood feeding. In addition to animal viruses, the mosquito samples also contained a broad range of previously described and novel plant, insect, and bacterial viruses. This is the first study to access virus diversity using metagenomics of insect vectors, demonstrating the potential of VEM for transforming the field of virus surveillance.

50

Introduction Understanding the total diversity of animal viruses is critical for disease surveillance; however, broad surveys of natural viral diversity are technically challenging due to the inability to sample a sufficient number of individuals from different host species and the difficulty of characterizing previously undescribed viruses. An effective strategy for exploring the pan-animal virome would need to simultaneously identify a wide range of viral types in a large number of individuals. Here we describe the use of vector-enabled metagenomics (VEM) to assess the diversity of viruses found in insect vectors and the hosts they feed upon. This method combines the power of metagenomics for discovering novel viruses with the natural ability of insect vectors to integrate viral diversity over space, time, and many hosts. Since female mosquitoes draw blood from a wide range of vertebrate hosts including humans, non-human primates, other mammals and birds [1], and also feed on plant nectar, they effectively sample numerous important viral reservoirs.

Viruses present in mosquitoes can include viruses that are biologically or mechanically transmitted by these vectors, as well as other viruses that are not transmitted by mosquitoes but are drawn indiscriminately from host reservoirs. To date, the majority of mosquito virus studies have focused on the detection of specific, well-described RNA arboviruses [i.e., 2,3]. Characterizing new viruses is difficult due to limitations of current detection methods [4]. Many viruses cannot be cultured in the laboratory, and methods such as degenerate PCR and pan- viral microarrays are ineffective for discovering novel viruses that share limited homology to known viruses in the database [5]. To circumvent these issues, recent studies have demonstrated the effectiveness of viral particle purification and shotgun sequencing (viral metagenomics) for describing novel viruses [reviewed in 4,6]. Using viral metagenomics, novel viruses have been characterized from nasopharyngeal aspirates [e.g., 7], fecal samples [e.g., 8,9], blood [10,11], and tissue samples such as lungs [12,13] and tumors [14]. However, to date, no published studies have applied metagenomic sequencing to explore the diversity of viruses present in mosquitoes.

In this study, we developed a unique method for characterizing viral diversity that capitalizes upon the ability of insect vectors to feed on a variety of hosts and the power of metagenomic sequencing for discovering novel viruses. This approach, called vector-enabled metagenomics (VEM), provides a snapshot of the viral diversity circulating in a region, including viruses that infect the mosquitoes themselves, as well as viruses infecting the many animals and plants that the mosquitoes feed upon. Sequencing of the DNA viral communities (viromes) in three different mosquito samples from San Diego, California resulted in the discovery of numerous novel animal, plant, insect, and bacterial virus sequences, demonstrating the effectiveness of the VEM approach for virus surveillance.

51

Results and Discussion Novel and largely unexplored virus sequences in mosquitoes For each of the three mosquito viromes, approximately half a million sequences were generated from purified viral DNA (Table S1). Based on the most significant tBLASTx similarities, the mosquito viromes contained sequences related to a wide range of animal, insect, plant, and bacterial viruses (Fig. 1B). Sequences with nucleotide-level identity to previously described viruses were limited to mosquito densoviruses, human papillomavirus 23 (HPV23), and a few phages (95-100% identities, Tables S3 and S4). The majority of the virome sequences were completely unknown and most recognizable viral sequences had only low levels of similarity to known viruses (32-70% amino acid identity), suggesting a highly novel and diverse viral community sampled by the mosquitoes.

Unclassified sequences likely represent novel viruses The majority of the sequences in all mosquito viromes were completely unidentifiable based on sequence similarity (>48%, Fig. 1A), which is consistent with other viromes [11,13,14,15]. This suggests that the reservoir of viruses present in the mosquitoes is novel and largely unexplored. Since sequencing was performed on purified virus particles, these divergent sequences likely originated from uncharacterized viral genomes. However, ongoing advancement of animal virus discovery can help elucidate the identities of these virus sequences found in mosquitoes. For example, Mosquito VEM Papillomavirus - SDRB AE – AG only have significant similarities to HPV112, and are unrelated to other HPV types (see below). Prior to the discovery of HPV112 from human skin within the past few months [16], these contigs had no significant BLAST similarities, and were considered “unknown”. This example demonstrates that many of the unidentifiable sequences likely represent novel viruses that are too divergent from known viruses to be recognized by sequence similarity searches. Increasing knowledge of animal and plant virus diversity has the potential to reveal the identities and hosts of these unknown viral sequences in the future.

Distinct viromes of the three mosquito samples Each mosquito virome contained BLAST similarities to a different complement of viruses. The distinct viromes were confirmed by specific PCR showing that viruses could generally only be amplified from the sample where they were originally identified (Table S2). Two exceptions to this trend were the densoviruses, which were present in all samples, and Mosquito VEM Anellovirus – SDWAP B, which was detected in both of the samples examined by PCR.

In addition, cross-BLASTn analysis was used to determine the percent of sequences shared between the three metagenomes (Table S5). Mosquito SD- RB and SD-WAP had few common sequences (5%), while each shared slightly more sequences with SD-BVL (12% and 11% respectively). Most sequences shared between samples were related to mosquito densoviruses, while unknown sequences and sequences similar to other viral genomes were less likely to be shared. This suggests that each mosquito virome shared a small core 52 component, largely composed of insect viruses infecting mosquitoes. The larger component of the virome, consisting of animal, plant and bacterial viruses, is more variable, and thus distinct between samples. This reflects the dynamic nature of the viral community sampled by the mosquitoes.

The distance between mosquito sampling sites is 30-50 km (Fig. S1), which is far greater than the average flight range for a host-seeking Culex erythrothorax mosquito [17,18]. The SD-WAP sample consisted exclusively of C. erythrothorax mosquitoes collected from an inland region of San Diego County in 2009, while both the SD-RB and SD-BVL mosquito samples consisted of mixed mosquito species drawn from coastal regions in 2006. Since mosquitoes draw blood within a radius of a few hundred meters, each mosquito sample should reflect a brief snapshot of the local animal virus diversity on that scale. The three mosquito samples were of different species composition and were collected in different locations at different times. Distinct viral profiles may be due to changes in the viral community over space or time, driven by interactions of the mosquitoes with hosts in their local environments. Regardless of the factors driving the differences between mosquito viromes, these data demonstrate the diverse and dynamic nature of the viral community sampled by the mosquitoes.

Animal viruses identified in mosquitoes Contiguous sequences (contigs) assembled from the mosquito viromes had similarities to five families of animal viruses, namely Anelloviridae, Circoviridae, Herpesviridae, Poxviridae and Papillomaviridae (Fig. 2), that infect a wide range of hosts including humans, primates, other mammals, and birds. Although other mosquito species can be specific in the hosts they feed on, Culex erythrothorax feed on a variety of mammals and birds opportunistically [18], allowing them to sample viruses from many different animal hosts. Although this is not an exhaustive investigation of total animal virus diversity, metagenomics performed on insect vectors with broad host ranges provides a way to elucidate a portion of the pan-animal virome.

Papillomaviruses Sequences related to novel and previously described papillomaviruses were identified in the Mosquito SD-RB virome. Numerous sequences had >95% nucleotide identity to human papillomavirus type 23 (HPV23) (Table S4). Comparison with the HPV23 genome revealed near-complete coverage from the metagenomic sequences (Fig. 3). Additionally, sequences related to human papillomavirus type 112 (HPV112) were identified. Mosquito VEM Papillomavirus - SDRB AE shared 91% nucleotide identity to the E1 gene of HPV112. Mosquito VEM Papillomavirus SDRB AF and AG (MosVemPapAG) shared only 76% and 71% nucleotide identities respectively to the minor capsid protein L2 gene of HPV112, and did not have any significant nucleotide identities to any other HPV types. Phylogenetic analysis based on this partial minor capsid protein region showed that MosVemPapAG is most closely related to HVP112, and belongs to the cutaneous gamma-papillomavirus genus (Fig. 4). Although we cannot confirm 53 the host of MosVemPapAG, it groups phylogenetically with other human papillomaviruses. All papillomavirus sequences identified in the mosquitoes belonged to the cutaneous groups (beta and gamma groups), suggesting that mosquitoes may acquire papillomaviruses from the host‟s skin during feeding.

Although more than 80% of normal human skin harbors papillomaviruses [19], human papillomaviruses have not previously been described in mosquitoes. This study is the first demonstration of a human papillomavirus (HVP23) in mosquitoes, and also provides evidence that mosquitoes can harbor novel papillomaviruses. Papillomaviruses were only identified in one of the mosquito samples, suggesting that these viruses may only be present in mosquitoes sporadically. It was previously noticed that mosquitoes can transmit rabbit papillomavirus [20]; however, further research is needed to determine the prevalence and transmission potential of different types of human papillomaviruses in mosquitoes.

Anelloviruses and Circoviruses All anellovirus sequences identified in the viromes were novel (<70% amino acid identity to known anelloviruses, Fig. 2 and Table S2), suggesting that the animal hosts the mosquitoes feed on contain largely uncharacterized anellovirus diversity. The complete genomes of four putative viral genomes were sequenced (Fig. 3). Phylogenetic analysis based on the complete nucleotide sequence of ORF1 placed SD-BVL and SD-RB anelloviruses into the Torque teno virus (TTV) group, but forming a distinct genetic lineage from other TTV sequences (Fig. 5). Anellovirus genomes from an individual sample were closely related to each other, but unique genomes were found in different mosquito samples. In addition to the complete genomes, partial contigs with similarity to bovine TTV, human TTV, and human SEN virus were identified (Fig. 2 and Table S2).

A diverse range of circoviruses was identified in the mosquito samples (Fig. 2 and Table S2). No significant pairwise nucleotide identity was shared between the replication genes of any of the circoviruses (data not shown), suggesting that these contigs represent distinct viral genomes. PCR assays for SD-BVL circovirus sequences were positive in the sample they originated from, but negative in the SD-WAP sample (Table S2), suggesting distinct circoviruses were present in each virome.

These results demonstrate that the pan-animal virome contains diverse and largely uncharacterized circoviruses and anelloviruses, which mosquitoes may routinely obtain from viremic hosts during blood feeding. Viruses belonging to the Circoviridae and Anelloviridae families contain small, circular, single-stranded DNA genomes, and are usually identified in blood [21,22]. Circoviruses are known to infect birds and pigs [reviewed in 23], and diverse circoviruses have been identified in aquatic environments [15,24,25], as well as in human, chimpanzee and bat feces [26,27,28,29].The identification of sequences similar to avian circoviruses in mosquitoes is interesting because birds are the reservoir 54 and secondary amplifying hosts of many mosquito-transmitted arboviruses, such as Eastern, Western, Japanese and St Louis equine encephalitis virus and West Nile virus [reviewed in 30]. Anelloviruses are known to infect humans, non- human primates, domestic animals and marine mammals [12,31,32,33], but the pathology of anelloviruses remains unknown [32,34].

Herpesvirus-like and Poxvirus-like sequences In sample SD-RB, four contigs with amino-acid-level sequence similarities to herpesviruses and poxviruses were identified (Fig. 2 and Table S2). However, these contigs are only short portions of the genomes so it is impossible to determine more details about their identities or hosts.

Plant viruses identified in mosquitoes Geminiviruses and Nanoviruses Sequences with similarities to plant viruses were consistently identified in the mosquito viromes (Fig. 2 and Table S2). All three viromes contained sequences related to geminiviruses, and sample SD-WAP had sequences related to nanoviruses. Mosquitoes are known to feed on plant nectar, indicating a potential source of these viruses. However, no plant viruses have been previously described in mosquitoes, so the ability of mosquitoes to transmit plant viruses still needs to be investigated through laboratory and field-based transmission studies. Other insect vectors that feed on plants, such as whiteflies, are known to transmit a diversity of plant viruses [35]. In a related study using VEM to examine the viral community in whiteflies, almost all of the viral sequences shared high levels of nucleotide identity with previously described plant geminiviruses (Ng et al. in review). In contrast, the plant virus sequences in the mosquito viromes showed only weak amino acid level identities to known viruses (46%-53%; Fig. 2). These sequences from mosquitoes may represent extremely novel plant viruses, or could be part of recombinant genomes infecting other hosts.

Insect viruses identified in mosquitoes and Poxviridae A diverse range of insect viruses was identified in the mosquito viromes (Table S3 and S4). The majority of the sequences were similar to mosquito densoviruses (DNVs), specifically Aedes albopictus densovirus (AalDNV) and Haemogogus equinus densovirus (HeDNV). Since H. equinus and A. albopictus mosquitoes are not indigenous to San Diego, these sequences most likely represent densoviruses that infect the sampled mosquito species, primarily C. erythrothorax. Using PCR targeting the NS1 gene region, we further investigated the presence of densoviruses in the SD-WAP sample, which contained exclusively C. erythrothorax mosquitoes. The 720 base pair sequence of the PCR product (Accession #GU810839) was closely related (96% nucleotide identity) to HeDNV (Fig. 6). This sequence (VEM Culex erythrothorax densovirus; VEMCeDNV) most likely represents a new mosquito densovirus that infects C. erythrothorax mosquitoes.

55

Densoviruses have been detected frequently in mosquito cell lines, and more rarely in wild-caught mosquitoes, where they are perpetuated through both horizontal and vertical transmission [36,37,38]. Densovirus infection is highly lethal in cell lines and early stage larvae, however, infection at the late stages of larval development generally leads to a persistent and transmissible viremia [36,38]. Mosquito densoviruses are stable vectors for transformation of mosquitoes [39,40,41], which has created interest in using these viruses for mosquito and malaria control, either directly as lethal agents or as possible carriers of transgenes [37]. Viral paratransgenesis takes advantage of the densoviruses to introduce genes that are lethal to mosquitoes or the pathogens that they carry. Viral paratransgenesis efforts can greatly benefit from the discovery of new densoviruses, such as those identified in this study. C. erythrothorax is the most common mosquito in San Diego County, and is suspected to be an emergent vector of West Nile Virus [42]. Further studies of VEMCeDNV will be necessary to determine its efficacy as a biocontrol agent for C. erythrothorax.

Many other insect viruses were also identified, but none were found in all three samples (Table S3). Sample SD-BVL had the highest diversity of densoviruses, followed by SD-RB, then by SD-WAP (Table S3). Many sequences showed less than 85% amino acid identities to known densoviruses and unclassified insect viruses (Table S3), suggesting the presence of many novel insect viruses in mosquitoes.

Phages identified in mosquitoes The mosquito viromes contained a large diversity of phage sequences (Table S3), including members from , , and , as well as unclassified phages. Most of the phage sequences found in the mosquitoes only shared amino acid identities to known phages. However, in sample SD-RB, numerous sequences had 100% nucleotide identities to Propionibacterium phage PAD42 and PA6, Acyrthosiphon pisum secondary endosymbiont phage (APSE) 1-6, and Enterobacteria phage lambda (Table S3), suggesting that these known phages (or closely related phages) were present in the mosquito SD-RB virome. The three mosquito viromes differed in terms of the types of phages with BLAST similarities (Table S3), suggesting that each sample had a distinct phage content.

Phages identified in the mosquito viromes may infect the bacterial flora of the mosquito or that of the hosts they have fed upon. Propionibacterium acnes, the host for Propionibacterium phage PAD42 and PA6, is a commensal bacterium of human skin, so it is possible that mosquitoes acquire this bacterium and its phages during blood feeding [42]. On the other hand, sequences with identities to phage APSE might infect endosymbiotic bacteria of mosquitoes. Phage APSE-1 through 6 infect Hamiltonella defensa, an endosymbiont of aphids and other sap- feeding insects that protects the aphids from wasp attack by killing the developing wasp larvae [43]. Phage APSE-3 carries a toxin-encoding gene that provides the endosymbiont with defense against wasp larvae [44], and other 56

APSE phages are also known to encode toxin genes [45,46]. Mosquitoes are not known to be hosts for parasitic wasps, but endosymbiotic bacteria such as Wolbachia are known to infect mosquitoes and interfere with the reproductive biology of their host through cytoplasmic incompatibility [47]. Identification of sequences with high nucleotide identity to Phage APSE in this study suggests that mosquitoes potentially harbor other endosymbiotic bacteria and their phages, possibly to increase mosquito survivorship, or that of their eggs, through deterring predation.

This is the first study demonstrating the presence of a broad range of phages in mosquitoes, and distinct phage profiles for each mosquito virome. A diversity of phages with different inferred bacterial hosts was observed in the mosquitoes. Surveys based on the 16S rRNA gene support the notion of high bacterial diversity in mosquitoes [48], but to date, no metagenomic studies have examined the bacterial communities associated with mosquitoes. Future investigation of the role of bacteria and phages in mosquitoes is important, as they likely affect the host‟s physiology and fitness.

Viruses with unique genome organizations identified in mosquitoes Two complete genomes were identified that contained features from different virus families. In sample SD-RB, genome Mosquito VEM CircoNanoGeminivirus - SDRB AJ showed a combination of features from the Circoviridae (animal virus), Nanoviridae (plant virus) and Geminiviridae (plant virus) families (Fig. 3). ORF2 of this virus showed similarity to the pfam viral replication protein 02407 superfamily, and had BLASTp hits to the replication protein from both the Circoviridae (29% amino acid identity to Porcine circovirus 2) and Nanoviridae (44% amino acid identity to Faba bean necrotic yellows virus) families. ORF1 of this virus showed similarity to the pfam geminivirus coat protein 00844 superfamily, and had 27% amino acid identity to the geminivirus Eragrostis curvula streak virus.

In sample SD-BVL, genome Mosquito VEM GeminiFungivirus - SDBVL G (Fig. 3) shared features from both the single-stranded DNA (ssDNA) plant virus Geminiviridae family and the single-stranded DNA (ssDNA) fungal virus, Sclerotinia sclerotiorum hypovirulence associated DNA virus (SSHADV1) [49]. Protein conserved domain searches on ORF2 revealed similarity to the geminivirus replication catalytic domain pfam 00799, and a BLASTp search showed 28% amino acid identity to the geminivirus Tomato mottle leaf curl virus replication protein and 34% amino acid identity to the fungal virus SSHADV1. ORF1 showed 32% amino acid identity to the SSHADV1 coat protein. ORF3 of this virus had 41% amino acid identity to the replication-associated protein from the geminivirus Eragrostis curvula streak virus, and 62% amino acid identity to the replication-associated protein from SSHADV1.

VEM as an effective method for exploring viral diversity 57

Understanding viral diversity in animal and plant hosts is vital for disease surveillance. However, monitoring viral diversity is difficult due to the large number of individuals to sample, as well as the methodological limitations in characterizing novel viruses. The unique approach described here circumvents these issues by allowing the discovery of multiple novel viruses from many hosts in a single experiment. Through the use of VEM, this study created a baseline of the DNA virus community present in mosquitoes, shedding light on the high diversity of animal, plant, insect, and bacterial viruses that are present in this important vector. Although the discovery of a viral sequence does not always indicate active infection, the initial characterization will enable future studies to investigate viral prevalence and link viruses to hosts and disease symptoms. The application of this technique to mosquitoes from other regions, as well as other types of vectors, will greatly enhance our understanding of viral diversity.

VEM is a versatile technique that can be further refined to answer specific questions. Instead of purifying viruses from whole mosquitoes, VEM can be performed on dissected blood meals, surveying viruses specifically from the animal blood and plant nectar that the mosquitoes feed on and excluding viruses that may be present on the outside of the mosquitoes. Similarly, performing metagenomics on viruses purified from the dissected mosquito salivary glands, or mosquito saliva emitted during sugar feeding [50] can identify arboviruses with potential for transmission to animals. VEM can also be performed to characterize RNA viruses through the addition of a random-primed reverse transcription step [51]. Finally, the multiple displacement amplification used in this study is known to preferentially amplify small ssDNA circular genomes [52,53,54], and the identifiable sequences in this study were dominated by mosquito densoviruses. To identify more double-stranded DNA (dsDNA) viruses, alternate amplification methods without this bias could be used, or ssDNA could be selectively removed from the mosquito samples by Mung Bean nuclease treatment.

In conclusion, this study utilized VEM to demonstrate the presence of a highly novel and diverse reservoir of animal, plant, insect, and bacterial viruses present in mosquitoes. The three different mosquito viromes contained distinct virus profiles, showing heterogeneity in the circulating viral community. The VEM approach described here will be transformative for our understanding of the ecology of animal, plant, insect, and bacterial viruses, and has implications for surveillance of emerging viruses, prediction of viral epidemics, and proactive control of vector-borne diseases.

58

Materials and Methods Sample collection Three mosquito samples (Table S1) were collected from San Diego County, CA, USA (Fig. S1) using an EVS CO2 trap baited with dry ice (BioQuip Products, Inc., Rancho Dominguez, CA, USA). SD-BVL and SD-RB mosquito samples were killed by freezing at -80oC, while the SD-WAP mosquitoes were anesthetized with triethylamine and stored at 4oC. Mosquitoes were homogenized in 5 ml of suspension medium (SM) buffer using a Tissumizer (Tekmar Control Systems, Inc., Vernon, Canada) at 5,000-8,000 rpm. Mosquito debris was pelleted by two rounds of centrifugation at 1,500 xg at 4oC for 30 min.

Viral particle purification and metagenomic sequencing The protocol for viral particle concentration and purification was modified from previous studies [11,12,55], and an overview is shown in Table S6. Supernatants from the mosquito homogenates were filtered through a 0.45 µm syringe filter unit (Millipore, Billerica, MA) and viral particles were purified from the filtrate using a cesium chloride (CsCl) step gradient. The purified viral concentrate was examined by epifluorescence microscopy to verify the presence of viral particles, and ensure the absence of contaminating bacterial and eukaryotic cells [56]. The viral fraction was concentrated and washed twice with sterile SM buffer on a Microcon 30 column (Millipore), followed by treatment with 0.2 volumes of chloroform for 10 minutes, then incubation with 2.5 U DNase I per μl for 3 hours at 37oC.

Total DNA was extracted using a CTAB/Formamide protocol [57]. Extracted viral DNA was amplified using Genomiphi for 1.5 hours (GE Healthcare, Piscataway, NJ) based on the manufacturer's instructions. Following amplification, samples SD-BVL and SD-RB were sequenced with 454 GS20 pyrosequencing and sample SD-WAP was sequenced using 454 GS FLX technology. Longer read length in the SD-WAP virome resulted in an increased proportion of sequences with known identities compared to the other two viromes. The NCBI genome project numbers for the three viromes are 28413, 28467, and 49713.

Bioinformatics Metagenomic sequences were compared to the GenBank non-redundant database using BLASTn and tBLASTx [58,59]. Each sequence was assigned top-level taxonomy (Eukarya, Bacteria, Archaea, or Virus) based on its closest BLAST similarity (E-value < 10-3 (Fig. 1A). Sequences with a best BLAST similarity to cellular genomes over ≥50 nt with an E-value <10-3 were removed prior to further classification. Remaining sequences were further classified based on best BLAST similarities to viral genomes (Fig. 1B). A cross-BLAST approach [13] was used to evaluate the similarity between mosquito metagenomes, where all sequences with BLASTn E-value <10-5 and ≥98% identity were considered shared.

59

Metagenomic sequences were assembled into contigs using the SeqMan Pro- assembler (DNASTAR, Madison, WI) with match size = 35, minimum match percentage = 95%, match spacing = 15, maximum mismatch end bases = 0. Contigs were compared to the non-redundant database using tBLASTx (E-value <0.001 [58,59] and contigs representing complete genomes were manually analyzed using SeqBuilder (DNASTAR). For complete genomes, open reading frames (ORFs) were analyzed and annotated using Artemis [60] and BLASTn and BLASTp were performed to determine identity. The HPV sequences were also assembled to the HPV23 genome (Genbank Accession # U31781) as a reference using Sequencher 4.7 (Gene Codes, Ann Arbor, MI). Densovirus sequences were assembled into contigs using the 454 GS De Novo Assembler (Branford, CT). Viral contigs were deposited to Genbank under the accession number HQ335010-HQ335087.

PCR screening Nucleic acids from Mosquito SD-BVL and SD-WAP were amplified by Genomiphi (GE Healthcare), then subjected to PCR with primers designed based on selected contigs from the viromes (Table S7). Sample Mosquito SD-RB was unavailable for PCR testing.

Phylogenetic analysis Alignments were performed using ClustalW multiple alignment in Bioedit [61], and MEGA4 was used for phylogenetic analysis with a neighbor joining method and bootstrap with 1000 replications [62]. Phylogenetic analysis for the papillomaviruses was based on the partial alignment of the minor capsid protein L2 sequence with representative HPVs. The phylogenetic analysis of the anelloviruses was based on alignment of the ORF1 nucleotide sequence with major anellovirus groups [12,31]. For the densoviruses, the phylogenetic analysis was based on the nucleotide alignment of the partial NS1 gene PCR sequence with representative densoviruses [63].

60

Acknowledgments This work was funded through a grant to FR from the Canadian Institute for Advanced Research (CIFAR) and National Science Foundation grant DEB- 1025915 to MB. TN was funded by the William and Elsie Knight Oceanographic Fellowship. We thank the DNA sequencing team of Genome Institute of Singapore for providing DNA sequencing support, as well as Bhakti Dwivedi and Hong Liu for assistance with bioinformatics.

61

References 1. Molaei G, Andreadis TA, Armstrong PM, Anderson JF, Vossbrinck CR (2006) Host feeding patterns of Culex mosquitoes and West Nile virus transmission, northeastern United States. Emerging Infectious Diseases 12: 468-474. 2. Huang C, Slater B, Campbell W, Howard J, White D (2001) Detection of arboviral RNA directly from mosquito homogenates by reverse- transcription-polymerase chain reaction. Journal of Virological Methods 94: 121-128. 3. Kuno G (1998) Universal diagnostic RT-PCR protocol for arboviruses. Journal of Virological Methods 72: 27-41. 4. Delwart EL (2007) Viral metagenomics. Reviews in Medical Virology 17: 115- 131. 5. Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA, et al. (2002) Microarray-based detection and genotyping of viral pathogens. Proceedings of the National Academy of Sciences of the United States of America 99: 15687-15692. 6. Edwards RA, Rohwer F (2005) Viral metagenomics. Nature Reviews Microbiology 3: 504-510. 7. Allander T, Tammi MT, Eriksson M, Bjerkner A, Tiveljung-Lindell A, et al. (2005) Cloning of a human parvovirus by molecular screening of respiratory tract samples. Proceedings of the National Academy of Sciences of the United States of America 102: 12891-12896. 8. Breitbart M, Hewson I, Felts B, Mahaffy JM, Nulton J, et al. (2003) Metagenomic analyses of an uncultured viral community from human feces. Journal of Bacteriology 185: 6220-6223. 9. Victoria JG, Kapoor A, Li LL, Blinkova O, Slikas B, et al. (2009) Metagenomic analyses of viruses in stool samples from children with acute flaccid paralysis. Journal of Virology 83: 4642-4651. 10. Jones MS, Kapoor A, Lukashov VV, Simmonds P, Hecht F, et al. (2005) New DNA viruses identified in patients with acute viral infection syndrome. Journal of Virology 79: 8230-8236. 11. Breitbart M, Rohwer F (2005) Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing. Biotechniques 39: 729-736. 12. Ng TFF, Suedmeyer WK, Gulland F, Wheeler E, Breitbart M (2009) Novel anellovirus discovered from a mortality event of captive California sea lions. Journal of General Virology 90: 1256 -1261. 13. Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, et al. (2009) Metagenomic analysis of respiratory tract DNA viral communities in cystic ribrosis and non-cystic fibrosis individuals. Plos One 4. 14. Ng TFF, Manire C, Borrowman K, Langer T, Ehrhart L, et al. (2009) Discovery of a novel single-stranded DNA virus from a sea turtle fibropapilloma using viral metagenomics. Journal of Virology 83: 2500- 2509.

62

15. Rosario K, Nilsson C, Lim YW, Ruan YJ, Breitbart M (2009) Metagenomic analysis of viruses in reclaimed water. Environmental Microbiology 11: 2806-2820. 16. Ekström J, Forslund O, Dillner J (2010) Three novel papillomaviruses (HPV109, HPV112 and HPV114) and their presence in cutaneous and mucosal samples. Virology 397: 331-336. 17. Tietze NS, Stephenson MF, Sidhom NT, Binding PL (2003) Mark-recapture of Culex erythrothorax in Santa Cruz County, California. Journal of the American Mosquito Control Association 19: 134-138. 18. Walton WE, Workman PD, Tempelis CH (1999) Dispersal, survivorship, and host selection of Culex erythrothorax (Diptera : Culicidae) associated with a constructed wetland in southern California. Journal of Medical Entomology 36: 30-40. 19. Antonsson A, Forslund O, Ekberg H, Sterner G, Hansson BG (2000) The ubiquity and impressive genomic diversity of human skin papillomaviruses suggest a commensalic nature of these viruses. Journal of Virology 74: 11636-11641. 20. Dalmat HT (1958) Arthropod transmission of rabbit papillomatosis. Journal of Experimental Medicine 108: 9-&. 21. Biagini P, Pierre G, Cantaloube JF, Attoui H, de Micco P, et al. (2006) Distribution and genetic analysis of TTV and TTMV major phylogenetic groups in French blood donors. Journal of Medical Virology 78: 298-304. 22. Shibata I, Okuda Y, Yazawa S, Ono M, Sasaki T, et al. (2003) PCR detection of Porcine circovirus type 2 DNA in whole blood, serum, oropharyngeal swab, nasal swab, and feces from experimentally infected pigs and field cases. Journal of Veterinary Medical Science 65: 405-408. 23. Todd D (2000) Circoviruses: immunosuppressive threats to avian species: a review. Avian Pathology 29: 373-394. 24. Lopez-Bueno A, Tamames J, Velazquez D, Moya A, Quesada A, et al. (2009) High diversity of the viral community from an Antarctic lake. Science 326: 858-861. 25. Rosario K, Duffy S, Breitbart M (2009) Diverse circovirus-like genome architectures revealed by environmental metagenomics. Journal of General Virology 90: 2418-2424. 26. Li L, Victoria JG, Wang C, Jones M, Fellers GM, et al. (2010) Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses. Journal of Virology 84: 6955-6965. 27. Li L, Kapoor A, Slikas B, Oderinde BS, Wang C, et al. (2010) Multiple diverse circoviruses infect farm animals and are commonly found in human and chimpanzee feces. Journal of virology 91: 74-86. 28. Blinkova O, Rosario K, Li L, Kapoor A, Slikas B, et al. (2009) Frequent detection of highly diverse variants of cardiovirus, , bocavirus, and circovirus in sewage samples collected in the United States. Journal of Clinical Microbiology 47: 3507-3513.

63

29. Blinkova O, Victoria J, Li Y, Keele BF, Sanz C, et al. (2010) Novel circular DNA viruses in stool samples of wild-living chimpanzees. J Gen Virol 91: 74-86. 30. Weaver SC, Barrett ADT (2004) Transmission cycles, host range, evolution and emergence of arboviral disease. Nature Reviews Microbiology 2: 789- 801. 31. Biagini P, Uch R, Belhouchet M, Attoui H, Cantaloube JF, et al. (2007) Circular genomes related to anelloviruses identified in human and animal samples by using a combined rolling-circle amplification/sequence- independent single primer amplification approach. Journal of General Virology 88: 2696-2701. 32. Hino S, Miyata H (2007) Torque teno virus (TTV): current status. Reviews in Medical Virology 17: 45-57. 33. Leary TP, Erker JC, Chalmers ML, Desai SM, Mushahwar IK (1999) Improved detection systems for TT virus reveal high prevalence in humans, non-human primates and farm animals. Journal of General Virology 80: 2115-2120. 34. Davidson I, Shulman LM (2008) Unraveling the puzzle of human anellovirus infections by comparison with avian infections with the chicken anemia virus. Virus Research 137: 1-15. 35. Jones DR (2003) Plant viruses transmitted by whiteflies. European Journal of Plant Pathology 109: 195-219. 36. Oneill SL, Kittayapong P, Braig HR, Andreadis TG, Gonzalez JP, et al. (1995) Insect densoviruses may be widespread in mosquito cell-lines. Journal of General Virology 76: 2067-2074. 37. Ren XX, Hoiczyk E, Rasgon JL (2008) Viral paratransgenesis in the malaria vector Anopheles gambiae. Plos Pathogens 4. 38. Barreau C, Jousset FX, Bergoin M (1997) Venereal and vertical transmission of the Aedes albopictus parvovirus in Aedes aegypti mosquitoes. American Journal of Tropical Medicine and Hygiene 57: 126-131. 39. Carlson J, Suchman E, Buchatsky L (2006) Densoviruses for control and genetic manipulation of mosquitoes. Insect Viruses: Biotechnological Applications 68: 361-392. 40. Rwegoshora RT, Kittayapong P (2004) Pathogenicity and infectivity of the Thai-strain densovirus (AThDNV) in Anopheles minimus s.l. Southeast Asian Journal of Tropical Medicine and Public Health 35: 630-634. 41. Cheng LP, Chen SX, Zhou ZH, Zhang JQ (2007) Structure comparisons of Aedes albopictus densovirus with other parvoviruses. Science in China Series C-Life Sciences 50: 70-74. 42. Lood R, Morgelin M, Holmberg A, Rasmussen M, Collin M (2008) Inducible Siphoviruses in superficial and deep tissue isolates of Propionibacterium acnes. Bmc Microbiology 8. 43. Van der Wilk F, Dullemans AM, Verbeek M, van den Heuvel J (1999) Isolation and characterization of APSE-1, a bacteriophage infecting the secondary endosymbiont of Acyrthosiphon pisum. Virology 262: 104-113.

64

44. Oliver KM, Degnan PH, Hunter MS, Moran NA (2009) Bacteriophages encode factors required for protection in a symbiotic mutualism. Science 325: 992-994. 45. Moran NA, Degnan PH, Santos SR, Dunbar HE, Ochman H (2005) The players in a mutualistic symbiosis: Insects, bacteria, viruses, and virulence genes. Proceedings of the National Academy of Sciences of the United States of America 102: 16919-16926. 46. Degnan PH, Moran NA (2008) Diverse phage-encoded toxins in a protective insect endosymbiont. Applied and Environmental Microbiology 74: 6782- 6791. 47. Yen JH, Barr AR (1971) New hypothesis of the cause of cytoplasmic incompatibility in Culex pipiens L. Nature 232: 657-658. 48. Pidiyar VJ, Jangid K, Patole MS, Shouche YS (2004) Studies on cultured and uncultured microbiota of wild Culex quinquefasciatus mosquito midgut based on 16s ribosomal RNA gene analysis. American Journal of Tropical Medicine and Hygiene 70: 597-603. 49. Yu X, Li B, Fu Y, Jiang D, Ghabrial SA, et al. (2010) A geminivirus-related DNA mycovirus that confers hypovirulence to a plant pathogenic fungus. Proceedings of the National Academy of Sciences of the United States of America 107: 8387-8392. 50. Hall-Mendelin S, Ritchie SA, Johansen CA, Zborowski P, Cortis G, et al. (2010) Exploiting mosquito sugar feeding to detect mosquito-borne pathogens. Proceedings of the National Academy of Sciences of the United States of America 107: 11255-11259. 51. Zhang T, Breitbart M, Lee WH, Run JQ, Wei CL, et al. (2006) RNA viral community in human feces: prevalence of plant pathogenic viruses. Plos Biology 4: 108-118. 52. Pinard R, de Winter A, Sarkis GJ, Gerstein MB, Tartaro KR, et al. (2006) Assessment of whole genome amplification-induced bias through high- throughput, massively parallel whole genome sequencing. Bmc Genomics 7: 21. 53. Kim K-H, Chang H-W, Nam Y-D, Roh SW, Kim M-S, et al. (2008) Amplification of uncultured single-stranded DNA viruses from rice paddy soil. Appl Environ Microbiol 74: 5975-5985. 54. Haible D, Kober S, Jeske H (2006) Rolling circle amplification revolutionizes diagnosis and genomics of geminiviruses. Journal of Virological Methods 135: 9-16. 55. Thurber RV, Haynes M, Breitbart M, Wegley L, Rohwer F (2009) Laboratory procedures to generate viral metagenomes. Nature Protocols 4: 470-483. 56. Patel A, Noble RT, Steele JA, Schwalbach MS, Hewson I, et al. (2007) Virus and prokaryote enumeration from planktonic aquatic environments by epifluorescence microscopy with SYBR Green I. Nature Protocols 2: 269- 276. 57. Sambrook J, Russell DW (2001) Molecular cloning: A laboratory manual. New York: Cold Spring Harbor Laboratory Press.

65

58. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403-410. 59. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389-3402. 60. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, et al. (2000) Artemis: sequence visualization and annotation. Bioinformatics 16: 944-945. 61. Hall TA (1999) BioEdit:a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41: 95-98. 62. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596-1599. 63. Zhai YG, Lv XJ, Sun XH, Fu SH, Gong ZD, et al. (2008) Isolation and characterization of the full coding sequence of a novel densovirus from the mosquito Culex pipiens pallens. Journal of General Virology 89: 195-199.

66

Figure Legends

Fig. 1. Taxonomic classification of the metagenomic sequences from the three mosquito viromes. A) Classification based on tBLASTx (E-value < 0.001) against the Genbank non-redundant database. B) Breakdown of the viral sequences into four major categories: animal, plant, insect viruses (densoviruses and other insect viruses) and bacteriophages. Samples were obtained from 3 sites in San Diego: Buena Vista Lagoon (SD-BVL), River Bank (SD-RB), and Wild Animal Park (SD-WAP).

67

Fig. 2. Classification of vertebrate and plant virus sequences present in the three San Diego mosquito viromes. The family, host, and name of the most significant tBLASTx similarities in the Genbank non-redundant database are shown, with the colors representing the level of amino acid identity.

68

Fig. 3. Genome organization and coverage of several putative virus genomes discovered in mosquito viromes. A) Human papillomavirus 23 (HPV23), B) Novel anelloviruses, C) Novel viruses with unique genome organization. Open reading frames are highlighted on the genome map and the amount of coverage from the metagenomic reads of the sample the virus was identified in is shown in the center.

69

Fig. 4. Neighbor joining tree based on the amino acid alignment of Mosquito VEM Papillomaviruses with the partial capsid protein L2 of representative HPV types. Mosquito VEM Papillomavirus – SDRB AF shared similarity to a different region of capsid protein L2, but produced identical tree topography (data not shown). Papillomavirus sequences identified in mosquito virome SD-RB are indicated by arrows, all of which belong to the cutaneous groups.

70

Fig. 5. Neighbor joining phylogenetic tree of anelloviruses constructed using the entire nucleotide sequence of ORF1. Genbank accession numbers are shown in parentheses, and the hosts are indicated for any non-human sequences. The newly discovered anelloviruses from the mosquito viromes are indicated by arrows.

71

Fig. 6. Neighbor joining phylogenetic tree of VEMCeDNV and other densoviruses based on alignment of the 720-bp partial NS1 gene nucleotide sequences. The VEMCeDNV from the C. erythrothorax mosquitoes in sample SD-WAP is indicated with an arrow.

72

73

74

75

76

77

78

79

80

81

Appendix 4: Exploring the diversity of plant viruses and their satellites

using vector-enabled metagenomics on whiteflies

This appendix is in review:

Terry Fei Fan Ng, Siobain Duffy, Elise Bixby, Gary E Vallad, Jane E Polston, Mya

Breitbart. 2010. Exploring the diversity of plant viruses and their satellites

using vector-enabled metagenomics on whiteflies.

82

Title: Exploring the Diversity of Plant Viruses and Their Satellites Using Vector-

Enabled Metagenomics on Whiteflies

Running title: Vector-Enabled Metagenomics on Whiteflies

Terry Fei Fan Ng1, Siobain Duffy2, Elise Bixby1, Gary E Vallad3, Jane E Polston4,

Mya Breitbart1

1College of Marine Science, University of South Florida, FL, USA

2Department of Ecology, Evolution and Natural Resources, School of

Environmental and Biological Sciences, Rutgers University, New Brunswick, NJ,

USA

3Gulf Coast Research and Education Center, University of Florida, FL, USA

4Department of Plant Pathology, University of Florida, FL, USA

Corresponding author: Mya Breitbart

College of Marine Science, University of South Florida

140 7th Ave S,

Saint Petersburg, FL 33701

(727) 553-3520 [email protected]

83

Abstract

Current knowledge of plant virus diversity is biased towards agents of visible and economically important diseases. Little is known about viruses that have not caused major diseases in crops, or viruses from native vegetation, which are a reservoir of biodiversity that can contribute to viral emergence. Discovery of these plant viruses is hindered by the traditional approach of sampling individual symptomatic plants. Since many damaging plant viruses are transmitted by insect vectors, we have developed “vector-enabled metagenomics” (VEM) as a new approach for plant virus surveillance and proactive agricultural biodefense.

VEM involves sampling of insect vectors (i.e., whiteflies) from plants, followed by purification of viral particles and metagenomic sequencing. The VEM approach exploits the natural ability of highly mobile adult whiteflies to integrate viruses from many plants over time and space, and leverages the capability of metagenomics for discovering novel viruses. This study utilized VEM to describe the DNA viral community from whiteflies (Bemisia tabaci) collected from two important agricultural regions in Florida, USA. VEM successfully characterized the active and abundant viruses that produce disease symptoms in crops, as well as the less abundant viruses infecting adjacent native vegetation. PCR assays designed from the metagenomic sequences enabled the complete sequencing of four novel begomovirus genome components, as well as the first discovery of plant virus satellites in North America. One of the novel begomoviruses was subsequently identified in symptomatic Chenopodium ambrosiodes from the same field site, validating VEM as an effective method for proactive monitoring of

84 plant viruses without a priori knowledge of the pathogens. This study demonstrates the power of VEM for describing the circulating viral community in a given region, which will enhance our understanding of plant viral diversity, and facilitate emerging plant virus surveillance and management of viral diseases.

85

Author summary

Current knowledge of plant virus diversity is biased towards agents of visible and economically important diseases, and little is known about potentially emergent viruses that have not yet made their presence known on crops.

Discovery of new plant viruses is limited by the traditional approach of sampling individual symptomatic plants. Here we demonstrate a novel approach that involves sampling of insect vectors from plants, followed by purification of viral particles and metagenomic sequencing. The vector-enabled metagenomics

(VEM) approach exploits the natural ability of whiteflies to concentrate viruses from many plants over time and space, and leverages the capability of metagenomics for discovering novel viruses. From two whitefly (Bemisia tabaci) samples collected in Florida, VEM successfully characterized the active and abundant viruses that produce disease symptoms in crops, as well as less abundant viruses infecting native vegetation which can serve as a reservoir of genetic diversity contributing to the emergence of novel begomoviruses. This study completely sequenced four novel begomovirus genome components, as well as the first plant virus satellites from North America. By characterizing the circulating viral community in a given region, VEM will facilitate surveillance for emerging plant viruses and enable proactive agricultural biodefense.

86

Introduction

Current knowledge of plant virus diversity is heavily biased towards agents of visible and economically important diseases, and little is known about the potentially emergent viruses that have not yet made their presence known on crops. However, these undiscovered viruses can provide a reservoir of biodiversity for recombination and reassortment with existing pathogenic plant viruses, and are an essential component of the complex microbial ecology of plants [e.g., 1,2]. Much of this diversity may exist in plants without symptoms of disease; for instance, plants infected with viruses that are not yet adept at exploiting the new host, some of which have historically evolved into highly virulent pathogens [3,4,5].

While viruses and virus-like elements can be readily isolated from ~60% of plants [1], even visibly diseased plants often have low viral titers [6,7].

Additionally, not all infections result in the production of visible symptoms, and viruses can be limited to only certain tissues of the plant, such as actively growing tissues or the phloem [8]. This presents a sampling problem for plant virus discovery: either a researcher needs advance knowledge of which plant tissues to sample, or they „dilute‟ the viral titer when homogenizing infected cells with uninfected parts of the plant. These challenges make the discovery of viruses from individual plants truly difficult.

Since the majority of known plant viruses are exclusively vector-transmitted

[9], examination of insect vectors presents an excellent opportunity to explore the diversity of plant viruses. The whitefly Bemisia tabaci species complex is an

87 important vector of many plant viruses. Whiteflies often occur in high populations on many crops [10,11], especially at the end of the production cycle when insecticide applications have been withheld due to harvest. B. tabaci feeds on a very wide range of plants [10], and is highly mobile, being able to fly short distances and capable of traveling up to a several kilometers when assisted by the wind [12,13]. In addition, begomoviruses can be retained for up to the lifespan of the adult whitefly [14]. Therefore, whitefly vectors are natural “flying syringes” that can sample viruses from many individual plants and different plant species over space and time.

Once the proper samples are obtained, an effective method of viral discovery is needed. Most viral identification techniques have limited capability for characterizing novel viruses due to their specificity. Common methods such as ELISA with antibodies for a specific virus, PCR or microarrays with primers designed for a specific virus, or PCR with degenerate primers that amplify a closely related group of viruses are effective methods for detecting close relatives of known viruses [e.g., 15,16,17]. However; these methods do not allow for the discovery of novel viruses with low levels of similarity to previously described viral groups, and thus prevent a thorough exploration of viral biodiversity. The amplification of viral DNA by rolling circle amplification (RCA) followed by restriction digestion and cloning is a more recent approach that has facilitated begomovirus discovery from individual plants [18,19]. However, this approach is largely restricted to circular ssDNA viruses, and the use of restriction analysis limits the types and numbers of viruses identified.

88

Viral metagenomics, a molecular technique that involves purifying viral particles from samples and shotgun sequencing the viral nucleic acids [reviewed in 20,21], circumvents the biases of classic viral identification methods, and has revolutionized the exploration of viral diversity. Viral metagenomics has been used to characterize viral communities present in the environment [e.g., 22,23], as well as individual plants [e.g., 24] and animals [e.g., 25]. The advantage of metagenomics for viral identification is that it allows for characterization of the complete viral community, including viruses that are too divergent to be detected by PCR assays based on known viral sequences.

The vector-enabled metagenomics (VEM) approach presented here takes advantage of the highly polyphagous and mobile nature of the whitefly vector, combined with the capability of metagenomics to discover novel viruses without relying on sequence similarity to known viruses. This study utilized VEM to explore the diversity of DNA viruses in whiteflies collected from different crops in two agriculturally important sites in Florida.

89

Results

Summary of viromes

The VEM method, consisting of viral purification and metagenomic sequencing, was used to obtain sequence from encapsulated viral DNA present in the whitefly vector, Bemisia tabaci. BLASTn analysis revealed that 79% of the sequences in the Citra virome and 93% of the sequences in the Homestead virome showed significant similarity to known plant viruses, with sequence identities between 70% and 100% to previously described begomoviruses (Table

1). Several sequences from the Homestead virome had short BLASTn alignments to the stem-loop and nonanucleotide sequence of begomoviruses, but further analysis of these sequences revealed that they belonged to novel satellite genomes. Each virome had a small number of ”unknown” sequences that did not have significant similarity to any available sequences in Genbank.

The Citra virome also contained a small number of bacterial sequences that were likely the result of imperfect virus purification.

Further analysis of the viral sequences demonstrated that the two viromes contained sequences with a range of nucleotide identities to known begomoviruses (Table 1). According to Fauquet [26], entire begomovirus DNA-A components with >93% nucleotide identity are considered the same strain, DNA-

A components with 88%-93% nucleotide identity are considered different strains of the same species, and DNA-A components with <88% nucleotide identity are considered new species. The results shown in Table 1 are based on analysis of individual metagenomic sequence reads ranging from 100 to 700 nt in length; the

90 entire DNA-A would need to be sequenced before these viruses could be classified as novel strains and/or species.

Citra whitefly virome

The Citra whitefly virome was dominated by Cucurbit leaf crumple virus

(CuLCrV), a begomovirus known to infect watermelon plants at this sampling site

(Table 1). Alignment of the metagenomic sequences against a CuLCrV reference genome from Genbank demonstrated that VEM resulted in near-complete coverage of the DNA-A and partial coverage of the DNA-B (Figure 1 A&B). The segments of both the DNA-A and DNA-B assembled from the Citra virome shared >97% nucleotide identities to the CuLCrV reference genome.

Additionally, two sequences were identified that shared 96-97% nucleotide identities to the DNA-B of Sida golden mosaic virus (SiGMV). Based on one of these metagenomic sequences, PCR primers were designed to amplify, clone, and sequence the entire DNA-B using DNA extracted from the whiteflies. The resulting sequence had characteristics consistent with the DNA-B of SiGMV

(Figure 2C), but shared only 90% nucleotide identity to Sida golden mosaic virus by whole component pairwise comparison. Therefore, this genome represents a novel SiGMV DNA-B (Figure 2B), which we have named Sida golden mosaic virus - whitefly VEM Citra Florida (SiGMV – VEM [USA:Flo:Citra]).

Strikingly, one partial sequence of 243 nt was identified in the virome which had only 84% nucleotide identity to the DNA-A of Tobacco leaf rugose virus

(Table 1). PCR primers were designed to amplify, clone, and completely

91 sequence the DNA-A of this virus using DNA extracted from the whiteflies. The complete sequence of the DNA-A (2628 nt) of this virus only shared 81% nucleotide identity with its closest relative, a Mexican isolate of Desmodium leaf distortion virus (GenBank DQ875870)[27] (Figure 2A). Despite its low level of nucleotide identity to known begomoviruses, this viral sequence contains all the essential genome features of a begomovirus DNA-A (Figure 2C). The replication initiator protein, transcription activator, replication enhancer, and coat protein are present in an organization consistent with known begomoviruses, and the genome has a stem-loop containing the geminivirus-signature nonanucleotide sequence TAATATT/AC [28].

To investigate the host plant of this novel virus, a survey of 203 wild and cultivated plants was conducted at the Citra site where the whiteflies were originally collected. Degenerate PCR [15] for begomoviruses was positive in symptomatic Chenopodium ambrosiodes. A full length DNA-A clone (2626 nt) was obtained and sequenced from this plant, which was 96% identical to the complete DNA-A sequence identified from the whiteflies using VEM. This novel virus identified in C. ambrosiodes is tentatively named Chenopodium leaf curl virus (ChLCV), while the novel virus identified in the whiteflies through VEM is tentatively named Chenopodium leaf curl virus [VEM] (ChLCV [VEM]). Both viruses appear to be most closely related to Desmodium leaf distortion virus, but group with other New World begomoviruses such as Cotton leaf crumple virus,

Sida yellow leaf curl virus and Tomato severe leaf curl virus (Figure 2A). Since

92 none of these viruses have been previously reported in Florida, studies are underway to more fully characterize ChLCV.

Homestead whitefly virome

For the Homestead sample, a number of metagenomic sequences had high (93-100%) nucleotide identities to known begomoviruses (Table 1). The virome was dominated by sequences from Tomato yellow leaf curl virus (TYLCV) and Sida golden mosaic virus (SiGMV) (Table 1). Complete coverage of the

TYLCV genome was obtained, demonstrating that the TYLCV strain in these samples had a 29 base deletion compared to the reference genome (Figure 1D), which has also previously been detected in Texas, Arizona and Mexico [29].

Partial coverage of the SiGMV DNA-A was also obtained from the virome (Figure

1C).

Numerous metagenomic sequences shared between 88%-93% nucleotide identities to seven known begomoviruses (Table 1). Furthermore, 24 metagenomic sequences were found to have less than 88% nucleotide identities to these same seven begomoviruses (Table 1). PCR assays targeting several of these metagenomic sequences confirmed that they represented novel begomoviruses (see below), although it is important to note that the top BLASTn hit for the complete genome components was not always the same as the top

BLASTn hit of the original metagenomic sequence.

PCR primers were designed to further explore two metagenomic sequences with low levels of identity to known begomoviruses. The complete

93

DNA-A was sequenced of a virus, tentatively named Whitefly VEM 1

Begomovirus (WfVEM1Bv), which shared only 88% to its closest relative

Malvastrum yellow mosaic Jamaica virus by whole component pairwise analysis and phylogenetic analysis (Figure 2A). In addition, the complete DNA-B was sequenced of a virus, tentatively named Whitefly VEM 2 Begomovirus

(WfVEM2Bv), which shared 76% nucleotide identity to its closest relatives,

Malvastrum yellow mosaic Jamaica virus and Sida golden mosaic virus (Figure

2B). The intergenic region between WfVEM1Bv and WfVEM2Bv shared less than

90% nucleotide identity, thus they are likely to be genome components of two distinct viruses.

Eight metagenomic sequences were identified that had very short BLASTn similarities to the origin of replication and nonanucleotide sequence of begomoviruses, but otherwise shared no significant similarities to any known sequences. Using specific PCR assays, eight full circular genomes (675 to 694 nt) were cloned and sequenced. The fact that these genomes were encapsulated in the whiteflies, have an adenosine-rich region, and do not have an open reading frame that codes for a replication-associated protein, suggests that these sequences are novel begomovirus satellites, which are tentatively named

Whitefly VEM Satellite (WfVEM-Sat). The WfVEM-Sat genomes are similar to the

Tomato leaf curl virus satellite (ToLCV-sat, Genbank U74627[30,31]) in size and genome organization; however, unlike ToLCV-Sat, the WfVEM-Sat genomes do not contain a second stem-loop and putative Rep binding motif (GGTGTCT).

Pairwise comparisons among the eight satellite genomes ranged from 85-94%

94 nucleotide identity (Table S1). Since all eight clones that were sequenced were unique, it is likely that continued sequencing would yield many more satellite genomes.

95

Discussion

Using VEM, this study characterized whitefly-transmitted plant viruses circulating in whiteflies from two important agricultural regions in Florida. With the exception of Sida golden mosaic virus, the virus profiles of the two viromes were completely distinct from each other (Table 1). In addition to identifying pathogens that produce disease symptoms in crops, VEM enables the identification of viruses infecting native vegetation, newly emerging viruses that are not yet widespread in crops, and viruses that have more mutualistic interactions with their hosts [32]. The highly polyphagous and mobile nature of whiteflies integrates viral diversity over space and time, making these insect vectors a perfect tool for sampling the viral community circulating in plants in a given region, including understudied weed hosts [33,34,35]. Performing viral metagenomics on whitefly vectors has several advantages over surveys performed on individual plants. Diseased plants often have low titers of encapsulated viruses [6,7] and infection can be limited to only certain tissues of the plant [8]. In comparison, viruses in whiteflies are likely to be encapsulated and present in high titers [36,37], making them ideal for the VEM approach which targets only intact (and thus potentially transmissible) viral particles.

Metagenomic sequencing of viruses purified directly from whiteflies provides an additional advantage over traditional PCR-based approaches since it allows for the discovery of divergent viruses that might not be captured with existing degenerate PCR assays. A few recent studies have used metagenomic sequencing to examine RNA plant virus diversity [24,38,39]; however, to date, no

96 published studies have applied metagenomic techniques to explore the diversity of plant DNA viruses. In addition, this is the first study to discover viruses infecting plant hosts by directly examining insect vectors that feed upon them, a technique that is directly applicable to other host-vector systems (e.g., mosquitoes and human viruses; Ng et al., in preparation).

The viromes from both field sites have an extremely high percentage of identifiable viral sequences, with more than 79% of the sequences in each virome similar on the nucleotide level to known plant viruses . This is striking in comparison to DNA viruses from animals [e.g., 40] or environmental samples

[e.g., 22], which are dominated by „unknown‟ sequences with no significant homology to characterized genes. In addition, the sequences from environmental or animal viromes that have similarities to known viruses are typically only distantly related on the amino acid level, and rarely demonstrate nucleotide level identities. The fact that the majority of the whitefly virome sequences showed nucleotide level identities (BLASTn) to known plant viruses suggests that a more thorough understanding exists for whitefly-transmitted plant DNA virus diversity than for viruses infecting other hosts (e.g. animal viruses and bacteriophages).

This finding also suggests that the majority of undiscovered DNA plant virus diversity consists of variations on viral themes that have already been described.

Interestingly, the vast majority of the sequences identified in this study were similar to begomoviruses. The single-stranded DNA (ssDNA) begomoviruses belonging to the family Geminiviridae are some of the most damaging and emergent viruses transmitted by whiteflies. Begomoviruses are often the limiting

97 factor in the production of tomato, pepper, squash, melon, and cotton in the subtropics and tropics [41,42], and periodic begomovirus epidemics in staple crops, such as cassava, have caused widespread famines in the developing world [43]. Some begomoviruses are associated with satellites, which can play a role in disease, and are dependent upon the begomovirus for replication and encapsulation [44]. Most begomovirus satellites fall into one of several classes:

α-satellites, β-satellites, and defective-interfering DNAs (DI-DNA) [2,44]. All satellites are encapsulated by the helper virus coat protein, and share an origin of replication sequence, the conserved nonanucleotide sequence, with their helper virus. Beyond that short conserved sequence, α-satellites and β-satellites show no significant homology with their helper viruses, while DI-DNAs show extensive homology with their helper viruses.

In this study, VEM led to the identification of multiple viruses from each site, exemplifying the advantages of this approach for describing the viral community circulating amongst plants in a given region, without a priori knowledge of what viral types are present. First, each virome was dominated by sequences with a high level of similarity to known viruses infecting the crop plants from which the whiteflies were collected. This demonstrates that VEM is capable of detecting and obtaining high levels of sequence coverage from active and abundant viruses in the primary collection host. Second, both viromes contained sequences similar to viruses that infect Sida spp., which are weeds commonly found near many agricultural fields in Florida. This demonstrates that discovery of viruses through VEM is not constrained by the collection host and

98 that VEM is capable of identifying viruses from neighboring crops or weeds.

Characterization of viruses from indigenous plants is especially interesting since this is a largely unexplored reservoir of genetic diversity that may contribute to emergence of novel begomovirus strains by host shift or recombination [34,45].

Third, VEM is a powerful technique for discovering novel viruses, as evidenced by the recovery of sequences with either low levels of identity (<88%) or no significant similarity to previously described viruses. The discovery of novel satellite DNA sequences in this study is notable since these sequences are so divergent that they would not have been identified through degenerate PCR assays. Fourth, based only on single sequence fragments from the virome, the complete DNA-A or DNA-B of many novel viruses, as well as numerous satellite genomes were obtained by PCR. This demonstrates that individual metagenomic sequences can be used as “hooks” to recover complete viral genomes. Finally, the novel virus ChLCV[VEM] discovered in whiteflies was subsequently confirmed in a symptomatic weed collected from the study site two years later, successfully linking this sequence to a specific host plant and disease symptoms, and demonstrating that these viruses are persistent in the system.

This proof-of-concept study has significantly expanded our knowledge of begomoviruses in Florida. Vegetable crops in Florida have experienced significant losses due to whitefly-transmitted viruses over the past 15 years. The dominant viruses identified in this study were Tomato yellow leaf curl virus

(TYLCV) and Cucurbit leaf crumple virus (CuLCrV), which are both whitefly- transmitted begomoviruses that have been introduced to Florida since the arrival

99 of the silverleaf whitefly in the late 1980‟s [46,47,48]. CuLCrV is an emergent virus that infects cucurbits, which was not known in Florida until the virus impacted crop production in 2006 [48]. TYLCV was identified in Florida in 1997

[49] and has negatively impacted tomato production since that time. Identification of these two viruses provides an excellent example of how VEM can be used for early surveillance of emerging plant viruses. Sufficient depth of coverage for

CuLCrV and TYLCV DNA-A was obtained such that had these viruses not already been described, the complete or near-complete genomes could have been assembled from the viromes (Figure 1). This is particularly noteworthy considering the small number of sequences obtained in this study. If the VEM method were performed using next-generation sequencing technologies, coverage of the dominant viruses would be much higher, and many more viruses present at lower abundances would be identified.

The small amount of sequencing performed in this study was sufficient to uncover numerous metagenomic sequence fragments related to viruses that had not been previously documented in Florida (Table 1). Some of these sequences are known to be present in neighboring Caribbean countries, and should now be monitored for in Florida since they potentially represent recently introduced viruses. Furthermore, this study discovered numerous metagenomic sequences with <88% nucleotide identity to different genes of known begomoviruses (Table

S2), which likely represent novel begomovirus strains or species. In fact, we have confirmed that several of these sequences belong to novel begomoviruses by designing PCR assays to obtain their full component sequences (Figure 2). The

100 metagenomic sequences produced in this study will enable future studies to characterize these new viruses‟ genomes and explore their origins, host ranges, and the timing of their introduction to Florida.

In addition to the discovery of begomoviruses, this study demonstrated the power of the VEM approach for the discovery of novel virus-associated entities such as satellites. This study is the first to demonstrate the presence of begomovirus satellites in the North America. Satellites can play important roles in plant disease, but often remain unrecognized due to limitations of common virus identification methods, especially when there is a lack of sequence homology with the helper virus. Until recently, both α-satellites and β-satellites were believed to be restricted to the Old World [2,44], but the presence of several α- satellites have now been recognized in Brazil [50]. The Tomato leaf curl virus satellite (ToLCV-sat), was first discovered in Australia in 1997 [30], and prior to this study, no similar satellites had been discovered. Both WfVEM-Sat and

ToLCV-Sat are distinct from α-satellites and β-satellites in terms of genome organization, lack of ORFs, and small genome size. The initial discovery of satellites through VEM will enable future studies to identify their helper virus(es) and associations with disease.

By enabling the discovery of a wide range of viruses and satellites, VEM is a powerful technique that will significantly enhance our fundamental scientific understanding of plant viral diversity, biogeography, and emergence. Typically, new plant viruses are not identified until an outbreak causing significant crop loss occurs. This has put agricultural biodefense into a reactive mindset – waiting until

101 a new disease becomes a problem before trying to understand and combat the causative agent. The VEM approach circumvents that traditional process by obtaining a comprehensive sample of the viruses actively circulating in an insect vector population within a region, effectively integrating over individual plants, space, and time. Thus, VEM is a proactive molecular surveillance tool that allows rapid identification of emerging viruses before noticeable crop loss occurs, providing precious time for implementation of preventative measures and significantly improving agricultural biosecurity.

Future studies using the VEM approach can incorporate high-throughput sequencing, expand the geographical range and frequency of sampling, and investigate both DNA and RNA viruses. The VEM approach can be extended to other insect vectors, such as aphids and leafhoppers, to understand the diversity of plant viruses that is being transmitted through each of these different vectors.

In addition, further refinement of this method for begomovirus discovery should focus on recovering whole-genome components in order to allow for accurate analyses of recombination and evolution. When applied on a larger scale, VEM can describe global diversity of plant viruses, which will help refine existing plant virus phylogenies, aid in the development of more inclusive assays for monitoring introduced and emerging viruses, and increase our understanding of plant virus biogeography and evolution.

102

Methods and Materials

Sample Collection

Adult whiteflies (Bemisia tabaci) were collected using battery-operated vacuums [51] from two crop fields in Florida: Citra (29o24‟N 82o06‟W) and

Homestead (25o28‟N 80o30‟W). The distance between the two collection sites was 460 km. The Citra sample contained whiteflies collected from soybean and watermelon plants in August 2007, while the Homestead sample was collected in

April 2009 and contained whiteflies from tomato and squash plants in the vicinity of mixed cucurbit crops such as cantaloupe, pumpkin, cucumber, and watermelon. The two samples differed in terms of collection host, amount of sequencing performed, and the time and location of sampling. Upon collection, whiteflies were chilled at 4 ºC for 1 hr, plant debris and other insects were removed manually, and the whiteflies were stored at -80 ºC until processing.

Virus Purification and Metagenomic Sequencing

Viruses were purified from the whiteflies using a modification of previously described methods [40,52]. Briefly, the whiteflies were homogenized in sterile SM buffer (50 mM Tris, 10 mM MgSO4, 0.1 M NaCl, pH 7.5) and host cells were removed through centrifugation at 10,000×g for 10 minutes, followed by filtration of the supernatant through a 0.22 μm Sterivex filter (Millipore, Billerica, MA). The filtrate was treated with 0.2 volumes of chloroform for 10 minutes, then incubated with 2.5 U DNase I per μl sample for 3 hours at 37oC. DNA was extracted from the purified viral particles using the QIAamp MinElute Virus Spin Kit (Qiagen,

103

Valencia CA) and amplified with the GenomiPhi V2 DNA Amplification Kit (GE

Healthcare, Piscataway, NJ) according to the manufacturer‟s instructions. The

GenomePlex Whole Genome Amplification Kit (Sigma-Aldrich, St. Louis, MO) was then used to fragment and amplify the viral DNA, which was subsequently cloned into the pCR4 vector using TOPO TA cloning (Invitrogen, Carlsbad, CA) and sequenced with the M13F forward primer by Beckman Coulter Genomics

(Danvers, MA).

The resulting sequences were trimmed for vector and read quality using

Sequencher 4.7 (Gene Codes, Ann Arbor, MI). A total of 58 and 158 sequences with >100 nt of good read quality after trimming were obtained from the Citra and

Homestead viral metagenomes (viromes), respectively (Accession numbers

HN153414-HN153629). Metagenomic sequences were analyzed using BLASTn against the Genbank non-redundant database with an E-value cut-off of 1e-5

[53,54]. Individual sequence reads were aligned to reference genomes from

Genbank, and assembled into contiguous sequences (>95% identity over 30 nt) using Sequencher.

PCR to complete genome components

To further characterize selected metagenomic sequences that likely represented novel virus species or strains, PCR primers were designed to amplify the entire DNA-A, DNA-B, and/or satellite DNA sequences (Table S3).

The resulting whole genome PCR products were cloned and completely sequenced (Accession numbers HM626515-HM626517, HM859902-HM859911).

104

The complete genomes were analyzed and annotated using Seqbuilder

(DNASTAR, Madison, WI). Pairwise comparison of the genomes to reference genomes to determine percent identity was performed using BioEdit [55].

Identifying the host for VEM generated viral sequences

To characterize the potential host of one of the novel virus sequences identified through the VEM approach, a survey of wild and cultivated plants was conducted at the Citra site in September and November 2009. Young leaves were collected from 203 plants belonging to 11 families and 15 genera and DNA was extracted using the Gentra Purgene Tissue Kit (Qiagen). Degenerate primers were used to screen for the presence of begomoviruses [15] and amplicons were sequenced and compared to the viromes by BLASTn. Once an amplicon sequence with high homology to selected metagenomic sequences was identified, rolling circle amplification [19] was used to obtain a full length

DNA-A clone from the original plant DNA extract, which was completely sequenced by primer walking.

Phylogenetic Analysis

The completely sequenced genome components were aligned with their closest relatives from the GenBank non-redundant database using MUSCLE

[56], and adjusted by eye in Se-Al (http://tree.bio.ed.ac.uk/software/seal/).

Maximum likelihood phylogenetic trees were estimated with PAUP* 4.0 [57] using the general time reversible nucleotide substitution model specifying the

105 proportion of invariant sites and a gamma distributed rate variation, which had been selected by MODELTEST [58], and tree-bisection-recombination branch- swapping. Bootstrap analyses (1000 replicate neighbor-joining trees) were used to assess the support for individual nodes on the phylogenetic trees.

106

Acknowledgments

This study was funded through the USDA – Tropical-Subtropical Agriculture

Research (T-STAR) program and the National Science Foundation Biodiversity

Inventories program (DEB-1025915). TFFN was supported by the Elsie Knight

Oceanographic Fellowship. SD was supported by funds from the New Jersey

Agricultural Experiment Station and Rutgers School of Environmental and

Biological Sciences. Thanks to Heather Capobianco and Bhakti Dwivedi for assistance.

107

References 1. Wren JD, Roossinck MJ, Nelson RS, Scheets K, Palmer MW, et al. (2006) Plant virus biodiversity and ecology. PLoS Biol 4: e80. 2. Nawaz-Ul-Rehman MS, Fauquet CM (2009) Evolution of geminiviruses and their satellites. FEBS Lett 583: 1825-1832. 3. Zhou X, Liu Y, Calvert L, Munoz C, Otim-Nape GW, et al. (1997) Evidence that DNA-A of a geminivirus associated with severe cassava mosaic disease in Uganda has arisen by interspecific recombination. J Gen Virol 78: 2101- 2111. 4. Zhou X, Liu Y, Robinson DJ, Harrison BD (1998) Four DNA-A variants among Pakistani isolates of cotton leaf curl virus and their affinities to DNA-A of geminivirus isolates from okra. J Gen Virol 79: 915-923. 5. Brown JK (2000) Molecular markers for the identification and global tracking of whitefly vector-Begomovirus complexes. Virus Res 71: 233-260. 6. Lapidot M, Friedmann M, Lachman O, Yehezkel A, Nahon S, et al. (1997) Comparison of resistance level to Tomato yellow leaf curl virus among commercial cultivars and breeding lines. Plant Dis 81: 1425-1428. 7. Lapidot M, Friedmann M, Pilowsky M, Ben-Joseph R, Cohen S (2001) Effect of host plant resistance to Tomato yellow leaf curl virus (TYLCV) on virus acquisition and transmission by its whitefly vector. Phytopathology 91: 1209-1213. 8. Lopez M, Bertolini E, Olmos A, Caruso P, Gorris M, et al. (2003) Innovative tools for detection of plant pathogenic viruses and bacteria. International Microbiology 6: 233-243. 9. Blanc S (2004) Insect transmission of viruses. In: Gillespie SH, Smith GL, Osbourn A, editors. Microbe-vector interactions in vector-borne diseases pp. 103. 10. De Barro P (1995) Bemisia tabaci biotype B, a review of its biology, distribution and control. CSIRO Division Entomology Technical Paper 36: 1-58. 11. Riley D, Nava-Camberos U, Allen J (1995) Population dynamics of Bemisia in agricultural systems. In: Gerling D, Mayer R, editors. Bemisia 1995: Taxonomy, Damage Control and Management. Andover, UK: Intercept Ltd. pp. 93-110. 12. Byrne D, Rathman R, Orum T, Palumbo J (1996) Localized migration and dispersal by the sweet potato whitefly, Bemisia tabaci. Oecologia 105: 320-328. 13. Byrne D (1999) Migration and dispersal by the sweet potato whitefly, Bemisia tabaci. Agricultural and Forest Meterology 97: 309-316. 14. Cohen S (1990) Epidemiology of whitefly-transmitted viruses. Whiteflies: their bionomics, pest status and management. Andover, UK: Intercept Ltd. pp. 211-225. 15. Rojas MR, Gilbertson RJ, Russell DR, Maxwell DP (1993) Use of degenerate primers in the polymerase chain reaction to detect whitefly-transmitted geminiviruses. Plant Dis 77: 340-347.

108

16. Symonds EM, Griffin DW, Breitbart M (2009) Eukaryotic viruses in wastewater samples from the United States. Appl Environ Microbiol 75: 1402-1409. 17. Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA, et al. (2002) Microarray-based detection and genotyping of viral pathogens. Proc Natl Acad Sci U S A 99: 15687-15692. 18. Inoue-Nagata AK, Albuquerque LC, Rocha WB, Nagata T (2004) A simple method for cloning the complete begomovirus genome using the bacteriophage phi 29 DNA polymerase. J Virol Methods 116: 209-211. 19. Haible D, Kober S, Jeske H (2006) Rolling circle amplification revolutionizes diagnosis and genomics of geminiviruses. J Virol Methods 135: 9-16. 20. Delwart EL (2007) Viral metagenomics. Rev Med Virol 17: 115-131. 21. Edwards RA, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3: 504-510. 22. Breitbart M, Salamon P, Andresen B, Mahaffy JM, Segall AM, et al. (2002) Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci U S A 99: 14250-14255. 23. Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, et al. (2008) Functional metagenomic profiling of nine biomes. Nature 452: 629-632. 24. Muthukumar V, Melcher U, Pierce M, Wiley GB, Roe BA, et al. (2009) Non- cultivated plants of the Tallgrass Prairie Preserve of northeastern Oklahoma frequently contain virus-like sequences in particulate fractions. Virus Res 141: 169-173. 25. Ng TFF, Manire C, Borrowman K, Langer T, Ehrhart L, et al. (2009) Discovery of a novel single-stranded DNA virus from a sea turtle fibropapilloma using viral metagenomics. J Virol 83: 2500-2509. 26. Fauquet CM, Briddon RW, Brown JK, Moriones E, Stanley J, et al. (2008) Geminivirus strain demarcation and nomenclature. Arch Virol 153: 783- 821. 27. Hernandez-Zepeda C, Idris AM, Carnevali G, Brown JK, Moreno-Valenzuela OA (2007) Preliminary identification and coat protein gene phylogenetic relationships of begomoviruses associated with native flora and cultivated plants from the Yucatan Peninsula of Mexico. Virus Genes 35: 825-833. 28. Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA, editors (2005) Virus taxonomy: classification and nomenclature of viruses. San Diego: Elsevier Academic Press. 29. Idris AM, Guerrero JC, Brown JK (2007) Two distinct isolates of Tomato yellow leaf curl virus threaten tomato production in Arizona and Sonora, Mexico. Plant Dis 91: 910-910. 30. Dry IB, Krake LR, Rigden JE, Rezaian MA (1997) A novel subviral agent associated with a geminivirus: The first report of a DNA satellite. Proc Natl Acad Sci U S A 94: 7088-7093. 31. Li D, Behjatnia SAA, Dry IB, Randles JW, Eini O, et al. (2007) Genomic regions of tomato leaf curl virus DNA satellite required for replication and for satellite-mediated delivery of heterologous DNAs. J Gen Virol 88: 2073-2077. 109

32. Márquez L, Redman R, Rodriguez R, Roossinck M (2007) A virus in a fungus in a plant: three-way symbiosis required for thermal tolerance. Science 315: 513-515. 33. Castillo-Urquiza G, Beserra J, Bruckner F, Lima A, Varsani A, et al. (2008) Six novel begomoviruses infecting tomato and associated weeds in Southeastern Brazil. Arch Virol 153: 1985-1989. 34. Jovel J, Reski G, Rothenstein D, Ringel M, Frischmuth T, et al. (2004) Sida micrantha mosaic is associated with a complex infection of begomoviruses different from Abutilon mosaic virus. Arch Virol 149: 829-841. 35. McGovern RJ, Polston JE, Danyluk GM, Hiebert E, Abouzid AM, et al. (1994) Identification of a natural weed host of Tomato mottle geminivirus in Florida. Plant Dis 78: 1102-1106. 36. Polston JE, Almusa A, Perring TM, Dodds JA (1990) Assocation of the nucleic acid of Squash Leaf Curl Geminivirus with the whitefly Bemisia tabaci. Phytopathology 80: 850-856. 37. Caciagli P, Bosco D (1997) Quantitation over time of tomato yellow leaf curl geminivirus DNA in its whitefly vector. Phytopathology 87: 610-613. 38. Adams IP, Glover RH, Monger WA, Mumford R, Jackeviciene E, et al. (2009) Next-generation sequencing and metagenomic analysis: a universal diagnostic tool in plant virology. Mol Plant Pathol 10: 537-545. 39. Roossinck MJ, Saha P, Wiley GB, Quan J, White JD, et al. (2010) Ecogenomics: using massively parallel pyrosequencing to understand virus ecology. Mol Ecol 19: 81-88. 40. Breitbart M, Rohwer F (2005) Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing. BioTechniques 39: 729-736. 41. Polston JE, Anderson PK (1997) The emergence of whitefly-transmitted geminiviruses in tomato in the Western hemisphere. Plant Dis 81: 1358- 1369. 42. Mansoor S, Briddon RW, Zafar Y, Stanley J (2003) Geminivirus disease complexes: an emerging threat. Trends Plant Sci 8: 128-134. 43. Legg JP, Fauquet CM (2004) Cassava mosaic geminiviruses in Africa. Plant Mol Biol 56: 585-599. 44. Briddon RW, Stanley J (2006) Subviral agents associated with plant single- stranded DNA viruses. Virology 344: 198-210. 45. Ribeiro S, Martin D, Lacorte C, Simoes I, Orlandini D, et al. (2007) Molecular and biological characterization of Tomato chlorotic mottle virus suggests that recombination underlies the evolution and diversity of Brazilian tomato begomoviruses. Phytopathology 97: 702-711. 46. Polston JE, McGovern RJ, Brown LG (1999) Introduction of Tomato yellow leaf curl virus in Florida and implications for the spread of this and other geminiviruses of tomato. St. Paul, MN: American Phytopathological Society. 47. Abouzid AM, Polston JE, Hiebert E (1992) The nucleotide sequence of Tomato mottle virus, a new geminivirus isolated from tomatoes in Florida. J Gen Virol 73: 3225-3229. 110

48. Akad F, Webb S, Nyoike TW, Liburd OE, Turechek W, et al. (2008) Detection of Cucurbit leaf crumple virus in Florida cucurbits. Plant Dis 92: 648-648. 49. Polston JE, McGovern RJ, Brown LG (1999) Introduction of tomato yellow leaf curl virus in Florida and implications for the spread of this and other geminiviruses of tomato. Plant Dis 83: 984-988. 50. Paprotka T, Metzler V, Jeske H (2010) The first DNA 1-like [alpha] satellites in association with New World begomoviruses in natural infections. Virology 404: 148-157. 51. Natwick E, Tosacano N, Yates L (1995) Correlations of adult Bemisia sampling techniques in cotton to whole plant samples. In: Gerling D, Mayer R, editors. Bemisia 1995: Taxonomy, Damage Control and Management. Andover, UK: Intercept Ltd. pp. 247-252. 52. Ng TFF, Suedmeyer WK, Gulland F, Wheeler E, Breitbart M (2009) Novel anellovirus discovered from a mortality event of captive California sea lions. J Gen Virol 90: 1256 -1261. 53. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410. 54. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402. 55. Hall TA (1999) BioEdit:a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41: 95-98. 56. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797. 57. Swofford DL (2003) PAUP*: phylogenetic analysis using parsimony (* and other methods). Sunderland, MA: Sinauer Associates. 58. Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817-818. 59. Adkins S, Polston JE, Turechek WW (2009) Cucurbit leaf crumple virus Identified in Common Bean in Florida. Plant Dis 93. 60. Idris AM, Hiebert E, Bird J, Brown JK (2003) Two newly described begomoviruses of Macroptilium lathyroides and common bean. Phytopathology 93: 774-783.

111

Figure Legends

Figure 1. Coverage of known virus genomes obtained from the whitefly viromes.

The reference genome from Genbank is shown in the top row of each panel, and the consensus represents the regions of the reference genome with coverage in the viromes. A & B) Near-complete coverage of Cucurbit leaf crumple virus DNA-

A and DNA-B obtained from the Citra metagenomic sequences. C) Partial coverage of Sida golden mosaic virus DNA-A obtained from Homestead metagenomic sequences. D) Complete coverage of the Tomato yellow leaf curl virus genome obtained from the Homestead metagenomic sequences.

112

Figure 2. Genomic and phylogenetic analysis of the virus genomes discovered by

VEM. The tree is midpoint rooted for clarity, all nodes with more than 75% bootstrap support are labeled, and new sequences are indicated by arrows. A)

Maximum likelihood phylogeny of the entire DNA-A of the novel Chenopodium leaf curl virus (ChLCV), Chenopodium leaf curl virus [VEM] (ChLCV [VEM]), and

Whitefly VEM 1 Begomovirus (WfVEM1Bv) with related begomovirus genome sequences. B) Maximum likelihood phylogeny of the entire DNA-B of the Sida golden mosaic virus - whitefly VEM Citra Florida (SiGMV – VEM [US:FLO:Citra]) and Whitefly VEM 2 Begomovirus (WfVEM2Bv). Begomovirus names are abbreviated according to convention. C & D) Genome characteristics and organization of the DNA-A, DNA-B and satellite DNA identified in the Citra and

Homestead samples. The ORFs encode for: CP/AV1, coat protein; Rep/AC1, replication initiator protein; TrAP/AC2, transcription activator and/or silencing suppressor; REn/AC3, replication enhancer; MP/AV2, movement protein; AC4,

AC4 protein; AC5, AC5 protein; MP/BC1, movement protein; NSP/BV1, nuclear shuttle protein. The positions of sequence fragments recovered in the

113 metagenome, the conserved stem loop and nonanucleotide sequence are indicated.

114

BLASTn Identities to Known Viruses >93% 88%-93% <88% Previously Characterized Component (DNA-A or DNA-B) A B A B A B in Florida (Reference) Closest Sequence in Genbank

Citra

Cucurbit leaf crumple virus 16 a 18 a 1 c 7 c 1 c Yes (59)

Sida golden mosaic virus 2 a + Yes *

Tobacco leaf rugose virus 1 c # No

Homestead

Anoda geminivirus 1 b No

Macroptilium golden mosaic virus 1 a 4 c 3 c ^ Yes (60)

Malvastrum yellow mosaic Helshire 3 b 7 c 1 c No virus Malvastrum yellow mosaic Jamaica 2 c 1 c @ No virus Sida golden mosaic virus 5 a 29 a 2 c 4 c 2 c 12 c Yes * Sida golden yellow vein 9 b 1 c No virus (monopartite) Tomato yellow leaf curl virus 50 a 10 c 1 c Yes (46) (monopartite) Wissadula golden mosaic virus 1 c 1 c No

Table 1. Summary of BLASTn results for the metagenomic sequences sharing nucleotide identities to the viruses in Genbank, showing the percent identity, and whether the reference viruses have been previously characterized in Florida.

Sequences range from 100 to 700 nt in length, representing partial viral genomes. The whitefly viromes contain a diverse range of viral sequences, including a sequences similar to viruses previously described in the area, b sequences similar to viruses not previously identified in the area, c Sequences that likely belong to new virus strains and virus species. * Jones,L.A. direct

GenBank submission. + Based on the full DNA-B sequence, this fragment belonged to a new strain of a novel Sida golden mosaic virus DNA-B (SiGMV –

VEM [US:FLO:Citra]). # Based on the full DNA-A sequence, this fragment belonged to a new virus, tentatively named Chenopodium leaf curl virus [VEM]

(ChLCV[VEM]). ^ Based on the full DNA-A sequence, one of the fragments belonged to a new virus, tentatively named Whitefly VEM 1 Begomovirus

115

(WfVEM1Bv). @ Based on the full DNA-B sequence, this fragment belonged to a novel DNA-B sequence, tentatively named Whitefly VEM 2 Begomovirus

(WfVEM2Bv).

116

117

118

119