Journal of General Virology (2015), 96, 2442–2452 DOI 10.1099/vir.0.000158

Discovery of novel sequences in an isolated and threatened species, the New Zealand lesser short-tailed bat (Mystacina tuberculata) Jing Wang,1 Nicole E. Moore,1 Zak L. Murray,1 Kate McInnes,2 Daniel J. White,3 Daniel M. Tompkins3 and Richard J. Hall1

Correspondence 1Institute of Environmental Science & Research (ESR), at the National Centre for Biosecurity & Richard J. Hall Infectious Disease, PO Box 40158, Upper Hutt 5140, New Zealand [email protected] 2Department of Conservation, 18–32 Manners Street, PO Box 6011, Wellington, New Zealand 3Landcare Research, Private Bag 1930, Dunedin, New Zealand

Bats harbour a diverse array of , including significant human pathogens. Extensive metagenomic studies of material from , in particular guano, have revealed a large number of novel or divergent viral taxa that were previously unknown. New Zealand has only two extant indigenous terrestrial mammals, which are both bats, Mystacina tuberculata (the lesser short- tailed bat) and Chalinolobus tuberculatus (the long-tailed bat). Until the human introduction of exotic mammals, these species had been isolated from all other terrestrial mammals for over 1 million years (potentially over 16 million years for M. tuberculata). Four bat guano samples were collected from M. tuberculata roosts on the isolated offshore island of Whenua hou (Codfish Island) in New Zealand. Metagenomic analysis revealed that this species still hosts a plethora of divergent viruses. Whilst the majority of viruses detected were likely to be of dietary origin, some putative vertebrate virus sequences were identified. Papillomavirus, polyomavirus, calicivirus and hepevirus were found in the metagenomic data and subsequently confirmed using independent PCR assays and sequencing. The new hepevirus and calicivirus sequences may represent new Received 30 November 2014 genera within these viral families. Our findings may provide an insight into the origins of viral Accepted 20 April 2015 families, given their detection in an isolated host species.

INTRODUCTION Rhinolophus) are thought to be the source of severe acute res- piratory syndrome coronavirus, which emerged in South Bats (order Chiroptera) are the second most diverse group of China via an intermediate host (caged civets) and then mammals with over 1200 species, accounting for more than caused significant mortality in humans (Ge et al., 2013; Lau 20 % of mammals (Simmons, 2005). They occur throughout et al., 2005; Li et al., 2005; Wang et al., 2006). Evidence simi- most of the world except for the two polar regions. In the last larly points to the Egyptian tomb bat (Taphozous perforatus) decade, it has become increasingly apparent that bats are as a potential source of Middle East respiratory syndrome important natural reservoirs for emerging and re-emerging coronavirus (Chan et al., 2013b; Memish et al., 2013; zoonotic viruses, due at least in part to roosting habitats, Milne-Price et al., 2014). Other examples of pathogenic the formation of large colonies, adaptive immune systems, human viruses that may have emerged from bats (O’Shea a long life span, and long-distance flying capability (Calisher et al., 2014) include Ebola virus (Leroy et al., 2005), Nipah et al., 2006; Luis et al., 2013). Bat-associated viruses are virus (Chua et al., 2000), Hendra virus (Murray et al., known to cause severe disease in humans, and rabies has 1995; Selvey et al., 1995) and Australian bat been recognized for centuries in this regard. Many zoonotic (Warrilow, 2005). Bats are often asymptomatic during infec- viruses that have emerged recently are thought to have their tion, but the disease can be severe for a new host species when origins in bats. For example, Chinese horseshoe bats (genus cross-species transmission occurs (Chan et al., 2013a; Smith & Wang, 2013). Surveys of the bat virome and monitoring of The GenBank/EMBL/DDBJ accession numbers for the bat virus virus dynamics in bat populations can inform our funda- sequences determined in this study are KM204378–KM204385. mental understanding on how viral pathogens emerge and One supplementary figure and two tables are available with the online evolve, enabling strategies for the prevention of future out- Supplementary Material. breaks or pandemics.

000158 G 2015 Institute of Environmental Science and Research Ltd. Printed in Great Britain 2442 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/). Viruses in an isolated New Zealand bat species

The global diversity of viruses found in bats is still largely unknown, and it is thought that a great number of Papillomaviruses are non-enveloped, small dsDNA viruses virus taxa are yet to be discovered (Anthony et al., 2013). that are capable of infecting all amniotes. In total, 105 Viruses in native New Zealand bats are poorly studied, sequence reads were identified as having similarity to papillo- perhaps in part due to the absence of any significant associ- maviruses within the DNA metagenome, which assembled ation with disease, either in the bats themselves or in other into 39 contigs matching E1, E2, E6, L1 and L2 genes (Fig. species. Three bat species represent the entire indigenous ter- 2a, b). The sequences of the two L1 contigs, 518 and 523 bp, restrial mammalian fauna of New Zealand, the long-tailed were confirmed by PCR and Sanger sequencing (GenBank bat (Chalinolobus tuberculatus), the lesser short-tailed bat accession nos. KM204378 and KM204379), and each was (Mystacina tuberculata), and the extinct greater short- found in a different single guano sample. These sequences tailed bat (Mystacina robusta) (King, 2005). M. tuberculata is thought to have lived in isolation for over 16 million were then used for phylogenetic analysis. Based on these years until the arrival of C. tuberculatus over 1 million data, the two novel papillomavirus sequences (here named years before the present (BP) (O’Donnell, 2001). Other ter- PV1 and PV2) grouped with deltapapillomaviruses (Fig. 2c). restrial mammals have been introduced within the last 800 The L1 gene of PV1 (KM204378) shared 54.2 % amino acid years by Polynesian explorers and Europeans (King, 2005). sequence identity to its closest relative, bovine papillomavirus Given the relative isolation of these bats, it may be possible (GenBank accession no. NC001522). The L1 gene of PV2 to make inferences on the origins of virus groups discovered (KM204379) showed 54.3 % similarity to its closest relative, within this species. For example, we have previously deer papillomavirus (GenBank accession no. NC001523). reported the discovery of an alphacoronavirus in The two New Zealand bat papillomaviruses had 86.1 % M. tuberculata guano from a remote offshore island of amino acid sequence identity for the partial L1 gene sequence. New Zealand (Hall et al., 2014a). Alphacoronaviruses exclu- sively infect mammals, so this virus either has been circulat- ing in native bats for more than 1 million years or has arrived via other mammals that were introduced later by humans. Polyomaviruses (PyVs) are small, circular dsDNA viruses. The former hypothesis would be congruent with a recent Their natural hosts are mammals and birds. In total, 493 study contending that coronaviruses have an ancient sequence reads were identified in the DNA metagenome as origin (Wertheim et al., 2013), whereas previous molecular having similarity to PyVs, which assembled into 29 contigs clocks estimate an origin for all coronaviruses of only covering VP1, VP2, VP3, and the small T antigen and large 10 000 years BP (Woo et al., 2012). T antigen genes (Fig. 3a, b). The presence of one of the VP1 contigs, 609 bp (GenBank accession no. KM204380), This study provides an analysis of metagenomic data in was confirmed in two out of the four samples using PCR M. tuberculata from two bat guano samples collected from and Sanger sequencing for a 187 bp subfragment of the Codfish Island (Whenua hou) in New Zealand, revealing contig. Phylogenetic analysis (Fig. 3c) showed that this the presence of a large number of , plant and vertebrate novel bat PyV sequence grouped with a PyV found pre- viruses. Our aim was to determine whether this bat species viously in South American bats (Fagrouch et al., 2012). was a host to potential bat pathogens, or zoonotic patho- The VP1 gene (GenBank accession no. KM204380) shared gens, that are relevant to human and wildlife health. 76.8 % identity at the amino acid sequence level with the clo- sest relative, bat polyomavirus 4 (GenBank accession no. JQ958887). By mapping these putative PyV sequences to RESULTS the reference genome of GenBank accession no. JQ958886, For the two guano samples analysed by metagenomics, the consensus contig was shown to cover 92 % of the 25 314 920 and 23 100 574 sequence reads were generated whole genome and included some minor gaps (Fig. 3a). in total for the DNA and RNA metagenomes, respectively. In the DNA metagenome, 19.55, 0.28 and 6.92 % of reads were assigned to bacteria, eukaryotes and viruses, respect- ively (Fig. 1a), with 79.65 % of virus reads assigned to bac- A total of 68 sequence reads were identified in the RNA meta- teriophages (Fig. 1b). In the RNA metagenome, 44.99, 5.44 genome as having similarity to caliciviruses, which were then and 0.38 % of reads were assigned to bacteria, eukaryotes assembled into three contigs, two matching the pro- and viruses, respectively (Fig. 1c), with 82.63 % of virus tein and one matching the helicase protein (Fig. 4a). Two reads assigned to the family (Fig. 1d), a virus capsid protein contigs of 687 and 359 bp in length (GenBank group known to infect and thus consistent with accession nos KM204381 and KM204382, respectively) and a the insectivorous diet of M. tuberculata. Less than 2 % of contig containing a helicase gene with a length of 498 bp all virus sequence reads showed similarity to vertebrate (GenBank accession no. KM204383) were identified. The viruses, with significant RAPSearch2 matches to conserved presence of the larger contig (KM204381) could only be con- gene regions used for taxonomic assignment (i.e. capsid firmed in one of the four guano samples, achieved using and/or helicase genes) of papillomavirus, , Sanger sequencing and PCR for a 302 bp subfragment of polyomavirus, calicivirus and hepevirus. the metagenomic contig. Phylogenetic analysis of the larger http://vir.sgmjournals.org 2443 J. Wang and others

(a) (c) Bacteria 2900812 Archaea Bacteria 12.56 % 4948954 1256844 19.55 % Archaea 5.44 % 2712 Eukaryote 0.01 % 1443 Eukaryote 0.01 % No hits 70029 Viral reads 10392553 0.28 % 86713 44.99 % 0.38 % Viral reads 1752078 6.92 % Others 2900061 Others 12.55 % No hits 156716 16781361 0.62 % 66.29 % Not assigned 1603070 6.33 % Not assigned 5562148 24.08 % DNA metagenome RNA metagenome

(b) DNA viruses (d) RNA viruses

1 600 000 80 000 1 395 590 71 649 1 400 000 70 000

1 200 000 60 000

1 000 000 50 000

800 000 40 000

600 000 30 000 Number of reads

Number of reads 400 000 20 000 10 000 200 000 57 001 112 522 19 742 20 204 20 753 21 909 23 342 28 895 880 904 1 014 1 615 2 166 2 320 2 536 0 0 e dae idae viridae viridae virida rovir Ifla phoviridae ic enophage Leviviridae Myoviri d Partiti PodoviridaeParvoviridae Circo Si M a B

Unclassified viruses ssified ssDNA viruses Virus taxa a Uncl Virus taxa

Fig. 1. Metagenomic data summary from RAPSearch2 results. Overall summary of the taxonomic assignment for individual sequence reads from the metagenomic data of two New Zealand short-tailed bat guano samples for the combined DNA metagenome (a), DNA viruses (b), combined RNA metagenome (c) and RNA viruses (d). capsid contig suggested that this virus sequence may be from Sanger sequencing and PCR for a 234 bp subfragment of a divergent calicivirus (Fig. 4b). This contig shared 30 % the metagenomic contig (KM204383). amino acid sequence identity with the closest calicivirus rela- tive , GII.1/Hu/Hawaii/71/US (GenBank accession no. U07611). In total, 156 reads from the RNA metagenome were assembled Phylogenetic analysis of the helicase protein showed it to be into eight contigs that showed similarity to the helicase gene of so distinct from any other calicivirus that it always formed an outgroup, even with the inclusion of a distantly related hepeviruses. Phylogenetic analysis of the longest contig of picornavirus helicase (data not shown). It is clear that a 673 bp (GenBank accession no. KM204384) determined viral helicase protein (or prophage remnant) is encoded that the closest relative was the cut-throat trout virus (Fig. in this sequence, as it belongs to the helicase superfamily 5), a proposed new genus within the family Hepeviridae 3 (SF-3). The protein appears to contain a Walker A (Batts et al., 2011). The New Zealand hepevirus amino acid (AIILTGPPGCGKTT) domain (shown in bold type), sequence shared 30.3 % identity with its closest relative, the Walker B domain (IVVWDD) and motif 3 (FIIICSNF) cut-throat trout virus (GenBank accession no. NC_015521). that are characteristic of viral helicases (Hickman & A 211 bp subfragment of this contig was confirmed in the Dyda, 2005). A subfragment of the metagenomic contig RNA of all four original guano samples by using PCR and was confirmed in two of the four guano samples using Sanger sequencing (GenBank accession no. KM204385).

2444 Journal of General Virology 96 Viruses in an isolated New Zealand bat species

L2, 9 % (a) (b) E1, 44 % E6 E7 1 L1

DeltaPV ~8 000 bp E1 6 000 2 000 L1, 32 % L3 E3 L2 E4 4 000 E9 E2 E5 E6, 6 % E2, 9 %

(c) 69 Cougar PV (AY904723) 63 Bobcat PV (AY904722) 100 Cat PV (NC_004765) 0.2 70 Panthera leo PV (AY904724) 54 Canine oral PV (NC_001619) 100 Spotted hyena PV(NC_018575) Racoon PV(NC_007150) 59 99 HPV63 (NC_001458) HPV1 (NC_001356)

73 94 Rabbit oral PV (AF227240) 100 Sylvilagi PV (AJ404003) 100 Cottontail rabbit PV Hershey strain (JF303889) North American porcupine PV (NC_006951) 78 98 HPV41 (NC_001354)

100 Putative Mystacina tuberculata PV2 (KM204379) Putative Mystacina tuberculata PV1 (KM204378) 99 Deer PV (NC_001523) 76 European elk PV (NC_001524) 100 Yak PV (JX174437) 100 Bovine PV1 (NC_001522) HPV 106 (DQ080082) HPV 10 (NC001576) 100 Rhesus monkey PV (NC001678) HPV 83 (AF151983) Egyptian fruit bat (NC008298)

98 Crab-eating macaque PV (NC_015691) 76 50 HPV 120 (GQ845442) European hedgehog PV (FJ379293) 68 Bottlenose dolphin PV (EU240894) Ferret PV (NC_022253) 99 Canine PV2 (NC_006564)

Fig. 2. Identification of putative papillomavirus sequences. (a) Schematic representation of the location of contigs matching the papillomavirus genome, shown as the outer dark black line. Genome positions (bp) are shown inside. (b) Proportions of gene segments in the metagenomic data matching papillomavirus sequences. (c) Phylogenetic tree showing the relationship of the New Zealand bat papillomavirus L1 directly from the metagenomic data with that of other papillomaviruses. The New Zealand bat papillomavirus sequences were 172 aa (GenBank accession no. KM204379) and 173 aa (GenBank accession no. KM204378) respectively. The tree was compiled using the maximum-likelihood method and Lee and Gascuel model with 1000 bootstrap replicates. Bootstrap values ,50 % are not shown. Bar, number of amino acid substitutions per site. Aster- isks (*) denote unclassified caliciviruses which have been suggested as new genera. http://vir.sgmjournals.org 2445 J. Wang and others

(a) (b) VP2, 17 % Large T 1

VP2 Small T VP4 VP3 Polyomavirus ~5 200 bp

VP1, 24 % Large T 2 500 VP1

Large T, 52 % Small T, 7 %

Mastomys rodent PyV (AB588640) (c) 77 0.5 92 Murine pneumotropic PyV (M55904) Mouse-eared bat PyV (FJ188392) 70 100 Mouse-eared bat PyVlsolate2 (NC_011310) 51 Squirrel monkey PyV (AM748741) Bat PyV2 (JQ958892) 55 Bovine PyV (D13942) 77 California PyV1 (GQ331138) Orangutan PyV (FN356900) Simian agent 12 PyV (AY614708) 100 African green monkey PyV (NC_004763) Human Py9 (HQ696595) 99 Bat PyV3 (JQ958886) 84 Putative Mystacina tuberculata PyV (KM204380) 81 99 Bat PyV4 (JQ958887) Gorilla PyV (HQ385752) 93 Merkel cell PyV (EU375803) Finch PyV (DQ192571) Crow PyV (DQ192570) 95 Goose haemorrhagic PyV (HQ681905) Canary PyV (GU345044) KI PyV (EF127906) 100 Human PyV6 (NC_014406) 96 Human PyV7 (NC_014407)

Fig. 3. Identification of a putative PyV sequence. (a) Schematic representation of the location of contigs matching the PyV genome, shown as the outer dark black line. Genome positions (bp) are shown inside. (b) Proportions of gene segments in the metagenomic data matching PyV. (c) Phylogenetic tree showing the relationship of the New Zealand bat polyomavirus partial VP1 translated sequence of 203 aa (GenBank accession no. KM204380) directly from the metagenomic data with that of other polyomaviruses. The tree was compiled using the maximum-likelihood method and the Lee and Gascuel model with 1000 bootstrap replicates. Bootstrap values ,50 % are not shown. Bar, number of amino acid substitutions per site.

Other viruses Of the 139 sequences assigned to poxvirus, the majority of RAPSearch2 matches were also obtained for poxvirus, par- the contigs and singleton reads were most closely related vovirus, adenovirus and picornavirus (Table S1, available to molluscum contagiosum virus (MCV), a pathogen in in the online Supplementary Material). No conserved humans. The MCV-like contigs and singleton reads genetic elements were identified for adenovirus or poxvirus. showed between 55.6 and 75.4 % identity with another bat

2446 Journal of General Virology 96 Viruses in an isolated New Zealand bat species

(a) 5′ UTR 5 200 6 700 7 400 Helicase VPg Pro POL Capsid protein VP2 687 bp 498 bp 359 bp

(b)

93 Gll. 1/Hu/Hawaii/71/US (U07611) 0.5 Hu/Toronto/91/CA (U02030) 96 Gll.11/Sw/Sw43/97/JP (AB126320) Gll/Hu/DSK/2013/ KF199166) 99 GVI.3//Viseu-C33/07/PT (GQ443611) GVI.1/Hu/Alphatron/98/NL (AF195847) GVI.2/Dog/Bari-170/07/IT (EU224456) 99 88 100 GVI.2/Fe/10/US (JF81268) Norovirus 97 GVI.2/Lion/Pistoia-387/06/IT (EF450827) 100 Glll.1/Bo/Jena/98/DE (AJ011099) Glll.2/Bo/Newberry2/76/UK (AF097917) 99 97 Gl/Hu/DSV/90/SA (U04469) 99 Gl/Hu/Norwalk/93/US (M87661) 67 Gl.2/Hu/Southampton/91/UK (L07418) 89 80 Gl.6/Hu/Hesse/97/DE (AF093797) GV/Murine1/03/US (AY228235) Tulane virus (EU391643) Recovirus* 94 Bangladesh/Recovirus (AFM93995) 100 calicivirus strain AB 104 (FJ355930) Valovirus* 100 Pig calicivirus strain F1510 (FJ355929) Putative Mystacina tuberculata calicivirus 1 (KM204381) Newberry-1 (DQ228160) Nebovirus 99 RHDV-FRG (M67473) 100 RHDV-lowa (AF258618) European brown hare syndrome virus (Z69620) 100 Primate calicivirus 1 (AF091736) 100 Vesicular exanthema of swine-A48 (U76874) San Miguel sea lion virus-1 (AF181081) 66 100 99 Walrus calicivirus (AF321298) FCV-F4 (D31836) 100 FCV-F9 (M86739) 53 Chicken calicivirus (HQ010042) Nacovirus* Turkey calicivirus (JQ347522) Bat (NC_017936) 100 HuCV/Lond/29845/92/UK (U95645) 99 HuCV/Mexico/340/90/MX (AF435812) HuCV/Houston/7-1181/US (AF435814) 100 Sapovirus HuCV/Parklille/94 (U73124) 95 HuCV/Stockholm/317/97/SE (AF194182) 100 HuCV/Manchester/93/UK (X86560) 100 HuCV/Sapporo/82/JP (U65427)

Fig. 4. Identification of putative calicivirus sequences. (a) Schematic representation of the location of contigs matching the calicivirus genome, shown as the lower black bars. (b) Phylogenetic tree showing the relationship of the New Zealand bat calicivirus translated capsid sequence (GenBank accession no. KM204381) directly from the metagenomic data with that of other caliciviruses. The tree was compiled using the maximum-likelihood method and the Lee and Gascuel model with 1000 bootstrap replicates, including the New Zealand bat calicivirus partial capsid sequence of 228 aa (GenBank accession no. KM204381). Bootstrap values ,50 % are not shown. Bar, number of amino acid substitutions per site. Asterisks (*) denote unclassified caliciviruses which have been suggested as new genera. http://vir.sgmjournals.org 2447 J. Wang and others

MCV-like sequence (GenBank accession no. NC_001731) at (Baker et al., 2013). The mean pairwise identity of all the protein level. MCV-like amino acid sequences from M. tuberculata com- pared with E. helvum was 65.5 %, suggesting that the New Members of the family are known to infect a Zealand bats may host a different species of an MCV-like diverse range of hosts, including insects, and can be present virus. Full-genome sequencing, or at least sequencing of as endogenous elements in the host genome. For this the conserved major capsid protein, will be necessary reason, parvoviruses were not subject to further investi- before any provisional statement on species assignment of gation (Liu et al., 2011). the MCV-like virus from M. tuberculata can be made. Six contigs matching members of the family Picornaviridae were identified, but only one contig contained a viable Despite the large number of virus taxa indicated in the meta- ORF. This contig only encoded 116 aa (349 bp), matching genomic data in the present study, a conservative approach the 2C gene. It was therefore excluded from further investi- was taken where: (i) only conserved genetic elements were gation given that it encoded fewer than 150 aa, the threshold considered, such as the capsid protein or helicase, and (ii) set as a requirement for phylogenetic analysis and PCR con- independent confirmation of the metagenomic data was firmation (see Methods). required by specific PCR assay and Sanger sequencing of the amplicon. Given the proliferation of virus metagenomic studies, and high-profile instances of erroneous reporting of DISCUSSION novel viruses (Naccache et al., 2014), the use of conserved genes with an independent PCR assay and Sanger sequencing This study provides the first report of the virus diversity in as well as appropriate controls should be considered mini- the New Zealand lesser short-tailed bat, M. tuberculata. mum requirements before reporting the discovery of new Despite extensive biogeographical isolation from other ter- virus sequences (Rosseel et al., 2014). The most distinctive restrial mammals for more than 16 million years (Hand virus sequences found in this study were similar to those et al., 2013), the viral sequences observed in the metage- of caliciviruses and hepeviruses. This study is, to the best nomic data were generally consistent with those observed of our knowledge, only the second report of caliciviruses in other metagenomic studies of bat guano. Sequences in bats (the first being the detection of a novel sapovirus; with similarity to known vertebrate viruses were identified, Tse et al., 2012a). including poxvirus, parvovirus, papillomavirus, calicivirus, hepevirus, polyomavirus and picornavirus (and also coro- A major review of the family Caliciviridae removed the hepa- navirus from a previous study; Hall et al., 2014a). It is poss- titis E viruses from this family, placing them as the sole genus ible that these viral sequences could have arisen as a within the family Hepeviridae (Berke et al., 1997; Berke & consequence of dietary origin or subsequent to the depo- Matson, 2000; Green et al., 2000). The discovery of a hepa- sition of guano by the bats, and this must be taken into titis E virus in cut-throat trout (Batts et al., 2011) has consideration when interpreting these results. In addition, raised the possibility of an aquatic lineage of hepatitis E the polyomavirus and papillomavirus sequences were viruses, with the possible suggestion that this is a new found only in the DNA virus metagenomes, so there is genus (Batts et al., 2011; Smith et al., 2013). The putative the possibility that these sequences were detected as hepevirus reported for M. tuberculata formed an alternate endogenous/integrated elements from the host genome. clade alongside the cut-throat trout hepevirus, separate from all other hepeviruses (Fig. 5b). The current classifi- Metagenomic studies in bats from other countries have cation for hepeviruses includes separate clades for rodent, identified adenovirus (Baker et al., 2013; Drexler et al., bat, human and avian viruses, and it is postulated that co- 2011; He et al., 2013; Li et al., 2010a, b; Sonntag et al., divergence with the host has led to this (Drexler et al., 2009; Wu et al., 2012), papillomavirus (Ge et al., 2012; 2012). Therefore, the divergent hepevirus in M. tuberculata Tse et al., 2012b; Wu et al., 2012), parvovirus (Ge et al., could be accounted for by co-speciation of all other hepe- 2012; Li et al., 2010a) and a calicivirus (sapovirus, reported viruses subsequent to the isolation of M. tuberculata on the by Tse et al., 2012a). It is difficult to make specific or direct Zealandia subcontinent over 16 million years ago. comparisons with the present study due to significant methodological variation among the studies including: M. tuberculata is listed as a vulnerable species on the IUCN sample type (i.e. fresh guano, urine, roost guano), virus Red List (http://www.iucnredlist.org/) and has an important enrichment procedures (i.e. nuclease-treatment, centrifu- status for the indigenous Ma¯ori population of New Zealand. gation, filtration), and amplification methods and/or It is not possible to examine any of the viruses within an exper- high-throughput sequencing platforms (Daly et al., 2011; imental colony. For any of the viruses identified, culture could Hall et al., 2014b). Perhaps the most comparable study is be attempted either in specific bat cell lines or in other per- that of Baker et al. (2013), who used a metagenomic missive cell lines such as Vero. Furthermore, complete approach to examine virus diversity in African straw- genome sequencing will allow a greater degree of certainty coloured fruit bats (Eidolon helvum) (Baker et al., 2013). in taxonomic assignment and will provide additional insights Similar to the present study, E. helvum samples (lung into the evolutionary history of viruses in M. tuberculata. tissue, urine and throat swabs) contained novel adenovirus, Given the more recent arrival of the related native bat species polyomavirus, papillomavirus and MCV-like sequences C. tuberculatus (the long-tailed bat) approximately 1 million

2448 Journal of General Virology 96 Viruses in an isolated New Zealand bat species

(a) 5'UTR 3 000 3 770 5 300 7 120 Polyprotein Helicase RDRP Capsid protein ORF3 673 bp

HEV-3 (b)

HEV/Trout/USA/1988/Onc cla

Putative HEV/bat/NZ/2013/Mys

P

(NC_015521) Rabbit hepeviruses Rodent (KM204384) hepeviruses 2001/SusS

pan/

HEV/Rat/Germany/2 (AB089824) ina/2009

(AB443624)

Gt3/Ja

(GU3 U937805) HEV/Gt3/Japan/1993/HomSP G HEV/ ( 45043)009/Rat nor 0.5 EV/Rabbit/Ch HEV-2 H

73 66 HEV/ (FJ906895) m sp Rat/Germany/2009/Rat nor HEV/Rabbit/China/2009 Ho 91 /1992/ (GU345042) 99 61 HEV/Gt2/Mexico(M74506) 99 96 HEV/Ferret/Netherlands/2010 58 HEV/Gt1/India/2000/Hom sp (JN998607) 99 (DQ459342) 99 99 HEV/G t1/China/1987/Hom sp (D11092) HEV/JBOR135-Shiz09/Ja erret/Netherlands/2010 99 52 HEV/F (JN998606) HEV/wbJOY 0 HEV-1 90 HEV/Gt4/Ja 99 (AB573435)

100 HEV/Gt4/Japan/2009/Sus sp pan/2009/SusSP (AB602441)

(AB097811) 6/2006/suc scr (JQ001749)

(G p a n/2002/Sus sp HEV/Bat/Germany/2009/Ept ser U Bat 119960) hepevirus

(EF206691)

(AY535004) Unassigned HEV/Avian Gt2/USA/2004/Gal gal wild boar hepeviruses

HEV/Avian Gt2/USA/2001/Gal gal HEV-4 Avian hepeviruses

Fig. 5. Identification of a putative hepevirus sequence. (a) Schematic representation of the location of contigs matching the hepevirus genome, shown as the lower black bar. (b) Phylogenetic tree showing the relationship of other hepeviruses with that of the New Zealand bat hepevirus helicase sequence of 224 aa (GenBank accession no. KM204384) obtained directly from the metagenomic data. The tree was compiled using the maximum-likelihood method and the JTT model with 1000 boot- strap replicates. Bootstrap values ,50 % are not shown. Bar, number of amino acid substitutions per site.

http://vir.sgmjournals.org 2449 J. Wang and others years ago, it will be interesting to explore the viral metagen- Bioinformatics workflow. FastQC (Andrews, 2010) was used to ome within this species to search for evidence of these viruses. check sequence data quality, with reads trimmed using the FASTX- An expansion in the survey of viruses in M. tuberculata is also Toolkit with a base pair quality score threshold of 28 (http://hannonlab. cshl.edu/fastx_toolkit/index.html). Trimmed reads were then com- warranted, as the present study relied upon just four guano pared with the NCBI non-redundant protein sequence database samples collected from one island at one time point. Indeed, (downloaded in April 2013) using RAPSsearch2 (Ye et al., 2011; Zhao given that a broad survey of viruses in both rodent and chir- et al., 2012), an alternative to BLASTX (Altschul et al., 1990). Taxonomic opteran species in New Zealand has never been conducted assignment of the RAPSearch2 output was performed using MEGAN 5.2 (there are three introduced rat species in New Zealand: (Huson et al., 2007; Huson & Mitra, 2012) with a threshold for w w Rattus rattus, Rattus exulans and Rattus norvegicus), and that assignment of bit scores 50 and read complexity 0.44. Paired-end read data of interest was then extracted and assembled into contigs in similar surveys in other countries have found an abundance Geneious 6.1.7 (Biomatters; http://www.geneious.com). of virus diversity (Li et al., 2010a; Phan et al., 2011; Tse et al., 2012a), there is clearly a need to look more closely. Confirmatory PCR. For conserved genes of interest from the meta- This recommendation holds for any similarly remote or iso- genomic data that would allow further taxonomic assignment and lated area of the world. Whilst it is clear that emerging ‘hot- phylogenetic analysis (i.e. vertebrate virus capsid, helicase or RNA- spots’ of infectious disease require comprehensive surveys dependent RNA-polymerase gene regions), specific primers were designed using the Primer3 plug-in for Geneious 6.1.7. A 25 ml reaction for virus diversity (Anthony et al., 2013; Jones et al., 2008), volume of Invitrogen Platinum SuperMix (Life Technologies) was used there may be additional value in examining low-diversity, as per the manufacturer’s instructions, including 5 ml of DNA or cDNA. remote ecosystems, as virus data could inform both the origins Primer sequences are detailed in Table S2. PCR products were visual- of virus taxa and provide a simplified model for the study of ized by agarose gel electrophoresis, followed by purification using evolutionary processes such as co-divergence and cross- USB ExoSAP-IT (Affymetrix) and Sanger sequencing on a capillary species transmission. sequencer (model 3100 Avant; Applied Biosystems). Phylogenetic analysis. The nucleotide sequences produced by the confirmatory PCR assays were translated into protein sequences and METHODS compared with reference virus genomes downloaded from GenBank. Only sequences with an ORF of w150 aa were considered for Sample collection. The collection protocols have been described phylogenetic analysis. Sequences were aligned in MEGA6 (6.06) using previously (Hall et al., 2014a). For this study, four bat guano samples CLUSTALW with a gap opening penalty of 10 and gap extension penalty were collected from each of four separate bat roosts on Codfish Island of 0.2 (Tamura et al., 2011). Maximum-likelihood phylogenetic trees u 9 u 9 (Whenua hou) (Fig. S1, 46 47 S 167 38 E). Samples were kept at 4 were reconstructed in MEGA6 using all available amino acid sites with u C and nucleic acids were extracted within 48 h. M. tuberculata is 1000 bootstrap replicates and the nearest-neighbour-interchange found only in New Zealand and is listed as ‘nationally endangered’. method. Relatively small populations of this bat are present only in remote areas of unmodified old forest. The bats are highly secretive and difficult to locate, as they frequently change roost sites and also roost in trees upwards of 15 m above the forest floor. Codfish Island is an ACKNOWLEDGEMENTS active conservation study site where movements of this bat species are well known, and thus allowed the collection of fresh guano. Access to We would like to thank Malcolm Rutherford and Peter McClelland Codfish Island is restricted to scientists and conservation staff. from the New Zealand Department of Conservation for the collection of the guano samples. We would also like to acknowledge the high- Sample preparation and metagenomic sequencing. Bat guano throughput sequencing provided by New Zealand Genomics Ltd was resuspended in 2 ml of sterile PBS, followed by centrifugation at (NZGL), and especially to Lorraine Berry and Patrick Biggs from the 6000 g for 5 min. Nucleic acid was extracted from 400 ml of super- Massey Genome Service. High-performance computing was provided natant using an iPrep Purelink Virus kit (Life Technologies), with by the New Zealand e-Science Infrastructure (NeSI). Our thanks also elution into 50 ml of RT-PCR molecular-grade water (Ambion). DNA go to ESR IMSG, especially Steve Pyne, for providing computing and RNA were co-purified during extraction. infrastructure and support. This work was funded by the Crown Research Institute Core Fund from the New Zealand Ministry for Given resource constraints, only two of the guano samples were Business, Innovation and Employment. selected for metagenomic sequencing, and the remaining two samples were used only for confirmatory PCR assays. For the purpose of finding RNA viruses, DNA was removed using DNase treatment by REFERENCES Ambion DNA-free (Life Technologies), and 8 ml of DNA-free RNA was incorporated into first-strand cDNA synthesis primed by random Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. hexamers (Life Technologies), including an RNase H digestion. This (1990). Basic local alignment search tool. J Mol Biol 215, 403–410. step was not required for the complementary approach that sought to find DNA viruses. The cDNA and DNA from previous steps were then Andrews, S. (2010). FastQC: A quality control tool for high throughput amplified by multiple displacement amplification using a Whole sequence data. http://www.bioinformatics.babraham.ac.uk/projects/ Transcriptome Amplification kit (Qiagen) to ensure that w1 mgof fastqc/ DNA was produced. This is the minimum amount required for Anthony, S. J., Epstein, J. H., Murray, K. A., Navarrete-Macias, I., library preparation using an Illumina TruSeq DNA Library Prep- Zambrana-Torrelio, C. M., Solovyov, A., Ojeda-Flores, R., Arrigo, aration kit. DNA libraries were prepared and sequenced on an Illu- N. C., Islam, A. & other authors (2013). A strategy to estimate mina MiSeq2000 instrument producing 250 bp paired-end reads 4 (New Zealand Genomics Limited). Samples were indexed using bar- unknown viral diversity in mammals. MBio , e00598–e00e13. codes so that each metagenome could be linked specifically to the Baker, K. S., Leggett, R. M., Bexfield, N. H., Alston, M., Daly, G., Todd, original DNA and RNA. S., Tachedjian, M., Holmes, C. E. G., Crameri, S. & other authors

2450 Journal of General Virology 96 Viruses in an isolated New Zealand bat species

(2013). Metagenomic study of the viruses of African straw-coloured Hand, S. J., Worthy, T. H., Archer, M., Worthy, J. P., Tennyson, A. J. D. fruit bats: detection of a chiropteran poxvirus and isolation of a & Schofield, R. P. (2013). Miocene Mystacinids (Chiroptera, novel adenovirus. Virology 441, 95–106. Noctilionoidea) indicate a long history for endemic bats in New 33 Batts, W., Yun, S., Hedrick, R. & Winton, J. (2011). A novel member of Zealand. J Vertebr Paleontol , 1442–1448. the family Hepeviridae from cutthroat trout (Oncorhynchus clarkii). He, B., Li, Z., Yang, F., Zheng, J., Feng, Y., Guo, H., Li, Y., Wang, Y., Su, Virus Res 158, 116–123. N. & other authors (2013). Virome profiling of bats from Myanmar Berke, T. & Matson, D. O. (2000). Reclassification of the Caliciviridae by metagenomic analysis of tissue samples reveals more novel 8 into distinct genera and exclusion of hepatitis E virus from the family mammalian viruses. PLOS ONE , e61950. on the basis of comparative phylogenetic analysis. Arch Virol 145, Hickman, A. B. & Dyda, F. (2005). Binding and unwinding: SF3 viral 1421–1436. helicases. Curr Opin Struct Biol 15, 77–85. Berke, T., Golding, B., Jiang, X., Cubitt, D. W., Wolfaardt, M., Huson, D. H. & Mitra, S. (2012). Introduction to the analysis of Smith, A. W. & Matson, D. O. (1997). Phylogenetic analysis of the environmental sequences: metagenomics with megan. Methods Mol caliciviruses. J Med Virol 52, 419–424. Biol 856, 415–429. Calisher, C. H., Childs, J. E., Field, H. E., Holmes, K. V. & Schountz, T. Huson, D. H., Auch, A. F., Qi, J. & Schuster, S. C. (2007). MEGAN (2006). Bats: important reservoir hosts of emerging viruses. Clin analysis of metagenomic data. Genome Res 17, 377–386. Microbiol Rev 19, 531–545. Jones, K. E., Patel, N. G., Levy, M. A., Storeygard, A., Balk, D., Chan, J. F.-W., To, K. K.-W., Tse, H., Jin, D.-Y. & Yuen, K.-Y. (2013a). Gittleman, J. L. & Daszak, P. (2008). Global trends in emerging Interspecies transmission and emergence of novel viruses: lessons infectious diseases. Nature 451, 990–993. from bats and birds. Trends Microbiol 21, 544–555. King, C. M. (2005). The Handbook of New Zealand Mammals. Chan, J. F. W., Lau, S. K. P. & Woo, P. C. Y. (2013b). The emerging Melbourne: Oxford University Press. novel Middle East respiratory syndrome coronavirus: the ‘‘knowns’’ Lau, S. K. P., Woo, P. C. Y., Li, K. S. M., Huang, Y., Tsoi, H.-W., Wong, and ‘‘unknowns’’. J Formos Med Assoc 112, 372–381. B. H. L., Wong, S. S. Y., Leung, S.-Y., Chan, K.-H. & Yuen, K.-Y. Chua, K. B., Bellini, W. J., Rota, P. A., Harcourt, B. H., Tamin, A., Lam, (2005). Severe acute respiratory syndrome coronavirus-like virus in S. K., Ksiazek, T. G., Rollin, P. E., Zaki, S. R. & other authors (2000). Chinese horseshoe bats. Proc Natl Acad Sci U S A 102, 14040–14045. Nipah virus: a recently emergent deadly paramyxovirus. Science 288, Leroy, E. M., Kumulungui, B., Pourrut, X., Rouquet, P., Hassanin, A., 1432–1435. Yaba, P., De´ licat, A., Paweska, J. T., Gonzalez, J. -P. & Swanepoel, R. Daly, G. M., Bexfield, N., Heaney, J., Stubbs, S., Mayer, A. P., Palser, (2005). Fruit bats as reservoirs of Ebola virus. Nature 438, 575–576. A., Kellam, P., Drou, N., Caccamo, M. & other authors (2011). A viral Li, W., Shi, Z., Yu, M., Ren, W., Smith, C., Epstein, J. H., Wang, H., discovery methodology for clinical biopsy samples utilising massively Crameri, G., Hu, Z. & other authors (2005). Bats are natural parallel next generation sequencing. PLoS One 6, e28879. reservoirs of SARS-like coronaviruses. Science 310, 676–679. Drexler, J. F., Corman, V. M., Wegner, T., Tateno, A. F., Zerbinati, Li, L., Victoria, J. G., Wang, C., Jones, M., Fellers, G. M., Kunz, T. H. & R. M., Gloza-Rausch, F., Seebens, A., Mu¨ ller, M. A. & Drosten, C. Delwart, E. (2010a). Bat guano virome: predominance of dietary (2011). Amplification of emerging viruses in a bat colony. Emerg viruses from insects and plants plus novel mammalian viruses. Infect Dis 17, 449–456. J Virol 84, 6955–6965. Drexler, J. F., Seelen, A., Corman, V. M., Fumie Tateno, A., Cottontail, Li, Y., Ge, X., Zhang, H., Zhou, P., Zhu, Y., Zhang, Y., Yuan, J., Wang, V., Melim Zerbinati, R., Gloza-Rausch, F., Klose, S. M., Adu- L.-F. & Shi, Z. (2010b). Host range, prevalence, and genetic diversity Sarkodie, Y. & other authors (2012). Bats worldwide carry hepatitis of adenoviruses in bats. J Virol 84, 3889–3897. e virus-related viruses that form a putative novel genus within the Liu, H., Fu, Y., Xie, J., Cheng, J., Ghabrial, S. A., Li, G., Peng, Y., Yi, X. & family Hepeviridae. J Virol 86, 9134–9147. Jiang, D. (2011). Widespread endogenization of densoviruses and Fagrouch, Z., Sarwari, R., Lavergne, A., Delaval, M., de Thoisy, B., parvoviruses in animal and human genomes. J Virol 85, 9863–9876. Lacoste, V. & Verschoor, E. J. (2012). Novel polyomaviruses in Luis, A. D., Hayman, D. T. S., O’Shea, T. J., Cryan, P. M., Gilbert, A. T., South American bats and their relationship to other members of Pulliam, J. R. C., Mills, J. N., Timonin, M. E., Willis, C. K. R. & other the Polyomaviridae. J Gen Virol 93, 2652–2657. authors (2013). A comparison of bats and rodents as reservoirs of Ge, X., Li, Y., Yang, X., Zhang, H., Zhou, P., Zhang, Y. & Shi, Z. (2012). zoonotic viruses: are bats special? Proc Biol Sci 280, 20122753. Metagenomic analysis of viruses from bat fecal samples reveals many Memish, Z. A., Mishra, N., Olival, K. J., Fagbo, S. F., Kapoor, V., novel viruses in insectivorous bats in China. J Virol 86, 4620–4630. Epstein, J. H., Alhakeem, R., Durosinloun, A., Al Asmari, M. & Ge, X. Y., Li, J. L., Yang, X. L., Chmura, A. A., Zhu, G., Epstein, J. H., other authors (2013). Middle East respiratory syndrome Mazet, J. K., Hu, B., Zhang, W. & other authors (2013). Isolation coronavirus in bats, Saudi Arabia. Emerg Infect Dis 19, 1819–1823. and characterization of a bat SARS-like coronavirus that uses the Milne-Price, S., Miazgowicz, K. L. & Munster, V. J. (2014). The ACE2 receptor. Nature 503, 535–538. emergence of the Middle East respiratory syndrome coronavirus. Green, K. Y., Ando, T., Balayan, M. S., Berke, T., Clarke, I. N., Estes, Pathog Dis 71, 121–136. M. K., Matson, D. O., Nakata, S., Neill, J. D. & other authors (2000). Murray, K., Selleck, P., Hooper, P., Hyatt, A., Gould, A., Gleeson, Taxonomy of the caliciviruses. J Infect Dis 181 (Suppl. 2), S322–S330. L., Westbury, H., Hiley, L., Selvey, L. & other authors (1995). A Hall, R. J., Wang, J., Peacey, M., Moore, N. E., McInnes, K. & that caused fatal disease in horses and humans. 268 Tompkins, D. M. (2014a). New alphacoronavirus in Mystacina Science , 94–97. tuberculata bats, New Zealand. Emerg Infect Dis 20, 697–700. Naccache, S. N., Hackett, J. Jr, Delwart, E. L. & Chiu, C. Y. (2014). Hall, R. J., Wang, J., Todd, A. K., Bissielo, A. B., Yen, S., Strydom, H., Concerns over the origin of NIH-CQV, a novel virus discovered in Moore, N. E., Ren, X., Huang, Q. S. & other authors (2014b). Chinese patients with seronegative hepatitis. Proc Natl Acad Sci 111 Evaluation of rapid and simple techniques for the enrichment of USA , E976. viruses prior to metagenomic virus discovery. J Virol Methods 195, O’Donnell, C. F. J. (2001). Advances in New Zealand mammalogy 194–204. 1990–2000: long-tailed bat. J Roy Soc NZ 31, 43–57. http://vir.sgmjournals.org 2451 J. Wang and others

O’Shea, T. J., Cryan, P. M., Cunningham, A. A., Fooks, A. R., Hayman, Tse, H., Chan, W. M., Li, K. S. M., Lau, S. K. P., Woo, P. C. Y. & Yuen, D. T., Luis, A. D., Peel, A. J., Plowright, R. K. & Wood, J. L. (2014). Bat K. Y. (2012a). Discovery and genomic characterization of a novel bat flight and zoonotic viruses. Emerg Infect Dis 20, 741–745. sapovirus with unusual genomic features and phylogenetic position. 7 Phan, T. G., Kapusinszky, B., Wang, C., Rose, R. K., Lipton, H. L. & PLOS ONE , e34987. Delwart, E. L. (2011). The fecal viral flora of wild rodents. PLoS Tse, H., Tsang, A. K. L., Tsoi, H. W., Leung, A. S. P., Ho, C. C., Lau, Pathog 7, e1002218. S. K. P., Woo, P. C. Y. & Yuen, K. Y. (2012b). Identification of a novel bat papillomavirus by metagenomics. PLOS ONE 7, e43986. Rosseel, T., Pardon, B., De Clercq, K., Ozhelvaci, O. & Van Borm, S. (2014). False-positive results in metagenomic virus discovery: a strong Wang, L.-F., Shi, Z., Zhang, S., Field, H., Daszak, P. & Eaton, B. T. 12 case for follow-up diagnosis. Transbound Emerg Dis 61, 293–299. (2006). Review of bats and SARS. Emerg Infect Dis , 1834–1840. Selvey, L. A., Wells, R. M., McCormack, J. G., Ansford, A. J., Murray, Warrilow, D. (2005). Australian bat lyssavirus: a recently discovered 292 K., Rogers, R. J., Lavercombe, P. S., Selleck, P. & Sheridan, J. W. new rhabdovirus. Curr Top Microbiol Immunol , 25–44. (1995). Infection of humans and horses by a newly described Wertheim, J. O., Chu, D. K. W., Peiris, J. S. M., Kosakovsky Pond, morbillivirus. Med J Aust 162, 642–645. S. L. & Poon, L. L. M. (2013). A case for the ancient origin of coronaviruses. J Virol 87, 7039–7045. Simmons, N. B. (2005). Order Chiroptera. In Mammal Species of the World: a Taxonomic and Geographic Reference, pp. 312–529. Edited by Woo, P. C., Lau, S. K., Lam, C. S., Lau, C. C., Tsang, A. K., Lau, J. H., D. E. Wilson. Reeder. Baltimore, MD: Johns Hopkins University Bai, R., Teng, J. L., Tsang, C. C. & other authors (2012). Discovery of Press. seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of Smith, I. & Wang, L. F. (2013). Bats and their virome: an important alphacoronavirus and betacoronavirus and avian coronaviruses as source of emerging viruses capable of infecting humans. Curr Opin the gene source of gammacoronavirus and deltacoronavirus. J Virol 3 Virol , 84–91. 86, 3995–4008. Smith, D. B., Purdy, M. A. & Simmonds, P. (2013). Genetic variability Wu, Z., Ren, X., Yang, L., Hu, Y., Yang, J., He, G., Zhang, J., Dong, J., 87 and the classification of hepatitis E virus. J Virol , 4161–4169. Sun, L. & other authors (2012). Virome analysis for identification of Sonntag, M., Mu¨ hldorfer, K., Speck, S., Wibbelt, G. & Kurth, A. novel mammalian viruses in bat species from Chinese provinces. (2009). New adenovirus in bats, Germany. Emerg Infect Dis 15, J Virol 86, 10999–11012. 2052–2055. Ye, Y., Choi, J.-H. & Tang, H. (2011). RAPSearch: a fast protein Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. & Kumar, similarity search tool for short reads. BMC Bioinformatics 12, 159. S. (2011). MEGA5: molecular evolutionary genetics analysis using Zhao, Y., Tang, H. & Ye, Y. (2012). RAPSearch2: a fast and memory- maximum likelihood, evolutionary distance, and maximum efficient protein similarity search tool for next-generation sequencing parsimony methods. Mol Biol Evol 28, 2731–2739. data. Bioinformatics 28, 125–126.

2452 Journal of General Virology 96