Metagenomic Approaches for Detecting Viral Diversity in Water Environments

Camille McCall, M.ASCE1; and Irene Xagoraraki, Ph.D., M.ASCE2

Abstract: Methods for detecting and monitoring known and emerging viral pathogens in the environment are imperative for understanding risk and establishing regulatory standards in environmental and public health sectors. Next-generation (NGS) has uncovered the diversity of entire microbial populations, enabled discovery of novel organisms, and allowed pathogen surveillance. Metagenomics, the sequencing and analysis of all genetic material in a sample, is a detection method that circumvents the need for cell culturing and prior understanding of microbial assemblies, which are necessary in traditional detection methods. Advancements in NGS technologies have led to subsequent advancements in data analysis methodologies and practices to increase specificity, and accuracy of metagenomic studies. This paper highlights applications of metagenomics in viral pathogen detection, discusses suggested best practices for detecting the diversity of viruses in environmental systems (specifically water environments), and addresses the limitations of virus detection using NGS methods. Information presented in this paper will assist researchers in selecting an appropriate metagenomics approach for obtaining a comprehensive view of viruses in water systems. DOI: 10.1061/(ASCE)EE.1943-7870.0001548. © 2019 American Society of Civil Engineers.

Introduction monitoring the presence of viral pathogens in environmental - ples (Bibby and Peccia 2013; Ramírez-Castillo et al. 2015). The increase in viral-related diseases, lack of medications to In the early 1950s, Dulbecco and Vogt (1954) demonstrated that treat viral infections, low infectious dose, and high mortality rates it was possible to perform plaque assays on human viruses as with among children, elderly, and the immunocompromised is of global bacteriophages, viruses that infect bacteria. Briefly, viruses of vary- concern. Sewage-treatment plants serve as a significant exposure ing dilutions are applied to a susceptible cell monolayer. A semi- pathway for viral pathogens. Viral pathogens are known to resist solid media (e.g., agar) is then added to the monolayer to increase treatment and are released into receiving water bodies, posing a viscosity. This restricts the spread of the virus to only neighboring major threat to human health (Xagoraraki et al. 2014; Wigginton cells within the monolayer. Infected cells lyse and form holes et al. 2015). Therefore, detection and surveillance of viruses in (i.e., plaques) in the monolayer, which can be seen with the naked water environments is of growing importance. Conventional virus eye or by staining to increase visibility (Taylor 2014). Each plaque detection methods include indirect methods involving indicator is known as a plaque-forming unit (PFU) and can be counted and organisms, traditional cell-culture techniques like plaque assays expressed as a concentration. Furthermore, observing CPEs in in- and viral-induced cytopathic effects (CPEs), and molecular tech- fected cells is a widely used culture-based method for virus identi- niques such as polymerase chain reaction (PCR). This paper briefly fication in clinical diagnostics. Changes in cell morphology on a cell summarizes traditional methods for virus detection and addresses monolayer are observed microscopically, and the viral agent is iden- modern techniques, namely, next-generation sequencing, in detail. tified based on its known CPE. CPEs can appear in various forms, Indirect detection methods have been established for sug- including cell shrinking, swelling, and cell clustering (Leland and gesting the presence of pathogens in environmental systems. In par- Ginocchio 2007). ticular, measuring the concentration of indicator organisms such Traditional culture-based methods allow for the isolation of as coliforms and coliphages is a standard water quality practice purified virus stock and are effective at determining infectivity for monitoring the presence of enteric pathogens. The survivability and concentration. However, these methods are limited by long of indicator organisms depends on a number of environmental incubation periods and the inability to culture most viruses. PCR factors, and they are unable to determine the identity of a specific has become a standard molecular method for detecting and quan- pathogen. Although the concentration of these organisms are suit- tifying viruses in environmental media. This method bypasses the able for suggesting the occurrence of fecal contamination, they do need for cell culturing, provides rapid detection, and is more effi- not always correlate with the presence or absence of viral patho- cient for analyzing environmental samples (Xagoraraki et al. 2014). gens. Direct detection methods are required for classifying and Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. Quantitative PCR (qPCR) enables DNA to be simultaneously quan- tified and the concentration calculated given a standard curve. 1Ph.D. Student, Dept. of Civil and Environmental Engineering, Additionally, sequence-specific probes can be used to increase the Michigan State Univ., East Lansing, MI 48824. ORCID: https://orcid.org specificity of DNA amplification. Quantitative reverse transcription /0000-0001-7516-8391. Email: [email protected] (RT-qPCR) involves a collective set of tools to amplify and quantify 2 Associate Professor, Dept. of Civil and Environmental Engineering, ribonucleic acid (RNA) viruses such as influenza, rotavirus, and Michigan State Univ., East Lansing, MI 48824 (corresponding author). others. In RT-qPCR, RNA is converted into complementary deoxy- Email: [email protected] Note. This manuscript was submitted on July 14, 2018; approved on ribonucleic acid (DNA) (cDNA) using an enzyme, reverse tran- December 7, 2018; published online on May 22, 2019. Discussion period scriptase. The cDNA can then be used as a template for qPCR. open until October 22, 2019; separate discussions must be submitted Despite the advances of molecular methods, many of these for individual papers. This paper is part of the Journal of Environmental techniques rely on well-known primers and a priori knowledge Engineering, © ASCE, ISSN 0733-9372. of the sample’s microbiome. This limits the ability to establish a

© ASCE 04019039-1 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 comprehensive view of viral populations or the discovery of novel during an outbreak in western Uganda in 2007. The Bundibugyo pathogens in water and other environmental systems (Fancello et al. ebolavirus was found to be 35%–45% divergent from previously 2012). The need for direct human pathogen detection and viral di- characterized Ebola virus genomes and was therefore difficult to versity assessments in various ecosystems has led to the develop- detect using molecular methods. NGS assisted in the identification ment of a new generation of technologies. of this new Ebola virus species with a fatality rate of approximately Next-generation sequencing (NGS), or high-throughput sequenc- 36% within 10 days of RNA extraction. The new genome was ing (HTS), circumvents the limitations of traditional methods by us- constructed from the identified novel sequence, and molecular ing advanced technologies to sequence genes or genomes of entire techniques were used to confirm the presence of Bundibugyo organisms or microbial communities without the need for culturing ebolavirus in isolates of infected patients, thus facilitating rapid iso- or prior knowledge of the microbial structure of a sample. Several lation and control of the disease (Towner et al. 2008). DNA sequencing methods were introduced in the 1970s (Wu and Moreover, metagenomics has provided a comprehensive look Taylor 1971; Sanger et al. 1973; Maxam and Gilbert 1977). Building into viral diversity in environmental samples and has identified on previous methods, in 1977, Frederick Sanger developed a method novel and newly classified viral pathogens in several water and for DNA sequencing using chain-terminating inhibitors (Sanger et al. water-related habitats. Bibby and Peccia (2013) found high occur- 1977). Limitations of the earlier Sanger method included sequencing rences of herpesvirus and papillomavirus, along with newly discov- only up to 1,000 base pairs (bps) of DNA, or short reads (Heather ered picornaviruses, in Class B sewage sludge samples using and Chain 2016). In order to combat this limitation, larger DNA metagenomic analysis. Parechovirus and coronavirus were also de- sequences were sheared into smaller fragments, which were then tected in biosolids (Bibby et al. 2011), which highlights the risk of sequenced individually and reassembled through computational biosolids as a means for pathogen transport due to their use in land methods, a process known as shotgun sequencing (Anderson 1981; applications. Metagenomic analysis also identified human viral Gardner et al. 1981). Improvements and variations of the Sanger pathogens rhinovirus, enterovirus, parechovirus, and aichi virus shotgun method have since been introduced into the NGS methods in reclaimed water discharged from public sprinklers, fountains, widely used today, thus establishing Sanger as one of the founding and spigots at a plant nursery (Rosario et al. 2009). However, these pioneers of genomic sequencing. viruses obtained a low percent identity to reference protein sequen- Whole-genome shotgun (WGS) sequencing is widely used in ces, so therefore, careful consideration must be taken when inter- viral diversity and detection studies. WGS sequencing paired with preting results. computational tools and methods () provide a better Metagenomics has also been used to assess viral populations understanding of microbial compositions, shifts, and functions. in dairy lagoons (Alhamlan et al. 2013), sewage and surface waters Metagenomics, the sequencing and analysis of all genetic material in developed and developing communities (Ng et al. 2012; Aw in a sample (Fancello et al. 2012), has granted unprecedented ac- et al. 2014; O’Brien et al. 2017a, b; Fernandez-Cassi et al. 2018), cess into viral communities. Initially, many of the tools used in freshwater lakes (Djikeng et al. 2009), desert ponds (Fancello et al. metagenomics were designed for analyzing bacterial metagenomes. 2013), coastal (Miranda et al. 2016) and ballast waters (Kim The increase in viral metagenomic studies has led to advancements et al. 2015), and in different stages of a conventional wastewater- in established bioinformatics tools as well as new tools and stan- treatment facility (Tamaki et al. 2012). Additionally, a global study dard strategies for analyzing viral metagenomes (viromes). In this on the diversity of viruses in sewage revealed the presence of newly paper, the authors aim to (1) discuss applications of viral metage- nomics in terms of viral pathogen discovery and virus diversity in classified klassevirus and a host of novel viruses with at least water environments, (2) review standard processes and best prac- five different genomes closely related to sequences representing tices in viral metagenomic analysis, and (3) discuss limitations of picobirnaviruses (Cantalupo et al. 2011). Table 1 highlights virus NGS in viral diversity studies. concentration techniques, major bioinformatics approaches, and significant findings in viral metagenomic studies in water environ- ments published within the last 10 years. The percentage of human Viral Pathogen Discovery and Diversity of Viruses in viruses and phages reported from each study represents the percent- Water Environments Using Metagenomics ages of human host viruses and bacteriophages, respectively, in viral-affiliated sequences. Metagenomics has proven to be an effective approach in public health and environmental monitoring fields, including disease out- break analysis, novel pathogen discoveries, and viral diversity stud- Methodology in Viral Metagenomics ies. Nanopore (Oxford Nanopore Technologies, Oxford, UK) and Illumina (San Diego) sequencing, along with a Nanopore computa- A standard workflow containing presequencing and postsequenc- tional pipeline for pathogen detection has been used to identify ing processes for viral metagenomic studies in water environments Ebola, hepatitis C, and chikungunya viruses in human blood sam- is illustrated in Fig. 1. Presequencing practices such as virus con-

Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. ples with high accuracy in as little as 6 h (Greninger et al. 2015). centration and extraction have a significant impact on the diversity Another recent study used metagenomic deep sequencing on intra- of sequenced viral metagenomes. Hall et al. (2014) investigated ocular fluid samples from patients and determined rubella virus to the impact of commonly used enrichment methods for RNA viruses an etiological agent in a patient with bilateral uveitis (Thuy et al. on the abundance of influenza A and human enterovirus in a syn- 2016). Metagenomic analysis revealed common viral agents of hu- thetic clinical sample. The combination of centrifugation, syringe man respiratory tract infections (i.e., human respiratory syncytial filtration, and nuclease treatment increased the relative abundance virus, human metapneumovirus, human parainfluenza virus, influ- of influenza and enterovirus in the metagenomic data set by 10-fold enza virus, and human rhinovirus) comprising nearly 90% of viral and 20-fold, respectively. Hjelmso et al. (2017) evaluated several sequences in nasopharyngeal aspirates samples of infected individ- virus concentration methods and extraction kits as they related to uals (Lysholm et al. 2012). viral metagenomics and found both concentration and extraction to Pyrosequencing was used to identity a novel Ebola virus, have a significant impact on viral richness and an interdependent Bundibugyo ebolavirus, from blood samples of infected patients impact on viral specificity.

© ASCE 04019039-2 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved.

SE0093- .Evrn Eng. Environ. J. 04019039-3 ASCE © Table 1. Summary of methods for taxonomic classification along with significant findings of viral metagenomic studies in water environments Environmental Sequencing Human viruses Phages Significant media Virus concentration method platform Assembler Aligner/annotator Database (%) (%) human virusa Reference Wastewater Flocculation with skim milk Illumina CLC Genomics BLAST NCBI RefSeq (viral); NR NR Mamastrovirus Fernandez-Cassi solution; ultracentrifugation; NCBI Genbank (viral); et al. (2018) filtration (0.45 μm); UniProt (viral) nuclease treatment Wastewater Electropositive filtration Illumina IDBA-UD MetaVir2 NCBI RefSeq (viral) 0.53–1.21 64.55–86.15 Herpesvirus O’Brien et al. (NanoCeram); flocculation (BLAST); (2017a) with beef extract; Bowtie2 filtration (0.22 μm) Surface water; Electropositive filtration Illumina IDBA-UD BLAST NCBI RefSeq (viral) 1.18–5.40b 19–78 Rotavirus O’Brien et al. wastewater (NanoCeram); flocculation (2017b) with beef extract; filtration (0.22 μm) Seawater Polyethersulfone membrane 454/Roche CLC Genomics BLAST NCBI nonredundant protein NR NR NR Miranda et al.

J. Environ. Eng., 2019, 145(8): 04019039 prefiltration; TTF; centrifugal (non-viral-specific) (2016) ultrafiltration; CsCl density gradient ultracentrifugation Ballast water; TFF; PEG; chloroform treatment; Illumina IDBA-UD BLAST NCBI RefSeq (viral) >12.6 62.1 Parvoviridae Kim et al. harbor water filtration (0.45 and 0.22 μm); (2015) nuclease treatment Wastewater PEG; chloroform treatment; Illumina Velvet BLAST NCBI RefSeq (viral) 3 67.2 Adenoviridae Aw et al. (2014) filtration (0.45 and 0.2 μm); nuclease treatment Dairy wastewater Filtration (0.22 μm) 454/Roche NR BLAST NCBI Genbank (viral) NR 13.2–23.1 Circoviruses Alhamlan et al. (2013) Sewage sludge Flocculation with glycine; Illumina Velvet MG-RAST NCBI RefSeq 0.58 NR Herpesvirus Bibby and filtration (0.45 μm); PEG (BLAST); (viral) – amended Peccia (2013) BLAST Surface water Filtration (0.45 μm); PEG; 454/Roche Genome MG-RAST SEED (non-viral-specific) NR >92% NR Fancello et al. CsCl density gradient sequence (BLAST) (2013) ultracentrifugation; nuclease de novo treatment Wastewater; Glass fiber prefiltration; 454/Roche Newbler MG-RAST SEED (non-viral-specific) NR 16.2–45.2 NR Tamaki et al. activated sludge nitrocellulose filtration (1.2 μm); (BLAST) (2012) TFF; CsCl density gradient ultracentrifugation; nuclease treatment Wastewater TFF; sucrose density gradient 454/Roche NR BLAST Genbank nonredundant 0.66 13.5 Aichi virus Ng et al. (2012) centrifugation; nuclease treatment protein and nucleotide (non-viral-specific) Class B biosolids ASTM method D4994-89; 454/Roche Newbler BLAST NCBI RefSeq (viral) <0.1 66.2 Parechovirus Bibby et al. filtration (0.45 mm); nuclease (2011) treatment Standard postsequencing practices for processing raw reads for viral metagenomic analysis include quality filtering, assembling of raw reads into longer contiguous sequences, aligning viromes ) ) )

Reference against reference databases, and taxonomic annotation of aligned 2009 2011 2009 ( ( ( sequences. Sequencing platform, assembler, alignment approach, and reference databases are driving factors that influence annota- a tion results. This makes diversity studies difficult and oftentimes incomplete. Implementing best practices during sample processing and analysis can increase the accuracy of virus detection in meta- Significant Rhinovirus Rosario et al.

human virus genomic studies.

c Virus Concentration and Enrichment 26 (%) Phages Typically, viruses have genomes significantly shorter than some of their host prokaryotic and eukaryotic counterparts. This creates challenges for modern computational tools in the processing of viral LAST searches (i.e., BLASTx, BLASTn, and tBLASTx)

c sequences because viral signals can be masked by genomes of 4 (%) other organisms. For these reasons, virus concentration is a standard practice in viral metagenomic studies. Ultimately, virus concentra- Human viruses tion should be performed during sample collection to remove bac- teria and other nontargeted material. The virus adsorption-elution (VIRADEL) method is a standard practice for concentrating viruses from environmental waters (USEPA 2001). This method allows vi- ruses to be captured in large volumes of water and then concentrated in a solution of lesser volume, called eluate. Primary concentration is typically carried out by electropositive or electronegative cartridge filters. These filters make use of the hydrophobic and electrostatic nature of most enteric viruses. Because of the electronegative charge of most viral particles, electronegative filters need to be precondi- protein (non-viral-specific) tioned where sampling is performed at a lower pH (3.5) (Ruhanya et al. 2016) to encourage virus adsorption to filters. Electropositive filters do not need conditioning and operate at near neutral pH (Shi et al. 2017). Large volumes of water are passed through the filter, allowing viruses to adsorb to the filter media. Viruses are then eluted from the filters by an eluting solution. The most widely used eluting solution for viruses is beef extract. Other solutions include glycine and skim milk (Cantalupo et al. 2011; Shi et al. 2017). Viruses from sludge and biosolid samples are eluted according to ASTM D4994-89 (ASTM 2014). Common secondary concentration methods for environmental water samples include organic flocculation, membrane filtration, tangential flow filtration (TFF), ultracentrifugation, density-dependent centrifuga- tion, polyethylene glycol (PEG) precipitation, and nuclease treat- ment (Table 1). Several other concentration methods for virus detection have also been explored (Hjelmso et al. 2017). Addition- platform Assembler Aligner/annotator Database 454/Roche SeqMan BLAST Genbank nonredundant 454/Roche Newbler (Hybrid) BLAST CAMERA (non-viral-specific) NR NR Banna virus Djikeng et al. 454/Roche Phrap BLAST NCBI RefSeq (viral) 5.8 80 Klassevirus Cantalupo et al. Sequencing ally, concentration techniques may vary depending on the targeted virus and water characteristics (Katayama et al. 2002; Deboosere et al. 2011; Ahmed et al. 2015). In addition to laboratory concentration methods, researchers

m); may choose to implement in silico enrichment techniques post μ metagenomic sequencing. Removing nontargeted sequences from

Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. a metagenome can improve the accuracy of downstream virus classification processes. Decontamination of nontargeted sequences increases the likelihood of detecting less-dominant viruses in meta- genomes and reduces computational resources and ambiguity dur- ing metagenomic assembly. The use of genetic markers such as the .) TFF; filtration (0.22 PEG; chloroform treatment; CsCl density gradient ultracentrifugation; nuclease treatment nuclease treatment solution; ultracentrifugation; nuclease treatment 16S rRNA gene in bacteria, and mapping sequences to nontargeted organism databases (i.e., human or prokaryotic, among others) have been employed in recent virome studies (Djikeng et al. 2009; Continued ( Cantalupo et al. 2011; Li et al. 2015; Motlagh et al. 2017). Despite the benefits of in silico enrichment, metagenomic data sets can be reduced up to 90% (Rose et al. 2016), resulting Percentage represents vertebrate host (may not all be of human origin). Studies may have detected more human virusesPercentages than reflect listed results in in this reclaimed table. water. Viruses listed signify significant finding. Table 1. a b c Potable water; reclaimed water Environmental media Virus concentration method Surface water TFF; ultracentrifugation; Note: NR = not reported; TFFare = denoted tangential flow as filtration; BLAST. and The PEG = percent polyethylene glycol. of Phage-specific human methods viruses are excluded and from phages summary. Various B are relative to viral affiliated sequences. Wastewater Flocculation with skim milk in a loss of crucial biological information and a reduction in the

© ASCE 04019039-4 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 Fig. 1. Suggested WGS workflow for viral metagenomics in water environments.

comprehensiveness of the study. Generally, annotated metage- affects the assembly of these fragments into longer sequences or the nomes contain a significant portion of prokaryotic genomes due direct mapping of sequence reads to reference databases. Therefore, to the increased likelihood of their larger genomes being sequenced prior to assembling, it is common practice to trim low-quality bases over smaller viral genomes, the presence of prophages annotated as and sequencing artifacts as well as remove duplicate and short reads their host genomes, and the limited number of viral reference se- (Kunin et al. 2008). quences available in public databases (Fancello et al. 2012; Tamaki The quality of a sequenced nucleotide base is generally evalu- et al. 2012). Therefore, laboratory concentration techniques for vi- ated by its Phred quality score, which is determined by the sequenc- ral diversity studies are favored over in silico enrichment. A recent ing platform used to generate reads. A Phred quality score is a study on the diversity of viruses in polluted waters in Uganda found quality parameter adopted by most quality control and analysis virus-related sequences to comprise as much as 99.79% of affiliated tools for read trimming and filtering (Ruffalo et al. 2011). Quality

Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. sequences when using the EPA-approved VIRADEL method trimming may be applied over the entire read or within a specified (O’Brien et al. 2017b). This further suggests that laboratory con- window of bases. Phred score thresholds ranging from 15 to 30 are centration techniques dictate the quality and richness of postse- commonly set in viral metagenomic studies (i.e., only bases or quenced viromes. reads ≥ Q are used in the analysis) (Bibby and Peccia 2013; Aw et al. 2014; Kim et al. 2015; Miranda et al. 2016; Fernandez- Cassi et al. 2018), where a Phred score of 20 is considered good Quality Analysis of Viral-Related Sequence Reads quality (Lapidus 2009). In addition to quality trimming, removal of sequencing artifacts Sequencing platforms produce massive amounts of short DNA is another critical step to ensure good-quality reads for assembly fragments (reads). The first goal for many viral metagenomic stud- and alignment. Adaptors, synthetic oligonucleotides, are added ies is to accurately assemble and combine DNA fragments, thus to a sample’s DNA to condition the DNA for sequencing. Removal reconstructing the whole genome. The quality of raw reads directly of adaptors postsequencing is necessary to reduce contamination

© ASCE 04019039-5 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 from these synthetic sequences. Adaptor trimming is typically car- Roux et al. 2017). Contig length thresholds can be influenced ried out at the 3′ end of a sequence (Bibby and Peccia 2013; Aw by the application of the study, assembler properties, and length et al. 2014; O’Brien et al. 2017b). Because sequencing is carried of reads. Discarding contigs less than 100 bp (Fernandez-Cassi out from the 5′ to 3′ ends of the DNA fragment, fragments shorter et al. 2018) and 200 bp (Aw et al. 2014) have been observed in than the sequencing length can cause the 3′ adaptor to lie within the viral metegenomic studies using Illumina data sets. Removing sequencing region (Bolger et al. 2014). Adaptor contamination may contigs with lengths less than 200 bp (Rosario et al. 2009)upto be low depending on the type of data sequenced, sequencing plat- 300bp(Fancello et al. 2013) have been observed in studies using form, and sequencing length. However, the presence of adaptors in 454 data sets. viromes can significantly increase the number of unassigned reads during read alignment and annotation, and therefore, is it best prac- tice to remove adaptors prior to assembly. Alignment and Taxonomic Classification of Viromes Another quality control practice includes dereplication, the re- moval of redundant reads (Bibby et al. 2011; Cantalupo et al. 2011; There are two main approaches to taxonomic classification in meta- Tamaki et al. 2012; Bzhalava and Dillner 2013; Miranda et al. genomic studies: (1) composition-based approaches, and (2) simi- 2016; Fernandez-Cassi et al. 2018). Replicate filtering depends larity-based approaches. The latter is most widely used in viral on the sequencing platform used to generate reads. Illumina and metagenomic studies because few tools exist that are trained on 454 (Roche, Basel, Switzerland) pyrosequencing platforms are viral genomes. Similarity-based searches involve aligning as- widely used in viral metagenomic studies (Table 1) with the latter sembled contigs or raw reads against reference genomes or genes. being recently discontinued. The generation of artificial replicates In viral studies, aligning contigs is considered best practice consid- in 454 sequencing technologies facilitates the need for replicate fil- ering the presence of highly divergent sequences and shorter ge- tering, as opposed to sequences generated from Illumina platforms nomes. However, O’Brien et al. (2017a) aligned short reads, (Gomez-Alvarez et al. 2009). In-depth comparisons between 454 also known as fragment alignment (Sharpton 2014), against the and Illumina sequencing technologies have been previously re- RefSeq viral database as a corroboratory detection method for viewed (Liu et al 2012; Luo et al. 2012). Additionally, removal assessing virus diversity in three sewage-treatment plants. Frag- of reads significantly shorter than the specified read length is a typ- ment alignment detected herpesvirus and adenovirus in sewage ical quality filtering approach in viral meteganomic studies. These samples; this was similar to results obtained from aligning contigs reads are often less informative, increase computing time, and can using Metavir annotation platform. Various short-read aligners such results in assembly errors (White et al. 2017). Read length thresh- as Burrows-Wheeler alignment tool (BWA), and Bowtie2 are de- olds depend on read length and application of the study. signed for aligning a large number of short reads to large reference data sets. Generally, in viral metagenomic studies, this approach is most commonly used for evaluating contig abundance and in silico Assembly of Viromes decontamination of nontargeted organisms (Cantalupo et al. 2011; Bibby and Peccia 2013; Li et al. 2015). Assembly is the process of assembling sequence reads into longer Despite the significant number of similarity-based aligners contiguous sequences (contigs). Although not required for taxo- available, the basic local alignment search tool (BLAST) is the nomic classification, it results in a decrease in data volume, which most widely used alignment tool in viral metagenomic studies. reduces time and computational resources. Assembling reads also The translated nucleotide query against translated nucleotide data- improves the accuracy of alignment processes because longer se- base BLAST (tBLASTx) algorithm with an E-value of 10−3 is said quences contain more genetic information. For these reasons, to be more accurate when identifying viral pathogens compared to assembly is a standard practice in viral metagenomic studies. the translated nucleotide query against protein database BLAST The process of assembling reads is highly dependent on the type (BLASTx) and nucleotide BLAST (BLASTn) searches (Bibby of sequencing platform used and reads generated. There are two et al. 2011). Hence, a number of studies have adopted this approach main approaches to assembly, reference-based and de novo for detecting less-abundant human pathogens in viromes (Bibby assembly. Reference-based assembly involves the use of a back- et al. 2011; Cantalupo et al. 2011; Alhamlan et al. 2013; Bibby bone genome in which unknown sequences can be referenced to and Peccia 2013; Aw et al. 2014). However, a number of viral di- and assembled. De novo assembly uses similarities between reads versity studies have performed BLASTx alignment with an E-value to obtain a consensus sequence or suggested whole genome. De of 10−5 (Djikeng et al. 2009; Kim et al. 2015; Miranda et al. 2016; novo assembly is most commonly used in metagenomic analyses Fernandez-Cassi et al. 2018). Regardless, Kunin et al. (2008) sug- because assigning backbone genomes to an unknown complex gested that protein searches increase the sensitivity of alignment, community is nearly impossible. Optimized de novo assembly therefore increasing the accuracy of annotated sequences. This practices for viral genome reconstruction include combined and may be especially true for viruses because protein searches are iterative exhaustive assembly approaches. These approaches in- more likely to identify divergent sequences based on conserved re-

Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. volve combining contigs produced from multiple assembly plat- gions (Fancello et al. 2012). forms or repeated assembly of the same set of contigs and The type of reference database also plays a role in the accuracy unassembled reads until no other contigs can be formed. These of alignment. Bibby et al. (2011) found that viral specific databases methods have been known to improve accuracy and recover a large can improve detection compared with reference databases contain- portion of the metagenome (Djikeng et al. 2009; Smits et al. 2014; ing organisms across multiple groups. Aligning genomes against White et al. 2017). target-specific databases are less computationally demanding due The accuracy of assembly in viral metagenomic studies can be to the reduction in data and decreases ambiguity during alignment. distorted by sequencing artifacts, volume of data, presence of The National Center for Biotechnology Information (NCBI) Viral nontargeted organisms, and length and quality of read (Lapidus RefSeq database contains validated nonredundant sequences of vi- 2009; Rose et al. 2016). Short contigs may misrepresent virus ral origin (O’Leary et al. 2016) and has become a standard database abundance in metagenomes because they can create bias when in viral studies (Table 1). NCBI RefSeq has made strides to stand- assessing population estimates (Vazquez-Castellanos et al. 2014; ardize reference sequences among public databases to allow for

© ASCE 04019039-6 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 more conclusive comparative studies, less confusion between vari- are prone to a number of errors, and therefore molecular methods ous annotation platforms, and increased accuracy when annotating such as qPCR are needed to validate findings from NGS analyses. viral sequences (O’Leary et al. 2016). Other databases include the The poor quantitative nature of NGS and difficulty in determining NCBI nonredundant database, the Community Cyberinfrastructure pathogen infectivity further promotes the need for confirmatory for Advanced Marine Microbial Ecology Research and Analysis methods tailored to the application of the study (Goldberg et al. (CAMERA) database (Sun et al. 2010), and SEED (Overbeek et al. 2015). Regardless of its limitations, NGS has greatly advanced the 2013). These databases consist of publicly available sequences be- current state of knowledge in viral diversity studies with powerful longing to organisms from multiple taxonomic groups. bioinformatics tools to combat sequencing limitations for improved Several factors influence the alignment of sequences against accuracy. reference databases. In particular, contamination can distort viral diversity when assessing whole communities. The presence of non- targeted organisms and dominant populations within viromes create Summary and Conclusions difficulties when detecting viral pathogens in metagenomes. Bacteriophages are the most abundant entities on Earth and usually Traditional molecular techniques are limited to identifying well- constitute a large portion of affiliated viruses in metagenomic stud- known viruses and assessing microbial environments only through ies (Table 1). This imposes further complications when aiming to the lens of targeted organisms. Next-generation sequencing enables detect human pathogens in viromes and even more so when viruses researchers to obtain a rapid, unbiased view of viral communities are contaminated with bacterial genomes. Tamaki et al. (2012) with the possibility of discovering novel pathogens, thus contrib- studied the diversity and functions of DNA viruses in major stages uting a great deal of knowledge in the field of environmental virol- of a domestic sewage-treatment plant in Singapore using metage- ogy. Sequencing and computational errors in viral metagenomics, nomic analysis. Despite performing virus enrichment prior to as well as the poor quantitative nature of NGS, calls for traditional sequencing, affiliated sequences contained 79%–84% of bacteria- confirmatory methods such as cell cultures and qPCR. Despite the related sequences, and 13%–18.5% of viral sequences of which limitations of viral sequencing, bioinformatics tools are continu- none were affiliated with human viruses. High numbers of bacteria ously developing to provide a deeper understanding of the fate affiliated sequences in viromes may be related to inadequate virus and dynamics of viruses in water environments. With that, standard enrichment, horizontal gene transfer by bacteriophages, and pro- approaches such as virus enrichment, quality control, assembly, phages annotated as their host genomes (Cantalupo et al. 2011; and alignment approaches have been adopted across multiple Tamaki et al. 2012). The low abundance and relatively short ge- applications to facilitate a clear understanding of viromes across nomes of human viruses are of particular concern in metagenomics different ecosystems. Here, the authors have reviewed standard ap- studies because they may fail to assemble or go undiscovered dur- proaches as well as suggested best practices for viral diversity stud- ing taxonomic classification. ies in water environments using metagenomics. These techniques, paired with confirmatory methods, make NGS an effective method of detection in environmental studies with the potential to influence Limitations of Next-Generation Sequencing for Viral environmental infrastructures and policies in the aim of reducing Diversity and Pathogen Detection the risk of human exposure to viral pathogens.

Despite its achievements, NGS is not without limitations. The chance of a virus’ genome being sequenced in a metagenome is References a result of its size (Bibby et al. 2011; Fancello et al. 2012). Viral pathogen genomes are generally small in nature and less abundant, Ahmed, W., V. J. Harwood, P. Gyawali, J. P. S. Sidhu, and S. Toze. 2015. “Concentration methods comparison for quantitative detection of especially RNA viruses, which creates a universal challenge when sewage-associated viral markers in environmental waters.” Appl. Envi- detecting these organisms and determining their abundance in ron. Microbiol. 81 (6): 2042–2049. https://doi.org/10.1128/AEM metagenomes (Marz et al. 2014). Sequencing platforms such as .03851-14. Illumina try to overcome this limitation by producing short reads Alhamlan, F. S., M. M. Ederer, C. J. Brown, E. R. Coats, and R. L. at greater sequencing depths to increase the chance of detecting Crawford. 2013. “Metagenomics-based analysis of viral communities less-abundant genomes (Nieuwenhuijse and Koopmans 2017). in dairy lagoon wastewater.” J. Microbiol. Methods 92 (2): 183–188. Consequently, the application of the study should be taken into https://doi.org/10.1016/j.mimet.2012.11.016. consideration when selecting the appropriate sequencing platform. Anderson, S. 1981. “Shotgun DNA sequencing using cloned DNASE ” – Similarly, web-based annotation platforms like Viral Informatics I-generated fragments. Nucleic Acids Res. 9 (13): 3015 3027. https://doi.org/10.1093/nar/9.13.3015. Resource for Metagenome Exploration (VIROME) (Wommack ASTM. 2014. Standard practice for recovery of viruses from wastewater et al. 2012) and Metavir (Roux et al. 2014) are designed to increase sludges. ASTM D4994-89. West Conshohocken, PA: ASTM. virus annotation and improve comparison of viral communities Aw, T. G., A. Howe, and J. B. Rose. 2014. “Metagenomic approaches Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. across different environments. A comprehensive review of bioin- for direct and cell culture evaluation of the virological quality of waste- formatics tools used for analyzing viral sequences has been dis- water.” J. Virol. Methods 210: 15–21. https://doi.org/10.1016/j.jviromet cussed elsewhere (Sharma et al. 2015; Nooij et al. 2018). .2014.09.017. Despite methods to overcome sequencing bias, the limited num- Bibby, K., and J. Peccia. 2013. “Identification of viral pathogen diversity in ber of available reference sequences for viral genomes remains a sewage sludge by metagenome analysis.” Environ. Sci. Technol. 47 (4): – challenge across all annotation platforms. Because viruses lack a 1945 1951. https://doi.org/10.1021/es305181x. “ universal marker, the detection of viruses in metagenomes is typ- Bibby, K., E. Viau, and J. Peccia. 2011. Viral metagenome analysis to guide human pathogen monitoring in environmental samples.” Lett. ically done through assembling of contigs and comparisons against Appl. Microbiol. 52 (4): 386–392. https://doi.org/10.1111/j.1472-765X previously identified sequences. The highly divergent nature of vi- .2011.03014.x. ruses makes it difficult to maintain well-updated reference data- Bolger, A. M., M. Lohse, and B. Usadel. 2014. “Trimmomatic: A flexible bases, resulting in a large number of unaffiliated sequences in trimmer for Illumina sequence data.” Bioinformatics 30 (15): 2114– viral studies. Moreover, NGS sequencing and bioinformatics tools 2120. https://doi.org/10.1093/bioinformatics/btu170.

© ASCE 04019039-7 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 Bzhalava, D., and J. Dillner. 2013. “Bioinformatics for viral metagenom- Leland, D. S., and C. C. Ginocchio. 2007. “Role of cell culture for virus ics.” J. Data Min. Genomics Proteomics 4 (3): 2153. detection in the age of technology.” Clin. Microbiol. Rev. 20 (1): 49–78. Cantalupo, P. G., et al. 2011. “Raw sewage harbors diverse viral popula- https://doi.org/10.1128/CMR.00002-06. tions.” MBio 2 (5): e00810. https://doi.org/10.1128/mBio.00180-11. Li, L., X. Deng, E. T. Mee, S. Collot-Teixeira, R. Anderson, S. Deboosere, N., S. V. Horm, A. Pinon, J. Gachet, C. Coldefy, P. Buchy, and Schepelmann, P. D. Minor, and E. Delwart. 2015. “Comparing viral M. Vialette. 2011. “Development and validation of a concentration metagenomics methods using a highly multiplexed human viral patho- method for the detection of influenza A viruses from large volumes gens reagent.” J. Virol. Methods 213: 139–146. https://doi.org/10.1016 of surface water.” Appl. Environ. Microbiol. 77 (11): 3802–3808. /j.jviromet.2014.12.002. https://doi.org/10.1128/AEM.02484-10. Liu, L., Y. Li, S. Li, N. Hu, Y. He, R. Pong, D. Lin, L. Lu, and M. Law. Djikeng, A., R. Kuzmickas, N. G. Anderson, and D. J. Spiro. 2009. 2012. “Comparison of next-generation sequencing systems.” J. Biomed. “Metagenomic analysis of RNA viruses in a fresh water lake.” PLoS Biotechnol. 2012: 251364. https://doi.org/10.1155/2012/251364. ONE 4 (9): e7264. https://doi.org/10.1371/journal.pone.0007264. Luo, C., D. Tsementzi, N. Kyrpides, T. Read, and K. T. Konstantinidis. Dulbecco, R., and M. Vogt. 1954. “Plaque formation and isolation of pure 2012. “Direct comparisons of Illumina vs. Roche 454 sequencing lines with poliomyelitis viruses.” J. Exp. Med. 99 (2): 167–182. https:// technologies on the same microbial community DNA sample.” PLoS doi.org/10.1084/jem.99.2.167. ONE 7 (2): e30087. https://doi.org/10.1371/journal.pone.0030087. Fancello, L., D. Raoult, and C. Desnues. 2012. “Computational tools for Lysholm, F., et al. 2012. “Characterization of the viral microbiome in viral metagenomics and their application in clinical research.” Virology patients with severe lower respiratory tract infections, using metage- ” 434 (2): 162–174. https://doi.org/10.1016/j.virol.2012.09.025. nomic sequencing. PLoS ONE 7 (2): e30875. https://doi.org/10 Fancello, L., S. Trape, C. Robert, M. Boyer, N. Popgeorgiev, D. Raoult, and .1371/journal.pone.0030875. “ ” C. Desnues. 2013. “Viruses in the desert: A metagenomic survey of Marz, M., et al. 2014. Challenges in RNA virus bioinformatics. Bioin- – viral communities in four perennial ponds of the Mauritanian Sahara.” formatics 30 (13): 1793 1799. https://doi.org/10.1093/bioinformatics ISME J. 7 (2): 359–369. https://doi.org/10.1038/ismej.2012.101. /btu105. “ Fernandez-Cassi, X., N. Timoneda, S. Martinez-Puchol, M. Rusinol, Maxam, A. M., and W. Gilbert. 1977. A new method for sequencing ” – J. Rodriguez-Manzano, N. Figuerola, S. Bofill-Mas, J. F. Abril, and DNA. Proc. Natl. Acad. Sci. 74 (2): 560 564. https://doi.org/10 R. Girones. 2018. “Metagenomics for the study of viruses in urban sew- .1073/pnas.74.2.560. age as a tool for public health surveillance.” Sci. Total Environ. Miranda, J. A., A. I. Culley, C. R. Schvarcz, and G. F. Steward. 2016. “ ” 618: 870–880. https://doi.org/10.1016/j.scitotenv.2017.08.249. RNA viruses as major contributors to Antarctic virioplankton. Envi- – Gardner, R. C., A. J. Howarth, P. Hahn, M. Brownluedi, R. J. Shepherd, and ron. Microbiol. 18 (11): 3714 3727. https://doi.org/10.1111/1462-2920 .13291. J. Messing. 1981. “The complete nucleotide-sequence of an infectious clone of cauliflower mosaic-virus by M13mp7 shotgun sequencing.” Motlagh, A. M., A. S. Bhattacharjee, F. H. Coutinho, B. E. Dutilh, S. R. Casjens, and R. K. Goel. 2017. “Insights of phage-host interaction in Nucleic Acids Res. 9 (12): 2871–2888. https://doi.org/10.1093/nar/9 hypersaline ecosystem through metagenomics analyses.” Front. Micro- .12.2871. biol. 8: 352. https://doi.org/10.3389/fmicb.2017.00352. Goldberg, B., H. Sichtig, C. Geyer, N. Ledeboer, and G. M. Weinstock. 2015. Ng, T. F. F., R. Marine, C. Wang, P. Simmonds, B. Kapusinszky, L. “Making the leap from research laboratory to clinic: Challenges and op- Bodhidatta, B. S. Oderinde, K. E. Wommack, and E. Delwart. 2012. portunities for next-generation sequencing in infectious disease diagnos- “High variety of known and new RNA and DNA viruses of diverse ori- tics.” Mbio 6 (6): e01888. https://doi.org/10.1128/mBio.01888-15. gins in untreated sewage.” J. Virol. 86 (22): 12161–12175. https://doi Gomez-Alvarez, V., T. K. Teal, and T. M. Schmidt. 2009. “Systematic ar- .org/10.1128/JVI.00869-12. tifacts in metagenomes from complex microbial communities.” ISME J. Nieuwenhuijse, D. F., and M. P. G. Koopmans. 2017. “Metagenomic 3 (11): 1314–1317. https://doi.org/10.1038/ismej.2009.72. sequencing for surveillance of food- and waterborne viral diseases.” Greninger, A. L., S. N. Naccache, S. Federman, G. Yu, and P. Mbala. 2015. Front. Microbiol. 8: 230. https://doi.org/10.3389/fmicb.2017.00230. “Rapid metagenomic identification of viral pathogens in clinical sam- ” Nooij, S., D. Schmitz, H. Vennema, A. Kroneman, and M. P. Koopmans. ples by real-time nanopore sequencing analysis. Genome Med. 7 (1): 2018. “Overview of virus metagenomic classification methods and their 99. https://doi.org/10.1186/s13073-015-0220-9. biological applications.” Front. Microbiol. 9: 749. https://doi.org/10 “ Hall, R. J., et al. 2014. Evaluation of rapid and simple techniques for .3389/fmicb.2018.00749. ” the enrichment of viruses prior to metagenomic virus discovery. O’Brien, E., M. Munir, T. Marsh, M. Heran, G. Lesage, V. V. Tarabara, and – J. Virol. Methods 195: 194 204. https://doi.org/10.1016/j.jviromet I. Xagoraraki. 2017a. “Diversity of DNA viruses in effluents of mem- .2013.08.035. brane bioreactors in Traverse City, MI (USA) and La Grande Motte “ Heather, J. M., and B. Chain. 2016. The sequence of sequencers: The his- (France).” Water Res. 111: 338–345. https://doi.org/10.1016/j.watres ” – tory of sequencing DNA. Genomics 107 (1): 1 8. https://doi.org/10 .2017.01.014. .1016/j.ygeno.2015.11.003. O’Brien, E., J. Nakyazze, H. Wu, N. Kiwanuka, W. Cunningham, J. B. “ Hjelmso, M. H., et al. 2017. Evaluation of methods for the concentration Kaneene, and I. Xagoraraki. 2017b. “Viral diversity and abundance and extraction of viruses from sewage in the context of metagenomic in polluted waters in Kampala, Uganda.” Water Res. 127: 41–49. sequencing.” PLoS ONE 12 (1): e0170199. https://doi.org/10.1371 https://doi.org/10.1016/j.watres.2017.09.063. /journal.pone.0170199. O’Leary, N. A., et al. 2016. “Reference sequence (RefSeq) database at Katayama, H., A. Shimasaki, and S. Ohgaki. 2002. “Development of a NCBI: Current status, taxonomic expansion, and functional annota-

Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved. virus concentration method and its application to detection of entero- tion.” Nucleic Acids Res. 44 (D1): D733–745. https://doi.org/10.1093 virus and Norwalk virus from coastal seawater.” Appl. Environ. Micro- /nar/gkv1189. biol. 68 (3): 1033–1039. https://doi.org/10.1128/AEM.68.3.1033-1039 Overbeek, R., et al. 2013. “The SEED and the rapid annotation of microbial .2002. genomes using subsystems technology (RAST).” Nucleic Acids Res. Kim, Y., T. G. Aw, T. K. Teal, and J. B. Rose. 2015. “Metagenomic inves- 42 (D1): D206–D214. https://doi.org/10.1093/nar/gkt1226. tigation of viral communities in ballast water.” Environ. Sci. Technol. Ramírez-Castillo, F. Y., A. Loera-Muro, M. Jacques, P. Garneau, F. J. 49 (14): 8396–8407. https://doi.org/10.1021/acs.est.5b01633. Avelar-González, J. Harel, and A. L. Guerrero-Barrera. 2015. “Water- Kunin, V., A. Copeland, A. Lapidus, K. Mavromatis, and P. Hugenholtz. borne pathogens: Detection methods and challenges.” Pathogens 4 (2): 2008. “A bioinformatician’s guide to metagenomics.” Microbiol. Mol. 307–334. https://doi.org/10.3390/pathogens4020307. Biol. Rev. 72 (4): 557–578. https://doi.org/10.1128/MMBR.00009-08. Rosario, K., C. Nilsson, Y. W. Lim, Y. Ruan, and M. Breitbart. 2009. Lapidus, A. L. 2009. Genome sequence databases (overview): Sequencing “Metagenomic analysis of viruses in reclaimed water.” Environ. Micro- and assembly. Rep. No. LBNL-2095E. Berkeley, CA: Lawrence biol. 11 (11): 2806–2820. https://doi.org/10.1111/j.1462-2920.2009 Berkeley National Laboratory. .01964.x.

© ASCE 04019039-8 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039 Rose, R., B. Constantinides, A. Tapinos, D. L. Robertson, and M. Prosperi. Supplement, Nucleic Acids Res. 39 (S1): D546–D551. https://doi.org 2016. “Challenges in the analysis of viral metagenomes.” Virus Evol. /10.1093/nar/gkq1102. 2 (2): vew022. https://doi.org/10.1093/ve/vew022. Tamaki, H., R. Zhang, F. E. Angly, S. Nakamura, P.-Y. Hong, T. Yasunaga, Roux, S., J. B. Emerson, E. A. Eloe-Fadrosh, and M. B. Sullivan. 2017. Y. Kamagata, and W.-T. Liu. 2012. “Metagenomic analysis of DNA “Benchmarking viromics: An in silico evaluation of metagenome- viruses in a wastewater treatment plant in tropical climate.” Environ. enabled estimates of viral community composition and diversity.” PeerJ Microbiol. 14 (2): 441–452. https://doi.org/10.1111/j.1462-2920.2011 5: e3817. https://doi.org/10.7717/peerj.3817. .02630.x. Roux, S., J. Tournayre, A. Mahul, D. Debroas, and F. Enault. 2014. Taylor, M. W. 2014. “A history of cell culture.” In Viruses and man: “Metavir 2: New tools for viral metagenome comparison and assembled A history of interactions,41–52. Cham, Switzerland: Springer. ” virome analysis. BMC Bioinf. 15 (1): 76. https://doi.org/10.1186/1471 Thuy, D., et al. 2016. “Illuminating Uveitis: Metagenomic deep sequencing -2105-15-76. identifies common and rare pathogens.” Genome Med. 8 (1): 90. https:// “ Ruffalo, M., T. LaFramboise, and M. Koyuturk. 2011. Comparative doi.org/10.1186/s13073-016-0344-6. analysis of algorithms for next-generation sequencing read align- Towner, J. S., et al. 2008. “Newly discovered Ebola virus associated with ” – ment. Bioinformatics 27 (20): 2790 2796. https://doi.org/10.1093 hemorrhagic fever outbreak in Uganda.” PLoS Pathog. 4 (11): /bioinformatics/btr477. e1000212. https://doi.org/10.1371/journal.ppat.1000212. “ Ruhanya, V., L. Kabego, and J. O. Gichana. 2016. Adsorption-elution USEPA. 2001. Manual of methods for virology. Washington, DC: USEPA. techniques and molecular detection of enteric viruses from water.” Vazquez-Castellanos, J. F., R. Garcia-Lopez, V. Perez-Brocal, M. J. Hum. Virol. Retrovirol. 3 (6): 00112. https://doi.org/10.15406/jhvrv Pignatelli, and A. Moya. 2014. “Comparison of different assembly .2016.03.00112. and annotation tools on analysis of simulated viral metagenomic com- Sanger, F., J. E. Donelson, A. R. Coulson, H. Kössel, and D. Fischer. 1973. munities in the gut.” BMC Genomics 15 (1): 37. https://doi.org/10.1186 “Use of DNA polymerase I primed by a synthetic oligonucleotide to /1471-2164-15-37. determine a nucleotide sequence in phage F1 DNA.” Proc. Natl. Acad. White, D. J., J. Wang, and R. J. Hall. 2017. “Assessing the impact of Sci. 70 (4): 1209–1213. https://doi.org/10.1073/pnas.70.4.1209. “ assemblers on virus detection in a de novo metagenomic analysis pipe- Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing ” – with chain-terminating inhibitors.” Proc. Natl. Acad. Sci. 74 (12): line. J. Comput. Biol. 24 (9): 874 881. https://doi.org/10.1089/cmb 5463–5467. https://doi.org/10.1073/pnas.74.12.5463. .2017.0008. “ Sharma, D., P. Priyadarshini, and S. Vrati. 2015. “Unraveling the web of Wigginton, K. R., Y. Ye, and R. M. Ellenberg. 2015. Emerging investi- viroinformatics: Computational tools and databases in virus research.” gators series: The source and fate of pandemic viruses in the urban ” – J. Virol. 89 (3): 1489–1501. https://doi.org/10.1128/JVI.02027-14. water cycle. Environ. Sci. Water Res. Technol. 1 (6): 735 746. Sharpton, T. J. 2014. “An introduction to the analysis of shotgun metage- https://doi.org/10.1039/C5EW00125K. nomic data.” Front. Plant Sci. 5: 209. https://doi.org/10.3389/fpls.2014 Wommack, K. E., J. Bhavsar, S. W. Polson, J. Chen, M. Dumas, S. .00209. Srinivasiah, M. Furman, S. Jamindar, and D. J. Nasko. 2012. Shi, H., E. V. Pasco, and V. V. Tarabara. 2017. “Membrane-based methods “VIROME: A standard operating procedure for analysis of viral meta- of virus concentration from water: A review of process parameters and genome sequences.” Stand. Genomic Sci. 6 (3): 427–439. https://doi.org their effects on virus recovery.” Environ. Sci. Water Res. Technol. 3 (5): /10.4056/sigs.2945050. 778–792. https://doi.org/10.1039/C7EW00016B. Wu, R., and E. Taylor. 1971. “Nucleotide sequence analysis of DNA: II. Smits, S. L., R. Bodewes, A. Ruiz-Gonzalez, W. Baumgartner, M. P. Complete nucleotide sequence of the cohesive ends of bacteriophage Koopmans, A. D. M. E. Osterhaus, and A. C. Schurch. 2014. λ DNA.” J. Mol. Biol. 57 (3): 491–511. https://doi.org/10.1016/0022 “Assembly of viral genomes from metagenomes.” Front. Microbiol. -2836(71)90105-7. 5: 714. https://doi.org/10.3389/fmicb.2014.00714. Xagoraraki, I., Z. Yin, and Z. Svambayev. 2014. “Fate of viruses in water Sun, S., et al. 2010. “Community cyberinfrastructure for advanced micro- systems.” J. Environ. Eng. 140 (7): 04014020. https://doi.org/10.1061 bial ecology research and analysis: The CAMERA resource.” /(ASCE)EE.1943-7870.0000827. Downloaded from ascelibrary.org by Michigan State University on 06/04/19. Copyright ASCE. For personal use only; all rights reserved.

© ASCE 04019039-9 J. Environ. Eng.

J. Environ. Eng., 2019, 145(8): 04019039