PERSPECTIVES

NEW TECHNOLOGIES: METHODS AND APPLICATIONS (TABLE 1), and the importance of deeply OPINION sequencing some viral pathogens. We will also explore two areas in which viral WGS Clinical and biological insights from has recently proven its clinical utility: metagenomic sequencing to identify that cause encephalitis (BOX 1); and the role of viral sequencing WGS in molecular epidemiology and management of the Pan-American Charlotte J. Houldcroft, Mathew A. Beale and Judith Breuer Zika outbreak (BOX 2). Finally, we will Abstract | Whole-genome sequencing (WGS) of pathogens is becoming briefly consider the ethical and data analysis challenges that clinical viral WGS presents. increasingly important not only for basic research but also for clinical science and practice. In , WGS is important for the development of novel treatments Why sequence viruses in the clinic? and vaccines, and for increasing the power of molecular epidemiology and For small viruses, such as HIV, influenza evolutionary . In this Opinion article, we suggest that WGS of viruses in virus, hepatitis (HBV) and a clinical setting will become increasingly important for patient care. We give an virus (HCV), the sequencing of partial has been widely used overview of different WGS methods that are used in virology and summarize their for research, but it also has important advantages and disadvantages. Although there are only partially addressed clinical applications. One of the main technical, financial and ethical issues in regard to the clinical application of viral applications and reasons for sequencing WGS, this technique provides important insights into virus transmission, evolution viruses is the detection of . and pathogenesis. For example, the management of highly active antiretroviral therapy (HAART) for HIV relies on viral sequencing for the Since the publication of the first shotgun-­ partial genomes has been used to detect detection of drug-resistant variants. HAART sequenced genome (cauliflower mosaic drug resistance in RNA viruses, such as has substantially improved the survival virus1), the draft human genome2 and influenza virus11, and DNA viruses, such of patients who have HIV, but successful the first bacterial genomes (Haemophilus as human (HCMV)12. therapy requires long-term suppression influenzae3 and Mycoplasma genitalium4), Viral genome sequencing is becoming of with antiretroviral and enabled by the rapidly decreasing cost ever more important, especially in clinical drugs, which may be prevented by of high-throughput sequencing5, genomics research and epidemiology. WGS of impaired host immunity, suboptimal drug has changed our understanding of human pathogens has the advantage of detecting penetration in certain tissue compartments and pathogen biology. Several large projects all known drug-resistant variants in a single and incomplete adherence to therapy19. that aim to systematically analyse microbial test, whereas deep sequencing (that is, When viral replication continues despite genomes have recently been completed sequencing at high coverage) can identify treatment, the high mutation rate of HIV or are ongoing (for example, sequencing low levels of drug-resistant variants to enables resistant variants to develop. It has thousands of microbiomes6 and fungal enable intervention before resistance become standard practice in many parts genomes6,7); these projects are shaping our becomes clinically apparent13,14. Whole of the world to sequence the HIV pol gene, knowledge of the genetic variation that genomes also provide good data with which which encodes the main viral enzymes, is present in pathogen populations, the to identify linked infections for public to detect variants that confer resistance to genetic changes that underlie disease and the health and infection control purposes15,16. inhibitors of reverse transcriptase, integrase diversity of microorganisms with which we However, progress in using viral WGS for or protease20, particularly when patients are share our environment. clinical practice has been slow. By contrast, first diagnosed and when viral loads indicate The methods and data from WGS of is now well accepted, treatment failure. Sequencing resistant whole-genome sequencing (WGS), which particularly for tracking outbreaks and for variants has enabled targeted changes in have been developed through basic scientific the management of nosocomial transmission treatment, which has resulted in greater research, are increasingly being applied to of antimicrobial-resistant bacteria17,18. reductions in viral loads than with standard clinical medicine, involving both humans8 In this Opinion article, we will address care (undetectable HIV load in 32% versus and pathogens. For example, WGS has been the challenges and opportunities for making 14% of patients after six months)21,22. Thus, used to identify new routes of transmission WGS, using modern next-generation sequencing resistant variants to guide HIV of Mycobacterium abscessus9 in healthcare sequencing (NGS) methods, standard treatment improves disease outcomes. facilities (nosocomial transmission) and to practice in clinical virology. We will discuss Similar approaches have been used to understand Neisseria meningitidis epidemics the strengths, weaknesses and technical identify resistant variants of HCV23, HBV24 in Africa10, whereas the sequencing of challenges of different viral WGS methods and influenza virus25.

NATURE REVIEWS | VOLUME 15 | MARCH 2017 | 183 ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

Table 1 | Advantages and disadvantages of different viral sequencing methods PCR reactions (which increases the chance of failure), requires more starting material, Method Advantages Disadvantages is more labour intensive and generally less Metagenomic • Simple, cost-effective sample • High sequencing cost to obtain tractable for diagnostic use31. Sequencing sequencing preparation sufficient data the whole genome simultaneously captures • Can sequence novel or poorly • Relatively low sensitivity to target characterized genomes pathogen all resistant variants and removes the need • Effective in ‘fishing’ approaches • Coverage is proportional to viral to design and optimize PCR assays for to identify a potential underlying load the detection of resistance to new drugs. pathogen • High proportion of non-pathogen A good example of this is HCMV, for which • Lower required number of PCR cycles reads increases computational causes few amplification mutations challenges WGS can simultaneously capture the genes • Preservation of minor variant • Incidental sequencing of human and that encode targets of licensed therapies, frequencies reflects in vivo variation off-target pathogens raises ethical such as UL27 (unknown function), UL54 • No primer or probe design required, and diagnostic issues (DNA polymerase) and UL97 (serine/ which enables a rapid response to threonine protein kinase), and of newer novel pathogens or sequence variants drugs, such as letermovir, which targets PCR • Tried and trusted well-established • Labour-intensive and difficult to UL56 (terminase complex). This enables amplification methods and trained staff scale for large genomes sequencing • Highly specific; most sequencing • Iterating standard PCRs across comprehensive antiviral-resistance testing 12 reads will be pathogen-specific, which large genomes requires high sample in a single test . In addition, WGS can decreases sequencing costs volume provide information on antigenic epitopes, • Highly sensitive, with good coverage • PCR reactions are subject to primer virus evolution in a patient over time12, and even at low pathogen load mismatch, particularly in poorly evidence of recombination between HCMV • Relatively straightforward design and characterized or highly diverse 32 application of new primers for novel pathogens, or those with novel strains . WGS can also detect putative sequences variants novel drug-resistant variants and predict • Limited ability to sequence novel changes to epitopes, although phenotypic pathogens testing of variants is required to confirm • High number of PCR cycles may 33 introduce amplification mutations clinical resistance and to map epitope 34 • Uneven amplification of different changes . PCR amplicons may influence As pre-existing resistance to antiviral minor variant and haplotype drugs increases (for example, HCV that is reconstruction resistant to protease inhibitors35 and HBV Target • Single tube sample preparation that is • High cost and technical expertise that is resistant to nucleoside analogue enrichment suited to high-throughput automation for sample preparation reverse-transcriptase inhibitors36), WGS sequencing and the sequencing of large genomes • Unable to sequence novel • Higher specificity than pathogens and requires will provide the comprehensive resistance decreases sequencing costs well-characterized reference data that are required for selecting • Overlapping probes increases genomes for probe design appropriate treatment. The complete tolerance for individual primer • Sensitivity is comparable to PCR, knowledge of all resistant variants can mismatches but coverage is proportional to also support novel decisions in clinical • Fewer PCR cycles (than PCR pathogen load; low pathogen load amplification) limits the introduction yields low or incomplete coverage management; for example, the identification of amplification mutations • Cost and time to generate new of extensive genome-wide HCMV drug • Preservation of minor variant probe sets limit a rapid response to resistance in a patient supported the frequencies reflects in vivo variation emerging and novel viruses decision to treat the individual with autologous cytomegalovirus-specific T cells instead of antiviral drugs37. Why sequence whole genomes? genes encompass more than 50% of the WGS may also better identify Limited sequencing of the small number viral genome26. Individually sequencing transmission events and outbreaks, which of genes that encode targets of antiviral each of these genes can be as expensive and is not always possible with sequences agents, such as HIV pol, has been the norm time-consuming as WGS27. Partial-genome of subgenomic fragments. For example, in clinical practice. For the detection of sequencing is particularly problematic for WGS of respiratory syncytial virus (RSV) a limited number of antiviral-resistant large viral genomes, in particular those identified variation outside of the gene variants, WGS has been too costly and of the herpesviruses HCMV12, varicella that is traditionally used for genotyping, labour-intensive to use compared with zoster virus (VZV)28, 1 and such information could be used to sequencing only the specific genes that (HSV‑1)29 and HSV‑2 (REF. 30). These track outbreaks in households when the are targeted by the drugs. However, the viruses have traditionally been treated genetic variability in single genes is too low increasing number of resistance genes that with drugs that target the viral thymidine for transmission studies38. The numerous are located across viral genomes, together or serine/threonine protein kinases and phylogenetically informative variant sites with decreasing costs of sequencing and DNA polymerase. However, the increasing that can be obtained from full-length or the use of sequence data for transmission number of drugs in development that near full-length genomes removes the need studies, are driving a reappraisal of the need interact with different proteins that are for high-quality sequences, which enabled for WGS. For example, antiviral treatment encoded by viral genes scattered across the the robust linking of cases of Ebola virus for HCV now targets four gene products genome, means that targeted sequencing infection and public health interventions (NS3, NS4A, NS5A and NS5B), and these for resistance testing is costly, involves more in real time during the 2015 epidemic39.

184 | MARCH 2017 | VOLUME 15 www.nature.com/nrmicro ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

Box 1 | RNA-seq and metagenomic diagnostics of X4 or R5X4 genotypes is predictive of maraviroc treatment failure50. Sub-consensus For cases of encephalitis of unknown origin, metagenomic techniques are promising diagnostic frequencies of X4‑tropic or R5X4‑tropic tools. There are various protocols in use, but the main methods that are used are RNA sequencing HIV are also important for the success51 and (RNA-seq) and metagenomics. For RNA-seq, the total RNA, or a subset of RNA, is extracted from a failure52 of bone marrow transplants from sample (for example, cerebrospinal fluid or a brain biopsy), converted to complementary DNA (cDNA) and sequenced. Metagenomics generally describes the same procedure for DNA, but may CCR5‑deficient (CCR5‑Δ32) donors, and also include simultaneous sequencing of DNA and RNA through the incorporation of a this information may influence the decision 51 cDNA-synthesis step. RNA-seq may improve the detection of pathogenic viruses, as many viruses to stop antiviral therapy in these patients . have RNA genomes and viral mRNAs in the cerebrospinal fluid (CSF) or brain indicate both the Minority variants and the identification of presence of the virus and which viral genes are being transcribed. However, DNA viruses, which haplotypes can also be used to detect mixed experience low-level transcription, may be poorly detected using RNA-seq, and read numbers for infections. Infections with different HCMV DNA viruses may be higher in metagenomic datasets63. genotypes or super-infections53 are associated Both methods have successfully identified new or known viral pathogens in cases of encephalitis with poor clinical outcomes, and the of unknown origin. Metagenomics has been used to aid the diagnosis and characterization of detection of such mixed infections by WGS enterovirus D68 in cases of acute flaccid paralysis83. Metagenomics identified herpesviruses in the might justify more aggressive treatment. CSF of four patients who had suspected viral meningoencephalitis136. RNA-seq also identified herpes simplex virus 1 (HSV‑1) in a patient with encephalitis, although the use of a DNase I Sanger sequencing of a virus population digestion (which was intended to decrease the amount of host nucleic acid) decreased the number can detect minority variants at frequencies 54 of HSV‑1 reads63. Mumps vaccine virus has also been detected in a patient with chronic between 10% and 40% , whereas NGS can encephalitis using RNA-seq137. sequence those same PCR amplicons to a RNA-seq has been very successful in the identification of encephalitis caused by astroviruses138,139 much greater depth55, and, consequently, and coronaviruses65. The deaths of three squirrel breeders from encephalitis were linked to a novel capture more of the variability that is squirrel bornavirus, which was identified by separate metagenomic sequencing of DNA and RNA62. present. Sensitivity and specificity are Metagenomics provides more information about the virus in a sample than PCR alone, which may specific for the analysed virus and the be important for molecular epidemiology, whereas RNA-seq can identify viral sequences and viral sequencing method. Many studies of drug gene expression. resistance in HIV that use deep-sequencing of PCR amplicons require minority variants to be present at >1% to decrease Another example of WGS supporting public at an extremely high rate (4.1 ± 1.7 × 10−3 the possibility of false-positive results56,57. health efforts is with the recent outbreak of per base per cell)45. Many closely related, This may miss drug-resistance mutations (BOX 2). but subtly different, viral variants exist in a at frequencies of 0.1–1% and lead to poor The routine use of pathogen WGS for single patient. These variants are sometimes treatment outcome57. Although a 1–2% diagnostic purposes40 is likely to have wider described as a quasispecies or a cloud of frequency threshold (or lower) may be clinical and research benefits. For example, intra-host virus diversity. The presence of clinically relevant for the detection of drug Zika virus sequences that were generated a mixed population of viruses introduces resistance in HIV, it is less clear whether for epidemiological purposes inform problems for the determination of the true the same degree of sensitivity is required public health decisions41. In addition, HIV consensus ‘majority’ sequence, but these for monitoring vaccine escape in HBV or genomes that were sequenced to identify minority (non-consensus) variants may drug resistance in herpesviruses (discussed antiviral-resistant variants have also been also change the clinical phenotype of the below). Large cohorts of patients need to be used to study virus evolution42 and viral virus, and can be used to predict changes in tested before, during and after treatment46,50 genetic association with disease, including genotype, tropism or drug resistance. For to establish thresholds for minority genotype–phenotype association studies example, a minor variant that confers drug drug-resistant variants12 and vaccine-escape and genome-to‑genome association studies, resistance in HIV that is present in only 2.1% variants that are clinically relevant for which look for associations between viral of sequencing reads in a patient at baseline each virus. genetic variants, host genetic variants and can rapidly become the majority (consensus) Direct deep sequencing of clinical outcomes of infection, such as set variant under the selective pressure of drug material, either by shotgun methods or point in HIV infection43,44. treatment46. Similar changes in the frequency RNA-seq methods (so called metagenomic of resistance-associated alleles during methods), also enables the unbiased Why do we need deep sequencing? treatment have been observed for HBV47, detection and diagnosis of pathogens, and Modern methods, which use massively HCV48, HCMV12 and influenza virus49. provides an alternative to culture, electron parallel sequencing, enable better Deep sequencing of viruses is not only microscopy and quantitative PCR (qPCR; examination of diversity and the analysis of required to detect drug resistance, it is also see below). virus populations that contain nucleotide key for genotypically predicting the receptor variants or haplotypes at low frequencies tropism of HIV, which has treatment Practical considerations (less than 50% of the consensus sequence). implications. HIV can be grouped by its Sequencing viral nucleic acids, whether Minority variant analysis is particularly use of cellular co‑receptor into R5 (uses from cultures or directly from clinical powerful for RNA viruses, reverse-­ CC-chemokine receptor 5 (CCR5)), X4 specimens, is complicated by the presence transcribing DNA viruses and , (uses CXC chemokine receptor 4 (CXCR4)) of contaminating host DNA58. By contrast, because they typically show high diversity, or R5X4 (dual tropism). Maraviroc is a most bacterial sequencing is currently even in a single host. HIV is the classic CCR5 antagonist that blocks infection of carried out on clinical isolates that are example; the reverse transcriptase of HIV R5‑tropic HIV, but not of X4‑tropic and cultured; thus, sample preparation is is error-prone and introduces mutations R5X4‑tropic HIV. Just a 2% frequency comparatively straightforward (TABLE 2 and

NATURE REVIEWS | MICROBIOLOGY VOLUME 15 | MARCH 2017 | 185 ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

Box 2 | Whole-genome sequencing of Zika virus DNA (cDNA)–amplified fragment length polymorphism (AFLP), abbreviated to Whole-genome sequencing (WGS) of Zika virus can help to understand the epidemiology of the VIDISCA), filtration, ultracentrifugation recent outbreak in South America, including the origin and spread of the virus, and the connection and the depletion of free nucleic acids, between the virus and microcephaly. It also informs control measures, such as stopping which mostly come from the host, have importation of Zika cases or disrupting transmission from a reservoir, and blood safety measures 74–77 in hospitals. all been tried ; however, these methods For flaviviruses, such as Zika virus, WGS, or at least near whole-genome sequencing, is required may also decrease the total amount of viral to provide molecular epidemiology studies sufficient power41. WGS, phylogenetic analysis and nucleic acids so that it is insufficient for molecular clock dating, combined with other epidemiological data, were useful to study the preparing a sequencing library. Non-specific introduction of Zika virus to South America41. For example, the most recent common ancestor of amplification methods, such as multiple strains that are circulating in Brazil pre-dates the 2014 football World Cup, which makes it highly displacement amplification (MDA), unlikely that this event was responsible for the introduction of the Asian-lineage Zika virus to which make use of random primers and 41 South America . Φ29 polymerases, can increase the DNA WGS is also central for understanding the pathogenesis of Zika virus; for example, by trying to yield. However, these approaches are time identify sequence changes that are associated with microcephaly, as it is currently unclear which consuming, costly, and may increase the genome regions determine pathogenesis. It is likely that numerous whole-genome sequences of Zika virus from around the world and from individuals with microcephaly and asymptomatic risk of biases, errors and contamination, 78,79 infection are required to link particular mutations to birth defects. So far, no changes in the Zika without necessarily improving sensitivity . virus genome have been unambiguously associated with microcephaly41,72,81. Moreover, the proportion of host reads WGS and fragment sequencing were used to identify a case of Zika virus transmission through often remains high80. platelet transfusion140. This case suggested that asymptomatic donors can transmit the virus to When metagenomic methods are used immunocompromised individuals. PCR-based testing had already established the presence of Zika for pathogen discovery or diagnosis, it is virus in the blood supply in a previous outbreak, but no infection was detected in recipients of crucial to use appropriate bioinformatic 141 blood products . Based on this new evidence, blood products may need to be screened routinely tools and databases that can evaluate 140 for Zika virus . whether detected pathogen sequences Finally, WGS of Zika virus isolates has identified sequence polymorphisms in primer binding are likely to be the cause of infection, sites142, which may make PCR-based diagnosis and the quantification of viral load more difficult. This highlights the need to characterize population-level diversity, especially in epidemics in which incidental findings or contaminants. the locally circulating virus may have diverged from viruses from other locations or time periods. Bioinformatic analyses of large metagenomic Several projects are underway to determine population-level diversity, including the Zika in Brazil datasets require high-performance real time analysis (ZIBRA) mobile laboratory project143, which uses portable metagenomic computational resources. sequencing of Zika virus and real-time reporting of results107. The fact that metagenomics requires no prior knowledge of the viral genome, can be considered an advantage27 as it enables novel reviewed in REF. 59). Currently, genome causes68 of encephalitis. In addition, these viruses to be sequenced without the need for sequencing of viruses can be achieved methods have been used to sequence the primer or probe design and synthesis. This by ultra-deep sequencing or through whole genome of some viruses, including is particularly relevant for rapid responses to the enrichment for viral nucleic acids Epstein–Barr virus (EBV)69 and HCV27. emerging threats, such as Zika virus81. For before sequencing, either directly or However, in clinical specimens, the presence virus-associated cancers, metagenomics can by concentrating virus particles. All of of contaminating nucleic acids from the host inform clinical care, provide information on these approaches have their own costs and commensal microorganisms58 (TABLE 2) cancer evolution and generate high-coverage and complexities. decreases sensitivity. The proportion of data of integrated virus genomes69. However, Three main methods are currently used reads that match the target virus genome incidental findings, both in host and for viral genome sequencing: metagenomic from metagenomic WGS is often low; for microbial sequences, may also present sequencing, PCR amplicon sequencing and example, 0.008% for EBV in the blood of a ethical and even diagnostic dilemmas target enrichment sequencing (FIG. 1). healthy adult70, 0.0003% for Lassa virus in for clinical metagenomics82 (see below). clinical samples71 and 0.3% for Zika virus in A recent example involved a cluster of cases Metagenomics. Metagenomic approaches a sample that was enriched for virus particles of acute flaccid myelitis that were associated have been used extensively for pathogen through filtration and centrifugation72. The with enterovirus D68 (REF. 83). The discovery and for the characterization of read depth is often inadequate to detect metagenomic data from samples taken from microbial diversity in environmental and resistance27 and the cost is high. Thus, patients showed the presence of alternative clinical samples60,61. Total DNA and/or RNA, metagenomic sequencing has typically pathogens, some of which are treatable, including from the host, bacteria, viruses, only been carried out on a small number and was debated in formal84 and informal fungi and other pathogens, are extracted of samples for research purposes72,73. The scientific channels (see Omicsomics from a sample, and a library is prepared and concentration of virus particles (see the blogspot article). Regulation and reporting sequenced by or RNA Zika virus example above72), depletion of frameworks will be important to resolve sequencing (RNA-seq). BOX 1 explores the host material and/or sequencing to high future issues of this kind. diagnostic applications for metagenomics read depth can increase the amount of and RNA-seq; for example, in encephalitis virus sequence, but all of these methods PCR amplicon enrichment. An alternative of unknown aetiology62–64, for which add to the cost. The concentration of virus to metagenomic approaches is to enrich the conventional methods such as PCR are often particles from clinical specimens by anti- specific viral genome before sequencing. not diagnostic, metagenomics and RNA-seq body-mediated pull-down (for example, PCR amplification of viral genetic material have detected viral infections65–67 and other virus discovery based on complementary using primers that are complementary to

186 | MARCH 2017 | VOLUME 15 www.nature.com/nrmicro ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

86,87 Table 2 | Limitations of viral sequencing compared with bacterial sequencing PCR products, respectively . For clinical applications this is problematic because Feature Bacteria Viruses Challenges of the high laboratory workload that is Genome dsDNA dsDNA, ssDNA, partially Different extraction associated with numerous discrete PCR dsDNA, ssRNA or dsRNA protocols for different reactions, the necessity for individually viruses. RNA viruses require cDNA synthesis and ssDNA normalizing concentrations of different second strand synthesis PCR amplicons before pooling, the increasing probability of reaction failure Gene Highly conserved, No homologous genes Lack of conserved conservation essential genes (for between viruses of homology between viral due to primer mismatch (particularly for example, 16s rRNA) different phyla phyla prevents universal highly variable viruses), and the high costs enabling broad primer-based surveys of of labour and consumables94. Therefore, studies although PCR-based sequencing of viruses and surveys of taxa as large as 250 kb is technically possible, Culture Often straightforward Challenging to culture, Cultured viruses are heavily the proportional relationship between to culture and obtain and require a host cell for contaminated with host genome size and technical complexity pure, highly enriched replication cell nucleic acids, which bacterial DNA and decreases viral sequencing make PCR-based sequencing of viral RNA output genomes that are more than 20–50 kb Clinical Hardy bacterial cells Viruses are intracellular Clinical specimens are impractical with current technologies, specimens with cell walls can pathogens, and although heavily contaminated with particularly for large multi-sample often be separated separation from the host nucleic acids, which studies or routine diagnostics. Another from human cells in host is possible (for decreases viral sequencing consideration is that increasing numbers clinical specimens example, by filtration output of PCR reactions require a corresponding using differential or antibody pull-down), lysis methods or flow viruses cannot easily increase in sample amount, and this is not cytometry144 prior to be separated from always possible as clinical specimens are extraction clinical samples prior to limited. Improvements in microfluidic extraction technologies may help to overcome some Methylation Bacteria use different DNA viruses are often DNA digestion according of these barriers; for example, Fluidigm, patterns methylation patterns methylated by the host to methylation patterns is RainDance and other ‘droplet’ sequencing from eukaryotes; host intracellular machinery, less effective as a means technologies. -based PCR DNA can be depleted and may have similar of host depletion for viral post-extraction methylation patterns sequencing and the pooling of multiple amplicons have using restriction been used successfully to sequence several endonucleases that antimicrobial-­resistance loci (for example, are directed against from the microbiome of pigs)95 and can also CpG methylation145 be used for viral genomes, potentially down cDNA, complementary DNA; dsDNA, double-stranded DNA; dsRNA, double-stranded RNA; rRNA, to the single-genome level96. ribosomal RNA; ssDNA, single-stranded DNA; ssRNA, single-stranded RNA. Highly variable pathogens, particularly those that have widely divergent genetic a known nucleotide sequence has been the heterogeneity of RNA viruses, such as lineages or genotypes, such as HCV97 most common approach for enriching small HCV27, norovirus86, rabies virus91 and and , cause problems for viral genomes, such as HIV and influenza RSV38, may necessitate the use of multiple PCR amplification, such as primer virus. Recent examples of PCR amplicon overlapping sets of primers to ensure amplification27,92 and primer mismatches86. enrichment followed by WGS include the amplification of all genotypes. PCR Careful primer design may help to mitigate phylogenetic analysis of a measles virus amplicon sequencing is more successful these problems, but novel variants outbreak at the 2010 Winter Olympics85 for WGS from samples that have low remain problematic. and tracking the recent Ebola virus39 and virus concentrations than metagenomic Zika virus (BOX 2) epidemics. PCR amplicon methods27, although other methods such as Target enrichment. Methods of target WGS of norovirus, which has a genome target enrichment of viral sequences may enrichment (also known as pull-down, size of 7.5 kb, has been used to understand work equally well in such samples, as shown capture or specific enrichment methods) can virus transmission in community86 and for norovirus samples92. be used to sequence whole viral genomes hospital87 settings, which revealed both Overlapping PCRs combined with NGS directly from clinical samples without the independent introductions of the pathogen have been used to sequence the whole need for prior culture or PCR98–100. These to the hospital and nosocomial transmission genomes of larger viruses, such as HCMV93, methods typically involve small RNA or despite measures to control infection87. but this method has limited scalability, as DNA probes that are complementary to the Other PCR-based deep-sequencing studies many primers and a relatively large amount pathogen reference sequence (or a panel of have generated several whole genomes of starting DNA are required93. This limits reference sequences). Unlike specific PCR of influenza virus88 (~13.5 kb), dengue the number of suitable samples that are amplicon-based methods, the reaction virus89 (~11 kb) and HCV90 (9.6 kb). This available and also the genomes that can be can be carried out in a single tube that was feasible because these viruses all have studied using this method. For example, contains overlapping probes that cover the relatively small genomes that require 8–19 PCR products were required to whole genome. In a hybridization reaction, only a few PCR amplicons to assemble amplify the genome of Ebola virus39, and the probes, which are bound to a solid whole-genome sequences. However, the two studies of norovirus needed 14 and 22 phase (for example, streptavidin-labelled

NATURE REVIEWS | MICROBIOLOGY VOLUME 15 | MARCH 2017 | 187 ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

27,69 Direct metagenomic PCR amplicon sequencing Target enrichment sequencing concentrations . With metagenomics, sequencing the proportion of sequencing data that map to the pathogen from unenriched clinical samples is small. Target enrichment can increase the percentage of on‑target viral reads from 0.01% to 80% or more69. The improvement in quality and depth of sequence that results enables more samples to be sequenced per run than unenriched metagenomic libraries for equivalent on‑target sequencing performance. This Biotinylated Amplicon PCR pathogen-specific improvement also decreases the price of with pathogen-specific RNA or DNA + sequencing, although the cost of library primers to enrich bait library pathogen DNA preparation is increased. There are alternative approaches for the enrichment of viral reads, including pulsed-field gel electrophoresis (PFGE)105, which separates Baits hybridized to pathogen large viral genomes from smaller fragments genome Bead capture of host DNA. Target enrichment using Enrichment techniques that use RNA or DNA baits degenerate RNA or DNA probes to capture hundreds of viral species have also been developed; for example, capture sequencing (VirCapSeq)106. This method is designed for the detection of both known Figure 1 | Methods for sequencing viral genomes from clinical specimens. All specimens originally and novel viruses, although its performance Pathogen Host Clinical Library Sequencing comprise aDNA mix orof cDNAhost (in blue) andDNA pathogen or cDNA (in red)sample DNA sequences.preparation For pathogens that have RNA remains to be evaluated. genomes, RNA in the sample is converted into complementary DNA (cDNA) before PCR and library preparation. Direct metagenomic sequencing provides an accurate representation of the sequences Comparison of methods. To date, there Nature Reviews | Microbiology in the sample, although at high sequencing and data analysis and storage costs. PCR amplicon has been very little direct comparison sequencing uses many discrete PCR reactions to enrich the viral genome, which increases the work- between the three methods for viral genome load for large genomes substantially but decreases the costs. Target enrichment sequencing uses sequencing in clinical practice, with only virus-specific nucleotide probes that are bound to a solid phase, such as beads, to enrich the viral genome in a single reaction, which reduces workload but increases the cost of library preparation one paper evaluating relative performance 27 compared with PCR. for the sequencing of HCV . Results from this study, in which three different enrichment protocols, two metagenomic magnetic beads), capture or ‘pull down’ of the diversity in and between samples. methods and one overlapping PCR method complementary DNA sequences from the Target enrichment is possible despite small were evaluated, showed that metagenomic total nucleic acids that are present in a mismatches between template and probe; methods were the least sensitive, yielded sample. Capture is followed by sequencer-­ however, whereas PCR amplification the lowest genome coverage for comparable specific adaptor ligation and a small number requires only knowledge of flanking regions sequencing effort and were more prone to of PCR cycles to enrich for successfully of a target region, target enrichment requires result in incomplete genome assemblies. ligated fragments. This has been used knowledge of the internal sequence to The PCR method required repeated successfully to characterize small and design probes. However, if one probe fails, amplification and was the most likely to large, clinically relevant viruses, such as internal and overlapping regions may still miss mixed infections, but when reactions HCV27, HSV‑1 (REF. 101), VZV100, EBV102, be captured by other probes69,100. Target were successful it resulted in the most HCMV69, (HHV6)103 enrichment is not suitable for the charac- consistent read depth, whereas read depth and HHV7 (REF. 104). The reaction is terization of novel viruses that have low was proportional to virus copy number carried out in a single well and, similarly to homology to known viruses for which in metagenomics and target enrichment. microfluidics-based PCR, is amenable to metagenomics, and, in some cases, PCR PCR generated more incomplete sequences high-throughput automation102. The lack using degenerate primers, which are a mix for some HCV genotypes (particularly of a culture step means that the sequences of similar but variable primers, may be genotype 2) than metagenomics and target that are obtained are more representative more appropriate. enrichment. Target enrichment was the most of original virus rather than cultured virus As with all methods, the technique consistent method to result in full genomes isolates, and there are fewer mutations than is constrained by the starting virus and identical consensus sequences. The in PCR-amplified templates69,100. The success concentration. Although viruses could ease of library preparation for metagenomic of this method depends on the available be sequenced from samples with viral and target enrichment sequencing of HCV reference sequences for the virus of interest; loads as low as 2,000 International Units was considered a major advantage for specificity increases when probes are (IU) ml–1 (for HCV) or 2,500 IU ml–1 (for clinical applications, but PCR may still be designed against a larger panel of reference HCMV), there was a reduced depth of appropriate for samples that have very low sequences, as this leads to better capture coverage in sequencing data at lower virus viral loads.

188 | MARCH 2017 | VOLUME 15 www.nature.com/nrmicro ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

Similar results were achieved in a study in clinical virology laboratories that use The issues of sensitivity and that compared PCR amplicon sequencing multiplex PCRs is to suppress results that contamination are especially important and target enrichment sequencing of have not been requested (J.B., unpublished in WGS, because of the risk of both norovirus92. With target enrichment observations). In the United Kingdom, the false-negative and false-positive detection sequencing, the whole viral genome could clinical virologist who interprets the test of pathogens. Highly sensitive sequencing be sequenced in all 164 samples, whereas results is part of the team that manages the (whether metagenomic, PCR-based or target PCR-based sequencing was only patient, and, as such, may decide to discuss enrichment-based) may detect low-level possible in 158 out of the 164 samples, an unexpected result with the physician contaminating viral nucleic acids112,113. owing to low virus titres and PCR primer in charge. Incidental host genetic findings For example, murine leukaemia virus114,115 mismatches, which suggests that target (for example, the detection of variants that and parvovirus-like sequences116,117 are enrichment is more sensitive than PCR predispose to cancer development) in a just two of many contaminants that can for sequencing norovirus and better pathogen metagenomic analysis are not come from common laboratory reagents, accommodates between-strain sequence reported to the individual in the United such as nucleic acid extraction columns118. heterogeneity92. Target enrichment has Kingdom, because this is only permissible As with other highly sensitive technologies, also been used for samples that have with the consent of the patient. In regard robust laboratory practices and protocols low viral loads and incomplete genome to both host and virus incidental findings, are required to minimize contamination. coverage in metagenomic sequencing107. target enrichment and PCR have the It is also important to remember that the Both metagenomic and target enrichment advantage of only providing results about detection of viral nucleic acid does not sequencing can be used for pathogen the pathogen of interest. The ethical necessarily identify the cause of illness, genomes of all sizes, whereas PCR-based and privacy concerns that are associated and it is good practice when using NGS methods are less suitable for large viral with the presence of host genetic data in methods for the diagnosis of viral infections genomes or for non-viral (that is, bacterial, publically available metagenomic datasets to confirm the findings with alternative fungal and parasite) genomes. have been well reviewed82 and represent a independent methods that do not rely on Direct comparisons of different separate challenge. testing for nucleic acids. For example, in methods27,92 will be important for cases of encephalitis of unknown origin, determining when each method should be Regulatory challenges. Regulation, as well positive NGS findings can be confirmed used, based on sensitivity and specificity, as as helping to address some of the ethical through immunohistochemical analysis well as factors such as cost, scalability and concerns, is also important in standardizing of the affected tissue65,119, or identification turn-around time, which are particularly WGS of viruses. The framework that is of the virus by electron microscopy or important in clinical applications (TABLE 1). required to make viral WGS sufficiently tissue culture82. robust and reproducible in clinical practice The standardization of methods, Analysis and interpretation challenges will come from several areas. including , will be key to Beyond the technical challenges of viral The framework of laboratory the success of NGS and WGS in clinical WGS that are mentioned above, there are accreditation and benchmark testing that virology. Software packages that use a several other roadblocks that may slow are already available (for example, Clinical graphical user interface (GUI) are preferable the advance of WGS in the clinic. They Laboratory Improvement Amendments to tools that require command-line may be considered in three groups: ethical of 1988 (CLIA) regulations in the USA, or expertise. Strict version control of software issues, including incidental host and accreditation according to medical laboratory and analysis pipelines is required to ensure microbiological findings; regulatory issues, quality and competence standardization that results are reproducible, to make best such as the establishment of standards, criteria for ISO 15189) will support the practices easily shareable, and to enable good laboratory practice, and sensitivity development of viral WGS standards, the accreditation of analysis software. and specificity thresholds for sequencing; provided that there is sufficient need and However, best-practice analysis methods and analytical issues regarding data pressure to implement clinical viral WGS. are continually evolving and the premature interpretation and the numerous choices of Lessons learned from the use of PCR standardization of best practices in an analysis options. in diagnostics may be useful here, starting overly rigid manner may inhibit innovation. with ensuring good clinical laboratory Commercialization and regulation Ethical issues and incidental findings. and molecular practices109,110. This will may help, as they provide financial and In many clinical tests (for example, mean including negative samples in every regulatory incentives to ensure that analysis magnetic resonance imaging (MRI) sequencing run to assess contamination tools and technologies meet clinical needs. scans and sequencing of the genomes thresholds, spiking samples with a known Finally, the development of well-curated of patients), there is a risk of detecting virus to provide a sensitivity threshold, and databases that show which variants are truly a disease association that is not part of including positive controls and controls indicative of drug resistance will be crucial the original investigation but might be for batch‑to‑batch variation111, all of which for accurate clinical interpretation. Such of clinical importance for the individual will increase sequencing costs and are databases have already been created for or their family. These incidental findings likely to deter the adoption of pathogen HIV120, HBV121,122 and HCV123, but without remain a topic of intense medical ethical genome sequencing by laboratories that are recognition of their value by funding debate108. The risk of incidental findings sequencing only small batches of samples. agencies and corresponding centralized in pathogen sequencing (for example, The centralization of virus WGS can help funding to ensure their continued the discovery of HIV infection during to ensure the maintenance of adequate maintenance and upkeep, these databases metagenomic sequencing for other standards, the processing of large batches and associated tools may become swiftly pathogens) is not novel and the solution of samples and reducing costs. outdated or unusable.

NATURE REVIEWS | MICROBIOLOGY VOLUME 15 | MARCH 2017 | 189 ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

Financial barriers to the clinical use single reads. MinION also has the advantage and selective depletion of DNA with a of viral WGS. Although there are good of being very fast, taking as little as four certain methylation pattern), no similar reasons for sequencing whole genomes hours to go from sample receipt to reporting methods exist, so far, for viral sequencing. and, in general, for using NGS, if diagnostic of analysed data125. So far, viral read lengths or hospital-based laboratories are to achieved by MinION sequencing have been Conclusions be persuaded to transition away from relatively modest; examples of mean read Viral WGS is of increasing clinical sequencing subgenomic fragments, they lengths include 751 bp for modified vaccinia importance for diagnosis, disease need to see the benefit of the additional Ankara virus, 758 bp for cowpox virus126, management, molecular epidemiology and information for patient care and the 455 bp (range 126–1477 bp) for chikungunya infection control. There are several methods practical feasibility of WGS. This includes virus, 358 bp (range 220–672 bp) for that are available to achieve WGS of viruses WGS workflows that are as scalable and Ebola virus, 1,576 bp127 (M.A.B., unpublished from clinical samples; amplicon sequencing, automatable as subgenomic fragment observations: 6,895bp) for HCMV and target enrichment or metagenomics. sequencing, a suitable regulatory framework 572bp (range 318–792 bp) for HCV128. Currently, the choice of method is and a price for sequencing whole genomes Results from the better-established PacBio specific to both the virus and the clinical that is competitive with sequencing technology are more promising, including question. Metagenomic sequencing is most fragments. a recent report of a mean read length of appropriate for diagnostic sequencing of Currently, the cost of sequencing 12,777 bp for virus129, which unknown or poorly characterized viruses, viral genomes, despite their small size, has a double-stranded DNA genome that PCR amplicon sequencing works well for remains higher than the cost of sequencing is around 142 kb in length. 9.2 kb reads short viral genomes and low diversity in subgenomic resistance genes. The cost have been achieved using PacBio for HCV, primer binding sites, and target enrichment difference between sequencing a target although 9.2 kb of the 9.6 kb genome had works for all pathogen sizes but is region and the whole virus genome is been pre-amplified by PCR130. particularly advantageous for large viruses largely governed by the size of the genome A drawback of both NGS and and for viruses that have diverse but well-­ versus the size and number of target loci. single-molecule sequencing is the need characterized genomes. Two obvious areas In addition, whole-genome information may for high coverage to minimize the effect of innovation currently exist: methods that provide important additional knowledge, as of sequencing errors. This is particularly can effectively deplete host DNA without discussed above. problematic for studies of drug resistance, as affecting viral DNA, and the further drug resistance most frequently results from development of long-read technologies What does the future hold? single-nucleotide mutations or small deletions to achieve the flexibility and competitive Current NGS technologies that are based (1–3 bases), especially in lower-fidelity RNA pricing of short-read technologies. New around Illumina, 454, Ion Torrent or Sanger viruses131. Achieving the high coverage that is technologies are required to unite the methodologies all generate short-read data, necessary to ensure accurate variant typing is strengths of these different methods and which presents challenges for haplotype challenging when there is a lot of host DNA enable healthcare providers to invest in a phasing; that is, determining whether compared with viral sequences, and when single technology that is suitable for all viral genetic variants (whether inter-host or the error profile of a technology makes point WGS applications. 125 intra-host) occur on the same genetic mutations particularly hard to detect . At Charlotte J. Houldcroft is at the Department of background (single viral genome or the time of writing, MinION sequencing Infection, Immunity and Inflammation, Great Ormond clonal) or on related, highly similar but (R9 pore chemistry) has raw high quality (so Street Institute of Child Health, University College different genetic backgrounds in the same called ‘2D reads’) read error rates of around London, London WC1N 1EH, UK; and the Division of Biological Anthropology, University of Cambridge, population (sometimes called a viral swarm 5% (J. Quick, personal communication), Cambridge CB2 3QG, UK. or cloud). Furthermore, repetitive regions which compares unfavourably with the Matthew A. Beale is at the Division of Infection and recombination are more difficult to error rates of other technologies (Illumina and Immunity, University College London, resolve using short reads owing to problems (<0.1%), Ion Torrent (~1%), but not PacBio London WC1E 6BT, UK; and The Wellcome Trust Sanger such as mapping ambiguities. The clinical (13% single pass))132, although accuracy Institute, Wellcome Trust Genome Campus, Hinxton, implications of understanding whether, for can been improved using circular consensus Cambridge CB10 1SA, UK. 133,134 example, multidrug-resistant variants occur read sequencing . Judith Breuer is at the Division of Infection together on a single viral genome or are However, combining these long-read and Immunity, University College London, distributed between a mixed population of technologies with target enrichment London WC1E 6BT, UK; and at Great Ormond Street 127,135 Hospital for Children NHS Foundation Trust, viruses, each with different drug-resistance provides a potential way forward , as London WC1N 3JH, UK. profiles, are currently unclear. ambiguities can be resolved if sufficient Correspondence to J.B. Although there are computational depth of sequence is achieved for the [email protected] tools124 to help resolve these issues, new target pathogen, and error rates for all technologies can generate longer reads. methodologies may be decreased by further doi:10.1038/nrmicro.2016.182 Published online 16 Jan 2017 Newer, single-molecule sequencers, such as technological and analytical improvements. PacBio (Pacific Biosciences) and MinION Depleting the nucleic acids of the host is an 1. Gardner, R. C. et al. The complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by (Oxford Nanopore), are capable of extremely alternative solution, as a higher proportion M13mp7 shotgun sequencing. Nucleic Acids Res. 9, long-read sequencing, and whole viral of virus reads would be recovered from 2871–2888 (1981). 2. Lander, E. S. et al. Initial sequencing and analysis genomes (for example, viruses that have each sequencing run. Although there are of the human genome. Nature 409, 860–921 genomes less than 20 kb in size, such as already solutions in place to achieve this for (2001). 3. Fleischmann, R. D. et al. Whole-genome random Ebola virus, norovirus and influenza A bacterial sequencing (for example, depletion sequencing and assembly of Haemophilus influenzae virus) could theoretically be obtained from of human ribosomal RNA or mitochondria, Rd. Science 269, 496–512 (1995).

190 | MARCH 2017 | VOLUME 15 www.nature.com/nrmicro ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

4. Fraser, C. M. et al. The minimal gene complement of resistance mutations using short- and long-read next hepatitis B virus in drug-resistant and drug-naive Mycoplasma genitalium. Science 270, 397–403 generation sequencing technologies. PLoS ONE 11, patients and to detect minor variants in reverse (1995). e0157600 (2016). transcriptase and hepatitis B S antigen. J. Virol. 83, 5. Hayden, E. C. Technology: the $1,000 genome. Nature 30. Piret, J. & Boivin, G. resistance in 1718–1726 (2009). 507, 294–295 (2014). herpesviruses other than cytomegalovirus. Rev. Med. 55. Chou, S. et al. Improved detection of emerging drug- 6. Turnbaugh, P. J. et al. The human microbiome project. Virol. 24, 186–218 (2014). resistant mutant cytomegalovirus subpopulations by Nature 449, 804–810 (2007). 31. Melendez, D. P. & Razonable, R. R. Letermovir and deep sequencing. Antimicrob. Agents Chemother. 58, 7. Grigoriev, I. V. et al. MycoCosm portal: gearing up inhibitors of the terminase complex: a promising new 4697–4702 (2014). for 1000 fungal genomes. Nucleic Acids Res. 42, class of investigational antiviral drugs against human 56. Fonager, J. et al. Identification of minority resistance D699–D704 (2014). cytomegalovirus. Infect. Drug Resist. 8, 269–277 mutations in the HIV‑1 integrase coding region using 8. Worthey, E. A. et al. Making a definitive diagnosis: (2015). next generation sequencing. J. Clin. Virol. 73, 95–100 successful clinical application of whole exome 32. Lassalle, F. et al. Islands of linkage in an ocean of (2015). sequencing in a child with intractable inflammatory pervasive recombination reveals two-speed evolution of 57. Kyeyune, F. et al. Low-frequency drug resistance in bowel disease. Genet. Med. 13, 255–262 (2011). human cytomegalovirus genomes. Virus Evol. 2, HIV-infected Ugandans on antiretroviral treatment is 9. Bryant, J. M. et al. Whole-genome sequencing to vew017 (2016). associated with regimen failure. Antimicrob. Agents identify transmission of Mycobacterium abscessus 33. Lanier, E. R. et al. Analysis of mutations in the gene Chemother. 60, 3380–3397 (2016). between patients with cystic fibrosis: a retrospective encoding cytomegalovirus DNA polymerase in a 58. Liu, P. et al. Direct sequencing and characterization cohort study. Lancet 381, 1551–1560 (2013). phase 2 clinical trial of brincidofovir prophylaxis. of a clinical isolate of Epstein–Barr virus from 10. Lamelas, A. et al. Emergence of a new epidemic J. Infect. Dis. 214, 32–35 (2016). tissue by using next- Neisseria meningitidis serogroup A clone in the 34. Kaverin, N. V. et al. Epitope mapping of the generation sequencing technology. J. Virol. 85, African meningitis belt: high-resolution picture of hemagglutinin molecule of a highly pathogenic H5N1 11291–11299 (2011). genomic changes that mediate immune evasion. mBio influenza virus by using monoclonal antibodies. J. Virol. 59. Loman, N. J. & Pallen, M. J. Twenty years of bacterial 5, e01974‑14 (2014). 81, 12911–12917 (2007). genome sequencing. Nat. Rev. Microbiol. 13, 11. Zaraket, H. et al. Genetic makeup of amantadine- 35. Franco, S. et al. Detection of a sexually transmitted 787–794 (2015). resistant and oseltamivir-resistant human influenza hepatitis C virus protease inhibitor-resistance variant in 60. Venter, J. C. et al. Environmental genome shotgun A/H1N1 viruses. J. Clin. Microbiol. 48, 1085–1092 a human immunodeficiency virus-infected homosexual sequencing of the Sargasso Sea. Science 304, 66–74 (2010). man. Gastroenterology 147, 599–601.e1 (2014). (2004). 12. Houldcroft, C. J. et al. Detection of low frequency 36. Fujisaki, S. et al. Outbreak of infections by hepatitis B 61. Mulcahy‑O’Grady, H. & Workentine, M. L. The multi-drug resistance and novel putative maribavir virus genotype A and transmission of genetic drug challenge and potential of metagenomics in the clinic. resistance in immunocompromised pediatric patients resistance in patients coinfected with HIV‑1 in Japan. Front. Immunol. 7, 29 (2016). with cytomegalovirus. Front. Microbiol. 7, 1317 J. Clin. Microbiol. 49, 1017–1024 (2011). 62. Hoffmann, B. et al. A variegated squirrel bornavirus (2016). 37. Pierucci, P. et al. Novel autologous T‑cell therapy associated with fatal human encephalitis. N. Engl. 13. Witney, A. A. et al. Clinical application of whole- for drug-resistant cytomegalovirus disease after J. Med. 373, 154–162 (2015). genome sequencing to inform treatment for multidrug- lung transplantation. J. Heart Lung Transplant. 35, 63. Perlejewski, K. et al. Next-generation sequencing resistant tuberculosis cases. J. Clin. Microbiol. 53, 685–687 (2016). (NGS) in the identification of encephalitis-causing 1473–1483 (2015). 38. Agoti, C. N. et al. Local evolutionary patterns of human viruses: unexpected detection of human herpesvirus 1 14. Simen, B. B. et al. Low-abundance drug-resistant viral respiratory syncytial virus derived from whole-genome while searching for RNA pathogens. J. Virol. Methods variants in chronically HIV-infected, antiretroviral sequencing. J. Virol. 89, 3444–3454 (2015). 226, 1–6 (2015). treatment-naive patients significantly impact 39. Quick, J. et al. Real-time, portable genome sequencing 64. Duncan, C. J. et al. Human IFNAR2 deficiency: lessons treatment outcomes. J. Infect. Dis. 199, 693–701 for Ebola surveillance. Nature 530, 228–232 (2016). for antiviral immunity. Sci. Transl Med. 7, 307ra154 (2009). 40. Aanensen, D. M. et al. Whole-genome sequencing (2015). 15. Smith, G. J. et al. Origins and evolutionary genomics for routine pathogen surveillance in public health: 65. Morfopoulou, S. et al. Human coronavirus OC43 of the 2009 swine-origin H1N1 influenza A epidemic. a population snapshot of invasive Staphylococcus associated with fatal encephalitis. N. Engl. J. Med. Nature 459, 1122–1125 (2009). aureus in Europe. mBio 7, e00444‑16 (2016). 375, 497–498 (2016). 16. Gire, S. K. et al. Genomic surveillance elucidates Ebola 41. Faria, N. R. et al. Zika virus in the Americas: early 66. Naccache, S. N. et al. Diagnosis of neuroinvasive virus origin and transmission during the 2014 epidemiological and genetic findings. Science 352, infection in an immunocompromised outbreak. Science 345, 1369–1372 (2014). 345–349 (2016). adult with encephalitis by unbiased next-generation 17. Koser, C. U. et al. Routine use of microbial whole- 42. Mbisa, J. L. et al. Evidence of self-sustaining drug sequencing. Clin. Infect. Dis. 60, 919–923 genome sequencing in diagnostic and public health resistant HIV‑1 lineages among untreated patients in (2015). microbiology. PLoS Pathog. 8, e1002824 (2012). the United Kingdom. Clin. Infect. Dis. 61, 829–836 67. Huang, W. et al. Whole-genome sequence analysis 18. Cartwright, E. J., Koser, C. U. & Peacock, S. J. (2015). reveals the enterovirus D68 isolates during the United Microbial sequences benefit health now. Nature 471, 43. Bartha, I. et al. A genome-to‑genome analysis of States 2014 outbreak mainly belong to a novel clade. 578 (2011). associations between human genetic variation, HIV‑1 Sci. Rep. 5, 15223 (2015). 19. Paredes, R. & Clotet, B. Clinical management of HIV‑1 sequence diversity, and viral control. eLife 2, e01123 68. Wilson, M. R. et al. Actionable diagnosis of resistance. Antiviral Res. 85, 245–265 (2010). (2013). neuroleptospirosis by next-generation sequencing. 20. Van Laethem, K., Theys, K. & Vandamme, A. M. HIV‑1 44. Power, R. A. et al. Genome-wide association study of N. Engl. J. Med. 370, 2408–2417 (2014). genotypic drug resistance testing: digging deep, HIV whole genome sequences validated using drug 69. Depledge, D. P. et al. Specific capture and whole- reaching wide? Curr. Opin. Virol. 14, 16–23 (2015). resistance. PLoS ONE 11, e0163746 (2016). genome sequencing of viruses from clinical samples. 21. Durant, J. et al. Drug-resistance genotyping in HIV‑1 45. Cuevas, J. M., Geller, R., Garijo, R., Lopez-Aldeguer, J. PLoS ONE 6, e27805 (2011). therapy: the VIRADAPT randomised controlled trial. & Sanjuan, R. Extremely high mutation rate of HIV‑1 70. Allen, U. D. et al. The of Epstein–Barr Lancet 353, 2195–2199 (1999). in vivo. PLoS Biol. 13, e1002251 (2015). virus in the setting of transplantation relative to non- 22. Clevenbergh, P. et al. Persisting long-term benefit of 46. Vandenhende, M. A. et al. Prevalence and evolution of transplant settings: a feasibility study. Pediatr. genotype-guided treatment for HIV-infected patients low frequency HIV drug resistance mutations detected Transplant. 20, 124–129 (2016). failing HAART. The Viradapt Study: week 48 follow‑up. by ultra deep sequencing in patients experiencing first 71. Matranga, C. B. et al. Enhanced methods for unbiased Antivir. Ther. 5, 65–70 (2000). line antiretroviral therapy failure. PLoS ONE 9, e86771 deep sequencing of Lassa and Ebola RNA viruses from 23. Khudyakov, Y. Molecular surveillance of hepatitis C. (2014). clinical and biological samples. Genome Biol. 15, 519 Antivir. Ther. 17, 1465–1470 (2012). 47. Zhou, B. et al. Composition and interactions of (2014). 24. Kim, J. H., Park, Y. K., Park, E. S. & Kim, K. H. hepatitis B virus quasispecies defined the virological 72. Calvet, G. et al. Detection and sequencing of Zika virus Molecular diagnosis and treatment of drug-resistant response during telbivudine therapy. Sci. Rep. 5, from amniotic fluid of fetuses with microcephaly in hepatitis B virus. World J. Gastroenterol. 20, 17123 (2015). Brazil: a case study. Lancet Infect. Dis. 16, 653–660 5708–5720 (2014). 48. Itakura, J. et al. Resistance-associated NS5A variants of (2016). 25. McGinnis, J., Laplante, J., Shudt, M. & George, K. S. hepatitis C virus are susceptible to interferon-based 73. Lei, H. et al. Epstein–Barr virus from Burkitt Next generation sequencing for whole genome therapy. PLoS ONE 10, e0138060 (2015). Lymphoma biopsies from Africa and South America analysis and surveillance of influenza A viruses. J. Clin. 49. Rogers, M. B. et al. Intrahost dynamics of antiviral share novel LMP‑1 promoter and gene variations. Virol. 79, 44–50 (2016). resistance in influenza A virus reflect complex patterns Sci. Rep. 5, 16706 (2015). 26. Pawlotsky, J. M. Hepatitis C virus resistance to direct- of segment linkage, , and natural 74. Kohl, C. et al. Protocol for metagenomic virus acting antiviral drugs in interferon-free regimens. selection. mBio 6, e02464‑14 (2015). detection in clinical specimens. Emerg. Infect. Dis. 21, Gastroenterology 151, 70–86 (2016). 50. Swenson, L. C., Daumer, M. & Paredes, R. Next- 48–57 (2015). 27. Thomson, E. et al. Comparison of next generation generation sequencing to assess HIV tropism. Curr. 75. Sauvage, V. & Eloit, M. Viral metagenomics and sequencing technologies for the comprehensive Opin. HIV AIDS 7, 478–485 (2012). blood safety. Transfus. Clin. Biol. 23, 28–38 assessment of full-length hepatitis C viral genomes. 51. Hutter, G. et al. Long-term control of HIV by CCR5 (2016). J. Clin. Microbiol. 54, 2470–2484 (2016). delta32/delta32 stem-cell transplantation. N. Engl. 76. Lecuit, M. & Eloit, M. The diagnosis of infectious 28. Brunnemann, A. K. et al. Drug resistance of clinical J. Med. 360, 692–698 (2009). diseases by whole genome next generation varicella-zoster virus strains confirmed by recombinant 52. Kordelas, L. et al. Shift of HIV tropism in stem-cell sequencing: a new era is opening. Front. Cell. Infect. thymidine kinase expression and by targeted transplantation with CCR5 delta32 mutation. N. Engl. Microbiol. 4, 25 (2014). resistance mutagenesis of a cloned wild-type isolate. J. Med. 371, 880–882 (2014). 77. Oude Munnink, B. B. et al. Autologous antibody Antimicrob. Agents Chemother. 59, 2726–2734 53. Coaquette, A. et al. Mixed cytomegalovirus capture to enrich immunogenic viruses for viral (2015). glycoprotein B genotypes in immunocompromised discovery. PLoS ONE 8, e78454 (2013). 29. Karamitros, T. et al. De novo assembly of human patients. Clin. Infect. Dis. 39, 155–161 (2004). 78. Sabina, J. & Leamon, J. H. Bias in whole genome herpes virus type 1 (HHV‑1) genome, mining of non- 54. Solmone, M. et al. Use of massively parallel ultradeep amplification: causes and considerations. Methods canonical structures and detection of novel drug- pyrosequencing to characterize the genetic diversity of Mol. Biol. 1347, 15–41 (2015).

NATURE REVIEWS | MICROBIOLOGY VOLUME 15 | MARCH 2017 | 191 ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.

PERSPECTIVES

79. Jensen, R. H. et al. Target-dependent enrichment of 102. Palser, A. L. et al. Genome diversity of Epstein–Barr 129. Mathijs, E., Vandenbussche, F., Verpoest, S., virions determines the reduction of high-throughput virus from multiple tumor types and normal infection. De Regge, N. & Van Borm, S. Complete genome sequencing in virus discovery. PLoS ONE 10, J. Virol. 89, 5222–5237 (2015). sequence of pseudorabies virus reference strain e0122636 (2015). 103. Tweedy, J. et al. Complete genome sequence of the NIA3 using single-molecule real-time sequencing. 80. Denesvre, C., Dumarest, M., Remy, S., Gourichon, D. human herpesvirus 6A strain AJ from Africa resembles Genome Announc. 4, e00440‑16 (2016). & Eloit, M. Chicken skin virome analyzed by high- strain GS from North America. Genome Announc. 3, 130. Bull, R. A. et al. A method for near full-length throughput sequencing shows a composition highly e01498‑14 (2015). amplification and sequencing for six hepatitis C virus different from human skin. Virus Genes 51, 209–216 104. Donaldson, C. D., Clark, D. A., Kidd, I. M., Breuer, J. genotypes. BMC Genomics 17, 247 (2016). (2015). & Depledge, D. D. Genome sequence of human 131. Kimberlin, D. W. & Whitley, R. J. Antiviral resistance: 81. Mlakar, J. et al. Zika virus associated with microcephaly. herpesvirus 7 strain UCL‑1. Genome Announc. 1, mechanisms, clinical significance, and future implications. N. Engl. J. Med. 374, 951–958 (2016). e00830‑13 (2013). J. Antimicrob. Chemother. 37, 403–421 (1996). 82. Hall, R. J., Draper, J. L., Nielsen, F. G. & Dutilh, B. E. 105. Kamperschroer, C., Gosink, M. M., Kumpf, S. W., 132. Goodwin, S., McPherson, J. D. & McCombie, W. R. Beyond research: a primer for considerations on using O’Donnell, L. M. & Tartaro, K. R. The genomic sequence Coming of age: ten years of next-generation viral metagenomics in the field and clinic. Front. of lymphocryptovirus from cynomolgus macaque. sequencing technologies. Nat. Rev. Genet. 17, Microbiol. 6, 224 (2015). Virology 488, 28–36 (2016). 333–351 (2016). 83. Greninger, A. L. et al. A novel outbreak enterovirus D68 106. Briese, T. et al. Virome capture sequencing enables 133. Li, C. et al. INC-Seq: accurate single molecule reads strain associated with acute flaccid myelitis cases in the sensitive viral diagnosis and comprehensive virome using nanopore sequencing. Gigascience 5, 34 (2016). USA (2012–2014): a retrospective cohort study. Lancet analysis. mBio 6, e01491‑15 (2015). 134. Travers, K. J., Chin, C. S., Rank, D. R., Eid, J. S. Infect. Dis. 15, 671–682 (2015). 107. Naccache, S. N. et al. Distinct Zika virus lineage & Turner, S. W. A flexible and efficient template format 84. Breitwieser, F. P., Pardo, C. A. & Salzberg, S. L. in Salvador, Bahia, Brazil. Emerg. Infect. Dis. 22, for circular consensus sequencing and SNP detection. Re‑analysis of metagenomic sequences from acute 1788–1792 (2016). Nucleic Acids Res. 38, e159 (2010). flaccid myelitis patients reveals alternatives to 108. Hofmann, B. Incidental findings of uncertain 135. Karamitros, T. & Magiorkinis, G. A novel method for the enterovirus D68 infection. F1000Res. 4, 180 (2015). significance: to know or not to know — that is not the multiplexed target enrichment of MinION next 85. Gardy, J. L. et al. Whole-genome sequencing of measles question. BMC Med. Ethics 17, 13 (2016). generation sequencing libraries using PCR-generated virus genotypes H1 and D8 during outbreaks of 109. Public Health England. Good laboratory practice baits. Nucleic Acids Res. 43, e152 (2015). infection following the 2010 Olympic Winter Games when performing molecular amplification assays. 136. Guan, H. et al. Detection of virus in CSF from the cases reveals viral transmission routes. J. Infect. Dis. 212, Public Health England https://www.gov.uk/ with meningoencephalitis by next-generation 1574–1578 (2015). government/uploads/system/uploads/attachment_ sequencing. J. Neurovirol. 22, 240–245 (2016). 86. Cotten, M. et al. Deep sequencing of norovirus data/file/344076/Q_4i4.4.pdf (2013). 137. Morfopoulou, S. et al. Deep sequencing reveals genomes defines evolutionary patterns in an urban 110. Viana, R. V. & Wallis, C. L. in Wide Spectra of Quality persistence of cell-associated mumps vaccine virus in tropical setting. J. Virol. 88, 11056–11069 (2014). Control Ch. 3 (ed Akyar, I.) (InTech, 2011). chronic encephalitis. Acta Neuropathol. http://dx.doi. 87. Kundu, S. et al. Next-generation whole genome 111. Blomquist, T., Crawford, E. L., Yeo, J., Zhang, X. org/10.1007/s00401-016-1629-y (2016). sequencing identifies the direction of norovirus & Willey, J. C. Control for stochastic sampling variation 138. Brown, J. R. et al. Astrovirus VA1/HMO‑C: an transmission in linked patients. Clin. Infect. Dis. 57, and qualitative sequencing error in next generation increasingly recognized neurotropic pathogen in 407–414 (2013). sequencing. Biomol. Detect. Quantif. 5, 30–37 (2015). immunocompromised patients. Clin. Infect. Dis. 60, 88. Watson, S. J. et al. Molecular epidemiology and 112. Houldcroft, C. J. & Breuer, J. Tales from the crypt and 881–888 (2015). evolution of influenza viruses circulating within coral reef: the successes and challenges of identifying 139. Fremond, M. L. et al. Next-generation sequencing for European swine between 2009 and 2013. J. Virol. 89, new herpesviruses using metagenomics. Front. diagnosis and tailored therapy: a case report of 9920–9931 (2015). Microbiol. 6, 188 (2015). astrovirus-associated progressive encephalitis. 89. Parameswaran, P. et al. Genome-wide patterns of 113. Munro, A. C. & Houldcroft, C. Human cancers and J. Pediatric Infect. Dis. Soc. 4, e53–e57 (2015). intrahuman dengue virus diversity reveal associations mammalian retroviruses: should we worry about bovine 140. Barjas-Castro, M. L. et al. Probable transfusion- with viral phylogenetic clade and interhost diversity. leukemia virus? Future Virol. 11, 163–166 (2016). transmitted Zika virus in Brazil. Transfusion 56, J. Virol. 86, 8546–8558 (2012). 114. Hue, S. et al. Disease-associated XMRV sequences are 1684–1688 (2016). 90. Newman, R. M. et al. Whole genome pyrosequencing of consistent with laboratory contamination. Retrovirology 141. Musso, D. et al. Potential for Zika virus transmission rare hepatitis C virus genotypes enhances subtype 7, 111 (2010). through blood transfusion demonstrated during an classification and identification of naturally occurring 115. Erlwein, O. et al. DNA extraction columns contaminated outbreak in French Polynesia, November 2013 to drug resistance variants. J. Infect. Dis. 208, 17–31 with murine sequences. PLoS ONE 6, e23484 (2011). February 2014. Euro Surveill. 19, 20761 (2014). (2013). 116. Rosseel, T., Pardon, B., De Clercq, K., Ozhelvaci, O. 142. Ellison, D. W. et al. Complete genome sequences of 91. Jakava-Viljanen, M. et al. Evolutionary trends of & Van Borm, S. False-positive results in metagenomic Zika virus strains isolated from the blood of patients European bat lyssavirus type 2 including genetic virus discovery: a strong case for follow‑up diagnosis. in Thailand in 2014 and the Philippines in 2012. characterization of Finnish strains of human and bat Transbound. Emerg. Dis. 61, 293–299 (2014). Genome Announc. 4, e00359‑16 (2016). origin 24 years apart. Arch. Virol. 160, 1489–1498 117. Naccache, S. N. et al. The perils of pathogen discovery: 143. Faria, N. R. et al. Mobile real-time surveillance of Zika (2015). origin of a novel parvovirus-like hybrid genome traced virus in Brazil. Genome Med. 8, 97 (2016). 92. Brown, J. R. et al. Norovirus whole genome sequencing to nucleic acid extraction spin columns. J. Virol. 87, 144. Chitsaz, H. et al. Efficient de novo assembly of single- by SureSelect target enrichment: a robust and 11966–11977 (2013). cell bacterial genomes from short-read data sets. sensitive method. J. Clin. Microbiol. 54, 2530–2537 118. Salter, S. J. et al. Reagent and laboratory Nat. Biotechnol. 29, 915–921 (2011). (2016). contamination can critically impact sequence-based 145. Feehery, G. R. et al. A method for selectively enriching 93. Renzette, N., Bhattacharjee, B., Jensen, J. D., Gibson, L. microbiome analyses. BMC Biol. 12, 87 (2014). microbial DNA from contaminating vertebrate host & Kowalik, T. F. Extensive genome-wide variability of 119. Lipkin, W. I. A. Vision for investigating the microbiology DNA. PLoS ONE 8, e76096 (2013). human cytomegalovirus in congenitally infected infants. of health and disease. J. Infect. Dis. 212 (Suppl. 1), PLoS Pathog. 7, e1001344 (2011). S26–S30 (2015). Acknowledgements 94. Bialasiewicz, S. et al. Detection of a divergent 120. Shafer, R. W. Rationale and uses of a public HIV drug- The authors thank J. Brown and K. Gilmour (Great Ormond parainfluenza 4 virus in an adult patient with influenza resistance database. J. Infect. Dis. 194 (Suppl. 1), Street Hospital (GOSH), London, UK) and R. Doyle (University like illness using next-generation sequencing. S51–S58 (2006). College London (UCL), UK) for their helpful discussions, and BMC Infect. Dis. 14, 275 (2014). 121. Gnaneshan, S., Ijaz, S., Moran, J., Ramsay, M. & Green, J. J. Quick (University of Birmingham, UK) for sharing unpub- 95. Johnson, T. A. et al. Clusters of antibiotic resistance HepSEQ: international public health repository for lished MinION statistics. C.J.H. was funded by Action Medical genes enriched together stay together in swine hepatitis B. Nucleic Acids Res. 35, D367–D370 (2007). Research (grant GN2424). M.A.B. was funded through the agriculture. mBio 7, e02214‑15 (2016). 122. Rhee, S. Y. et al. Hepatitis B virus reverse transcriptase European Union’s Seventh Programme for research, techno- 96. Ocwieja, K. E. et al. Dynamic regulation of HIV‑1 mRNA sequence variant database for sequence analysis and logical development and demonstration under grant agree- populations analyzed by single-molecule enrichment mutation discovery. Antiviral Res. 88, 269–275 (2010). ment number 304875 held by J.B. This work was supported and long-read sequencing. Nucleic Acids Res. 40, 123. Kuiken, C., Yusim, K., Boykin, L. & Richardson, R. The by the UK National Institute for Health Research Biomedical 10345–10355 (2012). Los Alamos hepatitis C sequence database. Research Centre at GOSH National Health Service (NHS) 97. Bonsall, D. et al. ve‑SEQ: Robust, unbiased enrichment Bioinformatics 21, 379–384 (2005). Foundation Trust and UCL. J.B. receives funding from the for streamlined detection and whole-genome 124. Hong, L. Z. et al. BAsE-Seq: a method for obtaining University College London Hospitals (UCLH)/UCL National sequencing of HCV and other highly diverse pathogens. long viral haplotypes from short sequence reads. Institute for Health Research Biomedical Research Centre. F1000Res. 4, 1062 (2015). Genome Biol. 15, 517 (2014). The authors acknowledge infrastructure support for the UCL 98. Wylie, T. N., Wylie, K. M., Herter, B. N. & Storch, G. A. 125. Schmidt, K. et al. Identification of bacterial pathogens Pathogen Genomics Unit, from the UCL UK Medical Research Enhanced virome sequencing using targeted sequence and antimicrobial resistance directly from clinical urines Council (MRC) Centre for Molecular Medical Virology and the capture. Genome Res. 25, 1910–1920 (2015). by nanopore-based metagenomic sequencing. UCLH/UCL National Institute for Health Research Biomedical 99. Tsangaras, K. et al. Hybridization capture using short J. Antimicrob. Chemother. 72, 104–114 (2017). Research Centre. The funders had no role in study design, PCR products enriches small genomes by capturing 126. Kilianski, A. et al. Bacterial and viral identification and data collection and interpretation, or the decision to submit flanking sequences (CapFlank). PLoS ONE 9, e109101 differentiation by amplicon sequencing on the MinION work for publication. (2014). nanopore sequencer. Gigascience 4, 12 (2015). 100. Depledge, D. P. et al. Deep sequencing of viral genomes 127. Eckert, S. E., Chan, J. Z.‑M., Houniet, D., Breuer, J. Competing interests statement provides insight into the evolution and pathogenesis & Speight, G. Enrichment of long DNA fragments from The authors declare no competing interests. of and its vaccine in humans. mixed samples for Nanopore sequencing. Preprint at Mol. Biol. Evol. 31, 397–409 (2014). bioRxiv http://dx.doi.org/10.1101/048850 (2016). FURTHER INFORMATION 101. Ebert, K., Depledge, D. P., Breuer, J., Harman, L. 128. Greninger, A. L. et al. Rapid metagenomic identification Omicsomics blogspot article: http://omicsomics.blogspot. & Elliott, G. Mode of virus rescue determines the of viral pathogens in clinical samples by real-time co.uk/2015/07/leaky-clinical-metagenomics-pipelines.html acquisition of VHS mutations in VP22‑negative herpes nanopore sequencing analysis. Genome Med. 7, 99 ALL LINKS ARE ACTIVE IN THE ONLINE PDF simplex virus 1. J. Virol. 87, 10389–10393 (2013). (2015).

192 | MARCH 2017 | VOLUME 15 www.nature.com/nrmicro ©2017 Mac millan Publishers Li mited, part of Spri nger Nature. All ri ghts reserved.