Temporal Dynamics of Uncultured Viruses: a New Dimension in Viral Diversity
Total Page:16
File Type:pdf, Size:1020Kb
The ISME Journal (2018) 12, 199–211 © 2018 International Society for Microbial Ecology All rights reserved 1751-7362/18 www.nature.com/ismej ORIGINAL ARTICLE Temporal dynamics of uncultured viruses: a new dimension in viral diversity Ksenia Arkhipova1,2, Timofey Skvortsov1,3, John P Quinn1, John W McGrath1,3, Christopher CR Allen1,3, Bas E Dutilh2,4, Yvonne McElarney5 and Leonid A Kulakov1 1School of Biological Sciences, The Queen’s University of Belfast, Belfast, UK; 2Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, The Netherlands; 3Institute for Global Food Security, The Queen’s University of Belfast, Belfast, UK; 4Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands and 5Agri-Food and Biosciences Institute, Belfast, UK Recent work has vastly expanded the known viral genomic sequence space, but the seasonal dynamics of viral populations at the genome level remain unexplored. Here we followed the viral community in a freshwater lake for 1 year using genome-resolved viral metagenomics, combined with detailed analyses of the viral community structure, associated bacterial populations and environ- mental variables. We reconstructed 8950 complete and partial viral genomes, the majority of which were not persistent in the lake throughout the year, but instead continuously succeeded each other. Temporal analysis of 732 viral genus-level clusters demonstrated that one-fifth were undetectable at specific periods of the year. Based on host predictions for a subset of reconstructed viral genomes, we for the first time reveal three distinct patterns of host–pathogen dynamics, where the viruses may peak before, during or after the peak in their host’s abundance, providing new possibilities for modelling of their interactions. Time series metagenomics opens up a new dimension in viral profiling, which is essential to understand the full scale of viral diversity and evolution, and the ecological roles of these important factors in the global ecosystem. The ISME Journal (2018) 12, 199–211; doi:10.1038/ismej.2017.157; published online 13 October 2017 Introduction (Emerson et al., 2012), or by binning sequencing reads into assemblages (possibly at a viral family One of the major challenges in studies of viral level; Bolduc et al., 2015) to study their temporal dynamics is the absence of a phylogenetically stability and/or fluctuations (Emerson et al., 2013; informative universal marker, analogous to the Bolduc et al., 2015). Although these studies have bacterial 16S or eukaryotic 18S ribosomal RNA provided much-needed insight into possible scenar- (rRNA) genes. To analyse temporal changes of some ios of viral dynamics, there is still no global picture viral subgroups (for example, marine T4-like myo- available of seasonal changes of viral populations viruses or freshwater cyanomyoviruses), recent and their links to other factors in an ecosystem. studies have used sequencing of amplicons of viral Owing to the mosaic nature of viral genome conserved structural proteins, such as capsid organisation, assessment of viral genetic similarity proteins g23 or g20 (Chow and Fuhrman, 2012; is a non-trivial task. To tackle this problem, Lima- Wang et al., 2015; Yeo and Gin, 2015). However, this Mendez et al. in 2008 proposed a method of approach does not allow assessment of the dynamics reticulate classification of phage genetic relatedness. of the whole community. A shotgun metagenomics The method provides means to subdivide the whole approach does not share this limitation and provides sequence space of viral metagenomics data into a means to study seasonal changes without any a groups approximately corresponding to genus level priori assumptions about the structure of a viral of taxonomical classification. At that time the community. Using shotgun metagenomics, some approach has been successfully used in several attempts have been made to study viral dynamics, studies to gain deeper insight into phage biology for example, by tracking the temporal changes of 35 and to connect newly assembled genomes with individual de novo assembled viral genomes already known sequences (Roux et al., 2015, 2016). At the same time, it is well known that sequence Correspondence: LA Kulakov, School of Biological Sciences, The relatedness within characterised viral genera can Queen's University of Belfast, 97 Lisburn Road, Belfast, Northern vary substantially (King et al., 2011), but in natural Ireland BT9 7BL, UK. environments the genetic variation of newly E-mail: [email protected] ‘ ’ Received 22 March 2017; revised 26 July 2017; accepted 22 August assembled viral genomes within genera resulting 2017; published online 13 October 2017 from reticulate clustering has not yet been analysed. Temporal dynamics of uncultured viruses K Arkhipova et al 200 Along with the gaps in knowledge of global viral 2016). Briefly, water samples were filtered through sequence diversity, there is a lack of information 0.22 μm filters to obtain a ‘virus-like particle’ water about the possible variants of bacteria–phage fraction, which was concentrated using 100 kDa dynamic interactions. To date, a range of models filters and treated with DNAse I. Extracted and describing behaviour of some host–pathogen rela- purified DNA was used for library preparation with tionships have been developed. First and foremost, Nextera DNA Sample Preparation kit (Illumina, the Kill-the-Winner model (Thingstad, 2000), which San Diego, CA, USA) and sequenced from both ends assesses populations’ changes within the framework with the 600-cycle MiSeq Reagent Kit v3 on MiSeq of the classic Lotka–Volterra model. Recently, (Illumina) at the University of Cambridge DNA Knowles et al. (2016) have noticed discrepancies Sequencing facility. between the predictions of the model and the Total DNA (particle sizes 40.22 μm) was extracted experimentally measured virus and host abundances from 500 ml of water using a PowerWater DNA in natural environments, which poses a question Isolation kit (MO BIO, Carlsbad, CA, USA). Partial about the possible existence of other dynamics of bacterial 16S rRNA gene sequences were amplified host–pathogen interactions in natural microbial with 909- F/1492- R primers and sequenced on a 454 communities. GS Junior (Roche, Basel, Switzerland) with Lib-L Here we present a detailed exploration of the Shotgun chemistry. structure, seasonal dynamics and functional poten- tial of the viral community in a temperate freshwater eutrophic lake (Lough Neagh, Northern Ireland). Our Sequencing library processing, assembly and novel data include 12 viral shotgun metagenomes annotation and 13 bacterial 16 S rRNA-amplicon data sets The Illumina reads were processed with BBMap v collected over a period of 1 year (Supplementary 33.54 (http://sourceforge.net/projects/bbmap/) soft- Table 1, sheet 1). This unique collection of data ware, and all reads with an average Q-scoreo15 or allowed us to explore the range of interaction containing Ns were discarded. We applied a two-step dynamics of viruses and their hosts in a natural assembly strategy. First all 12 libraries were ecosystem. We also investigate the possibility of assembled separately using the graph-based assem- functional manipulations of bacteria by phages by bler IDBA-UD (Peng et al., 2012) (kmer range 20–250, analysing auxiliary metabolic genes (AMGs), reveal- step—10). Next, all the libraries were combined and ing that their functions are clearly different in winter assembled collectively (kmer range 20–1500, step— compared with summer. 10). This allowed us to use all available reads in the assembly to reconstruct even low-abundance viral genomes, as well as to maximise assembly effective- Materials and methods ness for genomes appearing only in individual libraries. After that, an additional attempt to elongate Data availability the contigs obtained in the two previous steps was Raw reads from the Illumina sequencing and made using an overlap-layout-consensus assembler sequences of bacterial 16S rRNA gene amplicons are with very strict parameters (CAP3; Huang and available for download from the Short Reads Archive Madan (1999), overlap42000 bp, percentage of (BioProject PRJNA350258 and PRJNA292054). Anno- nucleotide identity—99%). This step also reduced tated viral reads and assembled sequences are also drastically the number of duplicated sequences. To available on MetaVir and MG-RAST databases (for completely remove duplicates and leave only the accession numbers see Supplementary Table 1, longest assembled contigs, we used the cd-hit (Li and sheet 1). Godzik, 2006) program (-c 0.98 -n 11 -d 0). For subsequent analyses, only sequences longer than 7000 bp were retained. To estimate what part of the Sample collection, processing and sequencing viral population this set of contigs represented, reads Lough Neagh is a large eutrophic polymictic from all 12 libraries were mapped onto contigs using shallow freshwater lake located in Northern Ireland BBMap (70% of nucleotide identity). (UK). Water samples were collected from the Open reading frames (ORFs) in the assembled deepest site in the lake (54°37′06″N, 6°23′43″W) at contigs were predicted with MetaGeneAnnotator 12 time points over the period of a year (Noguchi et al., 2008). For functional annotation, (Supplementary Table 1, sheet 1) as described the contigs assembled separately from 12 libraries previously (Skvortsov et al., 2016). Some environ- were uploaded to the MG-RAST (Meyer et al., 2008) mental parameters,