Genomic Analysis of Uncultured Marine Viral Communities
Total Page:16
File Type:pdf, Size:1020Kb
Genomic analysis of uncultured marine viral communities Mya Breitbart*, Peter Salamon†, Bjarne Andresen†‡, Joseph M. Mahaffy†, Anca M. Segall*, David Mead§, Farooq Azam¶, and Forest Rohwer*ʈ *Department of Biology, San Diego State University, San Diego, CA 92182-4614; †Department of Mathematical Sciences, San Diego State University, San Diego, CA 92182-7720; ‡Ørsted Laboratory, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen Ø, Denmark; §Lucigen, Middleton, WI 53562; and ¶Marine Biology Division, Scripps Institution of Oceanography, La Jolla, CA 92093 Communicated by Allan Campbell, Stanford University, Stanford, CA, August 14, 2002 (received for review February 22, 2002) Viruses are the most common biological entities in the oceans by by pulse field gel electrophoresis (11); however, this method will an order of magnitude. However, very little is known about their not recover all viruses (e.g., large eukaryotic viruses and ssRNA diversity. Here we report a genomic analysis of two uncultured phages). After CsCl purification, the viruses were lysed by using marine viral communities. Over 65% of the sequences were not a formamide extraction, and the DNA was recovered by an significantly similar to previously reported sequences, suggesting isopropanol precipitation and a cetyltrimethylammonium bro- that much of the diversity is previously uncharacterized. The most mide (CTAB) extraction (12). common significant hits among the known sequences were to viruses. The viral hits included sequences from all of the major Construction of the Shotgun Library. The amount of viral DNA in families of dsDNA tailed phages, as well as some algal viruses. an environmental sample is very low (Ϸ10 g͞100 liters). Viral Several independent mathematical models based on the observed genomes often contain modified nucleotides that cannot be number of contigs predicted that the most abundant viral genome directly cloned into Escherichia coli. Additionally, because viral comprised 2–3% of the total population in both communities, genomes contain genes (e.g., holins, lysozyme) that must be which was estimated to contain between 374 and 7,114 viral types. disrupted before cloning, we have not been able to create a Overall, diversity of the viral communities was extremely high. The representative cosmid library from these communities. We have results also showed that it would be possible to sequence the circumvented these problems by randomly shearing the total entire genome of an uncultured marine viral community. marine viral community DNA (HydroShear, GenMachine, San Carlos, CA), end-repairing, ligating dsDNA linkers to the ends, arine viruses, the majority of which are phages, have and amplifying the fragments by using the high-fidelity Vent Menormous influences on global biogeochemical cycles (1), DNA polymerase. The resulting fragments were ligated into the microbial diversity (2, 3), and genetic exchange (4). Despite their pSMART vector and electroporated into MC12 cells (Lucigen, importance, virtually nothing is known about marine viral biodi- Middleton, WI). We call these libraries LASLs for linker- versity or the evolutionary relationships of marine and nonma- amplified shotgun libraries. This method has been checked to rine viruses (5–7). Addressing these issues is difficult because ensure randomness as described (ref. 13, and our web site at www.sci.sdsu.edu͞PHAGE͞LASL͞index.htm). A test library viruses must be cultured on hosts, the majority of which cannot be cultivated by using standard techniques (8). In addition, was constructed of coliphage DNA, and 100 fragments were viruses do not have ubiquitously conserved genetic elements sequenced without observing any chimeras. Additionally, we such as rDNA that can be used as diversity and evolutionary have recently sequenced two phage genomes, Vibriophage 16T distance markers (9). To circumvent these limitations, we de- and 16C, from a mixed lysate by using the LASL approach. No chimeric molecules were observed in this mixed library. To- veloped a method to shotgun clone and sequence uncultured Ͼ aquatic viral communities. gether, these three libraries represent 1,000 sequences. There- fore, it is highly unlikely that we are observing a significant Materials and Methods number of chimeric sequences in our library (F.R. and A.M.S., Isolation of Viral Community DNA. Marine viruses were isolated unpublished results). from 200 liters of surface seawater from Scripps Pier (SP, La Analysis of Sequences: Composition Analyses. Jolla, CA; May 2001) and the channel side of Fiesta Island in Clones from the SP Mission Bay (MB, San Diego; June 2001) by using a combination library were sequenced with both forward and reverse primers, of differential filtration and density-dependent gradient centri- yielding a total of 1,061 sequences. Eight hundred and seventy- fugation. The water at the MB site is exchanged with each tidal three clones from the MB library were sequenced only with the cycle. Both the SP and MB sites experience increased levels of forward primer. These sequences were compared against Gen- Bank by using TBLASTX (14, 15). A hit was consider significant pollution during the rainy season, because of runoff from the if it had an E value of Ͻ0.001. Significant hits to GenBank entries surrounding city. The MB site routinely has more eukaryotic were classified into the groups described in the text, based on algae than does the SP site. Once collected, the water samples sequence annotation. In cases where multiple significant hits were initially filtered through a 0.16-m Centramate tangential were observed for a single query sequence, the sequence was flow filter (TFF; Pall) to remove bacteria, eukaryotes, and large preferentially classified as a phage or virus if these occurred particles. Approximately 90% of the viruses, as determined by within the top five hits. Mobile elements consisted of trans- epifluorescent microscopy (10), and most of the water, passed posons, plasmids, insertion sequences, retrotransposons, unsta- through the filter and were collected in a separate tank. Subse- ble genetic elements, and pathogenicity islands. Bacterial hits quently, the viruses in the filtrate were concentrated by using a 100-kDa TFF filter until the final sample volume was Ͻ100 ml Ϸ ϫ ( 5,000 concentration). Recovery of viruses during this step Abbreviations: SP, Scripps Pier; MB, Mission Bay; LASL, linker-amplified shotgun library; was essentially 100%. After the TFF, the phage concentrate was MM%, minimal mismatch percentage. loaded onto a cesium chloride (CsCl) step gradient, ultracen- Data deposition: The sequences reported in this paper have been deposited in the GenBank trifuged, and the 1.35–1.5 g͞ml fraction was collected. This database (accession nos. AY079522–AY080585 and BH898061–BH898933). fraction contains the majority of the viral DNA as determined ʈTo whom correspondence should be addressed. E-mail: [email protected]. 14250–14255 ͉ PNAS ͉ October 29, 2002 ͉ vol. 99 ͉ no. 22 www.pnas.org͞cgi͞doi͞10.1073͞pnas.202488399 Downloaded by guest on September 24, 2021 were examined manually according to a list provided by Sher- observed. Therefore, the average error in this area was 0.26%. wood Casjens (University of Utah, Salt Lake City) to look for The number of errors associated with the ends, where not all of potential hits to prophage sequences. Significant hits to phages the sequences overlapped, was determined based on compari- were further classified into phage families according to The sons with the published T7 sequence. At the 5Ј end, there was International Committee on Taxonomy of Viruses (ICTV) clas- one miscalled base in 273 positions (0.366%). At the 3Ј end, there sification (16), with the one exception being that we considered were 7 miscalled bases and 5 gaps in 739 positions (1.62%). P22 a Siphovirus. Therefore, by using our cloning and bioinformatics protocols, the sequence data has an error rate of 0.26–1.62%. Given this Analysis of Sequences: Contig Analyses. All sequences were initially limitation, two DNA sequences from the same phage would be analyzed by SEQUENCHER 3.0.6 (Gene Codes, Ann Arbor, MI) to expected to be at least 98–99% identical. identify contigs based on a minimum overlap of 20 bp with a 98% minimal mismatch percentage (MM%). Contigs were examined Results and Discussion individually to ensure that any overlaps between fragments from Identity of Uncultured Marine Viruses. Shotgun libraries were made the same clone were not considered contigs. Only contigs from two near-shore marine viral communities (SP and MB). Of containing an overlapping sequence from two different clones the sequence fragments obtained from both samples (1,061 from were considered in the population analyses. To empirically SP and 873 from MB), the majority showed no significant hits determine the overlap size and MM% needed to differentiate (E Ͻ 0.001) to previously reported sequences in GenBank (Fig. between phage genotypes and groups during assembly, an in 1A). This finding suggested that much of the diversity in the viral silico shotgun library was constructed and tested. The genomes communities was previously uncharacterized. Significant hits used in this analysis were three Siphoviruses (coliphage , to GenBank entries were classified as phages, viruses, mobile coliphage HK620, and coliphage N15), three Myoviruses (co- elements, repeat elements, Bacteria, Archaea, or Eukarya as liphage T4, coliphage P2, and coliphage Mu), four Podoviruses described in Materials