A New Comprehensive Catalog of the Human Virome Reveals Hidden Associations with 2 Chronic Diseases 3 4 Michael J

A New Comprehensive Catalog of the Human Virome Reveals Hidden Associations with 2 Chronic Diseases 3 4 Michael J

bioRxiv preprint doi: https://doi.org/10.1101/2020.11.01.363820; this version posted November 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 1 A New Comprehensive Catalog of the Human Virome Reveals Hidden Associations with 2 Chronic Diseases 3 4 Michael J. Tisza and Christopher B. Buck* 5 6 Affiliation 7 Lab of Cellular Oncology, NCI, NIH, Bethesda, MD 20892-4263 8 9 *Corresponding author: [email protected] 10 11 Abstract 12 While there have been remarkable strides in microbiome research, the viral component of the 13 microbiome has generally presented a more challenging target than the bacteriome. This is 14 despite the fact that many thousands of shotgun sequencing runs from human metagenomic 15 samples exist in public databases and all of them encompass large amounts of viral sequences. 16 The lack of a comprehensive database for human-associated viruses, along with inadequate 17 methods for high-throughput identification of highly divergent viruses in metagenomic data, 18 has historically stymied efforts to characterize virus sequences in a comprehensive way. In this 19 study, a new high-specificity and high-sensitivity bioinformatic tool, Cenote-Taker 2, was 20 applied to thousands of human metagenome datasets, uncovering over 50,000 unique virus 21 operational taxonomic units. Publicly available case-control studies were re-analyzed, and over 22 1,700 strong virus-disease associations were found. 23 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.11.01.363820; this version posted November 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 24 Introduction 25 The human virome is the sum total of all viruses that are intimately associated with 26 people. This includes viruses that directly infect human cells1 but mostly consists of viruses 27 infecting resident bacteria, i.e. phages2. While the large majority of microbiome studies have 28 focused on the bacteriome, revealing numerous important functions for bacteria in human 29 physiology3, information about the human virome has lagged. However, a number of recent 30 studies have begun making inroads into characterizing the virome4-12. 31 Just as human-tropic viruses can have dramatic effects on people, phages are able to 32 dramatically alter bacterial physiology and regulate host population size. A variety of 33 evolutionary dynamics can be at play in the phage/bacterium arena, including Red Queen10, 34 arms-race13, and piggyback-the-winner14 relationships, to name just a few. In the gut, many 35 phages enter a lysogenic or latent state and are retained as integrated or episomal prophages 36 within the host bacterium15. In some instances, the prophage can buttress host fitness (at least 37 temporarily) rather than destroying the host cell. Prophages often contain genes that can 38 dramatically alter the phenotype of the bacteria, such as toxins16, virulence factors17, antibiotic 39 resistance genes18, photosystem components19, other auxiliary metabolic genes20, and CRISPR- 40 Cas systems21, along with countless genes of unknown function. Experimental evidence has 41 shown that bacteria infected with particular phages - i.e. "virocells" - are physiologically distinct 42 from cognate bacteria that lack those particular phages20. 43 There have been a few documented cases where phages have been shown to be 44 involved in increased bacterial virulence16 or resistance to antibiotics22, demonstrating complex 45 roles for phages in human health. In addition, several studies have conducted massively parallel 46 sequencing on virus-enriched samples of human stool, finding differential abundance of some 47 phages in disease conditions5,23-25. A major issue encountered by these studies is that there is 48 not yet a comprehensive database of annotated virus genome sequences, and de novo 49 prediction of virus sequences from metagenomic assemblies remains a daunting challenge2. In 50 one study, only 31% of the assembled sequence data in virion-enriched virome surveys could be 51 identified as recognizably viral26. Impressively, another study of twelve individuals was able to 52 recruit over 80% of reads from virus-enriched samples to putative virus contigs10. Still, most of 53 the potential viral contigs were unclassifiable sequences, and a large majority of contigs 54 appeared to represent subgenomic fragments under 10kb. 55 The current study sought to overcome the traditional challenges of sparse viral 56 databases and poor detection of highly divergent viral sequences by using Cenote-Taker 2, a 57 new virus discovery and annotation tool27. The pipeline was applied to sequencing data from 58 nearly 6,000 human metagenome samples. Strict criteria identified over 180,000 viral contigs 59 representing 52,570 specific taxa. In most cases, 75-99% of reads from virus-enriched stool 60 datasets could be back-aligned to the Cenote-Taker 2-compiled Human Virome Database. 61 Furthermore, the curated database allowed read-alignment-based abundance profiling of the 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.11.01.363820; this version posted November 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 62 virome in human metagenomic datasets, enabling the re-analysis of a panel of existing case- 63 control studies. The re-analysis revealed previously undetected associations between chronic 64 diseases and the abundance of 1,789 specific virus taxa. 65 66 Results 67 Characteristics of the Human Virome 68 Read data was downloaded from NCBI's Sequence Read Archive (SRA), including data from the 69 Human Microbiome Project28 and several other studies (25 Bioprojects in total) involving deep 70 sequencing of human metagenomic samples (Supplemental Table 1)10,29-42. These data spanned 71 multiple body sites, including gut (stool), mouth, skin, and vagina. A subset of the projects 72 performed enrichment for viral sequences43-46. Almost all of the projects pursued DNA 73 sequencing, but a small number of metatranscriptomic (i.e., ribosome-depleted total RNA) 74 samples were analyzed as well47. Read data were binned and assembled by Biosample rather 75 than by individual run to combine read sets from the same individual. A total of 5,996 76 Biosamples were analyzed. 77 Cenote-Taker 227 was used to check contigs for two common end-features of complete 78 viral genomes: terminal direct repeats (suggesting a circular or LTR-bounded viral genome) or 79 inverted terminal repeats (ITRs). Sequences with direct terminal repeats were arbitrarily 80 assumed to represent circular DNA genomes. Circularized sequences >1,500 nucleotides (nt) 81 with at least one viral hallmark gene and ITR-containing contigs >4,000 nt with at least one viral 82 hallmark gene were designated as putative viruses. Linear (no discernable end features) contigs 83 >12,000 nt with two or more viral hallmark genes were also kept as putative viruses. Since 84 phages are often integrated into bacterial chromosomes, each linear contig was pruned with 85 the Cenote-Taker 2 prophage pruning module to remove flanking host sequences. The analysis 86 resulted in over 180,000 high-quality putative viral sequences. The sequences were classified 87 into operational taxonomic units (OTUs) by clustering at <0.05 intersequence distance (a proxy 88 for >95% average nucleotide identity) using Mash48 (see Methods). A final database of 52,570 89 sequences representing nonredundant virus OTUs was generated, and this database will herein 90 be referred to as the Cenote Human Virome Database (CHVD, download available at 91 https://zenodo.org/record/4069377). 92 8,625 virus OTUs consisted of contigs with terminal direct repeats and 113 were ITR- 93 bounded. An additional 2,070 virus OTUs were likely complete proviruses because they were 94 flanked on both sides by host sequences. Since it can be difficult to detect many kinds of viral 95 genome ends using short read assemblies the absence of terminal repeats or flanking 96 chromosomal regions does not necessarily imply that a given contig represents an incomplete 97 viral genome (or genome segment)49. It is thus uncertain what proportion of the remaining 98 41,762 contigs in the CHVD represent complete genomes, as opposed to subgenomic 99 fragments. 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.11.01.363820; this version posted November 1, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 100 Although it is often challenging to obtain very long virus contigs from de novo 101 assemblies, 133 virus OTUs over 200 kilobases (kb) were detected in the survey, with the 102 largest being Siphoviridae species ctpHQ1, at 501 kb. A total of 38 family- or order-level taxa 103 were observed, and 2,568 sequences could not be classified by Cenote-Taker 2 (Supplemental 104 Fig. 1). It is important to note that virus taxonomy, especially taxonomy of dsDNA phages, is 105 currently in flux 50-53, and these taxonomic statistics will likely change as taxonomic groupings 106 are revised. The vast majority of classified sequences represent dsDNA tailed phages in the 107 order Caudovirales (Including Siphoviridae, Podoviridae, Myoviridae, Ackmannviridae, 108 Herelleviridae, and CrAss-like viruses).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    43 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us