Composition of CRISPR-cassettes reflects the viriome diversity in the human gut microbime

Gogleva A.A Artamonova I.I Vavilov Institute of General Genetics RAS Vavilov Institute of General Genetics RAS; [email protected] Institute for Information Transmission Problems RAS [email protected]

Abstract functions that are essential for digestion process but are absent in the host. as well affects CRISPR is a prokaryotic adaptive defence system energy balance, controls cell proliferation, and that provides resistance against alien replicons like development of the immune system [2]. Though and plasmids. Cas-proteins and CRISPR- prokaryotes are primarily responsible for all these cassettes comprise the entire CRISPR-system. The vital functions, are supposed to be CRISPR-cassettes include leader sequence at 5' and responsible for balancing and controlling the an array of conserved short direct repeats with abundance of microbes within the human unique spacer sequences located between them. The microbiome. As a reaction to an ongoing phage insertion of new spacers predominantly occurs near pressure and have developed a the leader sequence. Therefore CRISPR-cassettes plenty of defenсe mechanisms to escape it. CRISPR- can be treated as a footprint of phage-bacteria system is the one that can help to indicate and interactions within the certain ecosystem. The human reconstruct the history of interactions between gut is a rich populated by a vast number of phages and their prokaryotic hosts. microorganisms, however the large proportion of Despite the fact that CRISPR-systems are highly them are referred to the uncultured category and diverse in different organisms, typically they are little is known about their CRISPR-systems. We used composed of a CRISPR-cassette, containing single human gut metagenomic data of three distinct leader sequence (300-500 bp) at the 5'-end, unique projects to obtain information about the composition spacer sequences (30-70 bp) alternated with and dynamics of CRISPR-cassettes in the human- conserved short direct repeats and numerous associated microbiota. This study shows that CRISPR-associated (cas) genes. Cas-genes encode CRISPR-cassettes are highly variable and particular proteins possessing various only partially CRISPRome of a certain individual reflects it's characterized functions essential for the activity of unique viriome diversity, abundance and dynamics. CRISPR-system[3]. Since the new spacers are accumulated primarily on the one side of the cassette near the leader sequence and the older spacers are shifted to the 3'-end, CRISPR-arrays retain unique 1. Introduction chronological information about the bacterial population phage infections. Human is considered to be a super organism. Up to 60% of all microorganisms that inhabit Prokaryotic cells that share human body space human body are considered to be uncultured, hence outnumber our own eukaryotic cells at least ten to they are poorly studied along with the CRISPR- one with the overwhelming majority of germs systems their genomes bear. The culture-independent residing in the intestine. This complex community of metagenomic approach is the most powerful one to symbiotic, pathogenic and commensal study the composition and dynamics of the complex microorganisms is defined as a microbiome [1]. microbial communities. Metagenomic data allow to Microbiome was shown to be indispensable for obtain the complete snapshot of the coexisting human life as it is capable of many metabolic

262 microorganisms including bacteriophages and their and three unspecified enterobacteria bacteriophages. prokaryotic preys. Notably, NR_U_spacer_2662 matched 6 different Today considerable effort is directed to the large- genomes of enterobacteria phages: Enterobacteria scale investigation of human microbiome using phage VT2-Sakai, Enterobacteria phage Sf6, Stx1 metagenomic approach. At the moment several converting phage, Stx2 converting phage II, human microbiome [5-7] and viriome metagenomic Salmonella phage ST160, Salmonella enterica datasets are available [8,13]. We examined SE1. In all the cases this protospacer compound and dynamics of CRISPR-cassettes in matched the most conservative parts of coding human gut metagenomic samples as this particular regions corresponding to Ea22 lambda phage-like sub-environment is known to be populated by the proteins. Therefore this particular spacer might be highest diversity of microorganisms. responsible for CRISPR-mediated multiphage resistance against the group of closely related 2. Results bacteriophages. The observed low number of matches with complete phage genomes is most In this study three human gut metagenomic probably not accidental and it reflects the fact that datasets were used: 1) the Human Microbiome only few prokaryotic viruses are known while Project (HMP) [5] 2) gut metagenome of 13 healthy considerable proportion of actual viral space remains individuals from Japan (13JAP) [7] 3) Distal gut unexplored. biome project (DG) [6]. Eventually we analyzed NR spacer sets against To construct the reliable set of CRISPR-cassettes for the human metagenomic datasets themselves, guided each metagenomic dataset we applied three by the assumption that these data are certainly algorithms PILER-CR [9], the CRISPR recognition supposed to contain contigs of viral or plasmid tool (CRT) [10], CRISPRFinder [11] and a filtering origin. In this case the outcome was completely procedure previously designed [12]. The set of different. So, we identified 1591 reliable spacer- reliable CRISPR-cassettes consisted of 296, 78 and protospacer pairs for the 13JAP NR-set comprised of 14 cassettes predicted in the 13JAP, HMP and DG 2992 unique spacer sequences. The observed metagenomes respectively. protospacers corresponded to 358 different spacers Identifying the source of predicted non-redundant (~10% of 13JAP NR spacer set). For the 228 (65%) spacers for the each matagenomic dataset was of a of total 352 NR HMP spacers we manged to find particular interest. 301 spacer-protospacer pair with protospacers We manged to find the only probable protospacer coming from 109 different contigs. In the case of the in RNA viriome for 1 spacer sequence from the DG NR-spacers only self-mathches were identified, 13JAP NR-set. The spacer-protospacer pair differed therefore no reliable protospacers can be asssociated in 4 mismatches. This observation is rather intriguing with these spacers. This might be a result of some since CRISPR-systems are known to preferably metagenomic assembly characteristics as well as bias target alien DNA, not RNA. Though in vitro caused by sample treatment. experiments with CRISPR-Cas systems from the For the 13JAP metagenomic dataset, where the archaeon P. furiosus showed that the crRNA rather number of contigs and CRISPR-cassettes per targets the mRNA [14]. individual was considerably high, it was interesting Comparisson of the NR-sets of spacers with the to find out if the spacer-prototspacer pair came from metagenome of Human Fecal Viriome (HFV) yielded the same individual or not. It turned out that in the 1897 spacer-protospacer pairs with the required case of individuals having relatively high number of mismatch threshold corresponding to only 34 unique spacer-protospacer pairs (>10) their spacers tended to spacer sequences. The identified protospacers match metagenomic contigs ascribed to the same belonged of 152 metagenomic reads. The individual. Though it was not true for F2X and F2Y considerable part of matched reads (46%) resembled individuals, both having more protospacers coming sequences of bacteriophage origin. from their counterpant but not themselves. It should When the NR-spacers were compared with be noticed that F2X and F2Y samples correspond to known viral complete genome reference sequences children belonging to the same family, hence 11 spacer-protospacer pairs were obtained. The observed cross-match might be a result of close detected protospacers corresponded to 5 different interactions between these particular individuals. spacers, all coming from 13JAP metagenome. The So, in this study we have demonstrated that protospacers matched regions in the complete spacers from the human gut associated CRISPR- genomes of Escherichia, Salmonella, Clostridium cassettes tend to originate in the same smples as

263 protospacer-containing sequences. Therefore . DNA Res. (2007) Aug 31;14(4):169- CRISPR-cassette composition of the human gut 81. Epub 2007 Oct 3. microbiota reflects phage diversity of a particular [8]. S. Minot, R.Sinha, J. Chen, H. Li, S.A environment. Keilbaugh, G.D Wu, J.D Lewis, F.D Bushman FD. The human gut viriome: inter-individual variation This is joint work with M.S.Gelfand and dynamic response to diet. Genome Res. 2011 Oct;21(10):1616-25. Epub 2011 Aug 31. [9]. С. Bland, T. L. Ramsey, F. Sabree, M. Lowe, K. 3. References Brown, N. C. Kyrpides, and P. Hugenholtz. 2007. CRISPR recognition tool (CRT): a tool for automatic [1]. J. Lederberg, A.T McCray, ’Ome Sweet ’Omics detection of clustered regularly interspaced – a genealogical treasury of words. Scientist (2001) palindromic repeats. BMC 15: 8. Bioinformatics 8:209. [2]. F. Bäckhed, R.E Ley, J.L Sonnenburg, D.A [10] R.C. Edgar. 2007. PILER-CR: fast and accurate Peterson, J.I Gordon. Host-bacterial mutualism in the identification of CRISPR repeats. BMC human intestine. Science. 2005 Mar Bioinformatics 8:18. 25;307(5717):1915-20. [11] I.Grissa, G.Vergnaud, and C.Pourcel. 2007. [3]. R. Barrangou, C. Fremaux, H. Deveau, et al, CRISPRFinder: a web tool to identify clustered CRISPR provides acquired resistance against viruses regularly interspaced short palindromic repeats. in prokaryotes. Science, 2007. 315(5819):p. 1709- Nucleic Acids Res. 35:W52–W57. 12. [12] V.A Sorokin, M.S Gelfand, I.I Artamonova. [4]. P.J Turnbaugh, M. Hamady, et al., A core gut Evolitionary dynamics of clustered irrgeularly microbiome in obese and lean twins. Nature (2009) interspaced short palindromic repeats systems in the 457, 480–484. ocean metagenome. Appl Environ Microbiol. 2010 [5]. NIH HMP Working Group, The NIH Human Apr;76(7):2136-44. Microbiome Project. Genome Res. 2009 [13] T. Zhang, M. Breitbart , W.H Lee, J.Q Run, C.L Dec;19(12):2317-23. Epub 2009 Oct 9. Wei, S.W Soh, M.L Hibberd, E.T Liu, F. Rohwer, Y. [6]. S.R Gill, M. Pop, R.T Deboy et.al, Metagenomic Ruan. RNA viral community in human feces: analysis of the human distal gut microbiome. revalence of plant pathogenic viruses. PLoS Biol. Science. (2006) Jun 2;312(5778):1355-9. 2006 Jan;4(1):e3. [14] C.R Hale, P. Zhao, S. Olson, M.O Duff, B.R [7]. K. Kurokawa, T. Itoh, T. Kuwahara, K. Oshima, Graveley, L. Wells, R.M Terns, M.P Terns. RNA- et al., Comparative metagenomics revealed guided RNA clevage by a CRISPR RNA-Cas protein commonly enriched gene sets in human gut complex. Cell. 2009 Nov 25;139(5):945-56.

264