Whole metagenome analysis with metagWGS Joanna Fourquet, Céline Noirot, Christophe Klopp, Philippe Pinton, Sylvie Combes, Claire Hoede, Géraldine Pascal

To cite this version:

Joanna Fourquet, Céline Noirot, Christophe Klopp, Philippe Pinton, Sylvie Combes, et al.. Whole metagenome analysis with metagWGS. JOBIM2020, Jun 2020, Montpellier, France. ￿hal-03176836￿

HAL Id: hal-03176836 https://hal.archives-ouvertes.fr/hal-03176836 Submitted on 22 Mar 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Whole metagenome analysis with metagWGS Joanna FOURQUET 1, Céline NOIROT 2, Christophe KLOPP 2, Philippe PINTON 3, Sylvie COMBES 1, Claire HOEDE 2, Géraldine PASCAL 1 1 GenPhySE, Université de Toulouse, INRAE, ENVT, F-31326, Castanet Tolosan, France 2 INRAE, UR875 MIAT, PF Bioinfo GenoToul, F-31326, Castanet-Tolosan, France 3 Toxalim, Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, F-31027 Toulouse, France Corresponding Author: [email protected] Acknowledgements ExpoMycoPig project and JF are funded by France Futur Elevage We are grateful to the Genotoul bioinformatics platform Toulouse Occitanie (Bioinfo Genotoul, doi: 10.15454/1.5572369328961167E12) for providing computing and storage resources What bioinformatics solution exists to process whole metagenome shotgun data?

nf-core/mag ATLAS [3] Our solution: metagWGS → Incomplete → Incomplete → Complete ✓ Metagenome assembly ✓ Metagenome assembly ✓ Metagenome assembly ✓ Taxonomic affiliation × Taxonomic affiliation ✓ Taxonomic affiliation of reads and bins of reads and contigs of reads, contigs and bins × Taxonomic affiliation of contigs ✓ Taxonomic affiliation of bins ✓ Functional annotation × Functional annotation ✓ Functional annotation → Easy of use → Issues during use → Easy of use ✓ Automated with [1] ✓ Automated with [4] ✓ Automated with

→ Highly reproducible [2] → Reproducible → Highly reproducible ✓ Comes with a Singularity container ✓ Installation and run with ✓ Comes with a Singularity container → Modular → Modular → Modular ✓ Different parameters ✓ Different parameters ✓ Different parameters ✓ Possibility to skip certain steps ✓ Possibility to skip certain steps ✓ Possibility to skip certain steps → Available → Available → Available ✓ https://github.com/nf-core/mag ✓ https://github.com/metagenome-atlas/atlas ✓ https://forgemia.inra.fr/genotoul-bioinfo/metagwgs metagWGS: preprocessing, assembly and annotation

S1 Raw data S2 Quality control Reads deduplication Adapter sequence S3 FastQC (entire or truncated) .html … [13] [10] bwa mem, samtools markdup High quality read S1 Low quality read FastQC .html … Contig 1.1 Host read S2 Contig 1.2 S3 Cleaning S1 S2 cutadapt [5] Assembly S3 metaspades [11] or megahit [12] sickle [6] S1 bwa mem [7] S2 S3 Contig 1.1 Contig 1.2 Assembly filter Filter_contig_per_cpm.py Taxonomic affiliation of reads S1 Contig 1.1 Annotation of genes S2 kaiju [8] CPM filter prokka [14] kronaTools [9] Contig 1.2 S3 S1 Contig 1.2 Gene 1 Gene 2 S2 Contig 1.3 S3 Gene 1 bis Gene 3

Possible skipping metagWGS: quantification and affiliation Clustering of genes Taxonomic affiliation of contigs diamond [17], aln2taxaffi.py cd-hit-est [15] S1 Contig 1.2 S2 Gene 1 Gene 2 #contig consensus taxid consensus lineage S1_1 1263 cellular organisms; ; Terrabacteria group; ; ; Clostridiales S3 Contig 1.3 ″Gene 1″ Gene 3 S1_10 84108 cellular organisms; Bacteria; Terrabacteria group; ; Contig 2.1 ″Gene 1″ Gene 4 Input from part 1 Binning of contigs Quantification of reads on genes Contig 1.2 S1 bowtie2 [18], metabat2 [19] Gene 1 Gene 2 S2 S1 bwa mem, featureCounts [16] Contig 1.3 S3 Annotation of genes S2 Contig 1.2 S1 Gene 1 bis Gene 3 S3 Gene 1 Gene 2 Reads Count 2 2 S1 Contig 1.3 Contig 1.1 Reads deduplication ″Gene 1″ Gene 3 S2 Contig 1.2 S3 Reads Count 3 1 Taxonomic affiliation of bins Contig 2.1 S2 BAT [20] ″Gene 1″ Gene 4 Contig 1.1 S1 S1 Reads S2 S2 S3 Count 1 4 Contig 1.2 S3 Assembly filter

S1 S2 … … Gene 1 5+ 1+ Taxonomy Taxonomy Gene 2 2+ 0+ … Possible skipping MultiQC [21] graphic outputs of metagWGS

Length and number of contigs

Read count metrics Assessment of bins

Taxonomic affiliation of reads metrics

Output tables and other graphs of metagWGS Taxonomic classification of contigs Kronas Quantification table of reads on genes consensus #contig consensus lineage taxid id_cluster first.featureCounts.tsv second.featureCounts.tsv cellular organisms; Bacteria; ; delta/epsilon subdivisions; ; Campylobacterales; first1 210 Helicobacteraceae; Helicobacter; Helicobacter pylori first59.Prot_28193 844 887 cellular organisms; Bacteria; Terrabacteria group; Actinobacteria; Actinobacteria; Bifidobacteriales; ; first10 1681 ; Bifidobacterium bifidum first82.Prot_34066 847 891 cellular organisms; Bacteria; Terrabacteria group; Firmicutes; ; Lactobacillales; ; ; Lactococcus first100 1358 first992.Prot_96159 4092 4389 lactis first1000 1322 cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Clostridia; Clostridiales; Lachnospiraceae; Blautia; Blautia hansenii first8.Prot_06503 5279 3702 cellular organisms; Bacteria; Proteobacteria; ; Aeromonadales; Aeromonadaceae; Aeromonas; Aeromonas first10004 644 first21.Prot_13879 584 611 hydrophila metagWGS will be used to analyze ExpoMycoPig data

How can we characterize microbiome digestive ecosystems for pig exposed to mycotoxins?

Experimental strategy 3 weeks Time n = 8 Control diet

n = 8 Diet contaminated with FB1 mycotoxins [22]

Fecal sampling Fecal sampling Total DNA extraction Whole genome metagWGS shotgun DNA samples sequencing Output files .fastq.gz

References

[1] Di Tommaso et al. Nextflow enables reproducible computational workflows. Nat Biotechnol., 2017. [12] Li et al. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015. [2] Kurtzer et al. Singularity: Scientific containers for mobility of compute. PLoS One, 2017. [13] Li at al. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009. [3] Kieser et al., ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data, bioRxiv, 2019. [14] Seemann. Prokka: rapid prokaryotic genome annotation. Bioinformatics, 2014. [4] Köster et al., Snakemake - A scalable bioinformatics workflow engine. Bioinformatics, 2012. [15] Fu et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. [5] Martin. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 2011. [16] Liao et al. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 2014. [6] Joshi and Fass. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files [Software]. Available at https://github.com/najoshi/sickle, 2011. [17] Buchfink et al. Fast and sensitive protein alignment using DIAMOND. Nature Methods, 2015. [7] Li and Durbin. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 2009. [18] Langmead and Salzberg. Fast gapped-read alignment with Bowtie 2. Nature Methods, 2012. [8] Menzel et al. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun., 2016. [19] Kang et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 2019. [9] Ondov et al. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 2011. [20] von Meijenfeldt et al. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biology, 2019. [10] Andrews. FastQC. Available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc, 2010. [21] Ewels et al. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 2016. [11] Nurk et al. MetaSPAdes: a new versatile metagenomic assembler. Genome Res., 2017. [22] Pinton and Oswald. Effect of deoxynivalenol and other Type B trichothecenes on the intestine: a review. Toxins, 2014.