1 Predictive Functionality of Bacteria in Naturally Fermented Milk Products of India Using

bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455378; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Predictive functionality of bacteria in naturally fermented milk products of India using 2 PICRUSt2 and Piphillin pipelines 3 4 H. Nakibapher Jones Shingling and Jyoti Prakash Tamang* 5 6 DAICENTER (DBT-AIST International Centre for Translational and Environmental 7 Research) and Bioinformatics Centre, Department of Microbiology, School of Life Sciences, 8 Sikkim University, Gangtok 737102, Sikkim, India 9 10 *Corresponding author: Professor Dr. Jyoti Prakash Tamang, Department of Microbiology, 11 Sikkim University, Tadong 737102, Sikkim, India (e-mail: [email protected]; 12 Mobile: +91-9832061073) 13 14 Running Title: Metagenome gene prediction of fermented milk 15 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455378; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 16 Abstract 17 Naturally fermented milk (NFM) products are popular food delicacies in Indian states of 18 Sikkim and Arunachal Pradesh. Bacterial communities in these NFM products of India were 19 previously analysed by high-throughput sequence method. However, predictive gene 20 functionality of NFM products of India has not been studied. In this study, raw sequences of 21 NFM products of Sikkim and Arunachal Pradesh were accessed from MG-RAST/NCBI 22 database server. PICRUSt2 and Piphillin tools were applied to study microbial functional 23 gene prediction. MUSiCC-normalized KOs and mapped KEGG pathways from both 24 PICRUSt2 and Piphillin resulted in higher percentage of the former in comparison to the 25 latter. Though, functional features were compared from both the pipelines, however, there 26 were significant differences between the predictions. Therefore, a consolidated presentation 27 of both the algorithms presented an overall outlook into the predictive functional profiles 28 associated with the microbiota of the NFM products of India. 29 30 Keywords: Metagenome gene prediction, PICRUSt2, Piphillin, naturally fermented milk 31 products, lactic acid bacteria 32 33 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455378; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 34 Introduction 35 Naturally fermented milk (NFM) products are popular food items in daily diets of ethnic 36 people of Arunachal Pradesh and Sikkim in India, which include dahi, mohi, gheu, soft- 37 chhurpi, hard-chhurpi, dudh-chhurpi, chhu, somar, maa, philu, shyow, mar, chhurpi/churapi, 38 churkam and churtang/chhurpupu (Rai et al. 2016; Tamang et al. 2021). Previously, 39 taxonomic analysis using high-throughput sequencing (HTS) of NFM products of Arunachal 40 Pradesh and Sikkim viz. chhurpi, churkam mar/gheu and dahi, have been studied 41 (Shangpliang et al. 2018). We have recorded the abundance of phylum Firmicutes with 42 predominated species of lactic acid bacteria (LAB) viz. Lactococcus lactis (19.7%) and 43 Lactobacillus helveticus (9.6%) and Leuconostoc mesenteroides (4.5%) and acetic acid 44 bacteria (AAB): Acetobacter lovaniensis (5.8%), Acetobacter pasteurianus (5.7%), 45 Gluconobacter oxydans (5.3%), and Acetobacter syzygii (4.8%) (Shangpliang et al. 2018). 46 Application of shotgun metagenomics is one of the commonly used methods for 47 understanding the microbial-associated gene functional characteristics (Quince et al. 2017). 48 However, alternately functional profiles of a microbial community can also be inferred 49 indirectly by marker-gene surveys such as 16S rRNA gene (Ortiz-Estrada et al. 2019; 50 Bokulich et al. 2020). Bioinformatics pipelines such as Phylogenetic Investigation of 51 Communities by Reconstruction of Unobserved States version2 (PICRUSt2) (Douglas et al. 52 2020) and Piphillin (Narayan et al. 2020) among others are some of the well-known tools for 53 microbial predictive functionality studies from various NGS-related metagenomic data 54 (Ortiz-Estrada et al. 2019; Bokulich et al. 2020). These pipelines have also been applied in 55 fermented milk products to infer the functional gene predictions (Zhang et al. 2017; Zhu et al. 56 2018; Chen et al. 2020; Choi et al. 2020a,b). Microbiota present in NFM products harbour 57 probiotic properties and impart several health-promoting benefits to consumers (Bengoa et al. 58 2019; Tamang et al. 2020; García-Burgos et al. 2020). Predictive gene functionality in NFM 3 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455378; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 59 products of India has not been analysed yet. Hence, the present study is aimed to predict the 60 microbial functional contents of 16S rRNA gene sequencing data of NFM products of India, 61 previously analysed by high-throughput sequencing method (Shangpliang et al. 2018), using 62 PICRUSt2 and Piphillin pipelines. 63 64 Material and Methods 65 Pre-analysis prior to predictive functionality analysis 66 Raw sequences of NFM products of Arunachal Pradesh and Sikkim in India analysed by HTS 67 method (Supplementary Table 1) were accessed from MG-RAST/NCBI database server and 68 were used in this study. Raw reads were processed using QIIME2-2020.6 69 (https://docs.qiime2.org/2020.6/) (Bolyen et al. 2019). After importing into QIIME2 70 environment, Q-score based filtering and denoising was performed using Divisive Amplicon 71 Denoising Algorithm (DADA2) (Callahan et al. 2016) via qiime dada2 denoise-paired 72 plugin. Quality-filtered sequences were then clustered against SILVA v132 (Quast et al. 73 2012) databases and followed by taxonomic assignment using q2-vsearch-cluster-features- 74 closed-reference (Rognes et al. 2016). 75 76 Predictive functionality analysis 77 PICRUSt2 analysis (https://github.com/picrust/picrust2/wiki) 78 Quality-filtered clustered sequences were feed into PICRUSt2 algorithm (Douglas et al. 79 2020) using via q2-vsearch-cluster-features-closed-reference (Rognes et al. 2016). PICRUSt2 80 deduced the predictive functionality of the marker genes by using a standard integrated 81 genomes database. Firstly, multiple assignment of the exact sequence variants (ESVs) was 82 performed using HMMER (http://www.hmmer.org/). Placements of ESVs in the reference 83 tree with evolutionary placement-ng (EPA-ng) algorithm (Barbera et al. 2019) and Genesis 4 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455378; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 84 Applications for Phylogenetic Placement Analyses (GAPPA) omics (Czech and Stamatakis 85 2019) were applied. Prediction of gene families was run using a default castor R package 86 (Louca and Doebeli 2018) with the default algorthim run (maximum parsimony) and 87 metagenome prediction was acquired using metagenome_pipeline.py (Ye and Doak 2009). 88 89 Piphillin analysis (https://piphillin.secondgenome.com/) 90 Additionally, predictive functionality was also inferred using Piphillin (Narayan et al. 2020), 91 a web-server analysis pipeline. DADA2-clustered representative sequences (.fasta) and 92 abundance frequency table (.csv) were used as inputs for the analysis. 93 94 Statistical analysis and data visualization 95 Unnormalized Kyoto Encyclopaedia of Genes and Genomes (KEGG) ortholog (KO) profiles 96 of PICRUSt2 and Piphillin predictive were normalized using Metagenomic Universal Single- 97 Copy Correction (MUSiCC) (Manor and Borenstein 2015). The output features were then 98 mapped to KEGG database for systematic analysis of gene functions (Kanehisa et al. 2012). 99 Relative abundance at the category level was plotted as stacked bar-plot using MSEXCEL 100 v365. Statistical analysis for significant features (pathways) was carried out using STAMP 101 (Parks et al. 2014). Normalized predictive features were log-transformed and the differences 102 between PICRUSt2 and Piphillin predictive features were calculated using White’s non- 103 parametric with Benjamini-Hochberg FDR (false discovery rate) (Parks et al. 2014). Non- 104 parametric Spearman’s correlation of the bacteria and functionality was analyzed through 105 Statistical Package for the Social Sciences (SPSS) v20 and the heatmap representation was 106 plotted using ClustVis (Metsalu and Vilo 2015). 107 108 Results 5 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.06.455378; this version posted August 8, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 109 Microbial predictive gene functionality 110 A total of 1109 error-corrected ESVs was obtained from DADA2 analysis and about 268 111 SILVA-clustered sequences were used for the downstream predictive analysis. A total of 112 5995 MUSiCC-normalized KOs and 181 mapped KEGG pathways was obtained from 113 PICRUSt2 analysis. Similarly, a total of 5245 MUSiCC-normalized KOs and

1 Predictive Functionality of Bacteria in Naturally Fermented Milk Products of India Using

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support