Figure S1 (A) 16S Rrna Gene Based Bacterial Taxonomy; TEFAP (Tag Encoded FLX Amplicon

Figure S1 (A) 16S Rrna Gene Based Bacterial Taxonomy; TEFAP (Tag Encoded FLX Amplicon

<p> 1 Figure S1 (a) 16S rRNA gene based bacterial taxonomy; TEFAP (tag</p><p>2 encoded FLX amplicon pyrosequencing) data from Sangwan et al., 2012 and</p><p>3 metagenomic 16S rRNA gene sequences from this study (minimum length =</p><p>4 150bp) were compared against SILVA database. Normalised genus versus</p><p>5 metagenome matrix was used to perform the two way clustering. Top 50</p><p>6 genera with minimum abundance = 0.8% and minimum standard deviation</p><p>7 >0.6 were clustered using Manhattan distance matrix (b) Pair wise</p><p>8 comparisons (Fisher's exact test with Benjamini and Hochberg False</p><p>9 Discovery Rate multiple testing correction) of EGT (environmental gene</p><p>10 typing) typing results of dumpsite (Illumina dataset; this study and</p><p>11 pyrosequence; Sangwan et al., 2012) </p><p>12 Figure S2 (a) Bacterial diversity patterns at HCH dumpsite; EGT</p><p>13 (Environment Gene Mapping) mapping against NCBI RefSeq (n= >15000)</p><p>14 database; sample tree was clustered based upon Manhattan distance matrix</p><p>15 and average linkage clustering, 50 statistically significant (minimum relative</p><p>16 abundance = 0.8% and standard deviation = >0.4) genera have been used for</p><p>17 presentation. (b) Comparative analysis of community metabolism across HCH</p><p>18 contamination; Hierarchical Clustering (bootstrap n =1000, clustering</p><p>19 algorithm = average with correlation distance matrix) was performed on the</p><p>20 KEGG enzyme annotations (BLASTX ; E-value 10 -5) (c) Taxon based</p><p>21 enrichment over HCH contamination gradient; Hierarchical clustering was</p><p>22 performed on the principal components obtained by ordination (PCA on</p><p>1 1 2 23 correlation matrix with 1000 bootstrap) analysis on genus versus metagenome</p><p>24 matrix obtained after metagenome 16S rRNA domain specific typing</p><p>25 (including TEFAP data from Sangwan et al., 2012) and EGT analysis. Genera</p><p>26 with minimum 0.8% relative abundance and standard deviation >0.4% across</p><p>27 HCH gradient (Dumpsite illumina = Dumpsite 454 > 1km-dataset > 5km</p><p>28 dataset) were selected for clustering and visualization. </p><p>29 Figure S3 Phylogenomics of the genus Sphingobium, and metagenomic</p><p>30 fragment recruitment from an HCH dumpsite sample. (a) Hierarchical</p><p>31 clustering on whole genome based ANI comparisons (measure= distance and</p><p>32 clustering algorithm = Un-weighted Average, bootstrap %; n= 1000) and</p><p>33 metagenomic recruitment of each genotype is plotted alongside. Purple color</p><p>34 dots represent the metagenome reads mapped over reference genome</p><p>35 sequence (X-axis) with measured percentage sequence identity (Y-axis) and</p><p>36 yellow color bands represents the genomic coordinates of lin genes (b) Tetra</p><p>37 nucleotide based comparisons of S. japonicum UT26, S. indicum B90A,</p><p>38 S.chlorophenolicum L-1, Sphingobium sp. SYK-6 and Meta-Sphingobium</p><p>39 assembly. Lower panel represents shows scatter plots of the 256 tetra</p><p>40 nucleotide z-scores for each pair wise comparisons and upper panel represents</p><p>41 Pearson correlation coefficients. (c) Correlation between MGI (Metagenomic</p><p>42 Island) predictions of Sphingobium indicum B90A and Sphingobium</p><p>43 japonicum UT26 complete genome. Each Protein coding gene was compared</p><p>44 against NCBI COG database and Fishers exact test with FDR correction</p><p>3 2 4 45 method was used for statistical significance. (d) Each ‘foreign gene’ (putative</p><p>46 CDS located on genomic islands) were compared (class level) against NCBI’s</p><p>47 COG database and custom database (see material and methods). For S.</p><p>48 indicum B90A genomic islands predicted from SIGI-HMM were plotted</p><p>49 separately. </p><p>50 Figure S4 (a) The average dN/dS ratio (Y axis) is plotted against the</p><p>51 synonymous substitution rates (Ds) (X axis). Encircled points represent the</p><p>52 genes for which pyrosequence data (Sangwan et al., 2012) of dumpsite</p><p>53 metagenome was used for the pair wise dN/dS calculations. (b) Strain and/or</p><p>54 environmental specific dynamics of recalcitrant compound degradation</p><p>55 potential. The X axis represents the putative genes from S.japonicum UT26</p><p>56 involved in the microbial degradation of phenol/toluene, chlorophenol,</p><p>57 anthranilate, homogentisate and hexachlorocyclohexane. Upper panel shows</p><p>58 the recruitment of ancestral protein coding sequences, middle panel;</p><p>59 individual metagenomic reads and lower panel represents the recruitment of</p><p>60 raw sequence from 2kb genomic library of S.indicum B90A. Location of linA,</p><p>61 linB and linC genes is highlighted with color gradient. (c) Comparative</p><p>62 metabolism analysis; ancestor genotype metabolism (KEGG subsystem) was</p><p>63 compared against HCH degrading subspecies and metagenome contigs</p><p>64 corresponding to Sphingobium genomes (binned out using tetra-ESOM and %</p><p>65 GC). Two way clustering was performed using with Euclidean distance and</p><p>66 Kendall’s tau matrices and average linkage clustering. </p><p>5 3 6 67</p><p>68 Figure S5 (a) Graph based clustering of Sphingobium indicum B90A genome;</p><p>69 paired end reads were assembled using cap3 program and clustering was</p><p>70 performed with minimum overlap length = 40% of the length (140bp) and</p><p>71 minimum percentage identity criterion was set at 80. (b) Graph based</p><p>72 clustering of HCH dumpsite metagenome; paired end reads were assembled</p><p>73 using cap3 program and clustering was performed with minimum overlap</p><p>74 length = 40% of the length (33bp) and minimum percentage identity criterion</p><p>75 was set at 80. (c) Plasmid genotype enrichment over HCH contamination:</p><p>76 Metagenomic reads were compared against NCBI plasmid database.</p><p>77 Normalised (read assigned to a genome/total reads assigned against database).</p><p>78 Euclidean and Manhattan distance matrix were used for the two way</p><p>79 clustering. [A] Represents the genotypes significantly enriched (P value <</p><p>80 0.0001 for all pair wise comparisons) over HCH contamination (Dumpsite ></p><p>81 1km > 5km). [B] represents genotypes enriched (P value < 0.0001 for all pair</p><p>82 wise comparisons) in Dumpsite-illumina dataset. Names and relative</p><p>83 abundance of all the genera used in the generation of this heat map is provided</p><p>84 in the Supplementary file 8.</p><p>7 4 8</p>

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us