6S Bacterial Rrna Microbiome Analyses of the Lung Identifies a Unique Bacterial Species Associated with a Specific T-53 Mutation
Total Page:16
File Type:pdf, Size:1020Kb
Species-specific 16S Bacterial rRNA Microbiome Analyses of the Lung Identifies a Unique Bacterial Species Associated with a Specific T-53 Mutation Garth D. Ehrlich, Ph.D., F.A.A.A.S Executive Director, Center for Genomic Sciences Executive Director, Center for Advanced Microbial Processing Professor of Microbiology and Immunology Drexel University College of Medicine Director, Meta-Omics Shared Resource Sidney Kimmel Cancer Center PacBio Users Group Meeting, St. Louis, MO, September 19, 2018. Thanks and acknowledgements The Lab Team The Bioinformatic Team Jarek Krol, PhD Azad Ahmed, MD Josh Earl, ABD Josh Mell, PhD Steven Lang, BS Bhaswati Sen, PhD Carol Hope, BS Archana Bhat, MS Rachael Ehrlich, MS Objectives of Pan-Domain Technology • Broad-based identification of Bacteria and Fungi – no a priori guessing of what to test for within these domains – No culture necessary • Mixed populations of microbes (polymicrobial assemblages) • Quantitative • High resolution genotyping • Identify emerging agents not previously seen • Costs effective, high throughput 16S Microbiome Analyses • Revolutionized community-based microbial studies – Revealed unprecedented levels of complexity • Natural systems - soils water • Organismal systems – Holobiont – Metazoans and their microbiota – Plants and their microbiota The Problem with NextGen Microbiome analyses Lack of specificity – good for global analyses of diversity • Short read-lengths – – Genus-level taxonomic characterization at best – Sometime only family level • High error rates – Single-pass sequencing make error correction very difficult – Vastly increases numbers of OTU’s The problem of false OTUs They create differences S1 0 error S1 w errors where there are none 400 OTUs 500 OTUs They amplify any real S2 0 error 0/400 100/500 differences 400 OTUs S2 w errors 100/500 200/600 500 OTUs S3 0 errors 200/500 300/600 400 OTUs S3 w errors 300/600 400/700 500 OTUs S1 and S2 are biological replicates S3 is different from S1 and S2 Current technical methods fail to provide the necessary taxonomic details to follow up associative studies How Circular Consensus Sequencing Works Multiple pass CCS Analysis Algorithm 1) Bin amplimers via barcodes 2) Filter out all amplimers that are not sequenced for ≥ 5 passes (circular sequencing), and do not have ≥ 90% PacBio quality scores 3) Filter out any reads aligning partially to the human genome 4) Remove all reads of > 1500bp (<1% of remaining reads) as likely chimerics 5) Remove all reads not having both forward and reverse primers, and trim primers 6) Correct for strandedness so that all reads are on the same strain 7) Remove any remaining reads which have > 10 expected errors calculated from PHRED scores (< 5% of remaining reads) 8) Use only reads with < 1expected error (so as to not inflate the number of OTU’s clusters formed at the level of 97% sequence identity 9) Use the OTU’s from (8) to map back all of the reads that passed filters 2-7 to provide the final counts for each taxon. Note with this approach essentially all of the surviving reads map back to one of the high-fidelity OTU’s Goals of Microbiome Data Processing • Minimize numbers of operational taxonomic units (OTUs) = species so as not to artificially inflate the diversity • This is critical for comparisons among specimens – for if each specimen has its own set of false OTUs it produces a greatly exaggerated picture of differences Expected Error vs. Average Error The Uclust pipeline recommends using the “Expected Error” Score rather than the average error: Q scores in read Avg. Q Expected number of errors 140 x Q35 + 10 x Q2 33 6.4 ! 150 x Q25 25 0.5 We can see that average quality is actually a poor representation of the probable number of errors in a sequence. Filtering • Phred Quality – Highest quality score in PacBio sequences is 42 where Quality is equal to -10*log10P, where P is equal to the proBaBility the Base is wrong. Our longest sequence was trimmed to 1500bp, therefore: – The *Best* expected error we can get is 10^(- 4.2)*1500 ~= 0.00946 (or about .01) Filtering applied to controls • Perfect (i.e. single OTU) clustering of positive controls happened only at an Expected Error (EE) level of one or less. • This suggests that we should filter to EE ≤ 1 for OTU clustering and classification, or expect that we will have spurious OTUs based on error profile, rather than true biological sequences BEI Mock Community (22 species) BEI Mock Community • Network of all BEI strain’s16s alignment Some results and graphs: BEI dataset Multiple expected error filtering and how it affects the number (and ‘correctness’) of the OTU’s Number of OTUs in BEI 500 472 450 400 350 300 250 210 200 150 NUMBER OF OTU'S NUMBER 80 100 50 65 19 19 19 19 20 22 25 36 50 0 0 0.01 0.25 0.5 0.75 1 2 3 4 5 6 7 8 9 10 EXPECTED ERROR BEI Mock Community • Initial Results – Able to resolve 19 out of 21 – No False Positives 100% species-level identity from mock community analysis Species %identity Streptococcus mutans UA159 100.00% Streptococcus agalactiae 2603V/R 100.00% Lactobacillus gasseri ATCC 33323 100.00% Enterococcus faecalis OG1RF 100.00% Actinomyces odontolyticus 100.00% Streptococcus pneumoniae TIGR4 99.90% Acinetobacter baumannii ATCC 17978 99.90% Neisseria meningitidis MC58 99.80% Listeria monocytogenes strain EGD 99.80% Bacillus cereus ATCC 10987 99.80% Staphylococcus epidermidis ATCC 12228 99.70% Rhodobacter sphaeroides 2.4.1 chromosome 1 99.70% Pseudomonas aeruginosa PAO1 99.70% Escherichia coli str. K-12 substr. MG1655 99.70% Clostridium beijerinckii NCIMB 8052 99.70% Propionibacterium acnes KPA171202 99.60% Helicobacter pylori 26695 99.40% Deinococcus radiodurans 99.20% Bacteroides vulgatus ATCC 8482 98.90% No species level confidence • Database had taxonomic classification up to species level • Command to assign taxonomy only gave confidences up to the genus level • We wanted the confidence values for the species levels as well • Spoof the database Shift taxonomy so Domain is now Phylum etc. • Command to assign taxonomy so program thinks of the species level as genus • Confidence of ‘g’ is actually species confidence Species-level confidence OTUId Genus genus_conf species species_con OTU_10;size=2442; ListeriA 0.987 ListeriA_monocytogenes 0.767 OTU_11;size=4118; BActeroidAceAe 0.9931 BActeroides_vulgAtus 0.9301 OTUId genus genus_conf species OTU_12;size=3798; StAphylococcus 0.9843 StAphylococcus_epidermidis 0.767 OTU_10;size=2442; ListeriA 0.978 ListeriA_monocytogenes OTU_13;size=597; RhodobActer 0.9684 RhodobActer_sphAeroides 0.6177 OTU_11;size=4118; BActeroidAceAe 0.9883 BActeroides_vulgAtus OTU_14;size=2281; ShigellA 0.7125 ShigellA_flexneri 0.4684 OTU_12;size=3798; StAphylococcus 0.9729 StAphylococcus_epidermidis OTU_15;size=417; PseudomonAs 0.9684 PseudomonAs_Aeruginosa 0.9128 OTU_13;size=597; RhodobActer 0.9678 RhodobActer_sphAeroides OTU_16;size=294; Deinococcus 0.9936 Deinococcus_rAdiodurAns 0.9301 OTU_14;size=2281; ShigellA 0.6849 ShigellA_flexneri OTU_15;size=417; PseudomonAs 0.9678 PseudomonAs_Aeruginosa OTU_17;size=238; Streptococcus 0.9887 Streptococcus_pneumoniAe 0.4684 OTU_16;size=294; Deinococcus 0.9891 Deinococcus_rAdiodurAns OTU_17;size=238; Streptococcus 0.9814 Streptococcus_pneumoniAe OTU_18;size=2227; Clostridium 0.9824 Clostridium_beijerinckii 0.319 OTU_18;size=2227; Clostridium 0.9746 Clostridium_beijerinckii OTU_19;size=101; Actinomyces 0.9878 Actinomyces_odontolyticus 0.9214 OTU_19;size=101; Actinomyces 0.9797 Actinomyces_odontolyticus OTU_1;size=5193; HelicobActerAceAe 0.9887 HelicobActer_pylori OTU_1;size=5193; HelicobActerAceAe 0.9934 HelicobActer_pylori 0.9128 OTU_2;size=2845; Streptococcus 0.9797 Streptococcus_AgAlActiAe OTU_3;size=2582; LActobAcillus 0.9879 LActobAcillus_gAsseri OTU_2;size=2845; Streptococcus 0.9878 Streptococcus_AgAlActiAe 0.9387 OTU_4;size=2658; NeisseriA 0.9678 NeisseriA_meningitidis OTU_3;size=2582; LActobAcillus 0.9922 LActobAcillus_gAsseri 0.767 OTU_5;size=1755; PropionibActerium 0.9865 PropionibActerium_Acnes OTU_6;size=1983; Streptococcus 0.9865 Streptococcus_mutAns OTU_4;size=2658; NeisseriA 0.9684 NeisseriA_meningitidis 0.9128 OTU_7;size=1812; AcinetobActer 0.9695 AcinetobActer_bAumAnnii OTU_8;size=1901; BAcillus 0.9746 BAcillus_AnthrAcis OTU_5;size=1755; PropionibActerium 0.9913 PropionibActerium_Acnes 0.956 OTU_9;size=886; Enterococcus 0.9712 Enterococcus_fAecalis OTU_6;size=1983; Streptococcus 0.9913 Streptococcus_mutAns 0.9474 OTU_7;size=1812; AcinetobActer 0.9824 AcinetobActer_bAumAnnii 0.9214 OTU_8;size=1901; BAcillus 0.9852 BAcillus_AnthrAcis 0.4684 OTU_9;size=886; Enterococcus 0.9835 Enterococcus_fAecalis 0.9214 CAMI Mock Community Sequencing OTU • 250 OTU found (232 mapped to unique species) out of 243 unique NCBI OTU (Or, How Did We Do?) – 6.9% (16/232) LabeLLed with incorrect species name (after correcting with reference OTU*) – 93.1% (216/232) Correct identification of OTU to species LeveL – 88.9% (216/243) Correct identification and classification to species LeveL out of entire expected OTU number in community – 2.2% (5/232) instances of the incorrect genus (onLy 2 of which had more than a singLe read mapping to it) – 93.4% (227/243) Correct identification and classification to genus LeveL of entire community – 66 correct unique Species identified from OTU pipeLine not present in predicted reference sequences – 27 missing/incorrect*16s genes predicted from species, genome assembLies 16 weremissing/incorrect assigned taxonomy from the generaNCBI. OTU Instances where a predicted 16s gene’s taxonomy matched the taxonomic assignment