Presentation on MEGAN
Total Page:16
File Type:pdf, Size:1020Kb
ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN% ACTGACTG% GACTGACT% taxonomic%binning%of% TGACTGAC% CTGACTGA% sequence%data% ACTGACTG% GACTGACT% TGACTGAC% % CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Overview% ACTGACTG% GACTGACT% TGACTGAC% • Introduc>on% CTGACTGA% ACTGACTG% • Why%use%MEGAN% GACTGACT% TGACTGAC% • CTGACTGA% using%MEGAN% ACTGACTG% GACTGACT% • Exercises% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 2% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% High%throughput%sequencing% ACTGACTG% GACTGACT% TGACTGAC% ShotFgun%sequencing%!%huge%amounts%of%data% CTGACTGA% 454%GSFFLX%generates%400F600%million%bp%per%run% ACTGACTG% %with%a%length%of%the%reads%between%400F500%bp %% GACTGACT% TGACTGAC% % CTGACTGA% Understanding%this%amount%of%informa>on%in%a%quick%manner?% ACTGACTG% !%classifica>on%of%sequences% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 3% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Sequence%Classifica>on% ACTGACTG% GACTGACT% TGACTGAC% Sequence%classifica>on%(binning)%is%the%process%of%separa>ng%sequence% CTGACTGA% data%using%specific%informa>on%!%crea>ng%bins% ACTGACTG% % GACTGACT% This%informa>on%can%be%based%on%:% TGACTGAC% CTGACTGA% F Similarity%e.g.%MEGAN,%SORTFITEMS% ACTGACTG% F Phylogeny%e.g.%tools%like%CARMA% GACTGACT% F Func>onal%annota>on%e.g.%GO%classifiers% TGACTGAC% CTGACTGA% ACTGACTG% MEGAN%uses%similarity%searches%with%BLAST%to%bin%sequences% GACTGACT% into%taxa.% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 4% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN% ACTGACTG% GACTGACT% Metagenome:%the%collec>ve%genome%of%all%the%microorganisms%in%am% TGACTGAC% environment.%(Handelsman)et)al.,)1998)% CTGACTGA% ACTGACTG% % GACTGACT% Metagenomics%is%the%study%of%the%metagenome%using%high%throughput% TGACTGAC% sequencing.% CTGACTGA% % ACTGACTG% The%ques>on:% GACTGACT% TGACTGAC% How%to%determine%species%composi>on%in%a%metagenomic%dataset?% CTGACTGA% % ACTGACTG% • Sequence%comparison%with%known%sequences%from%a%database%e.g.% GACTGACT% Genbank.% TGACTGAC% • Metagenomic%datasets%contain%many%sequences,%so%manual% CTGACTGA% inspec>on%is%impossible%!%MEGAN%is%your%assistant.% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 5% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% (Huson%et%al.,%Genome%Research,%2007)% 6% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Why%use%Megan?% ACTGACTG% GACTGACT% TGACTGAC% Easy%to%work%with%on%a%desktop%/%laptop%computer:% CTGACTGA% Extra%things%needed:%Java,%a%BLAST%server%(e.g.%Bioportal)% ACTGACTG% % GACTGACT% TGACTGAC% MEGAN%%gives%a%visualiza>on%of%BLAST%results% CTGACTGA% • Study%diversity% ACTGACTG% • Compare%samples% GACTGACT% TGACTGAC% • Contamina>on%filtering%% CTGACTGA% • Special%gene%of%interest% ACTGACTG% • Extrac>on%of%sequences%based%on%taxonomic%informa>on. %% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% Bioportal:%hap://www.bioportal.uio.no//% GACTGACT% TGACTGAC% CTGACTGA% 7% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% TGACTGAC% MEGAN%uses%BLAST,%a%database%and%a%taxonomy%file% CTGACTGA% • BLAST%N%:%nucleo>des%against%a%nucleo>de%database.% ACTGACTG% • BLAST%X%:%Translated%nucleo>de% GACTGACT% against%a%protein%database.% TGACTGAC% CTGACTGA% • Which%Database?% ACTGACTG% %one%of%the%many%available%database%like%the%NCBIFnonFredundant% GACTGACT% database,%or%a%your%own%custom%database.% TGACTGAC% • Taxonomy:%NCBI%taxonomy,%or%your%own%custom%taxonomy% CTGACTGA% ACTGACTG% % GACTGACT% BLAST%output%file%is%used%to%bin%sequences%using%the%LCA%assignment% TGACTGAC% algorithm%into%specific%taxons.% CTGACTGA% ACTGACTG% % GACTGACT% % TGACTGAC% % CTGACTGA% 8% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% • The%LCA%algorithm%=%“Lowest%Common%Ancestor”%algorithm% TGACTGAC% CTGACTGA% In%this%approach,%every%read%is%assigned%to%some%taxon.%If%the%read%aligns% ACTGACTG% “ GACTGACT% very%specifically%only%to%a%single%taxon,%then%it%is%assigned%to%that% TGACTGAC% taxon.%The%less%specifically%a%read%hits%taxa,%the%higher%up%in%the% CTGACTGA% taxonomy%it%is%placed.%Reads%that%hit%ubiquitously%may%even%be% ACTGACTG% assigned%to%the%root%node%of%the%NCBI%taxonomy.”% GACTGACT% % TGACTGAC% CTGACTGA% %(the%MEGAN%manual)% ACTGACTG% % GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 9% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% TGACTGAC% The%%default%LCA%parameters%are:%% CTGACTGA% •%Min%Support%% !%%default%=%5%reads%per%taxon% ACTGACTG% •%Min%Score%% !%default%bitscore%=%35% GACTGACT% TGACTGAC% •%Min%score%/%length%% !%Bitscore%divided%by%the%read%length%d%=%0% CTGACTGA% •%The%top%percentage% !%The%maximum%percentage%by%which%the% ACTGACTG% % %%%%%%score%of%a%hit%may%fall%below%the%best% GACTGACT% % %%%%%%score%achieved%for%a%given%read%%d%=%10% TGACTGAC% •Win%score% !%If%a%win%score%is%set,%then,%for%a%given%read,%if%any% % CTGACTGA% ACTGACTG% % %%match%exceeds%the%win%score,%only%matches%% %%%% GACTGACT% %exceeding%the%win%score%(“winners”)%are%used%to%% %%% TGACTGAC% %place%the%given%read.%d%=%0% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 10% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% No%hits:%none%of%the%blast%hits%reached%the%minimum%bitscore.% CTGACTGA% Not%assigned%:%The%sequence%had%not%enough%hits%to%be%% ACTGACTG% %%%%%classified%to%a%taxon%(Min%support%&%top%percentage)% GACTGACT% TGACTGAC% CTGACTGA% 11% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Playing%with%min%support% ACTGACTG% Alphaproteobacteria 1 GACTGACT% Rhizobiales 2 Gammaproteobacteria 20 Alteromonadales 1 Shewanella denitrificans OS217 1 TGACTGProteobacteriaAC% 215 Methylococcus capsulatus str. Bath 1 Bacteria 93 Gammaproteobacteria 20 Enterobacteriaceae 1 CTGACTGA% Thioalkalivibrio sp. HL-EbGR7 1 Thiotrichales 0 Methylophaga thiooxidans DMS010 1 ACcellularTG AorganismsCTG% 56 Beggiatoa sp. PS 2 Mariprofundus ferrooxydans 22 Endoriftia persephone 7 Proteobacteria 215 GACTGACT% Desulfovibrio desulfuricans 4 Desulfobacterales 0 Desulfobacteraceae 1 TGACTGAC% Deltaproteobacteria 2 root 10 Bacteria 93 Desulfotalea psychrophila 1 delta/epsilon subdivisions 0 Desulfuromonadales 2 CTGACTGA% Syntrophobacter fumaroxidans 1 ACTGACTG% Methanosarcina 10 Epsilonproteobacteria 3 Campylobacterales bacterium GD 1 1 cellular organisms 56 Sulfurovum sp. NBC37-1 3 GACTGACT% root 10 Betaproteobacteria 8 Burkholderiales 1 Thiobacillus denitrificans 7 TGACTGAC% Mariprofundus ferrooxydans 22 Bacteroidetes 2 Flavobacteriales bacterium HTCC2170 5 Bacteroidetes/Chlorobi group 0 Bacteroidales 1 CTGACTGA% Not assigned 107 Chlorobaculum parvum 1 Cyanobacteria 4 ACTGACTG% Cyanothece sp. PCC 7424 1 uncultured marine bacterium 2 GACTGACT% Verrucomicrobia 1 Euryarchaeota 7 Methanoculleus marisnigri 1 Methanomicrobia 0 Methanosarcinales 4 Methanosarcina 10 TGACTGAC% Methanosarcinaceae 5 Archaea 7 Methanococcoides burtonii 4 CTGACTGA% uncultured archaeon GZfos17F1 1 environmental samples 0 uncultured archaeon GZfos26B2 5 ACTGACTG% uncultured archaeon GZfos1D1 3 Eukaryota 0 Spermatophyta 1 Ass=0, GACTGACT% Sum=1, TGACTGAC% %=%10% %=%1% CTGACTGA% 12% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% Phosphorus metabolism CTGACTGA% Nitrogen metabolism Stappia 5 ACTGACTG% Alphaproteobacteria 0 Magnetospirillum 12 GACTGACT% Alteromonadales 41 Saccharophagus 6 TGACTGAC% Colwellia 22 %%%%An%example% Oceanospirillum 7 CTGACTGA% Methylococcus 5 Gammaproteobacteria 293 ACTGACTG% Photobacterium 5 GACTGACT% Comparison%between%reads%assigned% Nitrosococcus 15 Thiomicrospira 6 Thiotrichales 0 TGACTGAC% to%Phosphorus%metabolism% Beggiatoa 9 CTGACTGA% and%Nitrogen%metabolism% Proteobacteria 880 Endoriftia 69 Sorangium 5 ACTGACTG% Desulfobacteraceae 26 Desulfococcus 26 Desulfobacterales 0 Desulfatibacillum 16 GACTGACT% Deltaproteobacteria 82 Bacteria 719 Desulfotalea 9 TGACTGAC% cellular organisms 265 Desulfuromonadales 13 Pelobacter 7 delta/epsilon subdivisions 0 CTGACTGA% root 0 Geobacter 11 ACTGACTG% Bdellovibrio 6 Campylobacterales 19 Arcobacter 5 Epsilonproteobacteria 73 GACTGACT% Sulfurimonas 44 TGACTGAC%