<<

ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN% ACTGACTG% GACTGACT% taxonomic%binning%of% TGACTGAC% CTGACTGA% sequence%data% ACTGACTG% GACTGACT% TGACTGAC% % CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Overview% ACTGACTG% GACTGACT% TGACTGAC% • Introduc>on% CTGACTGA% ACTGACTG% • Why%use%MEGAN% GACTGACT% TGACTGAC% • CTGACTGA% using%MEGAN% ACTGACTG% GACTGACT% • Exercises% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 2% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% High%throughput%sequencing% ACTGACTG% GACTGACT% TGACTGAC% ShotFgun%sequencing%!%huge%amounts%of%data% CTGACTGA% 454%GSFFLX%generates%400F600%million%bp%per%run% ACTGACTG% %with%a%length%of%the%reads%between%400F500%bp %% GACTGACT% TGACTGAC% % CTGACTGA% Understanding%this%amount%of%informa>on%in%a%quick%manner?% ACTGACTG% !%classifica>on%of%sequences% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 3% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Sequence%Classifica>on% ACTGACTG% GACTGACT% TGACTGAC% Sequence%classifica>on%(binning)%is%the%process%of%separa>ng%sequence% CTGACTGA% data%using%specific%informa>on%!%crea>ng%bins% ACTGACTG% % GACTGACT% This%informa>on%can%be%based%on%:% TGACTGAC% CTGACTGA% F Similarity%e.g.%MEGAN,%SORTFITEMS% ACTGACTG% F Phylogeny%e.g.%tools%like%CARMA% GACTGACT% F Func>onal%annota>on%e.g.%GO%classifiers% TGACTGAC% CTGACTGA% ACTGACTG% MEGAN%uses%similarity%searches%with%BLAST%to%bin%sequences% GACTGACT% into%taxa.% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 4% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN% ACTGACTG% GACTGACT% Metagenome:%the%collec>ve%genome%of%all%the%microorganisms%in%am% TGACTGAC% environment.%(Handelsman)et)al.,)1998)% CTGACTGA% ACTGACTG% % GACTGACT% Metagenomics%is%the%study%of%the%metagenome%using%high%throughput% TGACTGAC% sequencing.% CTGACTGA% % ACTGACTG% The%ques>on:% GACTGACT% TGACTGAC% How%to%determine%%composi>on%in%a%metagenomic%dataset?% CTGACTGA% % ACTGACTG% • Sequence%comparison%with%known%sequences%from%a%database%e.g.% GACTGACT% Genbank.% TGACTGAC% • Metagenomic%datasets%contain%many%sequences,%so%manual% CTGACTGA% inspec>on%is%impossible%!%MEGAN%is%your%assistant.% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 5% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% (Huson%et%al.,%Genome%Research,%2007)% 6% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Why%use%Megan?% ACTGACTG% GACTGACT% TGACTGAC% Easy%to%work%with%on%a%desktop%/%laptop%computer:% CTGACTGA% Extra%things%needed:%Java,%a%BLAST%server%(e.g.%Bioportal)% ACTGACTG% % GACTGACT% TGACTGAC% MEGAN%%gives%a%visualiza>on%of%BLAST%results% CTGACTGA% • Study%diversity% ACTGACTG% • Compare%samples% GACTGACT% TGACTGAC% • Contamina>on%filtering%% CTGACTGA% • Special%gene%of%interest% ACTGACTG% • Extrac>on%of%sequences%based%on%taxonomic%informa>on. %% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% Bioportal:%hap://www.bioportal.uio.no//% GACTGACT% TGACTGAC% CTGACTGA% 7% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% TGACTGAC% MEGAN%uses%BLAST,%a%database%and%a%%file% CTGACTGA% • BLAST%N%:%nucleo>des%against%a%nucleo>de%database.% ACTGACTG% • BLAST%X%:%Translated%nucleo>de% GACTGACT% against%a%protein%database.% TGACTGAC% CTGACTGA% • Which%Database?% ACTGACTG% %one%of%the%many%available%database%like%the%NCBIFnonFredundant% GACTGACT% database,%or%a%your%own%custom%database.% TGACTGAC% • Taxonomy:%NCBI%taxonomy,%or%your%own%custom%taxonomy% CTGACTGA% ACTGACTG% % GACTGACT% BLAST%output%file%is%used%to%bin%sequences%using%the%LCA%assignment% TGACTGAC% algorithm%into%specific%taxons.% CTGACTGA% ACTGACTG% % GACTGACT% % TGACTGAC% % CTGACTGA% 8% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% • The%LCA%algorithm%=%“Lowest%Common%Ancestor”%algorithm% TGACTGAC% CTGACTGA% In%this%approach,%every%read%is%assigned%to%some%taxon.%If%the%read%aligns% ACTGACTG% “ GACTGACT% very%specifically%only%to%a%single%taxon,%then%it%is%assigned%to%that% TGACTGAC% taxon.%The%less%specifically%a%read%hits%taxa,%the%higher%up%in%the% CTGACTGA% taxonomy%it%is%placed.%Reads%that%hit%ubiquitously%may%even%be% ACTGACTG% assigned%to%the%root%node%of%the%NCBI%taxonomy.”% GACTGACT% % TGACTGAC% CTGACTGA% %(the%MEGAN%manual)% ACTGACTG% % GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 9% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% TGACTGAC% The%%default%LCA%parameters%are:%% CTGACTGA% •%Min%Support%% !%%default%=%5%reads%per%taxon% ACTGACTG% •%Min%Score%% !%default%bitscore%=%35% GACTGACT% TGACTGAC% •%Min%score%/%length%% !%Bitscore%divided%by%the%read%length%d%=%0% CTGACTGA% •%The%top%percentage% !%The%maximum%percentage%by%which%the% ACTGACTG% % %%%%%%score%of%a%hit%may%fall%below%the%best% GACTGACT% % %%%%%%score%achieved%for%a%given%read%%d%=%10% TGACTGAC% •Win%score% !%If%a%win%score%is%set,%then,%for%a%given%read,%if%any% % CTGACTGA% ACTGACTG% % %%match%exceeds%the%win%score,%only%matches%% %%%% GACTGACT% %exceeding%the%win%score%(“winners”)%are%used%to%% %%% TGACTGAC% %place%the%given%read.%d%=%0% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 10% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% The%basics%of%MEGAN% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% No%hits:%none%of%the%blast%hits%reached%the%minimum%bitscore.% CTGACTGA% Not%assigned%:%The%sequence%had%not%enough%hits%to%be%% ACTGACTG% %%%%%classified%to%a%taxon%(Min%support%&%top%percentage)% GACTGACT% TGACTGAC% CTGACTGA% 11% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Playing%with%min%support% ACTGACTG% 1 GACTGACT% Rhizobiales 2 20 Alteromonadales 1 Shewanella denitrificans OS217 1 TGACTGProteobacteriaAC% 215 Methylococcus capsulatus str. Bath 1 93 Gammaproteobacteria 20 1 CTGACTGA% Thioalkalivibrio sp. HL-EbGR7 1 0 thiooxidans DMS010 1 ACcellularTG AorganismsCTG% 56 sp. PS 2 Mariprofundus ferrooxydans 22 Endoriftia persephone 7 215 GACTGACT% Desulfovibrio desulfuricans 4 0 1 TGACTGAC% 2 root 10 Bacteria 93 Desulfotalea psychrophila 1 delta/epsilon subdivisions 0 Desulfuromonadales 2 CTGACTGA% Syntrophobacter fumaroxidans 1 ACTGACTG% Methanosarcina 10 3 Campylobacterales bacterium GD 1 1 cellular organisms 56 Sulfurovum sp. NBC37-1 3 GACTGACT% root 10 8 Burkholderiales 1 Thiobacillus denitrificans 7 TGACTGAC% Mariprofundus ferrooxydans 22 2 Flavobacteriales bacterium HTCC2170 5 Bacteroidetes/Chlorobi group 0 Bacteroidales 1 CTGACTGA% Not assigned 107 Chlorobaculum parvum 1 4 ACTGACTG% Cyanothece sp. PCC 7424 1 uncultured marine bacterium 2 GACTGACT% 1 Euryarchaeota 7 Methanoculleus marisnigri 1 Methanomicrobia 0 Methanosarcinales 4 Methanosarcina 10 TGACTGAC% Methanosarcinaceae 5 Archaea 7 Methanococcoides burtonii 4 CTGACTGA% uncultured archaeon GZfos17F1 1 environmental samples 0 uncultured archaeon GZfos26B2 5 ACTGACTG% uncultured archaeon GZfos1D1 3 Eukaryota 0 Spermatophyta 1 Ass=0, GACTGACT% Sum=1, TGACTGAC% %=%10% %=%1% CTGACTGA% 12% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC%

Phosphorus metabolism CTGACTGA% Nitrogen metabolism

Stappia 5 ACTGACTG% Alphaproteobacteria 0 Magnetospirillum 12

GACTGACT% Alteromonadales 41 Saccharophagus 6 TGACTGAC% Colwellia 22 %%%%An%example% Oceanospirillum 7 CTGACTGA% Methylococcus 5 Gammaproteobacteria 293 ACTGACTG% Photobacterium 5 GACTGACT% Comparison%between%reads%assigned% Nitrosococcus 15 Thiomicrospira 6 Thiotrichales 0 TGACTGAC% to%Phosphorus%metabolism% Beggiatoa 9 CTGACTGA% and%Nitrogen%metabolism% Proteobacteria 880 Endoriftia 69 Sorangium 5

ACTGACTG% Desulfobacteraceae 26 Desulfococcus 26

Desulfobacterales 0 Desulfatibacillum 16 GACTGACT% Deltaproteobacteria 82 Bacteria 719 Desulfotalea 9 TGACTGAC% cellular organisms 265 Desulfuromonadales 13 Pelobacter 7 delta/epsilon subdivisions 0 CTGACTGA% root 0 Geobacter 11 ACTGACTG% Bdellovibrio 6 Campylobacterales 19 Arcobacter 5 Epsilonproteobacteria 73 GACTGACT% Sulfurimonas 44 TGACTGAC% Sulfurovum 165 CTGACTGA% Betaproteobacteria 9 Magnetococcus 31

ACTGACTG% Flavobacteriales 15 Flavobacteriaceae 6 Bacteroidetes 17 GACTGACT% Bacteroidetes/Chlorobi group 9 unclassified Flavobacteriales (miscellaneous) 9 Sphingobacteriales 7

TGACTGAC% Chlorobiaceae 6 CTGACTGA% Blastopirellula 7 ACTGACTG% 5 Actinomycetales 5

Euryarchaeota 6 Methanosarcinaceae 23 Methanosarcina 8 GACTGACT% Methanomicrobia 9 TGACTGAC% Methanococcoides 11 Reads%were%annotated%using%MGFRAST% Not assigned 377 CTGACTGA% 13% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC%

1: Phosphorus metabolism CTGACTGA% 2: Nitrogen metabolism Stappia 5 ACTGACTG% Alphaproteobacteria 0 Magnetospirillum 12

Alteromonadales 41 GACTGACT% Saccharophagus 6 TGACTGAC% Colwellia 22 %%%%An%example% Oceanospirillum 7 CTGACTGA% Methylococcus 5 Gammaproteobacteria 293 ACTGACTG% Photobacterium 5 GACTGACT% Nitrosococcus 15 Thiomicrospira 6 Thiotrichales 0 TGACTGAC% Beggiatoa 9

Endoriftia 69 CTGACTGA% Proteobacteria 880 Sorangium 5

Desulfobacteraceae 26 ACTGACTG% Desulfococcus 26

Desulfobacterales 0 Desulfatibacillum 16 GACTGACT% Deltaproteobacteria 82 Bacteria 719 Desulfotalea 9

TGACTGAC% cellular organisms 265 Desulfuromonadales 13 Pelobacter 7 delta/epsilon subdivisions 0 CTGACTGA% root 0 Geobacter 11 ACTGACTG% Bdellovibrio 6 Campylobacterales 19 Arcobacter 5 Epsilonproteobacteria 73

GACTGACT% Sulfurimonas 44

Sulfurovum sp. NBC37-1 165 TGACTGAC% Ass=112, 53, Sum=112, 53, CTGACTGA% Betaproteobacteria 9 Magnetococcus 31

Flavobacteriales 15 Flavobacteriaceae 6 ACTGACTG% Bacteroidetes 17 Bacteroidetes/Chlorobi group 9 GACTGACT% unclassified Flavobacteriales (miscellaneous) 9 Sphingobacteriales 7

TGACTGAC% Chlorobiaceae 6 CTGACTGA% Blastopirellula 7 ACTGACTG% Firmicutes 5 Actinomycetales 5

Euryarchaeota 6 Methanomicrobia 9 Methanosarcinaceae 23 GACTGACT% Methanosarcina 8 TGACTGAC% Methanococcoides 11 Not assigned 377 CTGACTGA% 14% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC%%%%%An%example% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% Reads%at%node%Epsilonbacteria:%total%in%comparison%73% CTGACTGA% Assigned%=%23,%50% ACTGACTG% Summarized%=%167,%139% GACTGACT% % TGACTGAC% What%do%these%numbers%mean?% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 15% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Other%tools%in%MEGAN %% ACTGACTG% GACTGACT% TGACTGAC% • Comparing%datasets%%% CTGACTGA% • GO%annota>on%% ACTGACTG% • Rarefac>on%analysis% GACTGACT% TGACTGAC% • Extrac>ng%sequences% CTGACTGA% • Iden>fica>on%of%Clusters%of%Orthologous%Groups%of%proteins%(COGs).% ACTGACTG% • Microbial%aaributes% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 16% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Comparing%datasets%

Phosphorus metabolism ACTGACTG% Nitrogen metabolism Stappia 5 Alphaproteobacteria 0 GACTGACT% Magnetospirillum 12 • Absolute%numbers% Alteromonadales 41 Saccharophagus 6 TGACTGAC% Colwellia 22 • ! Oceanospirillum 7 CTGACTGA% Normalized% %100.000%reads% Methylococcus 5 Gammaproteobacteria 293 ACTGACTG% Photobacterium 5 % Nitrosococcus 15

Thiomicrospira 6 GACTGACT% Thiotrichales 0 More%than%2%sets%can%be% Beggiatoa 9 TGACTGAC% Proteobacteria 880 Endoriftia 69

Sorangium 5

CTGACTGA% Compared.% Desulfobacteraceae 26 Desulfococcus 26

Desulfobacterales 0 Desulfatibacillum 16 Deltaproteobacteria 82 ACTGACTG% Bacteria 719 % Desulfotalea 9 cellular organisms 265 Desulfuromonadales 13 Pelobacter 7 GACTGACT% delta/epsilon subdivisions 0 root 0 Geobacter 11

TGACTGAC% Bdellovibrio 6

Campylobacterales 19 Arcobacter 5 Epsilonproteobacteria 73

CTGACTGA% Sulfurimonas 44 ACTGACTG% Sulfurovum 165 Betaproteobacteria 9 GACTGACT% Magnetococcus 31 Flavobacteriales 15 Flavobacteriaceae 6 Bacteroidetes 17 TGACTGAC% Bacteroidetes/Chlorobi group 9 unclassified Flavobacteriales (miscellaneous) 9 CTGACTGA% Sphingobacteriales 7 Chlorobiaceae 6 ACTGACTG% Blastopirellula 7 Firmicutes 5 GACTGACT% Actinomycetales 5 Euryarchaeota 6 Methanosarcinaceae 23 Methanosarcina 8 Methanomicrobia 9 TGACTGAC% Methanococcoides 11 CTGACTGA% Not assigned 377 17% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Comparing%datasets% ACTGACTG% GACTGACT% TGACTGAC% • MEGAN%offers%a%sta>s>cal%test%to%compare%if%the%number%of%reads% CTGACTGA% found%in%a%taxon%between%two%datasets%is%different.%% ACTGACTG% %!%but%it%is%only%visual.% GACTGACT% • If%you%want%to%test%it%yourself% TGACTGAC% CTGACTGA% %%!%extract%assignments%aier%comparison% ACTGACTG% %%!%table%with%taxa%and%the%number%of%reads%per%taxon%for%each% %%%%% GACTGACT% dataset.% TGACTGAC% % CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 18% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% GO%annota>on% ACTGACTG% GACTGACT% TGACTGAC% The%Gene%Ontology%(GO)%project%is%a%collabora>ve%effort%to%address%the% CTGACTGA% need%for%consistent%descrip>ons%of%gene%products%in%different%databases.% ACTGACTG% The%GO%collaborators%are%developing%three%structured,%controlled% GACTGACT% vocabularies%(ontologies)%that%describe%gene%products%in%terms%of%their% TGACTGAC% associated%biological%processes,%cellular%components%and%molecular% CTGACTGA% func>ons%in%a%speciesFindependent%manner.% ACTGACTG% % GACTGACT% TGACTGAC% hap://www.geneontology.org/% CTGACTGA% % ACTGACTG% How%is%this%done%in%MEGAN?% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 19% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% GO%annota>on% CTGACTGA% Overview%% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% GO%terms%% zoomable% TGACTGAC% GO%map%% CTGACTGA% 20% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% GO%annota>on% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% Choose%the%right%GO%map%for%analyzing%your%data% CTGACTGA% % 21% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% GO%annota>on% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% Three%maps:% CTGACTGA% F%Biological%processes% ACTGACTG% F%Molecular%func>ons% GACTGACT% F%Cellular%components% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 22% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Biological%processes% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 23% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% GO%annota>on% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% Specificity%score:% CTGACTGA% The%score%of%a%term%reflects%the%frequency%of%gene%annota>ons%to%that%term% ACTGACTG% (or%to%descendants%in%the%sub%graph%of%that%term).%% GACTGACT% Terms%oien%used%for%annotated%gene%products%are%assigned%with%a%lower% TGACTGAC% specificity%than%infrequently%used%terms.% CTGACTGA% 24% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% AcetylFCoA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% The%reads%belonging%to%specific%node%can%be%extracted.% GACTGACT% TGACTGAC% CTGACTGA% 25% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% GO%annota>on% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 26% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Rarefac>on%analysis% ACTGACTG% GACTGACT% Rarefac>on%analysis:% TGACTGAC% Sequence%similarity%is%used%to%generate%rarefac>on%curves.% CTGACTGA% %% ACTGACTG% With%shotgun%data%this%is%not%possible,%sequences%are%not%similar.% GACTGACT% % TGACTGAC% MEGAN%samples%sequences%and%counts%how%many%“leaves”%are%found% CTGACTGA% aier%binning.% ACTGACTG% % GACTGACT% Calcula>on%in%MEGAN%depends%on%how%the%taxonomy%is%represented!!!% TGACTGAC% % CTGACTGA% If%you%collapse%one%node%your%rarefac>on%results%may%change% ACTGACTG% % GACTGACT% % TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 27% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Rarefac>on%analysis% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% phylum% species% CTGACTGA% 28% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Rarefac>on%analysis% ACTGACTG% GACTGACT% TGACTGAC% • Make%sure%you%know%what%level%of%taxonomy%you%use.% CTGACTGA% • Uncollapse%your%taxonomy%to%species%level.% ACTGACTG% • Comparison%of%rarefac>on%curves%of%different%datasets%is%possible.% GACTGACT% TGACTGAC% • For%amplicon%sequences%you%can%beaer%use:%Unifrac,%Mothur,%or% CTGACTGA% Es>mateS.% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Brazelton%et%al.,%2009% 29% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Extrac>ng%sequences% ACTGACTG% GACTGACT% TGACTGAC% MEGAN%offers%several%ways%of%extrac>ng%sequences% CTGACTGA% F%Export%reads% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 30% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Extrac>ng%sequences% ACTGACTG% GACTGACT% TGACTGAC% MEGAN%offers%several%ways%of%extrac>ng%sequences% CTGACTGA% F%Export%reads% ACTGACTG% F%Extract%reads%per%Taxa.% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% This%only%works%when%you%created%your%MEGAN%taxonomy%using%both% CTGACTGA% The%Blastout%file%and%your%original%fasta%sequences.% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 31% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Extrac>ng%sequences% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% Extrac>ng%assignments%can%be%interes>ng%for%analysis%later%on.% TGACTGAC% % CTGACTGA% TaxonFID:%numeric%ID%for%taxon,%not%the%name%!!!% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 32% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC%Iden>fica>on%of%Clusters%of%Orthologous%Groups% CTGACTGA% of%proteins% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% Not%very%easy%to%use,%since%it%needs%you%to%search%what%COGs% TGACTGAC% %are%actually%found,%GO%analyzer%is%easier.% CTGACTGA% 33% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% Microbial%aaributes% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% Is%dependent%of%the%informa>on%in%the%NCBI%database%on%the%species%found% CTGACTGA% in%your%dataset.%Usefulness%is%limited.% 34% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% MEGAN%exercise% ACTGACTG% GACTGACT% TGACTGAC% Two%datasets:% CTGACTGA% %F%Carbon%metabolism%–%3795%reads% ACTGACTG% %F%Nitrogen%metabolism%–%999%reads% GACTGACT% TGACTGAC% % CTGACTGA% Both%datasets%contain%annotated%reads%extracted%from%a%deep%sea% ACTGACTG% sediment%metagenome.% GACTGACT% % TGACTGAC% CTGACTGA% Annota>on%was%done%using%the%MetaGenomic%RAST%Server.% ACTGACTG% (MGFRAST:%Meyer%et%al.,%2008)% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% 35% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% %Any%ques>ons?% ACTGACTG% GACTGACT% % TGACTGAC% CTGACTGA% % ACTGACTG% GACTGACT% % TGACTGAC% Adopted%from:%Thomas%Haverkamp% CTGACTGA% ACTGACTG% % GACTGACT% TGACTGAC% % CTGACTGA% 36% ACTGACTG% GACTGACT% TGACTGAC% CTGACTGA% %