Garbage in = Garbage Out
Total Page:16
File Type:pdf, Size:1020Kb
USER RESPONSIBILITY GARBAGE IN = GARBAGE OUT Each step relies on accuracy of previous steps Just because you get an answer does not make it right: Appropriate test? Correct parameters? Applicable dataset? ANALYSIS PIPELINE Visualizaon Mul?ple Format Evolu?onary & Phylogenecs Alignment Input Data Analyses Adjustment CLUSTALW GENEDOC FASTA Methods: r8s Distance Matrix T-COFFEE JALVIEW PHYLIP PAML Max Parsimony MAFFT NEXUS Max Likelihood BEAST MUSCLE Newick Programs: Mul?div?me PHYLIP PROBCONS RAxML MrBayes ALIGNMENT PROGRAMS ClustalW (1994) h]p://www.ebi.ac.uk/Tools/msa/clustalw2/ Uses a progressive mul/ple alignment; Parameters e.g. gap penal/es are adjusted according to input i.e. divergence, length, local hydropathy, etc. T-Coffee (2000) http://igs-server.cnrs-mrs.fr/Tcoffee/ Performs pairwise local and global alignments, then combines them in a progressive mul/ple alignment MAFFT (2002) http://mafft.cbrc.jp/alignment/server/ Detects local homologous regions by Fast Fourier Transform (considers aa size & polarity), then uses a restricted global DP and a progressive algorithm and horizontal refinement MUSCLE (2004) http://www.drive5.com/muscle kmer distances and log-expecta/on scores, progressive and horizontal refinement PROBCONS (2005) http://probcons.stanford.edu <30 taxa** pairwise consistency based on an objecve funcon COMPARISON OF ALIGNMENT PROGRAMS ALIGNMENT: CLUSTALW ALIGNMENT: MUSCLE ALIGNMENT: MAFFT ALIGNMENT VIEWERS/MANIPULATORS GENEDOC Program Descrip.on: A Full Featured Mul?ple Sequence Alignment Editor, Analyser and Shading U?lity for Windows. hp://www.nrbsc.org/gfx/genedoc/ Plaorm: Windows Input: Amino acid and nucleo?de FASTA, Clustal (.aln), Phylip, PIR, GCG (.msf), and GenBank formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text JALVIEW Program Descrip.on: Jalview is a mul?ple alignment editor wri]en in Java. It is used widely in a variety of web pages but is available as a general purpose alignment editor and analysis workbench. hp://www.jalview.org/ Plaorm: Mac, Windows, Linux, Solaris, Unix, etc. Input: Amino acid and nucleo?de FASTA, Clustal (.aln), BLC, PIR, GCG (.msf), and PFAM formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text ALIGNMENT VIEWERS/MANIPULATORS BLOSUM62 PERCENT IDENTITY CLUSTAL HYDORPHOBICITY REGIONS OF PROBLEMATIC ALIGNMENT Accuracy of Alignment has an impact on the resulNng phylogeneNc tree!! ALIGNMENT: MUSCLE - FULL LENGTH ALIGNMENT: MUSCLE - CONSERVED REGIONS Gblocks: Castresana (2000) Mol. Biol. Evol. 17: 540-552 Radish2 Wradish3 30 Radish3 62 Wradish1 Wradish2 Radish2 CONSERVED REGIONS 56 82 FULL LENGTH Wradish1 40 Radish3 92 94 Wradish3 Radish1 Radish1 91 Wradish2 99 B rapa B napus1 B napus1 B rapa 67 100 99 91 B oleracea 44 B oleracea B napus2 100 B napus2 Athaliana 41 Athaliana 6 91 A lyrata1 91 A lyrata1 41 20 Sunflower2 Cotton1 27 Sunflower1 85 Cotton2 21 Lettuce1 Grape MFlower1 Poplar1 4 27 100 Tomato2 65 Cassava1 46 Potato2 37 81 Cassava2 57 Potato1 Apple1A 1 99 100 Tomato1 Apple1B Grape 100 Soybean4 1 Cotton2 59 91 Soybean3 22 Moss1 Medicago1 1 100 Moss2 97 Medicago2 33 Cotton1 Soybean2 88 3 Poplar1 Soybean1 99 93 Apple1B 57 CommonBean Apple1A 93 Cowpea 10 Cassava2 52 Lettuce1 4 39 Cassava1 82 Sunflower2 99 Soybean4 Sunflower1 12 32 Soybean3 63 MFlower1 Medicago1 100 Potato2 58 20 Medicago2 Tomato2 4 Soybean2 85 Potato1 62 Soybean1 100 Tomato1 72 23 CommonBean 13 100 Moss2 55 Cowpea Moss1 3 Rice3 8 Rice3 Brachy2 Brachy2 97 97 78 Sorghum1 70 Sorghum1 86 Maize2 63 Maize2 Columbine Columbine Papaya Papaya 65 Lettuce3 96 Artichoke 99 Artichoke 98 Lettuce3 83 Dandelion 96 Dandelion2 Lettuce2 Lettuce2 100 Dandelion1 98 Dandelion1 42 MFlower4 45 MFlower4 3 Tomato3 9 15 Cucumber2 Cucumber2 Tomato3 100 Cotton3 100 Cotton3 1 Poplar2 6 Poplar2 65 Brachy1 53 Brachy1 57 Rice1 48 Rice1 10 20 100 98 Sorghum2 Sorghum2 Maize1 Maize1 31 Brachyp3 43 Rice2 Rice2 Brachy3 100 100 Sorghum3 Sorghum3 22 71 69 Maize3 91 Maize4 79 Maize4 88 Maize3 0.1 0.2 Radish2 30 Radish3 56 Wradish2 Wradish1 92 Wradish3 Radish1 99 B rapa B napus1 99 91 B oleracea B napus2 Wradish3 Athaliana 62 Wradish1 6 91 A lyrata1 Radish2 20 Sunflower2 82 40 27 Radish3 Sunflower1 94 Radish1 Lettuce1 91 Wradish2 MFlower1 4 27 B napus1 100 Tomato2 46 B rapa Potato2 67 100 57 Potato1 44 B oleracea 1 100 100 Tomato1 B napus2 41 Grape Athaliana 1 Cotton2 91 A lyrata1 41 Cotton1 22 Moss1 1 100 Moss2 85 Cotton2 21 Cotton1 Grape 3 Poplar1 EFFECTS BRANCH/NODE SUPPORT Poplar1 93 Apple1B 65 Cassava1 Apple1A 37 81 Cassava2 10 CONSERVED REGIONS Cassava2 99FULL LENGTH Apple1A 4 39 Cassava1 Apple1B 99 Soybean4 100 Soybean4 32 Soybean3 59 91 Soybean3 Medicago1 Medicago1 20 Medicago2 97 Medicago2 4 33 Soybean2 Soybean2 62 88 Soybean1 Soybean1 72 99 CommonBean 23 57 CommonBean 55 Cowpea 93 Cowpea 3 Rice3 52 Lettuce1 Brachy2 97 82 Sunflower2 Sorghum1 78 Sunflower1 12 86 Maize2 63 MFlower1 Columbine 100 Potato2 Papaya 58 Tomato2 Lettuce3 65 85 Potato1 99 Artichoke 100 Tomato1 83 Dandelion 13 100 Moss2 Lettuce2 Moss1 100 Dandelion1 8 Rice3 42 MFlower4 Brachy2 3 97 Tomato3 70 Sorghum1 Cucumber2 63 Maize2 100 Cotton3 1 Columbine Poplar2 Papaya 65 Brachy1 96 Artichoke 57 Rice1 10 98 100 Lettuce3 Sorghum2 96 Dandelion2 Maize1 Lettuce2 31 Brachyp3 98 Dandelion1 Rice2 100 MFlower4 Sorghum345 22 9 15 Cucumber2 69 Maize3 Tomato3 79 Maize4 100 Cotton3 0.1 6 Poplar2 53 Brachy1 48 Rice1 20 98 Sorghum2 Maize1 43 Rice2 Brachy3 100 Sorghum3 71 91 Maize4 88 Maize3 0.2 Wradish3 62 Wradish1 Radish2 82 40 Radish3 94 Radish1 91 Wradish2 B napus1 B rapa 67 100 44 B oleracea 100 B napus2 41 Athaliana 91 A lyrata1 41 Cotton1 85 Cotton2 21 Grape Poplar1 Radish2 65 Cassava1 30 Radish3 37 81 Cassava2 56 Wradish2 99 Apple1A Wradish1 Apple1B 92 Wradish3 100 Soybean4 Radish1 59 91 Soybean3 99 B rapa Medicago1 B napus1EFFECTS BRANCH/NODE SUPPORT 97 Medicago2 99 33 91 B oleracea Soybean2 88 Soybean1 B napus2 99 CONSERVED REGIONS Athaliana FULL LENGTH 57 CommonBean 93 6 91 A lyrata1 Cowpea Lettuce1 20 Sunflower2 52 82 27 Sunflower1 Sunflower2 Lettuce1 Sunflower1 12 MFlower1 63 MFlower1 4 27 Potato2 100 Tomato2 100 46 58 NO “CORRECT” SOLUTION Potato2 Tomato2 85 57 Potato1 Potato1 1 100 100 Tomato1 Tomato1 GrapeKNOW IMPLICATIONS OF YOUR DECISIONS 13 100 Moss2 Moss1 1 Cotton2 8 Rice3 22 Moss1 Brachy2 1 100 Moss2 97 Sorghum1 Cotton1 70 63 Maize2 3 Poplar1 Columbine 93 Apple1B Papaya Apple1A 96 Artichoke 10 Cassava2 4 98 Lettuce3 39 Cassava1 96 Dandelion2 99 Soybean4 Lettuce2 32 Soybean3 98 Dandelion1 Medicago1 45 MFlower4 20 Medicago2 4 9 15 Cucumber2 Soybean2 62 Tomato3 Soybean1 72 100 Cotton3 23 CommonBean 6 Poplar2 55 Cowpea 3 53 Brachy1 Rice3 48 Rice1 20 Brachy2 98 97 Sorghum2 78 Sorghum1 Maize1 86 Maize2 43 Rice2 Columbine Brachy3 100 Papaya Sorghum3 71 65 Lettuce3 91 Maize4 99 Artichoke 88 Maize3 83 Dandelion Lettuce2 0.2 100 Dandelion1 42 MFlower4 3 Tomato3 Cucumber2 100 Cotton3 1 Poplar2 65 Brachy1 57 Rice1 10 100 Sorghum2 Maize1 31 Brachyp3 Rice2 100 Sorghum3 22 69 Maize3 79 Maize4 0.1 ANALYSIS PIPELINE Mul?ple Manual Format Evolu?onary Phylogenecs Alignment Adjustment Input Data Analyses CLUSTALW GENEDOC FASTA Methods: r8s Distance Matrix T-COFFEE JALVIEW PHYLIP PAML Max Parsimony MAFFT NEXUS Max Likelihood BEAST MUSCLE Programs: Mul?div?me PHYLIP PROBCONS RAxML MrBayes FILE FORMATS FASTA FORMAT >Struthio_camelus ! VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK! >Rhea_americana ! VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK! >Pterocnemia_pennata ! VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK! >Casuarius_casuarius ! VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK! >Dromaius_novaehollandiae ! VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK! >Nothoprocta_cinerascens ! VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ! >Eudromia_elegans ! VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ! 80 chars PHYLIP FORMAT 7 69! S_camelus VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK! R_american VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK! P_pennata VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK! C_casuariu VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK! D_novaehol VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK! N_cinerasc VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ! E_elegans VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ! 10 chars NO WHITE SPACE FILE FORMATS NEXUS FORMAT #NEXUS ! begin data;! dimensions ntax=7 nchar=69;! format datatype=protein missing=? gap=- matchchar=.;! ! matrix! Struthio_camelus VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSK??????FAT—VDDYKPVPLDYMLDSK! Rhea_americana .............L..E..N.V.T................?.D?????...--...H...S.E.....! Pterocnemia_pennata .............L..E..N.V.A..................DHD?EV...--...H...S.E.....! Casuarius_casuarius ........D....L.....N.........DD......A....DHDKEV...--..E....SPE.....! Dromaius_novaehollandiae ........D....L.....N..........D......A..??D?????...--.......S.E.....! Nothoprocta_cinerascens .....A.D.....P...TP...A.NA.FGS....V....I..DHDK?????T-..G...AT.E.F..N! Eudromia_elegans .R.....D.....P...TP..V.AN....S....V....I?.?????????S-I.G...AT.EFF..N! ;! end; ! ! begin mrbayes;! !prset aamodelpr=mixed;! end;! Wradish3 62 Wradish1 Radish2 82 40 Radish3 94 Radish1 91 Wradish2 B napus1 B rapa 67 100 44 B oleracea 100