Garbage in = Garbage Out

Total Page:16

File Type:pdf, Size:1020Kb

Garbage in = Garbage Out USER RESPONSIBILITY GARBAGE IN = GARBAGE OUT Each step relies on accuracy of previous steps Just because you get an answer does not make it right: Appropriate test? Correct parameters? Applicable dataset? ANALYSIS PIPELINE Visualizaon Mul?ple Format Evolu?onary & Phylogenecs Alignment Input Data Analyses Adjustment CLUSTALW GENEDOC FASTA Methods: r8s Distance Matrix T-COFFEE JALVIEW PHYLIP PAML Max Parsimony MAFFT NEXUS Max Likelihood BEAST MUSCLE Newick Programs: Mul?div?me PHYLIP PROBCONS RAxML MrBayes ALIGNMENT PROGRAMS ClustalW (1994) h]p://www.ebi.ac.uk/Tools/msa/clustalw2/ Uses a progressive mul/ple alignment; Parameters e.g. gap penal/es are adjusted according to input i.e. divergence, length, local hydropathy, etc. T-Coffee (2000) http://igs-server.cnrs-mrs.fr/Tcoffee/ Performs pairwise local and global alignments, then combines them in a progressive mul/ple alignment MAFFT (2002) http://mafft.cbrc.jp/alignment/server/ Detects local homologous regions by Fast Fourier Transform (considers aa size & polarity), then uses a restricted global DP and a progressive algorithm and horizontal refinement MUSCLE (2004) http://www.drive5.com/muscle kmer distances and log-expecta/on scores, progressive and horizontal refinement PROBCONS (2005) http://probcons.stanford.edu <30 taxa** pairwise consistency based on an objecve funcon COMPARISON OF ALIGNMENT PROGRAMS ALIGNMENT: CLUSTALW ALIGNMENT: MUSCLE ALIGNMENT: MAFFT ALIGNMENT VIEWERS/MANIPULATORS GENEDOC Program Descrip.on: A Full Featured Mul?ple Sequence Alignment Editor, Analyser and Shading U?lity for Windows. hp://www.nrbsc.org/gfx/genedoc/ Plaorm: Windows Input: Amino acid and nucleo?de FASTA, Clustal (.aln), Phylip, PIR, GCG (.msf), and GenBank formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text JALVIEW Program Descrip.on: Jalview is a mul?ple alignment editor wri]en in Java. It is used widely in a variety of web pages but is available as a general purpose alignment editor and analysis workbench. hp://www.jalview.org/ Plaorm: Mac, Windows, Linux, Solaris, Unix, etc. Input: Amino acid and nucleo?de FASTA, Clustal (.aln), BLC, PIR, GCG (.msf), and PFAM formats. Output: Default are .msf files. Can also export in FASTA, Clustal (.aln), Phylip, PIR, and text ALIGNMENT VIEWERS/MANIPULATORS BLOSUM62 PERCENT IDENTITY CLUSTAL HYDORPHOBICITY REGIONS OF PROBLEMATIC ALIGNMENT Accuracy of Alignment has an impact on the resulNng phylogeneNc tree!! ALIGNMENT: MUSCLE - FULL LENGTH ALIGNMENT: MUSCLE - CONSERVED REGIONS Gblocks: Castresana (2000) Mol. Biol. Evol. 17: 540-552 Radish2 Wradish3 30 Radish3 62 Wradish1 Wradish2 Radish2 CONSERVED REGIONS 56 82 FULL LENGTH Wradish1 40 Radish3 92 94 Wradish3 Radish1 Radish1 91 Wradish2 99 B rapa B napus1 B napus1 B rapa 67 100 99 91 B oleracea 44 B oleracea B napus2 100 B napus2 Athaliana 41 Athaliana 6 91 A lyrata1 91 A lyrata1 41 20 Sunflower2 Cotton1 27 Sunflower1 85 Cotton2 21 Lettuce1 Grape MFlower1 Poplar1 4 27 100 Tomato2 65 Cassava1 46 Potato2 37 81 Cassava2 57 Potato1 Apple1A 1 99 100 Tomato1 Apple1B Grape 100 Soybean4 1 Cotton2 59 91 Soybean3 22 Moss1 Medicago1 1 100 Moss2 97 Medicago2 33 Cotton1 Soybean2 88 3 Poplar1 Soybean1 99 93 Apple1B 57 CommonBean Apple1A 93 Cowpea 10 Cassava2 52 Lettuce1 4 39 Cassava1 82 Sunflower2 99 Soybean4 Sunflower1 12 32 Soybean3 63 MFlower1 Medicago1 100 Potato2 58 20 Medicago2 Tomato2 4 Soybean2 85 Potato1 62 Soybean1 100 Tomato1 72 23 CommonBean 13 100 Moss2 55 Cowpea Moss1 3 Rice3 8 Rice3 Brachy2 Brachy2 97 97 78 Sorghum1 70 Sorghum1 86 Maize2 63 Maize2 Columbine Columbine Papaya Papaya 65 Lettuce3 96 Artichoke 99 Artichoke 98 Lettuce3 83 Dandelion 96 Dandelion2 Lettuce2 Lettuce2 100 Dandelion1 98 Dandelion1 42 MFlower4 45 MFlower4 3 Tomato3 9 15 Cucumber2 Cucumber2 Tomato3 100 Cotton3 100 Cotton3 1 Poplar2 6 Poplar2 65 Brachy1 53 Brachy1 57 Rice1 48 Rice1 10 20 100 98 Sorghum2 Sorghum2 Maize1 Maize1 31 Brachyp3 43 Rice2 Rice2 Brachy3 100 100 Sorghum3 Sorghum3 22 71 69 Maize3 91 Maize4 79 Maize4 88 Maize3 0.1 0.2 Radish2 30 Radish3 56 Wradish2 Wradish1 92 Wradish3 Radish1 99 B rapa B napus1 99 91 B oleracea B napus2 Wradish3 Athaliana 62 Wradish1 6 91 A lyrata1 Radish2 20 Sunflower2 82 40 27 Radish3 Sunflower1 94 Radish1 Lettuce1 91 Wradish2 MFlower1 4 27 B napus1 100 Tomato2 46 B rapa Potato2 67 100 57 Potato1 44 B oleracea 1 100 100 Tomato1 B napus2 41 Grape Athaliana 1 Cotton2 91 A lyrata1 41 Cotton1 22 Moss1 1 100 Moss2 85 Cotton2 21 Cotton1 Grape 3 Poplar1 EFFECTS BRANCH/NODE SUPPORT Poplar1 93 Apple1B 65 Cassava1 Apple1A 37 81 Cassava2 10 CONSERVED REGIONS Cassava2 99FULL LENGTH Apple1A 4 39 Cassava1 Apple1B 99 Soybean4 100 Soybean4 32 Soybean3 59 91 Soybean3 Medicago1 Medicago1 20 Medicago2 97 Medicago2 4 33 Soybean2 Soybean2 62 88 Soybean1 Soybean1 72 99 CommonBean 23 57 CommonBean 55 Cowpea 93 Cowpea 3 Rice3 52 Lettuce1 Brachy2 97 82 Sunflower2 Sorghum1 78 Sunflower1 12 86 Maize2 63 MFlower1 Columbine 100 Potato2 Papaya 58 Tomato2 Lettuce3 65 85 Potato1 99 Artichoke 100 Tomato1 83 Dandelion 13 100 Moss2 Lettuce2 Moss1 100 Dandelion1 8 Rice3 42 MFlower4 Brachy2 3 97 Tomato3 70 Sorghum1 Cucumber2 63 Maize2 100 Cotton3 1 Columbine Poplar2 Papaya 65 Brachy1 96 Artichoke 57 Rice1 10 98 100 Lettuce3 Sorghum2 96 Dandelion2 Maize1 Lettuce2 31 Brachyp3 98 Dandelion1 Rice2 100 MFlower4 Sorghum345 22 9 15 Cucumber2 69 Maize3 Tomato3 79 Maize4 100 Cotton3 0.1 6 Poplar2 53 Brachy1 48 Rice1 20 98 Sorghum2 Maize1 43 Rice2 Brachy3 100 Sorghum3 71 91 Maize4 88 Maize3 0.2 Wradish3 62 Wradish1 Radish2 82 40 Radish3 94 Radish1 91 Wradish2 B napus1 B rapa 67 100 44 B oleracea 100 B napus2 41 Athaliana 91 A lyrata1 41 Cotton1 85 Cotton2 21 Grape Poplar1 Radish2 65 Cassava1 30 Radish3 37 81 Cassava2 56 Wradish2 99 Apple1A Wradish1 Apple1B 92 Wradish3 100 Soybean4 Radish1 59 91 Soybean3 99 B rapa Medicago1 B napus1EFFECTS BRANCH/NODE SUPPORT 97 Medicago2 99 33 91 B oleracea Soybean2 88 Soybean1 B napus2 99 CONSERVED REGIONS Athaliana FULL LENGTH 57 CommonBean 93 6 91 A lyrata1 Cowpea Lettuce1 20 Sunflower2 52 82 27 Sunflower1 Sunflower2 Lettuce1 Sunflower1 12 MFlower1 63 MFlower1 4 27 Potato2 100 Tomato2 100 46 58 NO “CORRECT” SOLUTION Potato2 Tomato2 85 57 Potato1 Potato1 1 100 100 Tomato1 Tomato1 GrapeKNOW IMPLICATIONS OF YOUR DECISIONS 13 100 Moss2 Moss1 1 Cotton2 8 Rice3 22 Moss1 Brachy2 1 100 Moss2 97 Sorghum1 Cotton1 70 63 Maize2 3 Poplar1 Columbine 93 Apple1B Papaya Apple1A 96 Artichoke 10 Cassava2 4 98 Lettuce3 39 Cassava1 96 Dandelion2 99 Soybean4 Lettuce2 32 Soybean3 98 Dandelion1 Medicago1 45 MFlower4 20 Medicago2 4 9 15 Cucumber2 Soybean2 62 Tomato3 Soybean1 72 100 Cotton3 23 CommonBean 6 Poplar2 55 Cowpea 3 53 Brachy1 Rice3 48 Rice1 20 Brachy2 98 97 Sorghum2 78 Sorghum1 Maize1 86 Maize2 43 Rice2 Columbine Brachy3 100 Papaya Sorghum3 71 65 Lettuce3 91 Maize4 99 Artichoke 88 Maize3 83 Dandelion Lettuce2 0.2 100 Dandelion1 42 MFlower4 3 Tomato3 Cucumber2 100 Cotton3 1 Poplar2 65 Brachy1 57 Rice1 10 100 Sorghum2 Maize1 31 Brachyp3 Rice2 100 Sorghum3 22 69 Maize3 79 Maize4 0.1 ANALYSIS PIPELINE Mul?ple Manual Format Evolu?onary Phylogenecs Alignment Adjustment Input Data Analyses CLUSTALW GENEDOC FASTA Methods: r8s Distance Matrix T-COFFEE JALVIEW PHYLIP PAML Max Parsimony MAFFT NEXUS Max Likelihood BEAST MUSCLE Programs: Mul?div?me PHYLIP PROBCONS RAxML MrBayes FILE FORMATS FASTA FORMAT >Struthio_camelus ! VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK! >Rhea_americana ! VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK! >Pterocnemia_pennata ! VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK! >Casuarius_casuarius ! VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK! >Dromaius_novaehollandiae ! VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK! >Nothoprocta_cinerascens ! VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ! >Eudromia_elegans ! VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ! 80 chars PHYLIP FORMAT 7 69! S_camelus VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSKNNNNNNFAT--VDDYKPVPLDYMLDSK! R_american VKYPNTNEEGKEVLLPEILNPVGTDGVYSNELANIEYTNVNKDNNNNNFAT--VDDHKPVSLEYMLDSK! P_pennata VKYPNTNEEGKEVLLPEILNPVGADGVYSNELANIEYTNVSKDHDNEVFAT--VDDHKPVSLEYMLDSK! C_casuariu VKYPNTNEDGKEVLLPKILNPIGSDGVYSDDLANIEYANVSKDHDKEVFAT--VDEYKPVSPEYMLDSK! D_novaehol VKYPNTNEDGKEVLLPKILNPIGSDGVYSNDLANIEYANVNNDNNNNNFAT--VDDYKPVSLEYMLDSK! N_cinerasc VKYPNANDDGKEVPLPKTPSPIAANAVFGSDLANVEYTNISKDHDKNNNNNT-VDGYKPATLEYFLDNQ! E_elegans VRYPNANDDGKEVPLPKTPSPVGANGVYSSDLANVEYTNINKNNNNNNNNNS-IDGYKPATLEFFLDNQ! 10 chars NO WHITE SPACE FILE FORMATS NEXUS FORMAT #NEXUS ! begin data;! dimensions ntax=7 nchar=69;! format datatype=protein missing=? gap=- matchchar=.;! ! matrix! Struthio_camelus VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSK??????FAT—VDDYKPVPLDYMLDSK! Rhea_americana .............L..E..N.V.T................?.D?????...--...H...S.E.....! Pterocnemia_pennata .............L..E..N.V.A..................DHD?EV...--...H...S.E.....! Casuarius_casuarius ........D....L.....N.........DD......A....DHDKEV...--..E....SPE.....! Dromaius_novaehollandiae ........D....L.....N..........D......A..??D?????...--.......S.E.....! Nothoprocta_cinerascens .....A.D.....P...TP...A.NA.FGS....V....I..DHDK?????T-..G...AT.E.F..N! Eudromia_elegans .R.....D.....P...TP..V.AN....S....V....I?.?????????S-I.G...AT.EFF..N! ;! end; ! ! begin mrbayes;! !prset aamodelpr=mixed;! end;! Wradish3 62 Wradish1 Radish2 82 40 Radish3 94 Radish1 91 Wradish2 B napus1 B rapa 67 100 44 B oleracea 100
Recommended publications
  • BIOINFORMATICS APPLICATIONS NOTE Doi:10.1093/Bioinformatics/Btl478
    Vol. 22 no. 22 2006, pages 2823–2824 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btl478 Phylogenetics Clearcut: a fast implementation of relaxed neighbor joining Luke ShenemanÃ, Jason Evans and James A. Foster Department of Biological Sciences, University of Idaho, Moscow, ID, USA Received on June 8, 2006; revised on September 5, 2006; accepted on September 6, 2006 Advance Access publication September 18, 2006 Associate Editor: Keith A Crandall ABSTRACT tree is constructed. At each step, traditional NJ searches the entire Summary: Clearcut is an open source implementation for the relaxed distance matrix and identifies and joins the pair of nodes with the neighbor joining (RNJ) algorithm. While traditional neighbor joining (NJ) global minimum transformed distance. In contrast, RNJ opportunis- remains a popular method for distance-based phylogenetic tree recon- tically joins any two neighboring nodes immediately after it is struction, it suffers from a O(N3) time complexity, where N represents determined that the nodes are closer to each other than any other the number of taxa in the input. Due to this steep asymptotic time node in the distance matrix. It is not required that the candidate complexity, NJ cannot reasonably handle very large datasets. In nodes be the closest of all nodes remaining in the matrix. In this contrast, RNJ realizes a typical-case time complexity on the order of sense, our algorithm relaxes the requirement of exhaustively search- N2logN without any significant qualitative difference in output. RNJ is ing the distance matrix at each step to find the closest two nodes particularly useful when inferring a very large tree or a large number of to join.
    [Show full text]
  • Neighbor Joining, Fastme, and Distance-Based Methods
    Getting a Tree Fast: Neighbor Joining, UNIT 6.3 FastME, and Distance-Based Methods Distance methods, and especially Neighbor Joining (NJ; Saitou and Nei, 1987), are popular methods for reconstructing phylogenies from alignments of DNA or protein sequences (UNIT 2.3). They are fast, allowing hundreds and even thousands of taxa to be dealt with by ordinary computers. The speed of these methods greatly simplifies the use of the bootstrap procedure (Page and Holmes, 1998; Graur and Li, 2000), which assesses the confidence level of inferred clades. They provide a simple way to incorporate knowledge of the evolution of the sequences being studied, depending on how the distance matrix is estimated. Numerous simulation studies have demonstrated their topological accuracy, and, unlike parsimony methods, they are not hampered by inconsistency (or “Felsenstein zone”; Swofford et al., 1996). The popularity of NJ, among the numerous existing distance-based methods, is explained by its speed and by the fact that its topological accuracy remains relatively close to that of recent approaches—i.e., FITCH (Felsenstein, 1997), BIONJ (Gascuel, 1997a), WEIGHBOR (Bruno et al., 2000), and FastME (Desper and Gascuel, 2002, 2004). However, several simulation studies (e.g., Vinh and Von Haeseler, 2005) showed that, with a high number of taxa, NJ is outperformed by FastME, both in terms of computing time and topological accuracy. Therefore, this latter program should be considered preferable for large-scale studies. NJ and other current distance methods do not assume a molecular clock (Page and Holmes, 1998), as opposed to the Unweighted Pair Group Method Using Arithmetic averages (UPGMA; Sokal and Michener, 1958), which is precluded for most phylogenetic studies.
    [Show full text]
  • An R Interface for PHYLIP
    Methods in Ecology and Evolution 2014, 5, 976–981 doi: 10.1111/2041-210X.12233 APPLICATION Rphylip: an R interface for PHYLIP Liam J. Revell1* and Scott A. Chamberlain2 1Department of Biology, University of Massachusetts Boston, Boston, MA 02125, USA; and 2Department of Biological Sciences, Simon Fraser University, Vancouver, BC V5A 1S6, Canada Summary 1. The phylogeny methods software package PHYLIP has long been among the most widely used packages for phylogeny inference and phylogenetic comparative biology. Numerous methods available in PHYLIP, including several new phylogenetic comparative analyses of considerable importance, are not implemented in any other software. 2. Over the past decade, the popularity of the R statistical computing environment for many different types of phylogenetic analyses has soared, particularly in phylogenetic comparative biology. There are now numerous packages and methods developed for the R environment. 3. In this article, we present Rphylip, a new R interface for the PHYLIP package. Functions of Rphylip interface seamlessly with all of the major analysis functions of the PHYLIP package. This new interface will enable the much easier use of PHYLIP programs in an integrated R workflow. 4. In this study, we describe our motivation for developing Rphylip and present an illustration of how functions in the Rphylip package can be used for phylogenetic analysis in R. Key-words: phylogeny, statistics, computational biology, evolution PHYLIP contains a wide diversity of methods. These meth- Introduction ods cover both phylogeny inference and evolutionary analysis Over the past several decades, phylogenetics has assumed a using phylogenies, including a range of analyses not imple- central role in evolutionary study (Felsenstein 1985b, 2004; mented (so far as we know) in any other software (e.g.
    [Show full text]
  • Intro to PHYLIP
    Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2012 Lab 1: Introduction to PHYLIP What’s due at the end of lab, or next Tuesday in class: 1. Print out of Caminicules outfile and outtrees from pars 2. Print out of Azolla consensus output Setup 1. Have a laptop running either Windows or Mac. (Or you might get away with Linux, but I am not a Linux wizard.) 2. Download and install MESQUITE. (If this doesn’t work out of the box, you may have to download Java Virtual Machine as well, some older computers might not have it already.) 3. Download and install PHYLIP. Introduction Today we will be learning about some of the features of the PHYLIP (PHYLogeny Inference Package) software package. PHYLIP was developed by the famous evolutionary biologist Joe Felsenstein, works on most operating systems, and is available for free online. It is widely used, but slightly less popular than PAUP*. Like Mesquite, PHYLIP is an open source package, and you can make changes yourself if you program in C++. Methods available in the package include parsimony, distance matrix, and likelihood, as well as bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites, distance matrices, and discrete-state characters. What is PHYLIP? PHYLIP consists of about 30 programs that perform different algorithms on various types of data, and collectively are able to do most things you might want to do when it comes to inferring phylogenies. In PHYLIP, there is
    [Show full text]
  • A Software Tool for the Conversion of Sequence Alignments
    embnet.news Volume 6 Nr. 1 Page 10 when you start SeqLab. On my Macintosh, TNTx/SeqLab Device: LASERWRITER is always stable when SeqLab is launched as the third Port or Queue: ps-filename program. Therefore, I usually run two other X-windows programs (xterm or emacs) before running SeqLab. If you If you have X-windows connection to Unix-host, you can cannot tolerate this you will have to switch to a "real" X- view postscript files by xv and also transfer to other formats. server. Also, when you use xv, it is better and faster to define DISPLAY to view files unciphered. GCGFigure setenv DISPLAY ip_number_of_mac:0.0 xv ps-filename GCGFigure is the best way (IMHO) to transfer graphics from GCG to Macintosh image programs. First, run GCG program with parameter -fig(ure)=anyfile.name. The output file (figure file) is a text file and can be transferred to the Macintosh by Fetch or via clipboard. If the scrollback buffer ForCon : a software tool for of Telnet (Edit-menu -> Preferences -> Terminals) or SSH the conversion of sequence (Edit-menu -> Connection properties -> Terminal-tab) is enough big, copy the file from screen to clipboard, paste alignments into text editor and save as a text file. Open the figure file into GCGFigure and save as a PICT file which all image programs can open. You can download GCGFigure from Jeroen Raes and Yves Van de Peer* - Department of ftp://alanine.gcg.com from directory Biochemistry, University of Antwerp (UIA), /pub/mac/. Universiteitsplein 1, B-2610 Antwerpen, Belgium *To whom correspondence should be addressed (E-mail: PostScript-files [email protected] ) key words : sequence alignments – phylogenetic analysis GCG programs can save output as a postscript file.
    [Show full text]
  • Fine Tuning of Phylip on Intel Xeon Architecture
    Prinkesh Sharma et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (3) , 2015, 2587-2592 FINE TUNING OF PHYLIP ON INTEL XEON ARCHITECTURE 1 2 3 Prinkesh Sharma , Kondorpa Kumar Borchetia , Shivam Saxena 1,2,3Dept. of Computer Science & Engineering, National Institute of Technology Silchar-788010, India Sumeet Singh Bhambrah4, Anneswa Ghosh5 andJoydeep Chakraborty6 4,5,6Dept. of Computer Science & Engineering, National Institute of Technology Silchar-788010, India Abstract -Computational Science is witnessing an exceptional growth over the years, but we are still lacking in efficient programming to effectively optimize these computations. In today's modern world, computations need to be done and results delivered in the least possible time. Porting, optimization, scaling and tuning of existing High Performance Computing (HPC) Applications on hybrid architectures is the norm for reaping the benefits of extreme scale computing. That being said, we must remember that the real gist in optimizing computations lies in properly tuning the core source code running on a single processor or a shared memory model within a node. This paper gauges the performance of PHYLIP application on Intel Xeon Processor. Keywords - High Performance Computing,Parallel Programming,Optimization,Phylogenetic I. INTRODUCTION A. About PHYLIP HPC systems are becoming challenging in terms PHYLIP is a comprehensive phylogenetic analysis of speedup and scalability. This ever increasing package created by Joseph Felsenstein at the University of complexity demands well-organized and flexible Washington. The PHYLIP package is one of the most numerical algorithms to achieve high performance comprehensive sets of tools freely available for use in computing. The size of compute intensive problems also phylogenetic studies [6].
    [Show full text]
  • Phylip and Phylogenetics
    ® Genes, Genomes and Genomics ©2009 Global Science Books Phylip and Phylogenetics Ahmed Mansour* Genetics Department, Faculty of Agriculture, Zagazig University, Zagazig, Egypt Correspondence : * [email protected] ; [email protected] ABSTRACT Phylogenetics studies are mainly concerned with evolutionary relatedness among various groups of organisms. Recently, phylogenetic analyses have been performed on a genomic scale to address issues ranging from the prediction of gene and protein function to organismal relationships. Computing the relatedness of organisms either by phylogenetic (gene by gene analyses) or phylogenomic (the whole genome comparison) methods reveals high-quality results for demonstrating phylogenies. In this regard, Phylip (Phylogeny Inference Package) software is a free package of programs for inferring phylogenies of living species and organisms. It is now one of the most widely used packages for computing accurate phylogenetic trees and carrying out certain related tasks. This paper provides an overview on Phylip package and its applications and contribution to phylogenetic analyses. _____________________________________________________________________________________________________________ Keywords: bioinformatics, evolutionary relatedness, genetic diversity INTRODUCTION Phylip: Different useful programs The word phylogenetics is derived from the Greek words, The PHYLIP programs could be classified into five cate- phylon, which means tribe or race, and genetikos, which gories (Table 1): means birth. Phylogenetic
    [Show full text]
  • Dnasp V5. Tutorial
    DnaSP v5. Tutorial Copyright © 2016 by Julio Rozas et al. & Universitat de Barcelona. All Rights Reserved. DnaSP v5. Tutorial Table of contents Contents ........................................................................................................... 4 Introduction ...................................................................................................... 6 What DnaSP can do ........................................................................................ 6 System Requirements ..................................................................................... 7 File Menu / Input and Output .............................................................................. 9 Input Data Files Formats ............................................................................... 10 FASTA Format ......................................................................................... 10 MEGA Format .......................................................................................... 11 NBRF/PIR Format ..................................................................................... 12 NEXUS Format ......................................................................................... 12 PHYLIP Format ........................................................................................ 15 HapMap3 Phased Haplotypes Format .......................................................... 16 Multiple Data Files ........................................................................................ 17 Unphase/Genotype
    [Show full text]
  • IQ-TREE Version 2.1.2: Tutorials and Manual Phylogenomic Software by Maximum Likelihood
    IQ-TREE version 2.1.2: Tutorials and Manual Phylogenomic software by maximum likelihood http://www.iqtree.org Bui Quang Minh, Rob Lanfear, Jana Trifinopoulos, Dominik Schrempf, Heiko A. Schmidt March 15, 2021 2 Preface Contents 1 Introduction 9 1.1 Why IQ-TREE? .............................. 9 1.2 Key features ................................. 10 1.3 Free web server ............................... 10 1.4 User support ................................ 11 1.5 Documentation ............................... 11 1.6 How to cite IQ-TREE? ........................... 12 1.7 Development team ............................. 13 1.8 Credits and acknowledgements ....................... 14 2 Getting started 15 2.1 IQ-TREE web server ............................ 15 2.2 Installation ................................. 15 2.2.1 Packages and bundles ....................... 15 2.2.2 Manual download .......................... 16 2.3 For Windows users ............................. 16 2.4 For Mac OS X users ............................ 16 2.5 Minimal command-line examples ..................... 17 2.6 Where to go from here? .......................... 20 3 Web server tutorial 21 3.1 Tree Inference ................................ 21 3.2 Model Selection ............................... 23 3.3 Analysis Results .............................. 23 4 Beginner’s tutorial 27 4.1 Input data .................................. 27 4.2 First running example ........................... 28 4.3 Choosing the right substitution model .................. 30 4.4 Using codon models ...........................
    [Show full text]
  • The Raxml V8.2.X Manual
    The RAxML v8.2.X Manual by Alexandros Stamatakis Heidelberg Institute for Theoretical Studies July 20, 2016 Structure of this manual I. About RAxML II. Getting Help III. RAxML Web-servers and GUI IV. Downloading RAxML V. Compiling RAxML VI. RAxML Likelihood Values & Idiosyncrasies VII. Alignment input File Formats VIII. The RAxML options IX. Output Files X. Computing TC and IC values XI. Simple RAxML Analyses XII. A Simple Heterotachous Model XIII. Frequently Asked Questions I. About RAxML RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It can also be used for post- analyses of sets of phylogenetic trees, analyses of alignments and, evolutionary placement of short reads. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP package. When using RAxML please cite: A. Stamatakis: "RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies". In Bioinformatics, 2014, open access link: http://bioinformatics.oxfordjournals.org/content/early/2014/01/21/bioinformatics.btu033.abstract? keytype=ref&ijkey=VTEqgUJYCDcf0kP II. Getting help RAxML support is provided via the RAxML Google group at: https://groups.google.com/forum/? hl=de#!forum/raxml Note that, Google groups have a search function! Thus, before posting to the RAxML google group: 1. Search the group to see if your issue has not already been discussed! 2. If you don't want to get a rude reply, read this manual first! 3. Read a standard textbook about phylogenetics such as Ziheng Yang's excellent Computational Molecular Evolution.
    [Show full text]
  • Rphylip’ December 30, 2013
    Package ‘Rphylip’ December 30, 2013 Version 0.1-14 Date 2013-12-30 Title Rphylip: An R interface for PHYLIP Author Liam J. Revell Maintainer Liam J. Revell <[email protected]> Depends R (>= 2.10), ape (>= 3.0-10) ZipData no Description Rphylip provides an R interface for the PHYLIP package. All users of Rphylip will thus first have to install the PHYLIP phylogeny methods program package (Felsenstein 2013). See http://evolution.genetics.washington.edu/phylip.html for more information about installing PHYLIP. License GPL (>= 2) URL http://www.phytools.org/Rphylip Repository Date/Publication 2013-12-30 12:00:00 EDT R topics documented: Rphylip-package . .2 as.proseq . .3 opt.Rdnaml . .3 primates . .5 print.proseq . .5 Rconsense . .6 Rcontml . .7 Rcontrast . .8 Rdnacomp . 10 Rdnadist . 11 Rdnainvar . 13 Rdnaml . 14 Rdnapars . 15 1 2 Rphylip-package Rdnapenny . 17 read.protein . 18 Rfitch . 19 Rmix ............................................ 20 Rneighbor . 22 Rpars . 23 Rpenny . 25 Rproml . 26 Rprotdist . 28 Rprotpars . 30 Rthreshml . 31 Rtreedist . 32 setPath . 34 setupOSX . 34 Index 36 Rphylip-package Rphylip: An R interface for PHYLIP Description Rphylip provides an R interface for programs in the PHYLIP phylogeny methods package (Felsen- stein 1989, 2013). Details The complete list of functions can be displayed with library(help = Rphylip). Obviously, before any of the functions of this package can be used, users must first install PHYLIP (Felsenstein 2013). More information about installing PHYLIP can be found on the PHYLIP web- page: http://evolution.genetics.washington.edu/phylip.html. More information on Rphylip can be found at http://www.phytools.org/Rphylip/ or http: //blog.phytools.org.
    [Show full text]
  • Maximum Likelihood Analysis for Nucleotide, Amino Acid, and Two-State Data
    TREE-PUZZLE - Maximum likelihood analysis for nucleotide, amino acid, and two-state data TREE-PUZZLE Manual Version 5.2 (July 2004) Copyright 2003-2004 by Heiko A. Schmidt, Korbinian Strimmer, and Arndt von Haeseler Copyright 1999-2003 by H.A. Schmidt, K. Strimmer, M. Vingron, and A. von Haeseler Copyright 1995-1999 by K. Strimmer and A. von Haeseler Heiko A. Schmidt von Neumann Institute for Computing (NIC), Research Center J¨ulich, D-52425 J¨ulich, Germany. email: hschmidt @ cs.uni-duesseldorf.de Korbinian Strimmer Department of Statistic, University of Munich, Ludwigstr. 33, D-80539 Munich, Germany. email: strimmer @ stat.uni-muenchen.de Arndt von Haeseler von Neumann Institute for Computing (NIC), Research Center J¨ulich, D-52425 J¨ulich, Germany and Bioinformatics, D¨usseldorf University, Universit¨atsstr. 1, D-40225 D¨usseldorf, Germany. email: haeseler @ cs.uni-duesseldorf.de Remarks: • A bug has been fixed, which tended to produce less resolved trees. Hence, versions prior to 5.2 should not be used anymore! • Some menus have been extended. Hence, the order of some options might have changed. Please check existing scripts and parameter files! • Until 2000 TREE-PUZZLE was distributed under the name PUZZLE. General Information TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREE- PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can be calculated with and without the molecular-clock assumption.
    [Show full text]