Phylogenetic Inference

Total Page:16

File Type:pdf, Size:1020Kb

Phylogenetic Inference Phylogenetic Inference Christian M. Zmasek, PhD [email protected] https://sites.google.com/site/cmzmasek/home GABRIEL Network/J. Craig Venter Institute APPLICATIONS OF GENOMICS & BIOINFORMATICS TO INFECTIOUS DISEASES 2017-12-07 Overview • General concepts, common misconceptions • Tree of Life (Eukaryotic, Bacterial, Viral) • Homologs, gene duplications, orthologs, … • Methods • Unix command line refresher • Multiple Sequence Alignment (MAFFT) • Pairwise distance calculation (Phylip) • Distance based methods (Neighbor Joining, FastME) • Maximum Likelihood methods (RAxML, PhyML) • Bayesian methods (MrBayes, BEAST) • Visualization: Archaeopteryx • Selection Analysis ("dN/dS", Datamonkey ~ HyPhy) • Gene Duplication Inference (GSDI algorithm) • Select papers are available here for download (in zipped archive): https://goo.gl/o2NPDj Why perform phylogenetic inference? • To infer the evolutionary relationships amongst different species/classes/sub-classes/strains/… of organisms • To infer the evolutionary relationships amongst molecular sequences (genes, proteins) • To infer the functions of genes/proteins • Paper: "Eisen_1998_Phylogenomics" • To use resulting tree as basis for additional analyses Theoretical Background • A phylogeny the evolutionary history of a species or a group of species • "Lately", the term is also being applied to the evolutionary history of individual DNA or protein sequences • The evolutionary history of organisms or sequences can be illustrated using a tree-like diagram – a phylogenetic tree A phylogenetic tree proposed in 1866 by Häckel Many Misconceptions about Phylogenetic Trees! • Example of misconception: "order of the external nodes provides information about their relatedness" • The order of external nodes is meaningless! • Paper: "Ryan_2008_Understanding_ evolutionary_trees" Types of Trees (Displays) • Rooted vs. unrooted • Cladogram vs. Phylogram Eukaryotic Tree of Life • Still not a resolved • Two major groups (probably): • Unikonta (single, or no, flagellum) • Bikonta (two flagella) • No monophyletic group of "protists" • Papers: • "Cavalier-Smith_2015_Multiple-origins" • "Zmasek_20111_Strong_functional_patterns" • "Roger_2009_Revisiting_the_root_of_the_eukaryote_tree" • "Baldauf_2003_The_Deep_Roots_of_Eukaryotes" Bacterial Tree of Life Based on concatenated a set of 16 ribosomal protein sequences Paper: "Hug_2016_A_new_view_of_the_tree_of_life" Viruses • No universal "tree of life" for viruses • Instead "superfamilies" of (probably unrelated viruses): • Double-stranded RNA Viruses (monophyly uncertain) • Single-stranded Negative Sense RNA Viruses (monophyly uncertain) • Single-stranded Positive Sense RNA Viruses (monophyly uncertain) • Single-stranded DNA Viruses (non-monophyletic) • Double-stranded DNA Viruses (non-monophyletic) • DNA-RNA Reverse Transcribing Viruses (monophyly uncertain) • Papers: • "Castro-Nallar_2012_The_evolution_of_HIV" • "Forterre_2013_The_major_role_of_viruses_in_cellular_evolution" • "Koonin_2013_A_virocentric_perspective_on_the_evolution_of_life" • "Krupovic_2013_Networks_of_evolutionary_interactions" A special case? Nucleo cytoplasmic large DNA viruses • Nucleo cytoplasmic large DNA virus (NCLDV) superfamily • Diverse group of viruses that infects a wide range of eukaryotic hosts (e.g. vertebrates, insects, single celled organisms) • Huge range in genome size (between 100 kb and 1.2 Mb) • Examples: • Mimiviridae • Marseilleviridae • Phycodnaviridae • Poxviridae • Papers: • "Krupovic_2013_Networks_of_evolutionary_interactions" • "Nasir_2012_Giant_viruses_coexisted" Nucleo cytoplasmic large DNA viruses (NCLDV) Bayesian Inference (BI) tree based on conserved regions of DNA polymerase B Paper: "Fischer_2010_Giant_virus_with_a_remarkable_complement" Gene Trees/Species Trees • Initially, phylogenetic trees were built based on the morphology of organisms. • Around 1960 molecular sequences were recognized as containing phylogenetic information and hence as valuable for tree building • A tree built based on sequence data is called a gene tree since it is a representation of the evolutionary history of genes • A tree illustrating the evolutionary history of organisms is called a species tree A gene tree which is also a species tree A gene tree of orthologs and paralogs based on Bcl-2 family protein sequences The Number of all possible trees topologies… … gets quickly larger than the number of all H-Atoms in the Universe • The number of different tree topologies increases rapidly with an increase in number of external nodes. The number of topologies for unrooted completely binary trees (T) with N external nodes is: 2N 5! Tp Tp(N=5)=15 2N 3 N 3! Tp(N=10)=2x106 Tp(N=20)=2x1020 Tp(N=100)=1x10182 Homologs • Homologs are defined as sequences which share a common ancestor (Fitch, 1966) • This definition becomes unclear if mosaic proteins, which are composed of structural units originating from different genes are considered • Phylogenetic trees make sense only if constructed based on homologous sequences (whole genes/proteins, or domains) Globin Family: An example of a homologous proteins Orthologs, Paralogs, Xenologs • Homologous sequences can be divided into orthologs, paralogs and xenologs: • Orthologs: diverged by a speciation event (their last common ancestor on a phylogenetic tree corresponds to a speciation event) • Paralogs: diverged by a duplication event (their last common ancestor corresponds to a duplication) • Xenologs: are related to each other by horizontal gene transfer (via retroviruses, for example) Orthologs, Paralogs example Caveat emptor: Orthology vs. Function • Orthologous sequences tend to have more similar “functions” than paralogs • Yet: Orthologs are mathematically defined, whereas there is no definition of sequence “function” (i.e. it is a subjective term) Gene Duplication – Significance • New genes evolve if mutations accumulate while selective constraints are relaxed by gene duplication • First recognized by Haldane (“… it [mutation pressure] will favour polyploids, and particularly allopolyploids, which possess several pairs of sets of genes, so that one gene may be altered without disadvantage…” Wheat S Rat Human How How – Rat 2 G Wheat Human Rat Human Wheat 1 G Wheat Rat Human Gene Duplications Can Be Detected Be Can Duplications Gene Gene Trees Vs. Species Trees Trees Species Vs. Trees Gene Rooting • Almost all methods and algorithms produce unrooted or randomly rooted trees!! • Rooting by: • Midpoint-rooting (minimizing overall tree height) • Known "outgroup" • Minimizing gene duplications • … Methods Multiple sequence alignment of homologous sequences Pairwise distance calculation Optimality Criteria Based on Character Data: •Maximum Parsimony •Maximum Likelihood Algorithmic Methods Optimality Criteria Based Based on Pairwise on Pairwise Distances: Bayesian Methods Distances: •Fitch-Margoliash (MCMC) •Neighbor Joining •Minimal Evolution “More accurate” Fast (in general) Pairwise Distance Calculation The simplest method to measure the distance between two amino acid sequences is by their fractional dissimilarity p (nd is the number of aligned sequence positions containing non- identical amino acids and ns is the number of aligned sequence positions containing identical amino acids): n p d nd ns Pairwise Distance Calculation • Unfortunately, this is unrealistic -- does not take into account: • superimposed changes: multiple mutations at the same sequence location • different chemical properties of amino acids: for example, changing leucine into isoleucine is more likely and should be weighted less than changing leucine into proline Pairwise Distance Calculation • A more realistic approach for estimating evolutionary distances is to apply maximum likelihood to empirical amino acid replacement models, such as PAM transition probability matrices. • The likelihood LH of a hypothesis H (an evolutionary distance, for example) given some data D (an alignment, for example) is the probability of D given H: LH=P(D|H) Algorithmic Methods Based on Pairwise Distances • UPGMA • Neighbor Joining UPGMA vs … • UPGMA stands for unweighted pair group method using arithmetic averages • This is clustering • This algorithm produces rooted trees based under the assumption of a molecular clock. • Do not use!! … Neighbor Joining • As opposed to UPGMA, neighbor joining (NJ) is not misled by the absence of a molecular clock • NJ produces phylogenetic trees (not cluster diagrams) Optimality Criteria Based on Pairwise Distances • Fitch-Margoliash • Minimal evolution (ME) Fitch-Margoliash An optimal tree is selected by minimizing the disagreement E between the tree and the estimated pairwise distances (estimated from a multiple alignment): Minimal Evolution Branch lengths are fitted to a tree according to a unweighted least squares criterion, but the optimality criterion to evaluate and compare trees is to minimize the sum of all branch lengths. Optimality Criteria Based on Character Data • Maximum Parsimony (MP) • Maximum Likelihood (ML) Maximum Parsimony • Evaluate a given topology • Example: • Sequence1: TGC • Sequence2: TAC • Sequence3: AGG • Sequence4: AAG Maximum Likelihood • Probabilistic methods can be used to assign a likelihood to a given tree and therefore allow the selection of the tree which is most likely given the observed sequences. • Probability for one residue a to change to b in time t along a branch of a tree: P(b|a,t) • Its actual calculation is dependent on what model for sequence evolution is used. • Poisson process: • P(b|a,t)=1/20 + 19/20e-ut for
Recommended publications
  • Syntax Highlighting for Computational Biology Artem Babaian1†, Anicet Ebou2, Alyssa Fegen3, Ho Yin (Jeffrey) Kam4, German E
    bioRxiv preprint doi: https://doi.org/10.1101/235820; this version posted December 20, 2017. The copyright holder has placed this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors. bioSyntax: Syntax Highlighting For Computational Biology Artem Babaian1†, Anicet Ebou2, Alyssa Fegen3, Ho Yin (Jeffrey) Kam4, German E. Novakovsky5, and Jasper Wong6. 5 10 15 Affiliations: 1. Terry Fox Laboratory, BC Cancer, Vancouver, BC, Canada. [[email protected]] 2. Departement de Formation et de Recherches Agriculture et Ressources Animales, Institut National Polytechnique Felix Houphouet-Boigny, Yamoussoukro, Côte d’Ivoire. [[email protected]] 20 3. Faculty of Science, University of British Columbia, Vancouver, BC, Canada [[email protected]] 4. Faculty of Mathematics, University of Waterloo, Waterloo, ON, Canada. [[email protected]] 5. Department of Medical Genetics, University of British Columbia, Vancouver, BC, 25 Canada. [[email protected]] 6. Genome Science and Technology, University of British Columbia, Vancouver, BC, Canada. [[email protected]] Correspondence†: 30 Artem Babaian Terry Fox Laboratory BC Cancer Research Centre 675 West 10th Avenue Vancouver, BC, Canada. V5Z 1L3. 35 Email: [[email protected]] bioRxiv preprint doi: https://doi.org/10.1101/235820; this version posted December 20, 2017. The copyright holder has placed this preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors.
    [Show full text]
  • BIOINFORMATICS APPLICATIONS NOTE Doi:10.1093/Bioinformatics/Btl478
    Vol. 22 no. 22 2006, pages 2823–2824 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btl478 Phylogenetics Clearcut: a fast implementation of relaxed neighbor joining Luke ShenemanÃ, Jason Evans and James A. Foster Department of Biological Sciences, University of Idaho, Moscow, ID, USA Received on June 8, 2006; revised on September 5, 2006; accepted on September 6, 2006 Advance Access publication September 18, 2006 Associate Editor: Keith A Crandall ABSTRACT tree is constructed. At each step, traditional NJ searches the entire Summary: Clearcut is an open source implementation for the relaxed distance matrix and identifies and joins the pair of nodes with the neighbor joining (RNJ) algorithm. While traditional neighbor joining (NJ) global minimum transformed distance. In contrast, RNJ opportunis- remains a popular method for distance-based phylogenetic tree recon- tically joins any two neighboring nodes immediately after it is struction, it suffers from a O(N3) time complexity, where N represents determined that the nodes are closer to each other than any other the number of taxa in the input. Due to this steep asymptotic time node in the distance matrix. It is not required that the candidate complexity, NJ cannot reasonably handle very large datasets. In nodes be the closest of all nodes remaining in the matrix. In this contrast, RNJ realizes a typical-case time complexity on the order of sense, our algorithm relaxes the requirement of exhaustively search- N2logN without any significant qualitative difference in output. RNJ is ing the distance matrix at each step to find the closest two nodes particularly useful when inferring a very large tree or a large number of to join.
    [Show full text]
  • Aacon Documentation Release 1.1
    AACon Documentation Release 1.1 Agnieszka Golicz Peter V. Troshin Fábio Madeira David M. A. Martin James B. Procter Geoffrey J. Barton Jan 22, 2018 The Barton Group Division of Computational Biology School of Life Sciences University of Dundee Dow Street Dundee DD1 5EH Scotland, UK CONTENTS: 1 Getting Started 3 1.1 Benefits..................................................3 1.2 Distributions...............................................3 1.2.1 Jalview and JABAWS......................................4 1.2.2 Standalone Client........................................4 1.2.3 Web service...........................................4 2 Included Methods 7 2.1 Valdar...................................................7 2.2 SMERFS.................................................7 2.3 References................................................8 3 Standalone Client 9 3.1 Executing AACon............................................9 3.2 Input...................................................9 3.3 Output.................................................. 10 3.4 Calculating conservation......................................... 12 3.5 Custom gap character.......................................... 12 3.6 Running SMERFS with custom parameters............................... 12 3.7 Outputing to a file............................................ 13 3.8 Execution details............................................. 13 3.9 Results normalization.......................................... 14 3.10 Conservation for large alignments...................................
    [Show full text]
  • Neighbor Joining, Fastme, and Distance-Based Methods
    Getting a Tree Fast: Neighbor Joining, UNIT 6.3 FastME, and Distance-Based Methods Distance methods, and especially Neighbor Joining (NJ; Saitou and Nei, 1987), are popular methods for reconstructing phylogenies from alignments of DNA or protein sequences (UNIT 2.3). They are fast, allowing hundreds and even thousands of taxa to be dealt with by ordinary computers. The speed of these methods greatly simplifies the use of the bootstrap procedure (Page and Holmes, 1998; Graur and Li, 2000), which assesses the confidence level of inferred clades. They provide a simple way to incorporate knowledge of the evolution of the sequences being studied, depending on how the distance matrix is estimated. Numerous simulation studies have demonstrated their topological accuracy, and, unlike parsimony methods, they are not hampered by inconsistency (or “Felsenstein zone”; Swofford et al., 1996). The popularity of NJ, among the numerous existing distance-based methods, is explained by its speed and by the fact that its topological accuracy remains relatively close to that of recent approaches—i.e., FITCH (Felsenstein, 1997), BIONJ (Gascuel, 1997a), WEIGHBOR (Bruno et al., 2000), and FastME (Desper and Gascuel, 2002, 2004). However, several simulation studies (e.g., Vinh and Von Haeseler, 2005) showed that, with a high number of taxa, NJ is outperformed by FastME, both in terms of computing time and topological accuracy. Therefore, this latter program should be considered preferable for large-scale studies. NJ and other current distance methods do not assume a molecular clock (Page and Holmes, 1998), as opposed to the Unweighted Pair Group Method Using Arithmetic averages (UPGMA; Sokal and Michener, 1958), which is precluded for most phylogenetic studies.
    [Show full text]
  • Operating Systems: from Every Palm to the Entire Cosmos in the 21St Century Lifestyle 5
    55 pages including cover Knowledge Digest for IT Community Volume No. 40 | Issue No. 11 | February 2017 ` 50/- Operating ISSN 0970-647X ISSN Systems COVER STORY Computer Operating Systems: From every palm to the entire cosmos in the 21st Century Lifestyle 5 TECHNICAL TRENDS SECURITY CORNER Cyber Threat Analysis with Blockchain : A Disruptive Innovation 9 Memory Forensics 17 www.csi-india.org research FRONT ARTICLE Customized Linux Distributions for Top Ten Alternative Operating Bioinformatics Applications 14 Systems You Should Try Out 20 CSI CALENDAR 2016-17 Sanjay Mohapatra, Vice President, CSI & Chairman, Conf. Committee, Email: [email protected] Date Event Details & Contact Information MARCH INDIACOM 2017, Organized by Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New 01-03, 2017 Delhi http://bvicam.ac.in/indiacom/ Contact : Prof. M. N. Hoda, [email protected], [email protected], Tel.: 011-25275055 0 3-04, 2017 I International Conference on Smart Computing and Informatics (SCI -2017), venue : Anil Neerukonda Institute of Technology & Sciences Sangivalasa, Bheemunipatnam (Mandal), Visakhapatnam, Andhra Pradesh, http://anits.edu.in/ sci2017/, Contact: Prof. Suresh Chandra Satapathy. Mob.: 9000249712 04, 2017 Trends & Innovations for Next Generation ICT (TINICT) - International Summit-2017 Website digit organized by Hyderabad Chapter http://csihyderabad.org/Contact 040-24306345, 9490751639 Email id [email protected] ; [email protected] 24-25, 2017 First International Conference on “Computational Intelligence, Communications, and Business Analytics (CICBA - 2017)” at Calcutta Business School, Kolkata, India. Contact: [email protected]; (M) 94754 13463 / (O) 033 24205209 International Conference on Computational Intelligence, Communications, and Business Analytics (CICBA - 2017) at Calcutta Business School, Kolkata, India.
    [Show full text]
  • An R Interface for PHYLIP
    Methods in Ecology and Evolution 2014, 5, 976–981 doi: 10.1111/2041-210X.12233 APPLICATION Rphylip: an R interface for PHYLIP Liam J. Revell1* and Scott A. Chamberlain2 1Department of Biology, University of Massachusetts Boston, Boston, MA 02125, USA; and 2Department of Biological Sciences, Simon Fraser University, Vancouver, BC V5A 1S6, Canada Summary 1. The phylogeny methods software package PHYLIP has long been among the most widely used packages for phylogeny inference and phylogenetic comparative biology. Numerous methods available in PHYLIP, including several new phylogenetic comparative analyses of considerable importance, are not implemented in any other software. 2. Over the past decade, the popularity of the R statistical computing environment for many different types of phylogenetic analyses has soared, particularly in phylogenetic comparative biology. There are now numerous packages and methods developed for the R environment. 3. In this article, we present Rphylip, a new R interface for the PHYLIP package. Functions of Rphylip interface seamlessly with all of the major analysis functions of the PHYLIP package. This new interface will enable the much easier use of PHYLIP programs in an integrated R workflow. 4. In this study, we describe our motivation for developing Rphylip and present an illustration of how functions in the Rphylip package can be used for phylogenetic analysis in R. Key-words: phylogeny, statistics, computational biology, evolution PHYLIP contains a wide diversity of methods. These meth- Introduction ods cover both phylogeny inference and evolutionary analysis Over the past several decades, phylogenetics has assumed a using phylogenies, including a range of analyses not imple- central role in evolutionary study (Felsenstein 1985b, 2004; mented (so far as we know) in any other software (e.g.
    [Show full text]
  • Intro to PHYLIP
    Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2012 Lab 1: Introduction to PHYLIP What’s due at the end of lab, or next Tuesday in class: 1. Print out of Caminicules outfile and outtrees from pars 2. Print out of Azolla consensus output Setup 1. Have a laptop running either Windows or Mac. (Or you might get away with Linux, but I am not a Linux wizard.) 2. Download and install MESQUITE. (If this doesn’t work out of the box, you may have to download Java Virtual Machine as well, some older computers might not have it already.) 3. Download and install PHYLIP. Introduction Today we will be learning about some of the features of the PHYLIP (PHYLogeny Inference Package) software package. PHYLIP was developed by the famous evolutionary biologist Joe Felsenstein, works on most operating systems, and is available for free online. It is widely used, but slightly less popular than PAUP*. Like Mesquite, PHYLIP is an open source package, and you can make changes yourself if you program in C++. Methods available in the package include parsimony, distance matrix, and likelihood, as well as bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies, restriction sites, distance matrices, and discrete-state characters. What is PHYLIP? PHYLIP consists of about 30 programs that perform different algorithms on various types of data, and collectively are able to do most things you might want to do when it comes to inferring phylogenies. In PHYLIP, there is
    [Show full text]
  • A Software Tool for the Conversion of Sequence Alignments
    embnet.news Volume 6 Nr. 1 Page 10 when you start SeqLab. On my Macintosh, TNTx/SeqLab Device: LASERWRITER is always stable when SeqLab is launched as the third Port or Queue: ps-filename program. Therefore, I usually run two other X-windows programs (xterm or emacs) before running SeqLab. If you If you have X-windows connection to Unix-host, you can cannot tolerate this you will have to switch to a "real" X- view postscript files by xv and also transfer to other formats. server. Also, when you use xv, it is better and faster to define DISPLAY to view files unciphered. GCGFigure setenv DISPLAY ip_number_of_mac:0.0 xv ps-filename GCGFigure is the best way (IMHO) to transfer graphics from GCG to Macintosh image programs. First, run GCG program with parameter -fig(ure)=anyfile.name. The output file (figure file) is a text file and can be transferred to the Macintosh by Fetch or via clipboard. If the scrollback buffer ForCon : a software tool for of Telnet (Edit-menu -> Preferences -> Terminals) or SSH the conversion of sequence (Edit-menu -> Connection properties -> Terminal-tab) is enough big, copy the file from screen to clipboard, paste alignments into text editor and save as a text file. Open the figure file into GCGFigure and save as a PICT file which all image programs can open. You can download GCGFigure from Jeroen Raes and Yves Van de Peer* - Department of ftp://alanine.gcg.com from directory Biochemistry, University of Antwerp (UIA), /pub/mac/. Universiteitsplein 1, B-2610 Antwerpen, Belgium *To whom correspondence should be addressed (E-mail: PostScript-files [email protected] ) key words : sequence alignments – phylogenetic analysis GCG programs can save output as a postscript file.
    [Show full text]
  • Fine Tuning of Phylip on Intel Xeon Architecture
    Prinkesh Sharma et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 6 (3) , 2015, 2587-2592 FINE TUNING OF PHYLIP ON INTEL XEON ARCHITECTURE 1 2 3 Prinkesh Sharma , Kondorpa Kumar Borchetia , Shivam Saxena 1,2,3Dept. of Computer Science & Engineering, National Institute of Technology Silchar-788010, India Sumeet Singh Bhambrah4, Anneswa Ghosh5 andJoydeep Chakraborty6 4,5,6Dept. of Computer Science & Engineering, National Institute of Technology Silchar-788010, India Abstract -Computational Science is witnessing an exceptional growth over the years, but we are still lacking in efficient programming to effectively optimize these computations. In today's modern world, computations need to be done and results delivered in the least possible time. Porting, optimization, scaling and tuning of existing High Performance Computing (HPC) Applications on hybrid architectures is the norm for reaping the benefits of extreme scale computing. That being said, we must remember that the real gist in optimizing computations lies in properly tuning the core source code running on a single processor or a shared memory model within a node. This paper gauges the performance of PHYLIP application on Intel Xeon Processor. Keywords - High Performance Computing,Parallel Programming,Optimization,Phylogenetic I. INTRODUCTION A. About PHYLIP HPC systems are becoming challenging in terms PHYLIP is a comprehensive phylogenetic analysis of speedup and scalability. This ever increasing package created by Joseph Felsenstein at the University of complexity demands well-organized and flexible Washington. The PHYLIP package is one of the most numerical algorithms to achieve high performance comprehensive sets of tools freely available for use in computing. The size of compute intensive problems also phylogenetic studies [6].
    [Show full text]
  • Phylip and Phylogenetics
    ® Genes, Genomes and Genomics ©2009 Global Science Books Phylip and Phylogenetics Ahmed Mansour* Genetics Department, Faculty of Agriculture, Zagazig University, Zagazig, Egypt Correspondence : * [email protected] ; [email protected] ABSTRACT Phylogenetics studies are mainly concerned with evolutionary relatedness among various groups of organisms. Recently, phylogenetic analyses have been performed on a genomic scale to address issues ranging from the prediction of gene and protein function to organismal relationships. Computing the relatedness of organisms either by phylogenetic (gene by gene analyses) or phylogenomic (the whole genome comparison) methods reveals high-quality results for demonstrating phylogenies. In this regard, Phylip (Phylogeny Inference Package) software is a free package of programs for inferring phylogenies of living species and organisms. It is now one of the most widely used packages for computing accurate phylogenetic trees and carrying out certain related tasks. This paper provides an overview on Phylip package and its applications and contribution to phylogenetic analyses. _____________________________________________________________________________________________________________ Keywords: bioinformatics, evolutionary relatedness, genetic diversity INTRODUCTION Phylip: Different useful programs The word phylogenetics is derived from the Greek words, The PHYLIP programs could be classified into five cate- phylon, which means tribe or race, and genetikos, which gories (Table 1): means birth. Phylogenetic
    [Show full text]
  • Garbage in = Garbage Out
    USER RESPONSIBILITY GARBAGE IN = GARBAGE OUT Each step relies on accuracy of previous steps Just because you get an answer does not make it right: Appropriate test? Correct parameters? Applicable dataset? ANALYSIS PIPELINE Visualizaon Mul?ple Format Evolu?onary & Phylogenecs Alignment Input Data Analyses Adjustment CLUSTALW GENEDOC FASTA Methods: r8s Distance Matrix T-COFFEE JALVIEW PHYLIP PAML Max Parsimony MAFFT NEXUS Max Likelihood BEAST MUSCLE Newick Programs: Mul?div?me PHYLIP PROBCONS RAxML MrBayes ALIGNMENT PROGRAMS ClustalW (1994) h]p://www.ebi.ac.uk/Tools/msa/clustalw2/ Uses a progressive mul/ple alignment; Parameters e.g. gap penal/es are adjusted according to input i.e. divergence, length, local hydropathy, etc. T-Coffee (2000) http://igs-server.cnrs-mrs.fr/Tcoffee/ Performs pairwise local and global alignments, then combines them in a progressive mul/ple alignment MAFFT (2002) http://mafft.cbrc.jp/alignment/server/ Detects local homologous regions by Fast Fourier Transform (considers aa size & polarity), then uses a restricted global DP and a progressive algorithm and horizontal refinement MUSCLE (2004) http://www.drive5.com/muscle kmer distances and log-expecta/on scores, progressive and horizontal refinement PROBCONS (2005) http://probcons.stanford.edu <30 taxa** pairwise consistency based on an objecve funcon COMPARISON OF ALIGNMENT PROGRAMS ALIGNMENT: CLUSTALW ALIGNMENT: MUSCLE ALIGNMENT: MAFFT ALIGNMENT VIEWERS/MANIPULATORS GENEDOC Program Descrip.on: A Full Featured Mul?ple Sequence Alignment Editor, Analyser and Shading U?lity for Windows. hp://www.nrbsc.org/gfx/genedoc/ Plaorm: Windows Input: Amino acid and nucleo?de FASTA, Clustal (.aln), Phylip, PIR, GCG (.msf), and GenBank formats. Output: Default are .msf files.
    [Show full text]
  • Dnasp V5. Tutorial
    DnaSP v5. Tutorial Copyright © 2016 by Julio Rozas et al. & Universitat de Barcelona. All Rights Reserved. DnaSP v5. Tutorial Table of contents Contents ........................................................................................................... 4 Introduction ...................................................................................................... 6 What DnaSP can do ........................................................................................ 6 System Requirements ..................................................................................... 7 File Menu / Input and Output .............................................................................. 9 Input Data Files Formats ............................................................................... 10 FASTA Format ......................................................................................... 10 MEGA Format .......................................................................................... 11 NBRF/PIR Format ..................................................................................... 12 NEXUS Format ......................................................................................... 12 PHYLIP Format ........................................................................................ 15 HapMap3 Phased Haplotypes Format .......................................................... 16 Multiple Data Files ........................................................................................ 17 Unphase/Genotype
    [Show full text]