Supertree Construction by Matrix Representation with Flip

Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1-1-2003 Supertree construction by matrix representation with flip Duhong Chen Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Recommended Citation Chen, Duhong, "Supertree construction by matrix representation with flip" (2003). Retrospective Theses and Dissertations. 19926. https://lib.dr.iastate.edu/rtd/19926 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Supertree construction by matrix representation with flip b~- D uhong Chen ~ thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of :MASTER OF SCIE~'CE Major: Computer Science Program of Study COIIlI211ttee: David Fernandez-Baca. Major Professor Oliver Eulenstein Gavin ?~ aylor Iowa State University Ames , Iowa 2003 Cop~~right ©Duhong Chen, 2003. All rights reserved. 11 Graduate College Iowa State University This is to certify that the master's thesis of D uhong Chen has met the thesis requirements of Iowa State University Signatures have been redacted for privacy 111 DEDICATION I would like to dedicate this thesis to my wife Jing and to my daughter Vivian without whose support I would not have been able to complete this work. I would also like to thank my friends and family for their loving guidance during the writing of this work. 1V TABLE OF CONTENTS LIST OF FIGURES vi CHAPTER 1. INTRODUCTION 1 1.1 Basic Concepts and Methods 1 l.l.l Phylogenetic Zees 1 1.1.2 Phylogenetic Inference 2 1.1.3 Character Compatibility and Phylogenetic Inference 3 1.1.4 Parsimony 6 1.2 Review of Supertree Reconstruction Methods 7 1.2.1 Phylogenetic Zees and their Binary Character Coding 8 1.2.2 Matrix Representation with Parsimony(MRP} and Matrix Representa- tion with Flip(MRF) 9 1.2.3 MinCut(MC) and Modified MinCut(MMC) Algorithm 9 1.3 Outline 10 CHAPTER 2. COMPLEXITY OF THE MINIMUM FLIP PROBLEM 12 2.1 Introduction 12 2.2 Flip and the Minimum Flip Problem 12 2.3 Complexity of the Minimum Flip Problem 13 CHAPTER 3. ALGORITHMS FOR SOLZTING THE MINIMUM FLIP PROB- LEM 18 3.1 Introduction 18 3.2 Exact Methods 20 3.2.1 Exhaustive Enumeration 20 3.2.2 Branch-and-bound Method 20 3.3 Heuristics Algorithm for Minimum Flip 25 3.3.1 Branch-swapping Operations 25 3.3.2 Greedy Algorithm 27 3.3.3 Heuristics 28 CHAPTER 4. SIMULATION STUDIES OF SUPERTREE CONSTRUC- TION METHODS 29 4.1 Introduction 29 4.2 Criteria Used for 'I~ee Comparison 29 4.3 Exact Solution Comparison Experiment 31 4.3.1 Materials and Methods 31 4.3.2 Results and Analysis 34 4.4 Heuristic Solution Comparison Experiment 36 4.4.1 Materials and Methods 36 4.4.2 Results and Analysis 39 CHAPTER 5. CONCLUSION AND FUTURE STUDY 45 BIBLIOGRAPHY 47 ACKNOWLEDGEMENTS 50 Vl LIST OF FIGURES Figure 1.1 (a) an unresolved triplet and (b) a resolved triplet 2 Figure 1.2 (a) a tuple of characters ~ _ {Cl, C2, C3, C4}over h7 = {ml, m2, m3, m4, m5} is denoted by a 5 x 4 matrix .~[.. (b) A binary matrix B represents the completion of .A. and the set of characters ~. (c) A tree T~ is consistent with the set of characters ~ where each C~. induces a cluster in T~. 4 Figure 1.3 A set of pairwise compatible incomplete characters, but the whole set is not compatible 5 Figure 1.4 Flip to achieve perfect phylogeny 6 Figure 1.5 A collection of trees and its matrix representation 8 Figure 2.1 Example of a flip tuple ~", and s(~') = 2 13 Figure 3.1 A binary matrix representing a tree T 19 Figure 3.2 Enumeration of all possible trees for 4 species, An initial tree is formed from the first two taxa. The third taxon is added transversely to each possible location of the initial tree to obtain all the three taxon trees. The four taxon tree is formed from adding the fourth taxon transversely to each location of every possible three taxon trees. 21 Figure 3.3 An example of lower bound for minimum flip problem, the thick edges is a maximum weighted matching whose total weight is 3. Thus, the lower bound for the input characters ~ _ (Cl , . C5) is 3. 22 Figure 3.4 Two rearrangements are possible for. one NNI operation on an internal node 25 vii Figure 3.5 A general case of SPR operation. Subtree A is detached from the original tree T, and attached to any possible branch of remaining part of the tree T — A, such that different trees are created. 1,2 and 6 are represented any branch inside subtree B,C and D respectively. B (A) denotes a tree formed by regrafting subtree A to a branch in subtree B. C(A), D(A) have similar meanings. 26 Figure 3.6 A special case of SPR operation. The prune node's parent is the root of original tree T. A general SPR operation is performed, and then the root is changed to the prune node's sister. 26 Figure 3.7 A tree T is divided into two subtrees t and T — t by a cut. Subtree t` is formed by "bending" the branch 1 in subtree t. A TBR tree is created when subtree t' is regrafted to one of the branches in subtree T — t. 27 Figure 4.1 Flowchart of the experiment to assess the accuracy of different supertree methods 32 Figure 4.2 Variable taxa, and fixed sequence length experiment. 35 Figure 4.3 Fixed sequence length, and fixed taxa experiment 36 Figure 4.4 Proportional sequence length, and fixed taxa experiment 37 Figure 4.5 Flowchart of the heuristics solution comparison experiment 38 Figure 4.6 Tree comparison criteria used in heuristic solution comparison experiment. (a) and (~) was used to evaluate how well source trees incor- porate the true tree and how well supertrees recovering information in source trees respectively. ('y) was used to measure the degree of supertrees reflecting the true tree 40 Figure 4.7 Fixed sequence model with 1,000 characters: average MASTf it (first column) and tripl et f it (second column) of supertrees and source trees. The figures from top to down represents deletion probabilities of 0.25, 0.50, and 0.75 respectively 42 Vlll Figure 4.8 Proportional sequence model with a total of 20,000 characters: average MASTf it (first column) and triplet f it (second column) of supertrees and source trees. The figures from top to down represents deletion probabilities of 0.25, 0.50, and 0.75 respectively 43 Figure 4.9 Average MASTf it of of supertrees and true tree: The figures of fixed sequence length model is listed in the first column and proportional model in the second column. Figures in both columns from top to down represents deletion probabilities of 0.25, 0.50, and 0.75 respectively. 44 1 CHAPTER 1. INTRODUCTION This thesis is about a new method for evolutionary tree inference and phylogenetic supertree construction. A phylogenetic tree or simply phylogeny represents the evolutionary history of a set of species. It can be used to determine the genetic connections and relationships between species. In recent years the explosive growth in phylogenetic trees on difFerent small sets of species, systematic biologists have become interested in assembling those trees in one supertree called the "tree of life" which represents evolutionary history among all living things(1) . 1.1 Basic Concepts and Methods In this section, I will introduce some basic terminology, theorems, and notation that are used in this thesis. 1.1.1 phylogenetic Trees A tree is a acyclic, connected graph which consists of a set of nodes(vertices) connected by a set of branches (edges) . The degree of a vertex equals to the number of edges to which it is connected. Let M = {ml, . , m s } be a taxon set. A phylogenetic tree T over M is a tree with exactly s leaves, each of which is labelled with a distinct element of M. Such a tree is used to represent biological evolutionary relationships among the taxon set M. A phylogenetic tree can be rooted or unrooted. A rooted tree is a tree in which one of the nodes is stipulated to be the root, from which a unique path leading to every other node. An unrooted tree, as could be imagined, has no pre-determined root and therefore induces no hierarchy. Suppose T = (V, E) is a tree. A vertex v E V is internal if its degree is greater than one; otherwise, it is a leaf. An edge e = (u, v) E E is internal if both u and v are internal vertices; 2 otherwise we say e is an external edge. We write .G (T) to denote the leaf set of tree T. A vertex a is a descendent of a vertex b if the path from a to the root passes through b. Then b is called the ancestor of a. For a vertex v E V , the set containing all leaves that are descendants of v is called a cluster. The set M is always a cluster of T; every other cluster is said to be proper.

Load more