Algorithms for Weighted Multidimensional Search and Perfect Phylogeny Richa Agarwala Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1994 Algorithms for weighted multidimensional search and perfect phylogeny Richa Agarwala Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Computer Sciences Commons Recommended Citation Agarwala, Richa, "Algorithms for weighted multidimensional search and perfect phylogeny " (1994). Retrospective Theses and Dissertations. 10669. https://lib.dr.iastate.edu/rtd/10669 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. U-M-I MICROFILMED 1994 INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality iUustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. University Microfilms International A Bell & Howell Information Company 300 North Zeeb Road. Ann Arbor. Ml 48106-1346 USA 313/761-4700 800/521-0600 Order Number 9503622 Algorithms for weighted multidimensional search and perfect phylogeny Agarwala, Richa, Ph.D. Iowa State University, 1994 Copyright ©1994 by Agarwala, Richa. All rights reserved. UMI 300N.ZeebRd. Ann Arbor, MI 48106 Algorithms for weighted multidimensional search and perfect phylogeny by Richa Agarwaia A Dissertation Submitted to the Graduate Faculty in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Department: Computer Science Major; Computer Science Approved: Members of the Committee: Signature was redacted for privacy. Signature was redacted for privacy. Signature was redacted for privacy. For the Major Department Signature was redacted for privacy. For the Graduate College Iowa State University Ames, Iowa 1994 Copyright © Richa Agarwaia, 1994. All rights reserved. 11 TABLE OF CONTENTS ACKNOWLEDGEMENTS vi ABSTRACT vii 1. INTRODUCTION 1 1.1 Problem Definitions and Related Work • . 1 1.1.1 Weighted Multidimensional Search 1 1.1.2 Perfect Phylogeny 3 1.2 Dissertation Organization 8 2. WEIGHTED MULTIDIMENSIONAL SEARCH AND ITS AP PLICATION TO CONVEX OPTIMIZATION 10 2.1 Abstract 10 2.2 Introduction 11 2.3 Weighted Multidimensional Search 12 2.3.1 Preliminaries 13 2.3.2 The Search Algorithm 17 2.3.3 Improving the Efficiency of the Search 22 2.4 Convex Optimization in Fixed Dimension 24 2.4.1 The Basic Scheme 25 2.4.2 Speeding up the Search 31 Ill 2.4.3 Some Remarks on Constant Factors 36 2.5 Solving the Lagrangian Dual when the Number of Constraints is Fixed 36 2.5.1 Matroidal Knapsack Problems 38 2.5.2 Constrained Optimum Subgraph Problems 40 3. FAST ALGORITHMS FOR INFERRING EVOLUTIONARY TREES 42 3.1 Abstract 42 3.2 Introduction 43 3.3 An algorithm for BP 45 3.4 An online algorithm for characters 49 3.5 An online algorithm for species 55 4. A POLYNOMIAL-TIME ALGORITHM FOR THE PERFECT PHYLOGENY PROBLEM WHEN THE NUMBER OF CHAR ACTER STATES IS FIXED 60 4.1 Abstract 60 4.2 Introduction 61 4.3 Preliminaries 63 4.4 Subphylogenies 66 4.5 Building a subphylogeny 70 4.6 Building a perfect phylogeny 73 4.7 Concluding remarks 75 5. A SIMPLE AND FAST ALGORITHM FOR THE PERFECT PHYLOGENY PROBLEM WHEN THE NUMBER OF CHAR ACTERS IS FIXED 78 iv 5.1 Abstract 78 5.2 Introduction 79 5.3 Preliminaries 80 5.4 Finding c-clusters 83 5.5 Basic Algorithm for Perfect Phylogeny 85 5.6 Improved Algorithm for Perfect Phylogeny 88 6. CONCLUSIONS AND FUTURE WORK 90 BIBLIOGRAPHY 97 APPENDIX A: WEIGHTED SEARCH IN EUCLIDEAN PLANE . 103 APPENDIX B: ACCOMPANYING DISKETTE AND OPERAT ING INSTRUCTIONS 106 V LIST OF FIGURES 3.1 (a) A matrix M and (b) its phylogenetic tree P 44 3.2 (a) Characters Cj, • • •, Cg and (b) their character tree T. 46 3.3 Compact character tree R for T in Figure 2 50 3.4 Dividing iV^ into Ni^ and adding Nj 58 4.1 Forced and unforced states 64 6.1 3-restricted Steiner trees 94 6.2 4-restricted Steiner trees 95 6.3 Wp and 104 vi ACKNOWLEDGEMENTS The author spent three busy, constructive, and pleasant years at Iowa State University for the research leading to this dissertation. During this time, there were several people who have made direct or indirect contributions to this work and the author wishes to take this opportunity to thank some of them personaly. Thanks to my advisor Dr. David Fernandez-Baca without whom it would have been difficult for me to organize my ideas in a useful way. I also appreciate the understanding shown by him during this rapidly changing period of my life. I would like to thank my Program of Study committee members Dr. Giora Slutzki and Dr. Vasant Honavar from Department of Computer Science, Dr. Clifford Bergman from Department of Mathematics, and Dr. Ricardo F. Rosenbusch from Department of Microbiology, Immunology and Preventive Medicine, for the time and suggestions they gave to this dissertation. Special thanks are due to Dr. Daniel A. Ashlock for attending my final oral examination in place of Dr. Clifford Bergman. I am also grateful to the Department of Computer Science and the Liberal Arts and Science College at Iowa State University for providing financial assistance during this period. Vll ABSTRACT This dissertation is a collection of papers from two independent areas: convex J optimization problems in R and the construction of evolutionary trees. The paper on convex optimization problems in R^ gives improved algorithms for solving the Lagrangian duals of problems that have both of the following proper ties. First, in absence of the bad constraints, the problems can be solved in strongly polynomial time by combinatorial algorithms. Second, the number of bad constraints is fixed. As part of our solution to these problems, we extend Cole's circuit simula tion approach and develop a weighted version of Megiddo's multidimensional search technique. The papers on evolutionary tree construction deal with the perfect phylogeny problem, where species are specified by a set of characters and each character can occur in a species in one of a fixed number of states. This problem is known to be NP- complete. The dissertation contains the following results on the perfect phylogeny problem: • A linear time algorithm when all the characters have two states. • A polynomial time algorithm when the number of character states is fixed. • A polynomial time algorithm when the number of characters is fixed. 1 1. INTRODUCTION 1.1 Problem Definitions and Related Work This dissertation can be viewed as having two independent parts. In the first part J (Chapter 2 and Appendix A), we study some convex optimization problems in R". Their solution involves the development of a technique that we shall call weighted multidimensional search. The second part (Chapters 3, 4, 5 and Appendix B) deals with the combinatorial problem of inferring the evolutionary relationships of a set of species; specifically, we consider what is known as the perfect phylogeny problem. 1.1.1 Weighted Multidimensional Search Lagrangian relaxation is a widely-used and an efficient way to obtain good es timates on the value of the optimum solution to many NP-hard combinatorial prob lems [40, 41]. These bounds are very useful in branch-and-bound methods. As the name suggests, the idea is to "relax" the problem constraints enough so that the resulting problem is easier to solve, but not so much that the result obtained from the relaxed problem is too far from the true optimum. The optimization problems that we consider have the following form: g = min{c^x : Ax < 0,a: G X}, (P) where c is a n x 1 vector, A is a. d X n matrix, x is a n x 1 vector and X is a 2 polyhedral subset of We obtain a Lagrangian relaxation of (P) by pricing out the complicating set of constraints < 0 into the objective function by introducing a vector A = (Aj, • • •, A^) of Lagrange multipliers as follows: g£){X) = min{c^x + Ax : x G X}. (^W) It is well known that fl'£)(A) < g for all A > 0 [35]. Thus, the relaxation will provide a useful bounding tool if, for any fixed A > 0, problem (JD(A)) is easier to solve than (P). The best lower bound on g that is attainable via (X)(A)) is given by: g* = max{5f/j{A) : A > 0}. This problem is called the Lagrangian dualoi (P) with respect to the set of constraints Ax < 0.