98-01-011.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

98-01-011.Pdf Minimal Cycle Bases of Outerplanar Graphs Josef Leydold Peter F. Stadler SFI WORKING PAPER: 1998-01-011 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE Minimal Cycle Bases of Outerplanar Graphs a bc Josef Leydold and Peter F Stadler a Dept for Applied Statistics and Data Pro cessing University of Economics and Business Adminstration Augasse A Wien Austria Phone Fax EMail JosefLeydoldwuwienacat URL httpstatistikwuwienacatstaffleydold b Institut fur Theoretische Chemie Universitat Wien Wahringerstrae A Wien Austria Phone Fax EMail studlatbiunivieacat Address for corresp ondence c The Santa Fe Institute Hyde Park Road Santa Fe NM USA Phone Fax EMail stadlersantafeedu URL httpwwwtbiunivieacatstudla Abstract connected outerplanar graphs have a unique minimal cycle basis with length jE j jV j They are the only Hamiltonian graphs with a cycle basis of this length Keywords Minimal Cycle Basis Outerplanar Graphs AMS Sub ject Classication C D J Leydold P F Stadler Minimal Cycle Bases of Outerplanar Graphs Intro duction The description of cyclic structures is an imp ortant problem in graph theory see eg Cycle bases of graphs have a variety of applications in science and engineering among them in structural analysis and in chemical structure storage and retrieval systems Naturally minimal cycles bases are of particular practical interest In this contribution we prove that outerplanar graphs have a unique minimal cycle basis This result was motivated by the analysis of the structures of biop oly mers In addition we derive upp er and lower b ounds on the length of minimal cycle basis in connected graphs Biop olymers such as RNA DNA or proteins form welldened three dimen sional structures These are of utmost imp ortance for their biological function The most salien t features of these structures are captured by their contact graph representing the set E of all pairs of monomers V that are spatially adjacent While this simplication of the D shap e obviously neglects a wealth of structural details it encapsulates the typ e of structural information that can b e obtained byavariety of exp erimental and computational metho ds Nucleic acids b oth RNA and DNA form a sp ecial typ e of contact structures known as secondary structures These graphs are sub cubic and outerplanar A particular typ e of cycles which is commonly termed lo ops in the RNA litera ture plays an imp ortant role for RNA and DNA secondary structures the energy of a secondary structure can be computed as the sum of energy contributions of the lo ops These lo ops form the unique minimal cycle basis of the contact graph Exp erimental energy parameters are available for the contribution of an individual lo op as a function of its size of the typ e of b onds that are contained in it and on the monomers nucleotides that it is comp osed of Based on this energy mo del it is p ossible to compute the secondary structure with minimal energy given the sequence of nucleotides using a dynamic programming technique Preliminaries In this contribution we consider only nite simple graphs GV E with vertex set V and edge set E ie there are no lo ops or multiple edges GV E is connected if the deletion of a single vertex do es not disconnect the graph Let G V E andG V E betwo subgraphs of a graph GV E We shall write G n G for the subgraph of G induced bythe edge set E n E The set E of all subsets of E forms an mdimensional vector space over GF ultiplication X X with vector addition X Y X Y n X Y and scalar m X for all X Y E A cycle is a subgraph such that anyvertex degree is even We represent a cycle by its edge set C Sometimes it will b e convenient to regard C as a subgraph V C of GV E The set C of all cycles forms a subspace of C E which is called the cycle space of G A basis B of the cycle space C is called a cycle basis of GV E The dimension of the cycle space is the cyclomatic number or rst Betti number GjE jjV j It is obvious that the cycle space of graph is the direct sum of the cycle spaces of its connected comp onents It will be sucient therefore to consider only connected graphs in this contribution A connected or elementary cycle is a cycle C for which V C is a connected C minimal subgraph suchthatevery vertex in V has degree Wesay that a cycle C J Leydold P F Stadler Minimal Cycle Bases of Outerplanar Graphs basis is connected if all cycles are connected A cycle C is a chord less cycle if V C is an induced subgraph of GV E ie if there is no edge in E n C that is C incidenttotwovertices of V We shall say that a cycle basis is chordless if all its C cycles are chordless The length jC j of a cycle C is the numberofitsedges The length B of a cycle P basis B is sum of the lengths of its cycles B jC j A minimal cycle basis C B is a cycle basis with minimal length Let cB b e the length of the longest cycle in the cycle base B Chickering showed that B is minimal if and only if cB is minimal ie a cycle basis is minimal if and only if has a shortest longest cycle A cycle C is relevant ifitiscontained in a minimal cycle basis Vismara proved the following Prop ostion A cycle C is relevant if and only if it cannot be represented as a sum of shorter cycles An immediate consequence is Corollary Arelevant cycle is chordless Hence a minimal cycle basis is chord less and of course connected Fundamental Cycle Bases In what follows let GV E b e a connected graph Supp ose T is a spanning tree of G Then for eachedge T there is unique cycle in T fg which is called a fundamental cycle The set of fundamental cycles b elonging to a given spanning tree form a basis of the cycle subspace which is called the fundamental basis wrt T For details see A collection of G cycles in G is called fundamental if there exists an ordering of these cycles suchthat C n C C C for j G j j Of course such a collection is a cycle basis Not all cycle bases are fundamental Lemma An elementary fundamental cycle basis can beordered such that i C is an elementary cycle and ii C n C C P is a nonempty path for j G j j j Proof Let G C C Then G G for i and consequently i i i i G G G G G G Therefore G G equality holds and wehave G iieB fC C g is a cycle basis for G i i i i Next notice that there exists an ordering for which holds such that G is i Otherwise there exists a j such connected for all i ie C G i i that C G for all C BnB for all orderings satisfying But then j j C C has emptyintersection with G C C a contradiction j j j G since G C C is connected G is connected since by assumption all i G C are connected j An immediate consequence is that C n G must b e either a path as claimed j j or an elementary cycle with has one vertex in common with G Otherwise we j G j If C n G is a cycle this one vertex must b e a cut vertex would have j j j of G Then there must be a cycle C BnB which has at least one edge in j k j common with G and with P Otherwise G cannot be connected Then we j j can reorder the basis by exchanging C and C j k Aweaker result holds for nonfundamental cycle bases J Leydold P F Stadler Minimal Cycle Bases of Outerplanar Graphs Lemma Any connected nonfundamental cycle basis can be ordered such that C C is connected for al l i i Proof Analogously to the pro of of lemma second part we can show that all G C C are connected i i If B is a nonfundamental cycle basis of G then there is subgraph G with cycle basis B B such that eachedgeofG is contained in at least two cycles of B prop Furthermore the examples of nonfundamental bases in are much longer than the minimal cycles bases One might b e tempted therefore to conjecture that every minimal cycle basis is fundamental Although this statement is easily veried for planar graphs see corollary it is not true in general Consider the complete graph K with vertices It is straightforward we used Mathematica to checkthat the following cycles are indep endent and thus are a basis of the cycle space since K Here denotes the cycle fv v v v v v g This basis is minimal since every cycle has length But it is nonfundamental since every edge is cov ered at least two times Outerplanar Graphs A graph GV Eisouterplanar if it can b e emb edded in the plane such that all vertices lie on the b oundary of its exterior region Given suchanemb edding wewill refer to the set of edges on the b oundary to the exterior region as the boundary B of G A graph is outerplanar if and only if it do es not
Recommended publications
  • Minimum Cycle Bases and Their Applications
    Minimum Cycle Bases and Their Applications Franziska Berger1, Peter Gritzmann2, and Sven de Vries3 1 Department of Mathematics, Polytechnic Institute of NYU, Six MetroTech Center, Brooklyn NY 11201, USA [email protected] 2 Zentrum Mathematik, Technische Universit¨at M¨unchen, 80290 M¨unchen, Germany [email protected] 3 FB IV, Mathematik, Universit¨at Trier, 54286 Trier, Germany [email protected] Abstract. Minimum cycle bases of weighted undirected and directed graphs are bases of the cycle space of the (di)graphs with minimum weight. We survey the known polynomial-time algorithms for their con- struction, explain some of their properties and describe a few important applications. 1 Introduction Minimum cycle bases of undirected or directed multigraphs are bases of the cycle space of the graphs or digraphs with minimum length or weight. Their intrigu- ing combinatorial properties and their construction have interested researchers for several decades. Since minimum cycle bases have diverse applications (e.g., electric networks, theoretical chemistry and biology, as well as periodic event scheduling), they are also important for practitioners. After introducing the necessary notation and concepts in Subsections 1.1 and 1.2 and reviewing some fundamental properties of minimum cycle bases in Subsection 1.3, we explain the known algorithms for computing minimum cycle bases in Section 2. Finally, Section 3 is devoted to applications. 1.1 Definitions and Notation Let G =(V,E) be a directed multigraph with m edges and n vertices. Let + E = {e1,...,em},andletw : E → R be a positive weight function on E.A cycle C in G is a subgraph (actually ignoring orientation) of G in which every vertex has even degree (= in-degree + out-degree).
    [Show full text]
  • Lab05 - Matroid Exercises for Algorithms by Xiaofeng Gao, 2016 Spring Semester
    Lab05 - Matroid Exercises for Algorithms by Xiaofeng Gao, 2016 Spring Semester Name: Student ID: Email: 1. Provide an example of (S; C) which is an independent system but not a matroid. Give an instance of S such that v(S) 6= u(S) (should be different from the example posted in class). 2. Matching matroid MC : Let G = (V; E) be an arbitrary undirected graph. C is the collection of all vertices set which can be covered by a matching in G. (a) Prove that MC = (V; C) is a matroid. (b) Given a graph G = (V; E) where each vertex vi has a weight w(vi), please give an algorithm to find the matching where the weight of all covered vertices is maximum. Prove the correctness and analyze the time complexity of your algorithm. Note: Given a graph G = (V; E), a matching M in G is a set of pairwise non-adjacent edges; that is, no two edges share a common vertex. A vertex is covered (or matched) if it is an endpoint of one of the edges in the matching. Otherwise the vertex is uncovered. 3. A Dyck path of length 2n is a path in the plane from (0; 0) to (2n; 0), with steps U = (1; 1) and D = (1; −1), that never passes below the x-axis. For example, P = UUDUDUUDDD is a Dyck path of length 10. Each Dyck path defines an up-step set: the subset of [2n] consisting of the integers i such that the i-th step of the path is U.
    [Show full text]
  • BIOINFORMATICS Doi:10.1093/Bioinformatics/Btt213
    Vol. 29 ISMB/ECCB 2013, pages i352–i360 BIOINFORMATICS doi:10.1093/bioinformatics/btt213 Haplotype assembly in polyploid genomes and identical by descent shared tracts Derek Aguiar and Sorin Istrail* Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA ABSTRACT Standard genome sequencing workflows produce contiguous Motivation: Genome-wide haplotype reconstruction from sequence DNA segments of an unknown chromosomal origin. De novo data, or haplotype assembly, is at the center of major challenges in assemblies for genomes with two sets of chromosomes (diploid) molecular biology and life sciences. For complex eukaryotic organ- or more (polyploid) produce consensus sequences in which the isms like humans, the genome is vast and the population samples are relative haplotype phase between variants is undetermined. The growing so rapidly that algorithms processing high-throughput set of sequencing reads can be mapped to the phase-ambiguous sequencing data must scale favorably in terms of both accuracy and reference genome and the diploid chromosome origin can be computational efficiency. Furthermore, current models and methodol- determined but, without knowledge of the haplotype sequences, ogies for haplotype assembly (i) do not consider individuals sharing reads cannot be mapped to the particular haploid chromosome haplotypes jointly, which reduces the size and accuracy of assembled sequence. As a result, reference-based genome assembly algo- haplotypes, and (ii) are unable to model genomes having more than rithms also produce unphased assemblies. However, sequence two sets of homologous chromosomes (polyploidy). Polyploid organ- reads are derived from a single haploid fragment and thus pro- isms are increasingly becoming the target of many research groups vide valuable phase information when they contain two or more variants.
    [Show full text]
  • Short Cycles
    Short Cycles Minimum Cycle Bases of Graphs from Chemistry and Biochemistry Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium an der Fakultat¨ fur¨ Naturwissenschaften und Mathematik der Universitat¨ Wien Vorgelegt von Petra Manuela Gleiss im September 2001 An dieser Stelle m¨ochte ich mich herzlich bei all jenen bedanken, die zum Entstehen der vorliegenden Arbeit beigetragen haben. Allen voran Peter Stadler, der mich durch seine wissenschaftliche Leitung, sein ub¨ er- w¨altigendes Wissen und seine Geduld unterstutzte,¨ sowie Josef Leydold, ohne den ich so manch tieferen Einblick in die Mathematik nicht gewonnen h¨atte. Ivo Hofacker, dermich oftmals aus den unendlichen Weiten des \Computer Universums" rettete. Meinem Bruder Jurgen¨ Gleiss, fur¨ die Einfuhrung¨ und Hilfstellungen bei meinen Kampf mit C++. Daniela Dorigoni, die die Daten der atmosph¨arischen Netzwerke in den Computer eingeben hat. Allen Kolleginnen und Kollegen vom Institut, fur¨ die Hilfsbereitschaft. Meine Eltern Erika und Franz Gleiss, die mir durch ihre Unterstutzung¨ ein Studium erm¨oglichten. Meiner Oma Maria Fischer, fur¨ den immerw¨ahrenden Glauben an mich. Meinen Schwiegereltern Irmtraud und Gun¨ ther Scharner, fur¨ die oftmalige Betreuung meiner Kinder. Zum Schluss Roland Scharner, Florian und Sarah Gleiss, meinen drei Liebsten, die mich immer wieder aufbauten und in die reale Welt zuruc¨ kfuhrten.¨ Ich wurde teilweise vom osterreic¨ hischem Fonds zur F¨orderung der Wissenschaftlichen Forschung, Proj.No. P14094-MAT finanziell unterstuzt.¨ Zusammenfassung In der Biochemie werden Kreis-Basen nicht nur bei der Betrachtung kleiner einfacher organischer Molekule,¨ sondern auch bei Struktur Untersuchungen hoch komplexer Biomolekule,¨ sowie zur Veranschaulichung chemische Reaktionsnetzwerke herange- zogen. Die kleinste kanonische Menge von Kreisen zur Beschreibung der zyklischen Struk- tur eines ungerichteten Graphen ist die Menge der relevanten Kreis (Vereingungs- menge aller minimaler Kreis-Basen).
    [Show full text]
  • A Cycle-Based Formulation for the Distance Geometry Problem
    A cycle-based formulation for the Distance Geometry Problem Leo Liberti, Gabriele Iommazzo, Carlile Lavor, and Nelson Maculan Abstract The distance geometry problem consists in finding a realization of a weighed graph in a Euclidean space of given dimension, where the edges are realized as straight segments of length equal to the edge weight. We propose and test a new mathematical programming formulation based on the incidence between cycles and edges in the given graph. 1 Introduction The Distance Geometry Problem (DGP), also known as the realization problem in geometric rigidity, belongs to a more general class of metric completion and embedding problems. DGP. Given a positive integer K and a simple undirected graph G = ¹V; Eº with an edge K weight function d : E ! R≥0, establish whether there exists a realization x : V ! R of the vertices such that Eq. (1) below is satisfied: fi; jg 2 E kxi − xj k = dij; (1) 8 K where xi 2 R for each i 2 V and dij is the weight on edge fi; jg 2 E. L. Liberti LIX CNRS Ecole Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France, e-mail: [email protected] G. Iommazzo LIX Ecole Polytechnique, France and DI Università di Pisa, Italy, e-mail: giommazz@lix. polytechnique.fr C. Lavor IMECC, University of Campinas, Brazil, e-mail: [email protected] N. Maculan COPPE, Federal University of Rio de Janeiro (UFRJ), Brazil, e-mail: [email protected] 1 2 L. Liberti et al. In its most general form, the DGP might be parametrized over any norm.
    [Show full text]
  • Dominating Cycles in Halin Graphs*
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Discrete Mathematics 86 (1990) 215-224 215 North-Holland DOMINATING CYCLES IN HALIN GRAPHS* Mirosiawa SKOWRONSKA Institute of Mathematics, Copernicus University, Chopina 12/18, 87-100 Torun’, Poland Maciej M. SYStO Institute of Computer Science, University of Wroclaw, Przesmyckiego 20, 51-151 Wroclaw, Poland Received 2 December 1988 A cycle in a graph is dominating if every vertex lies at distance at most one from the cycle and a cycle is D-cycle if every edge is incident with a vertex of the cycle. In this paper, first we provide recursive formulae for finding a shortest dominating cycle in a Hahn graph; minor modifications can give formulae for finding a shortest D-cycle. Then, dominating cycles and D-cycles in a Halin graph H are characterized in terms of the cycle graph, the intersection graph of the faces of H. 1. Preliminaries The various domination problems have been extensively studied. Among them is the problem whether a graph has a dominating cycle. All graphs in this paper have no loops and multiple edges. A dominating cycle in a graph G = (V(G), E(G)) is a subgraph C of G which is a cycle and every vertex of V(G) \ V(C) is adjacent to a vertex of C. There are graphs which have no dominating cycles, and moreover, determining whether a graph has a dominating cycle on at most 1 vertices is NP-complete even in the class of planar graphs [7], chordal, bipartite and split graphs [3].
    [Show full text]
  • Exam DM840 Cheminformatics (2016)
    Exam DM840 Cheminformatics (2016) Time and Place Time: Thursday, June 2nd, 2016, starting 10:30. Place: The exam takes place in U-XXX (to be decided) Even though the expected total examination time per student is about 27 minutes (see below), it is not possible to calculate the exact examination time from the placement on the list, since students earlier on the list may not show up. Thus, students are expected to show up plenty early. In principle, all students who are taking the exam on a particular date are supposed to show up when the examination starts, i.e., at the time the rst student is scheduled. This is partly because of the way external examiners are paid, which is by the number of students who show up for examination. For this particular exam, we do not expect many no-shows, so showing up one hour before the estimated time of the exam should be safe. Procedure The exam is in English. When it is your turn for examination, you will draw a question. Note that you have no preparation time. The list of questions can be found below. Then the actual exam takes place. The whole exam (without the censor and the examiner agreeing on a grade) lasts approximately 25-30 minutes. You should start by presenting material related to the question you drew. Aim for a reasonable high pace and focus on the most interesting material related to the question. You are not supposed to use note material, textbooks, transparencies, computer, etc. You are allowed to bring keywords for each question, such that you can remember what you want to present during your presentation.
    [Show full text]
  • Cheminformatics for Genome-Scale Metabolic Reconstructions
    CHEMINFORMATICS FOR GENOME-SCALE METABOLIC RECONSTRUCTIONS John W. May European Molecular Biology Laboratory European Bioinformatics Institute University of Cambridge Homerton College A thesis submitted for the degree of Doctor of Philosophy June 2014 Declaration This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation is not substantially the same as any I have submitted for a degree, diploma or other qualification at any other university, and no part has already been, or is currently being submitted for any degree, diploma or other qualification. This dissertation does not exceed the specified length limit of 60,000 words as defined by the Biology Degree Committee. This dissertation has been typeset using LATEX in 11 pt Palatino, one and half spaced, according to the specifications defined by the Board of Graduate Studies and the Biology Degree Committee. June 2014 John W. May to Róisín Acknowledgements This work was carried out in the Cheminformatics and Metabolism Group at the European Bioinformatics Institute (EMBL-EBI). The project was fund- ed by Unilever, the Biotechnology and Biological Sciences Research Coun- cil [BB/I532153/1], and the European Molecular Biology Laboratory. I would like to thank my supervisor, Christoph Steinbeck for his guidance and providing intellectual freedom. I am also thankful to each member of my thesis advisory committee: Gordon James, Julio Saez-Rodriguez, Kiran Patil, and Gos Micklem who gave their time, advice, and guidance. I am thankful to all members of the Cheminformatics and Metabolism Group.
    [Show full text]
  • Exam DM840 Cheminformatics (2014)
    Exam DM840 Cheminformatics (2014) Time and Place Time: Thursday, January 22, 2015, starting XX:00. Place: The exam takes place in U-XXX Even though the expected total examination time per student is about 27 minutes (see below), it is not possible to calculate the exact examination time from the placement on the list, since students earlier on the list may not show up. Thus, students are expected to show up plenty early. In principle, all students who are taking the exam on a particular date are supposed to show up when the examination starts, i.e., at the time the rst student is scheduled. This is partly because of the way external examiners are paid, which is by the number of students who show up for examination. For this particular exam, we do not expect many no-shows, so showing up one hour before the estimated time of the exam should be safe. Procedure The exam is in English. When it is your turn for examination, you will draw a question. Note that you have no preparation time. The list of questions can be found below. Then the actual exam takes place. The whole exam (without the censor and the examiner agreeing on a grade) lasts approximately 25-30 minutes. You should start by presenting material related to the question you drew. Aim for a reasonable high pace and focus on the most interesting material related to the question. You are not supposed to use note material, textbooks, transparencies, computer, etc. You are allowed to bring keywords for each question, such that you can remember what you want to present during your presentation.
    [Show full text]
  • From Las Vegas to Monte Carlo and Back: Sampling Cycles in Graphs
    From Las Vegas to Monte Carlo and back: Sampling cycles in graphs Konstantin Klemm SEHRLASSIGER¨ LEHRSASS--EL¨ for Bioinformatics University of Leipzig 1 Agenda • Cycles in graphs: Why? What? Sampling? • Method I: Las Vegas • Method II: Monte Carlo • Robust cycle bases 2 Why care about cycles? (1) Chemical Ring Perception 3 Why care about cycles? (2) Analysis of chemical reaction networks (Io's ath- mosphere) 4 Why care about cycles? (3) • Protein interaction networks • Internet graph • Social networks • : : : Aim: Detailed comparison between network models and empiri- cal networks with respect to presence / absence of cycles 5 Sampling • Interested in average value of a cycle property f(C) 1 hfi = X f(C) jfcyclesgj C2fcyclesg with f(C) = jCj or f(C) = δjCj;h or : : : • exhaustive enumeration of fcyclesg not feasible • approximate hfi by summing over representative, randomly selected subset S ⊂ fcyclesg • How do we generate S then? 6 Las Vegas • Sampling method based on self-avoiding random walk [Rozenfeld et al. (2004), cond-mat/0403536] • probably motivated by the movie \Lost in Las Vegas" (though the authors do not say explicitly) 1. Choose starting vertex s. 2. Hop to randomly chosen neighbour, avoiding previously vis- ited vertices except s. 3. Repeat 2. unless reaching s again or getting stuck 7 Las Vegas { trouble • Number of cycles of length h in complete graph KN N! W (h) = (2h)−1 (N − h)! • For N = 100, W (100)=W (3) ≈ 10150 • Flat cycle length distribution in KN from Rozenfeld method 1 p(h) = (N − 2) • Undersampling of long cycles 8 Las Vegas | results 100 12 10 h* Nh 8 10 10 100 1000 N 4 10 0 20 40 60 80 100 120 140 h Generalized random graphs ("static model") with N = 100; 200; 400; 800, hki = 2, β = 0:5 Leaving Las Vegas : : : 9 Monte Carlo | summing cycles • Sum of two cycles yields new cycle: + = + = • (generalized) cycle: subgraph, all degrees even • simple cycle: connected subgraph, all degrees = 2.
    [Show full text]
  • Hapcompass: a Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data
    HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data Derek Aguiar 1;2 Sorin Istrail 1;2;∗ Dedicated to professors Michael Waterman's 70th birthday and Simon Tavare's 60th birthday 1Department of Computer Science, Brown University, Providence, RI, USA 2Center for Computational Molecular Biology, Brown University, Providence, RI, USA Abstract: Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determin- ing the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational methods of determining hap- lotype phase from sequence data { known as haplotype assembly { have ∗to whom correspondence should be addressed: Sorin [email protected] 1 difficulties producing accurate results for large (1000 genomes-type) data or operate on restricted optimizations that are unrealistic considering modern high-throughput sequencing technologies. We present a novel algorithm, HapCompass, for haplotype assembly of densely sequenced human genome data. The HapCompass algorithm oper- ates on a graph where single nucleotide polymorphisms (SNPs) are nodes and edges are defined by sequence reads and viewed as supporting evidence of co- occuring SNP alleles in a haplotype. In our graph model, haplotype phasings correspond to spanning trees and each spanning tree uniquely defines a cycle basis. We define the minimum weighted edge removal global optimization on this graph and develop an algorithm based on local optimizations of the cycle basis for resolving conflicting evidence. We then estimate the amount of sequencing required to produce a complete haplotype assembly of a chro- mosome.
    [Show full text]
  • Cycle Bases in Graphs Characterization, Algorithms, Complexity, and Applications
    Cycle Bases in Graphs Characterization, Algorithms, Complexity, and Applications Telikepalli Kavitha∗ Christian Liebchen† Kurt Mehlhorn‡ Dimitrios Michail Romeo Rizzi§ Torsten Ueckerdt¶ Katharina A. Zweigk August 25, 2009 Abstract Cycles in graphs play an important role in many applications, e.g., analysis of electrical networks, analysis of chemical and biological pathways, periodic scheduling, and graph drawing. From a mathematical point of view, cycles in graphs have a rich structure. Cycle bases are a compact description of the set of all cycles of a graph. In this paper, we survey the state of knowledge on cycle bases and also derive some new results. We introduce different kinds of cycle bases, characterize them in terms of their cycle matrix, and prove structural results and apriori length bounds. We provide polynomial algorithms for the minimum cycle basis problem for some of the classes and prove -hardness for others. We also discuss three applications and show that they requirAPXe different kinds of cycle bases. Contents 1 Introduction 3 2 Definitions 5 3 Classification of Cycle Bases 10 3.1 Existence ...................................... 10 3.2 Characterizations............................... 11 3.3 SimpleExamples .................................. 15 3.4 VariantsoftheMCBProblem. 17 3.5 Directed and GF (p)-Bases............................. 20 3.6 CircuitsversusCycles . .. .. .. .. .. .. .. 21 3.7 Reductions ..................................... 24 ∗Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India †DB Schenker Rail Deutschland AG, Mainz, Germany. Part of this work was done at the DFG Research Center Matheon in Berlin ‡Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany §Dipartimento di Matematica ed Informatica (DIMI), Universit degli Studi di Udine, Udine, Italy ¶Technische Universit¨at Berlin, Berlin, Germany, Supported by .
    [Show full text]