Multipartite Graph Algorithms for the Analysis of Heterogeneous Data

Total Page:16

File Type:pdf, Size:1020Kb

Multipartite Graph Algorithms for the Analysis of Heterogeneous Data University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2015 Multipartite Graph Algorithms for the Analysis of Heterogeneous Data Charles Alexander Phillips University of Tennessee - Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Part of the Computational Biology Commons Recommended Citation Phillips, Charles Alexander, "Multipartite Graph Algorithms for the Analysis of Heterogeneous Data. " PhD diss., University of Tennessee, 2015. https://trace.tennessee.edu/utk_graddiss/3600 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Charles Alexander Phillips entitled "Multipartite Graph Algorithms for the Analysis of Heterogeneous Data." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Computer Science. Michael A. Langston, Major Professor We have read this dissertation and recommend its acceptance: Bruce J. MacLennon, Brynn H. Voy, David J. Icove Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Multipartite Graph Algorithms for the Analysis of Heterogeneous Data A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Charles Alexander Phillips December 2015 Copyright © 2015 by Charles A. Phillips All rights reserved. ii Acknowledgements In the course of my education and research I have had the good fortune to cross paths with many bright and dedicated people. I extend my gratitude to many of them here, although I am certain the list is not complete. First I would like to thank my advisor, Dr. Michael A. Langston, for his guidance, patience and above all the example he sets for high standards in scientific research and work ethics. My special thanks go out to those who served on my dissertation committee: Drs. David Icove, Bruce MacLennan, Lynne Parker and Brynn Voy. Former and present students I have worked with as part of Dr. Langston’s research team here at the University of Tennessee include John Eblen, Ron Hagan, Jeremy Jay, Jordan Lefebvre, Allan Lu, Sudhir Naswa, Clinton Nolan, Andy Perkins, Gary Rogers, Kai Wang, Dinesh Weerapurage and Yun Zhang. Research Collaborators include Erich Baker, Jason Bubier, Elissa Chesler, Frank Dehne, Dan Goldowitz, Mike Miles and Aaron Wolen. My thanks go to Suzanne Baktash for her encouragement, helpfulness and support. Other professors at UT who I have been fortunate to collaborate with include Drs. Arnold Saxton and Meg Staton. My appreciation goes to instructors at Moberly Area Community College, where I completed my associate degree, and Columbia College, where I did my bachelor’s degree, for helping to fan the spark of my interest in computer science and encouraging me to pursue a graduate degree. These instructors include David Heise, Yihsiang Liow, David Pence and Lawrence West. And last but not least, my gratitude and appreciation goes to my family: my sister Lisa, brothers Chris and Mike, stepmother Sylvia, and especially my father, Alex, whose support and encouragement through the years have been beyond price. iii Abstract The explosive growth in the rate of data generation in recent years threatens to outpace the growth in computer power, motivating the need for new, scalable algorithms and big data analytic techniques. No field may be more emblematic of this data deluge than the life sciences, where technologies such as high-throughput mRNA arrays and next generation genome sequencing are routinely used to generate datasets of extreme scale. Data from experiments in genomics, transcriptomics, metabolomics and proteomics are continuously being added to existing repositories. A goal of exploratory analysis of such omics data is to illuminate the functions and relationships of biomolecules within an organism. This dissertation describes the design, implementation and application of graph algorithms, with the goal of seeking dense structure in data derived from omics experiments in order to detect latent associations between often heterogeneous entities, such as genes, diseases and phenotypes. Exact combinatorial solutions are developed and implemented, rather than relying on approximations or heuristics, even when problems are exceedingly large and/or difficult. Datasets on which the algorithms are applied include time series transcriptomic data from an experiment on the developing mouse cerebellum, gene expression data measuring acute ethanol response in the prefrontal cortex, and the analysis of a predicted protein-protein interaction network. A bipartite graph model is used to integrate heterogeneous data types, such as genes with phenotypes and microbes with mouse strains. The techniques are then extended to a multipartite algorithm to enumerate dense substructure in multipartite graphs, constructed using data from three or more heterogeneous sources, with applications to functional genomics. Several new theoretical results are given regarding multipartite graphs and the multipartite enumeration algorithm. In all cases, practical implementations are demonstrated to expand the frontier of computational feasibility. iv Table of Contents Chapter 1 Introduction and Background .......................................................................... 1 1.1 Definitions, Notation and Preliminaries ............................................................... 2 1.2 Omics Data .................................................................................................................. 4 1.3 Constructing Graphs from High-Throughput Data ............................................ 4 1.3.1 Similarity Metrics ................................................................................................ 5 1.3.2 Thresholding ......................................................................................................... 5 1.4 The Quest for Dense Subgraphs ............................................................................. 6 1.4.1 Maximum Clique ................................................................................................. 7 1.4.2 Maximal Clique Enumeration ............................................................................. 7 1.4.3 The Paraclique Algorithm .................................................................................... 9 Chapter 2 Algorithms for General Graphs .................................................................... 11 2.1 Ethanol Responsive Gene Networks in the Prefrontal Cortex ........................ 11 2.1.1 Paraclique and Network Analysis ...................................................................... 13 2.1.2 Functional Analysis ........................................................................................... 15 2.1.3 Combining Transcriptomic and Phenotype Data .............................................. 15 2.1.4 QTL Analysis ..................................................................................................... 17 2.1.5 Maximal Clique Enumeration ........................................................................... 17 2.2 Time Series Analysis of the Developing Mouse Cerebellum ......................... 19 2.2.1 Data Description ................................................................................................ 19 2.2.2 Paraclique Method.............................................................................................. 20 2.2.3 Paraclique Results .............................................................................................. 21 2.3 A Custom Algorithm for Protein-Protein Interaction Prediction ................... 23 2.3.1 Motivation .......................................................................................................... 24 2.3.2 Algorithm ........................................................................................................... 25 2.3.3 Results ................................................................................................................ 26 2.4 Maximum Clique Enumeration ............................................................................. 28 2.4.1 Background ......................................................................................................... 29 2.4.2 Results and Discussion ...................................................................................... 30 2.4.2.1 Algorithms ............................................................................................... 30 2.4.2.2 Basic Backtracking ................................................................................... 31 2.4.2.3 Finding a Single Maximum Clique ......................................................... 32 2.4.2.4 Intelligent Backtracking .......................................................................... 32 2.4.2.5 Parameterized Enumeration .................................................................... 33 2.4.2.6 Maximum Clique Covers ........................................................................
Recommended publications
  • Efficient Domination and Polarity
    Efficient Domination and Polarity Dissertation zur Erlangung des akademischen Grades Doktor-Ingenieur (Dr.-Ing.) Universität Rostock Fakultät für Informatik und Elektrotechnik vorgelegt von Ragnar Christopher Nevries Gutachter: Prof. Dr. Andreas Brandstädt, Institut für Informatik, Universität Rostock Prof. Dr. Peter Widmayer, Institut für Theoretische Informatik, ETH Zürich Prof. Dr. Jing Huang, Mathematics and Statistics, University of Victoria eingereicht am 5. Februar 2014 verteidigt am 25. Juli 2014 Abstract This thesis considers Efficient Domination, Efficient Edge Domination, Polar- ity, and Monopolarity, graph problems that ask for a vertex or edge subset that is a packing and a covering at the same time. Efficient Domination seeks for an independent vertex subset D such that all other vertices have exactly one neighbor in D. Here, packing means that the vertices of D must not be too close to each other and, in contrast, covering requires that they have to be near to the other vertices. Efficient Edge Domination is the edge version of Efficient Domination. Polarity asks for a vertex subset that induces a complete multipartite graph—the packing aspect—and that contains a vertex of every induced P3—the covering aspect. Monopolarity is the special case of Polarity where the complete multipartite graph has to be edgeless. Since all these problems are NP-complete in general, for each problem a lot of effort has been put into separating the graph classes on which the problem remains NP-complete from those that admit an efficient algorithm. This thesis pursues both directions. On the one hand, we introduce a framework for our NP-completeness proofs and use it to sharpen known results for all mentioned problems.
    [Show full text]
  • A Characterization of General Position Sets in Graphs
    A characterization of general position sets in graphs Bijo S. Anand a Ullas Chandran S. V. b Manoj Changat c Sandi Klavˇzar d;e;f Elias John Thomas g November 16, 2018 a Department of Mathematics, Sree Narayana College, Punalur-691305, Kerala, India; bijos [email protected] b Department of Mathematics, Mahatma Gandhi College, Kesavadasapuram, Thiruvananthapuram-695004, Kerala, India; [email protected] c Department of Futures Studies, University of Kerala Thiruvananthapuram-695034, Kerala, India; [email protected] d Faculty of Mathematics and Physics, University of Ljubljana, Slovenia [email protected] e Faculty of Natural Sciences and Mathematics, University of Maribor, Slovenia f Institute of Mathematics, Physics and Mechanics, Ljubljana, Slovenia g Department of Mathematics, Mar Ivanios College, Thiruvananthapuram-695015, Kerala, India; [email protected] Abstract A vertex subset S of a graph G is a general position set of G if no vertex of S lies on a geodesic between two other vertices of S. The cardinality of a largest general position set of G is the general position number gp(G) of G. It is proved that S ⊆ V (G) is a general position set if and only if the components of G[S] are complete subgraphs, the vertices of which form an in-transitive, distance-constant partition of S. If diam(G) = 2, then gp(G) is the maximum of the clique number of G and the maximum order of an induced complete multipartite subgraph of the complement of G. As a consequence, gp(G) of a cograph G can be determined in polynomial time.
    [Show full text]
  • Diestel: Graph Theory
    Reinhard Diestel Graph Theory Electronic Edition 2000 c Springer-Verlag New York 1997, 2000 This is an electronic version of the second (2000) edition of the above Springer book, from their series Graduate Texts in Mathematics, vol. 173. The cross-references in the text and in the margins are active links: click on them to be taken to the appropriate page. The printed edition of this book can be ordered from your bookseller, or electronically from Springer through the Web sites referred to below. Softcover $34.95, ISBN 0-387-98976-5 Hardcover $69.95, ISBN 0-387-95014-1 Further information (reviews, errata, free copies for lecturers etc.) and electronic order forms can be found on http://www.math.uni-hamburg.de/home/diestel/books/graph.theory/ http://www.springer-ny.com/supplements/diestel/ Preface Almost two decades have passed since the appearance of those graph the- ory texts that still set the agenda for most introductory courses taught today. The canon created by those books has helped to identify some main elds of study and research, and will doubtless continue to inuence the development of the discipline for some time to come. Yet much has happened in those 20 years, in graph theory no less than elsewhere: deep new theorems have been found, seemingly disparate methods and results have become interrelated, entire new branches have arisen. To name just a few such developments, one may think of how the new notion of list colouring has bridged the gulf between invari- ants such as average degree and chromatic number, how probabilistic methods and the regularity lemma have pervaded extremal graph theo- ry and Ramsey theory, or how the entirely new eld of graph minors and tree-decompositions has brought standard methods of surface topology to bear on long-standing algorithmic graph problems.
    [Show full text]
  • Decomposition and Domination of Some Graphs Fairouz Beggas
    Decomposition and Domination of Some Graphs Fairouz Beggas To cite this version: Fairouz Beggas. Decomposition and Domination of Some Graphs. Data Structures and Algorithms [cs.DS]. Université Claude Bernard Lyon 1, 2017. English. tel-02168197 HAL Id: tel-02168197 https://hal.archives-ouvertes.fr/tel-02168197 Submitted on 28 Jun 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. No d'ordre NNT : 2017LYSE1051 THESE` DE DOCTORAT DE L'UNIVERSITE´ DE LYON op´er´eeau sein de l'Universit´eClaude Bernard Lyon 1 Ecole´ Doctorale ED 512 Informatique et Math´ematiques(InfoMaths) Sp´ecialit´ede doctorat : Informatique Soutenue publiquement le 28/03/2017, par : Fairouz BEGGAS Decomposition and Domination of Some Graphs Devant le jury compos´ede : Jean-Luc Baril, Professeur, Universit´ede Bourgogne Rapporteur Lhouari Nourine, Professeur, Universit´eBlaise Pascal Clermont-Ferrand Rapporteur Daniela Grigori, Professeur, Universit´ede Paris Dauphine Examinatrice Salima Benbernou, Professeur, Universit´ede Paris Descartes Examinatrice Norma Zagaglia Salvi, Professeur, Ecole´ polytechnique de Milan Examinatrice Hamamache Kheddouci, Professeur, Universit´eLyon 1 Directeur de th`ese Mohammed Haddad, Maitre de Conf´erences,Universit´eLyon 1 Co-directeur de th`ese iii Acknowledgments I would like to express my deepest gratitude to my advisor, Pr.
    [Show full text]
  • Characterization of General Position Sets and Its Applications to Cographs and Bipartite Graphs
    Characterization of general position sets and its applications to cographs and bipartite graphs Bijo S. Anand a Ullas Chandran S. V. b Manoj Changat c Sandi Klavˇzar d;e;f Elias John Thomas g April 16, 2019 a Department of Mathematics, Sree Narayana College, Punalur-691305, Kerala, India; bijos [email protected] b Department of Mathematics, Mahatma Gandhi College, Kesavadasapuram, Thiruvananthapuram-695004, Kerala, India; [email protected] c Department of Futures Studies, University of Kerala Thiruvananthapuram-695034, Kerala, India; [email protected] d Faculty of Mathematics and Physics, University of Ljubljana, Slovenia [email protected] e Faculty of Natural Sciences and Mathematics, University of Maribor, Slovenia f Institute of Mathematics, Physics and Mechanics, Ljubljana, Slovenia g Department of Mathematics, Mar Ivanios College, Thiruvananthapuram-695015, Kerala, India; [email protected] Abstract A vertex subset S of a graph G is a general position set of G if no vertex of S lies on a geodesic between two other vertices of S. The cardinality of a largest general position set of G is the general position number gp(G) of G. It is proved that S ⊆ V (G) is in general position if and only if the components of G[S] are complete subgraphs, the vertices of which form an in-transitive, distance-constant partition of S. If diam(G) = 2, then gp(G) is the maximum of !(G) and the maximum order of an induced complete multipartite subgraph of the complement of G. As a consequence, gp(G) of a cograph G can be determined in polynomial time.
    [Show full text]
  • Characterizations of Cographs As Intersection Graphs of Paths on a Grid
    Discrete Applied Mathematics 178 (2014) 46–57 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam Characterizations of cographs as intersection graphs of paths on a grid Elad Cohen a,∗, Martin Charles Golumbic a, Bernard Ries b,c a Caesarea Rothschild Institute and Department of Computer Science, University of Haifa, Mount Carmel, Haifa 31905, Israel b PSL, Université Paris-Dauphine, 75775 Paris Cedex 16, France c CNRS, LAMSADE UMR 7243, France article info a b s t r a c t Article history: A cograph is a graph which does not contain any induced path on four vertices. In this Received 20 June 2013 paper, we characterize those cographs that are intersection graphs of paths on a grid in the Received in revised form 18 June 2014 following two cases: (i) the paths on the grid all have at most one bend and the intersections Accepted 25 June 2014 concern edges (! B -EPG); (ii) the paths on the grid are not bended and the intersections Available online 26 July 2014 1 concern vertices (! B0-VPG). In both cases, we give a characterization by a family of forbidden induced subgraphs. We Keywords: further present linear-time algorithms to recognize B -EPG cographs and B -VPG cographs Cograph 1 0 Cotree using their cotree. Grid ' 2014 Elsevier B.V. All rights reserved. Intersection graph Induced subgraph 1. Introduction Edge intersection graphs of paths on a grid (or EPG graphs) are graphs whose vertices can be represented as paths on a rectangular grid such that two vertices are adjacent if and only if the corresponding paths share at least one edge of the grid.
    [Show full text]
  • Graph Theory, an Antiprism Graph Is a Graph That Has One of the Antiprisms As Its Skeleton
    Graph families From Wikipedia, the free encyclopedia Chapter 1 Antiprism graph In the mathematical field of graph theory, an antiprism graph is a graph that has one of the antiprisms as its skeleton. An n-sided antiprism has 2n vertices and 4n edges. They are regular, polyhedral (and therefore by necessity also 3- vertex-connected, vertex-transitive, and planar graphs), and also Hamiltonian graphs.[1] 1.1 Examples The first graph in the sequence, the octahedral graph, has 6 vertices and 12 edges. Later graphs in the sequence may be named after the type of antiprism they correspond to: • Octahedral graph – 6 vertices, 12 edges • square antiprismatic graph – 8 vertices, 16 edges • Pentagonal antiprismatic graph – 10 vertices, 20 edges • Hexagonal antiprismatic graph – 12 vertices, 24 edges • Heptagonal antiprismatic graph – 14 vertices, 28 edges • Octagonal antiprismatic graph– 16 vertices, 32 edges • ... Although geometrically the star polygons also form the faces of a different sequence of (self-intersecting) antiprisms, the star antiprisms, they do not form a different sequence of graphs. 1.2 Related graphs An antiprism graph is a special case of a circulant graph, Ci₂n(2,1). Other infinite sequences of polyhedral graph formed in a similar way from polyhedra with regular-polygon bases include the prism graphs (graphs of prisms) and wheel graphs (graphs of pyramids). Other vertex-transitive polyhedral graphs include the Archimedean graphs. 1.3 References [1] Read, R. C. and Wilson, R. J. An Atlas of Graphs, Oxford, England: Oxford University Press, 2004 reprint, Chapter 6 special graphs pp. 261, 270. 2 1.4. EXTERNAL LINKS 3 1.4 External links • Weisstein, Eric W., “Antiprism graph”, MathWorld.
    [Show full text]