Efficiently Answering Regular Simple Path Queries on Large Labeled

Total Page:16

File Type:pdf, Size:1020Kb

Efficiently Answering Regular Simple Path Queries on Large Labeled Efficiently Answering Regular Simple Path Queries on Large Labeled Networks Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, Srikanta Bedathur IIT Delhi, Dept. of Computer Science and Engineering, New Delhi, India [sarishtwadhwa.1996,anaghprasad95]@gmail.com,[sayanranu,bagchi,srikanta]@cse.iitd.ac.in ABSTRACT 1 INTRODUCTION A fundamental query in labeled graphs is to determine if A fundamental query in labeled graphs is to determine if there there exists a path between a given source and target vertices, exists a path from a given source vertex to a specified target such that the path satisfies a given label constraint. One of vertex, such that the labels encountered along the path form a the powerful forms of specifying label constraints is through string that is from a given regular language. Such queries are regular expressions, and the resulting problem of reachability termed as Regular Path Queries (RPQs) and are often spec- queries under regular simple paths (RSP) form the core of ified with a starting node, a destination node anda regular many practical graph query languages such as SPARQL from expression over the node and edge labels. For example, in a W3C, Cypher of Neo4J, Oracle’s PGQL and LDBC’s G-CORE. PPI, does there exist a pathway from protein P1 to protein P2 Despite its importance, since it is known that answering that proceeds only through either cleavage or covalent bond- RSP queries is NP-Hard, there are no scalable and practical ing interactions? Does there exist a cascade of interactions solutions for answering reachability with full-range of regu- from user U on Twitter to user V such that all intermediate lar expressions as constraints. In this paper, we circumvent nodes are females of age between 20 and 30? Regular path this computational bottleneck by designing a random-walk queries have been studied under different possible semantics: based sampling algorithm called ARRIVAL, which is backed arbitrary, shortest and simple [13]. In this paper, we will focus by theoretical guarantees on its expected quality. Extensive on the evaluation of RPQs under simple path semantics, or experiments on billion-sized real graph datasets with thou- Regular Simple Path Queries (RSPQs), which require that in sands of labels show that ARRIVAL to be 100 times faster a valid path no node is visited more than once. RPQs form than baseline strategies with an average accuracy of 95%. a proper superset of “label-constrained reachability” (LCR) queries (e.g., [11, 21, 23]) wherein the path from source to KEYWORDS destination must use labels belonging to a specified subset of the graph’s label set. reachability query, regular path query, random walks, knowl- Owing to their edge graphs, regular expression Limitations of existing techniques: wide applications, RPQs have been extensively studied [2, ACM Reference Format: 4, 6, 7, 12, 13, 16]. Nonetheless, several challenges, including Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, scalability to large networks, remain open. Srikanta Bedathur. 2019. Efficiently Answering Regular Simple 1. They handle a very small subclass of regular expres- Path Queries on Large Labeled Networks. In 2019 International sions. Existing techniques for RSP queries do not support Conference on Management of Data (SIGMOD ’19), June 30-July 5, most regular expressions as label constraints [10, 11, 21, 23]. 2019, Amsterdam, Netherlands. ACM, New York, NY, USA, 18 pages. Most existing techniques, including the most recent one by https://doi.org/10.1145/3299869.3319882 Valstar et al. [21], take a set of labels L = fa1; ··· ; ak g as input, and every edge (or node) in the path defining the reachability must contain a label from this set L. This constraint reduces Permission to make digital or hard copies of all or part of this work for ∗ personal or classroom use is granted without fee provided that copies are to the regular expression ¹a1j · · · jak º . Such a formulation sig- not made or distributed for profit or commercial advantage and that copies nificantly reduces the expressive power of the user in posing bear this notice and the full citation on the first page. Copyrights for com- effective queries. An analysis of SPARQL query logs on Wiki- ponents of this work owned by others than the author(s) must be honored. data17 knowledge graph reveals that ≈ 35% of the RSP queries Abstracting with credit is permitted. To copy otherwise, or republish, to cannot be expressed through this restricted form [5]. Fan et post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. al. [8] propose a technique that is slightly more generic and SIGMOD ’19, June 30-July 5, 2019, Amsterdam, Netherlands supports a subclass of regular expressions, which are compu- © 2019 Copyright held by the owner/author(s). Publication rights licensed tationally easier to solve. Fletcher et al.[10] replaces Kleene to the Association for Computing Machinery. closure with paths of bounded length recursion. Hence, they ACM ISBN 978-1-4503-5643-5/19/06...$15.00 operate on a restricted set. https://doi.org/10.1145/3299869.3319882 Age: 19 Age: 43 Table 1: Limitations of existing techniques. Gender: Male Gender: Male Algorithm Regular Non- Query- Dynamic v2 v3 Expressions exponential time label Networks growth with definition Age: 26 v7 Gender: Female label space Age: 27 v1 Valstar et al. [21] Only LCR x x x v4 v5 Gender: Male Jin et al. [11] Only LCR x x x Age: 25 Zou et al. [23] Only LCR x x X Age: 32 Gender: Female Gender: Female Fan et al. [8] X (partially) x x x Fletcher et al. [10] X (partially) X x x v6 Koschmieder et al. [12] XX x x Age: 17 ARRIVAL XXXX Gender: Male Figure 1: A sample graph dataset. For simplicity, we as- sume that labels are attached only with nodes. • We develop an index-free, sampling-based algorithm called 2. Costs grow exponentially with number of labels. Ex- ARRIVAL, which is remarkably simple in its approach, and isting techniques to answer LCR queries [8, 11, 21, 23] (which yet backed by theoretical analysis. are a proper subset of RSP queries) employ index structures • We perform extensive experiments on several real graphs whose memory and computation costs grow exponentially containing millions of nodes, billions of edges and thou- with the number of labels. Consequently, they are unable to sands of unique labels. Our empirical evaluation establishes scale to networks containing more than a handful of labels. that ARRIVAL is up to 100 times faster than BBFS. Further- In the real world, this assumption is not true. For example, more, ARRIVAL guarantees a precision of 1, while the recall in a social network, if each node is tagged with the resident is 95% on average. (Sec: 5). country of the corresponding user, the number of unique labels would run into hundreds. 2 PROBLEM FORMULATION 3. Do not support query-time label definition. In some scenarios, the constraints are defined as a function over node There are four pieces of input to the RSPQ problem: the graph, or edge labels at query time. For example, recall the previous the source and destination nodes, and the label constraint. query where one wishes to check if there exists a cascade of Definition 1 (Graph). A multi-labeled graph (directed interactions from user U on Twitter to user V such that all in- or undirected) is a triple G = ¹V; E; Lº, where V is the set of termediate nodes are females of age between 20 and 30. Here, nodes, E ⊆ V ×V is the set of edges, and L is a finite non-empty the constraint is not defined based on the presence or absence set of labels over nodes and edges in the graph. A node or an of an existing label, but as a function that is specified at query edge may be labeled with zero or more labels from the set L. time, whose output is like a new label for that node. Existing We do not assume the graph to adhere to any structural techniques do not support query-time label definitions since property such as acyclicity or strong connectedness. However, they assume that the label set space is known. certain quality guarantees may be provided only when the 4. Unable to handle dynamic networks. Many networks graph satisfies certain conditions. We explicitly mention those today are highly dynamic in nature. For example, new nodes conditions in our proofs. are added to the Wikidata17 knowledge base every day and We use the notation l¹vº and l¹eº to denote the set of labels existing nodes get updated by creating new links among them. present in node v and edge e respectively. Several index-based techniques for LCR reachability [8, 11, 21] assume a static network. This limits their applicability to Example 1. A sample multi-labeled graph is shown in dynamic networks. Fig. 1. In this graph, every node is characterized by two features: Proposed solution: Table 1 summarizes the limitations of the age and the gender. The label of a node is formed by concate- existing techniques, most of which result due to relying on an nating the feature name and feature value. For example, v1’s index structure. This property forces us to ask the following labels are “Age=26” and “Gender=Female”. We use the notation question: Is it possible to develop an index-free, near-optimal al- v:Aдe andv:Gender to note the value of the “Age” and “Gender” gorithm with linear storage and time complexity? In this paper, features of v. For example, v1:Aдe = 26.
Recommended publications
  • Networkx: Network Analysis with Python
    NetworkX: Network Analysis with Python Salvatore Scellato Full tutorial presented at the XXX SunBelt Conference “NetworkX introduction: Hacking social networks using the Python programming language” by Aric Hagberg & Drew Conway Outline 1. Introduction to NetworkX 2. Getting started with Python and NetworkX 3. Basic network analysis 4. Writing your own code 5. You are ready for your project! 1. Introduction to NetworkX. Introduction to NetworkX - network analysis Vast amounts of network data are being generated and collected • Sociology: web pages, mobile phones, social networks • Technology: Internet routers, vehicular flows, power grids How can we analyze this networks? Introduction to NetworkX - Python awesomeness Introduction to NetworkX “Python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.” • Data structures for representing many types of networks, or graphs • Nodes can be any (hashable) Python object, edges can contain arbitrary data • Flexibility ideal for representing networks found in many different fields • Easy to install on multiple platforms • Online up-to-date documentation • First public release in April 2005 Introduction to NetworkX - design requirements • Tool to study the structure and dynamics of social, biological, and infrastructure networks • Ease-of-use and rapid development in a collaborative, multidisciplinary environment • Easy to learn, easy to teach • Open-source tool base that can easily grow in a multidisciplinary environment with non-expert users
    [Show full text]
  • 1 Hamiltonian Path
    6.S078 Fine-Grained Algorithms and Complexity MIT Lecture 17: Algorithms for Finding Long Paths (Part 1) November 2, 2020 In this lecture and the next, we will introduce a number of algorithmic techniques used in exponential-time and FPT algorithms, through the lens of one parametric problem: Definition 0.1 (k-Path) Given a directed graph G = (V; E) and parameter k, is there a simple path1 in G of length ≥ k? Already for this simple-to-state problem, there are quite a few radically different approaches to solving it faster; we will show you some of them. We’ll see algorithms for the case of k = n (Hamiltonian Path) and then we’ll turn to “parameterizing” these algorithms so they work for all k. A number of papers in bioinformatics have used quick algorithms for k-Path and related problems to analyze various networks that arise in biology (some references are [SIKS05, ADH+08, YLRS+09]). In the following, we always denote the number of vertices jV j in our given graph G = (V; E) by n, and the number of edges jEj by m. We often associate the set of vertices V with the set [n] := f1; : : : ; ng. 1 Hamiltonian Path Before discussing k-Path, it will be useful to first discuss algorithms for the famous NP-complete Hamiltonian path problem, which is the special case where k = n. Essentially all algorithms we discuss here can be adapted to obtain algorithms for k-Path! The naive algorithm for Hamiltonian Path takes time about n! = 2Θ(n log n) to try all possible permutations of the nodes (which can also be adapted to get an O?(k!)-time algorithm for k-Path, as we’ll see).
    [Show full text]
  • K-Path Centrality: a New Centrality Measure in Social Networks
    Centrality Metrics in Social Network Analysis K-path: A New Centrality Metric Experiments Summary K-Path Centrality: A New Centrality Measure in Social Networks Adriana Iamnitchi University of South Florida joint work with Tharaka Alahakoon, Rahul Tripathi, Nicolas Kourtellis and Ramanuja Simha Adriana Iamnitchi K-Path Centrality: A New Centrality Measure in Social Networks 1 of 23 Centrality Metrics in Social Network Analysis Centrality Metrics Overview K-path: A New Centrality Metric Betweenness Centrality Experiments Applications Summary Computing Betweenness Centrality Centrality Metrics in Social Network Analysis Betweenness Centrality - how much a node controls the flow between any other two nodes Closeness Centrality - the extent a node is near all other nodes Degree Centrality - the number of ties to other nodes Eigenvector Centrality - the relative importance of a node Adriana Iamnitchi K-Path Centrality: A New Centrality Measure in Social Networks 2 of 23 Centrality Metrics in Social Network Analysis Centrality Metrics Overview K-path: A New Centrality Metric Betweenness Centrality Experiments Applications Summary Computing Betweenness Centrality Betweenness Centrality measures the extent to which a node lies on the shortest path between two other nodes betweennes CB (v) of a vertex v is the summation over all pairs of nodes of the fractional shortest paths going through v. Definition (Betweenness Centrality) For every vertex v 2 V of a weighted graph G(V ; E), the betweenness centrality CB (v) of v is defined by X X σst (v) CB
    [Show full text]
  • The On-Line Shortest Path Problem Under Partial Monitoring
    Journal of Machine Learning Research 8 (2007) 2369-2403 Submitted 4/07; Revised 8/07; Published 10/07 The On-Line Shortest Path Problem Under Partial Monitoring Andras´ Gyor¨ gy [email protected] Machine Learning Research Group Computer and Automation Research Institute Hungarian Academy of Sciences Kende u. 13-17, Budapest, Hungary, H-1111 Tamas´ Linder [email protected] Department of Mathematics and Statistics Queen’s University, Kingston, Ontario Canada K7L 3N6 Gabor´ Lugosi [email protected] ICREA and Department of Economics Universitat Pompeu Fabra Ramon Trias Fargas 25-27 08005 Barcelona, Spain Gyor¨ gy Ottucsak´ [email protected] Department of Computer Science and Information Theory Budapest University of Technology and Economics Magyar Tudosok´ Kor¨ utja´ 2. Budapest, Hungary, H-1117 Editor: Leslie Pack Kaelbling Abstract The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1=pn and depends only polynomially on the number of edges of the graph.
    [Show full text]
  • An Efficient Reachability Indexing Scheme for Large Directed Graphs
    1 Path-Tree: An Efficient Reachability Indexing Scheme for Large Directed Graphs RUOMING JIN, Kent State University NING RUAN, Kent State University YANG XIANG, The Ohio State University HAIXUN WANG, Microsoft Research Asia Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposed for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this paper, we introduce a novel graph structure, referred to as path- tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches. Categories and Subject Descriptors: H.2.8 [Database management]: Database Applications—graph index- ing and querying General Terms: Performance Additional Key Words and Phrases: Graph indexing, reachability queries, transitive closure, path-tree cover, maximal directed spanning tree ACM Reference Format: Jin, R., Ruan, N., Xiang, Y., and Wang, H.
    [Show full text]
  • An Optimal Path Cover Algorithm for Cographs R
    Old Dominion University ODU Digital Commons Computer Science Faculty Publications Computer Science 1995 An Optimal Path Cover Algorithm for Cographs R. Lin S. Olariu Old Dominion University Follow this and additional works at: https://digitalcommons.odu.edu/computerscience_fac_pubs Part of the Applied Mathematics Commons, and the Computer Sciences Commons Repository Citation Lin, R. and Olariu, S., "An Optimal Path Cover Algorithm for Cographs" (1995). Computer Science Faculty Publications. 118. https://digitalcommons.odu.edu/computerscience_fac_pubs/118 Original Publication Citation Lin, R., Olariu, S., & Pruesse, G. (1995). An optimal path cover algorithm for cographs. Computers & Mathematics with Applications, 30(8), 75-83. doi:10.1016/0898-1221(95)00139-p This Article is brought to you for free and open access by the Computer Science at ODU Digital Commons. It has been accepted for inclusion in Computer Science Faculty Publications by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. Computers Math. Applic. Vol. 30, No. 8, pp. 75-83, 1995 Pergamon Copyright©1995 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-1221/95 $9.50 + 0.00 0898-1221(95)00139-$ An Optimal Path Cover Algorithm for Cographs* R. LIN Department of Computer Science SUNY at Geneseo, Geneseo, NY 14454, U.S.A. S. OLARIU Department of Computer Science Old Dominion University, Norfolk, VA 23529-0162, U.S.A. G. PRUESSE Department of Computer Science and Electrical Engineering University of Vermont, Burlington, VT 05405-0156, U.S.A. (Received November 1993; accepted March 1995) Abstract--The class of cographs, or complement-reducible graphs, arises naturally in many dif- ferent areas of applied mathematics and computer science.
    [Show full text]
  • PDF Reference
    NetworkX Reference Release 1.0 Aric Hagberg, Dan Schult, Pieter Swart January 08, 2010 CONTENTS 1 Introduction 1 1.1 Who uses NetworkX?..........................................1 1.2 The Python programming language...................................1 1.3 Free software...............................................1 1.4 Goals...................................................1 1.5 History..................................................2 2 Overview 3 2.1 NetworkX Basics.............................................3 2.2 Nodes and Edges.............................................4 3 Graph types 9 3.1 Which graph class should I use?.....................................9 3.2 Basic graph types.............................................9 4 Operators 129 4.1 Graph Manipulation........................................... 129 4.2 Node Relabeling............................................. 134 4.3 Freezing................................................. 135 5 Algorithms 137 5.1 Boundary................................................. 137 5.2 Centrality................................................. 138 5.3 Clique.................................................. 142 5.4 Clustering................................................ 145 5.5 Cores................................................... 147 5.6 Matching................................................. 148 5.7 Isomorphism............................................... 148 5.8 PageRank................................................. 161 5.9 HITS..................................................
    [Show full text]
  • Single-Source Bottleneck Path Algorithm Faster Than Sorting For
    Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs 1 1 1 1 Ran Duan ∗ , Kaifeng Lyu † , Hongxun Wu ‡ , and Yuanhang Xie § 1Institute for Interdisciplinary Information Sciences, Tsinghua University Abstract In a directed graph G = (V, E) with a capacity on every edge, a bottleneck path (or widest path) between two vertices is a path maximizing the minimum capacity of edges in the path. For the single-source all-destination version of this problem in directed graphs, the previous best algorithm runs in O(m+n log n) (m = E and n = V ) time, by Dijkstra search with Fibonacci heap [Fredman and Tarjan 1987]. We| improve| this| time| bound to O(m√log n), thus it is the first algorithm which breaks the time bound of classic Fibonacci heap when m = o(n√log n). It is a Las-Vegas randomized approach. By contrast, the s-t bottleneck path has an algorithm with k running time O(mβ(m,n)) [Chechik et al. 2016], where β(m,n) = min k 1 : log( ) n m . { ≥ ≤ n } 1 Introduction The bottleneck path problem is a graph optimization problem finding a path between two ver- tices with the maximum flow, in which the flow of a path is defined as the minimum capacity of edges on that path. The bottleneck problem can be seen as a mathematical formulation of many network routing problems, e.g. finding the route with the maximum transmission speed between two nodes in a network, and it has many other applications such as digital compositing [FGA98]. It is also the important building block of other algorithms, such as the improved Ford-Fulkerson algorithm [EK72, FF56], and k-splittable flow algorithm [BKS02].
    [Show full text]
  • Dynamic Programming
    Dynamic Programming Comp 122, Fall 2004 Review: the previous lecture • Principles of dynamic programming: optimization problems, optimal substructure property, overlapping subproblems, trade space for time, implementation via bottom-up/memoization. • Example of Match Game. • Example of Fibinacci: from Divide and Conquer (exponential time complexity) to Dynamic Programming (O(n) time and space) , then to more efficient dynamic programming (constant space, O(logn) time) • Example of LCS in O(m n) time and space. Today we will show how to reduce the space complexity of LCS to O(n) and in future lectures we will show how to reduce the time complexity of LCS. Comp 122, Spring 2004 Longest Common Subsequence • Problem: Given 2 sequences, X = x1,...,xm and Y = y1,...,yn, find a common subsequence whose length is maximum. X: springtime Y: printing LCS(X,Y): printi Comp 122, Spring 2004 0 if empty or empty, c[, ] c[ prefix, prefix ]1 if end() end( ), max(c[ prefix, ],c[, prefix ]) if end() end( ). p r i n t i n g •Keep track of c[,] in a table of nm 0 0 0 0 0 0 0 0 0 entries: s 0 0 0 0 0 0 0 0 0 •top/down: increasing row order p 0 1 1 1 1 1 1 1 1 r 0 1 2 2 •within each row left-to-right: increasing column order i n g t i m Comp 122,e Spring 2004 0 if empty or empty, c[, ] c[ prefix, prefix ]1 if end() end( ), max(c[ prefix, ],c[, prefix ]) if end() end( ).
    [Show full text]
  • Longest Path in a Directed Acyclic Graph (DAG)
    Longest path in a directed acyclic graph (DAG) Mumit Khan CSE 221 April 10, 2011 The longest path problem is the problem of finding a simple path of maximal length in a graph; in other words, among all possible simple paths in the graph, the problem is to find the longest one. For an unweighted graph, it suffices to find the longest path in terms of the number of edges; for a weighted graph, one must use the edge weights instead. To explain the process of developing this algorithm, we’ll first develop an algorithm for computing the single-source longest path for an unweighted directed acyclic graph (DAG), and then generalize that to compute the longest path in a DAG, both unweighted or weighted. We’ve already seen how to compute the single-source shortest path in a graph, cylic or acyclic — we used BFS to compute the single-source shortest paths for an unweighted graph, and used Dijkstra (non-negative edge weights only) or Bellman-Ford (negative edge weights allowed) for a weighted graph without negative cycles. If we needed the shortest path between all pairs, we could always run the single-source shortest path algorithm using each vertex as the source; but of course there is Floyd-Warshall ready to do just that. The use of dynamic programming or a greedy algorithm is made possible by the fact the shortest path problem has optimal substructure — the subpaths in the shortest path between two vertices must themselves be the shortest ones! When it comes to finding the longest path however, we find that the problem does not have optimal substructure, and we lose the ability to use dynamic programming (and of course greedy strategy as well!).
    [Show full text]
  • Depth-First Search, Topological Sort
    And, for the hous is crinkled to and fro, And hath so queinte weyes for to go— For hit is shapen as the mase is wroght— Therto have I a remedie in my thoght, That, by a clewe of twyne, as he hath goon, The same wey he may returne anoon, Folwing alwey the threed, as he hath come. — Geoffrey Chaucer, The Legend of Good Women (c. 1385) Una lámpara ilustraba el andén, pero las caras de los niños quedaban en la zona de la sombra. Uno me interrogó: ¿Usted va a casa del doctor Stephen Albert?. Sin aguardar contestación, otro dijo: La casa queda lejos de aquí, pero usted no se perderá si toma ese camino a la izquierda y en cada encrucijada del camino dobla a la izquierda. — Jorge Luis Borges, “El jardín de senderos que se bifurcan” (1941) “Com’è bello il mondo e come sono brutti i labirinti!” dissi sollevato. “Come sarebbe bello il mondo se ci fosse una regola per girare nei labirinti,” rispose il mio maestro. — Umberto Eco, Il nome della rosa (1980) Chapter 6 Depth-First Search Status: Beta. Could add more, but don’t. In the previous chapter, we considered a generic algorithm—whatever-first search— for traversing arbitrary graphs, both undirected and directed. In this chapter, we focus on a particular instantiation of this algorithm called depth-first search, and primarily on the behavior of this algorithm in directed graphs. Rather than using an explicit stack, depth-first search is normally implemented recursively as follows: DFS(v): if v is unmarked mark v for each edge v w DFS(w) We can make this algorithm slightly faster (in practice) by checking whether a node is marked before we recursively explore it.
    [Show full text]
  • Generalized Pseudoforest Deletion: Algorithms and Uniform Kernel⋆
    Generalized Pseudoforest Deletion: Algorithms and Uniform Kernel? Geevarghese Philip1, Ashutosh Rai2, Saket Saurabh2;3?? 1 Chennai Mathematical Institute, Chennai, India. [email protected] 2 The Institute of Mathematical Sciences, HBNI, Chennai, India. fashutosh|[email protected] 3 University of Bergen, Bergen, Norway. Abstract. Feedback Vertex Set (FVS) is one of the most well stud- ied problems in the realm of parameterized complexity. In this problem we are given a graph G and a positive integer k and the objective is to test whether there exists S ⊆ V (G) of size at most k such that G − S is a forest. Thus, FVS is about deleting as few vertices as possible to get a forest. The main goal of this paper is to study the following interesting problem: How can we generalize the family of forests such that the nice structural properties of forests and the interesting algorithmic proper- ties of FVS can be extended to problems on this class? Towards this we define a graph class, Fl, that contains all graphs where each connected component can transformed into forest by deleting at most l edges. The class F1 is known as pseudoforest in the literature and we call Fl as l-pseudoforest. We study the problem of deleting k vertices to get into Fl, l-pseudoforest Deletion, in the realm of parameterized complex- ity. We show that l-pseudoforest Deletion admits an algorithm with k O(1) 2 running time cl n and admits a kernel of size f(l)k . Thus, for every fixed l we have a kernel of size O(k2).
    [Show full text]