Aggregating Inconsistent Information: Ranking and Clustering
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
An Efficient Implementation of Sugiyama's Algorithm for Layered
Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 9, no. 3, pp. 305–325 (2005) An Efficient Implementation of Sugiyama’s Algorithm for Layered Graph Drawing Markus Eiglsperger Z¨uhlke Engineering AG 8952 Schlieren, Switzerland Martin Siebenhaller WSI f¨urInformatik, Universit¨at T¨ubingen Sand 13, 72076 T¨ubingen, Germany [email protected] Michael Kaufmann WSI f¨urInformatik, Universit¨at T¨ubingen Sand 13, 72076 T¨ubingen, Germany [email protected] Abstract Sugiyama’s algorithm for layered graph drawing is very popular and commonly used in practical software. The extensive use of dummy vertices to break long edges between non-adjacent layers often leads to unsatis- fying performance. The worst-case running-time of Sugiyama’s approach is O(|V ||E| log |E|) requiring O(|V ||E|) memory, which makes it unusable for the visualization of large graphs. By a conceptually simple new tech- nique we are able to keep the number of dummy vertices and edges linear in the size of the graph without increasing the number of crossings. We reduce the worst-case time complexity of Sugiyama’s approach by an order of magnitude to O((|V | + |E|) log |E|) requiring O(|V | + |E|) space. Article Type Communicated by Submitted Revised Regular paper E. R. Gansner and J. Pach November 2004 July 2005 This work has partially been supported by the DFG-grant Ka512/8-2. It has been performed when the first author was with the Universit¨atT¨ubingen. Eiglsperger et al., Implementing Sugiyama’s Alg., JGAA, 9(3) 305–325 (2005)306 1 Introduction Most approaches for drawing directed graphs used in practice follow the frame- work developed by Sugiyama et al. -
Sorting Heuristics for the Feedback Arc Set Problem
Sorting Heuristics for the Feedback Arc Set Problem Franz J. Brandenburg and Kathrin Hanauer Department of Informatics and Mathematics, University of Passau fbrandenb;[email protected] Technical Report, Number MIP-1104 Department of Informatics and Mathematics University of Passau, Germany February 2011 Sorting Heuristics for the Feedback Arc Set Problem? Franz J. Brandenburg and Kathrin Hanauer University of Passau, Germany fbrandenb,[email protected] Abstract. The feedback arc set problem plays a prominent role in the four-phase framework to draw directed graphs, also known as the Sugiyama algorithm. It is equivalent to the linear arrangement problem where the vertices of a graph are ordered from left to right and the backward arcs form the feedback arc set. In this paper we extend classical sorting algorithms to heuristics for the feedback arc set problem. Established algorithms are considered from this point of view, where the directed arcs between vertices serve as binary comparators. We analyze these algorithms and afterwards design hybrid algorithms by their composition in order to gain further improvements. These algorithms primarily differ in the use of insertion sort and sifting and they are very similar in their performance, which varies by about 0:1%. The differences mainly lie in their run time and their convergence to a local minimum. Our studies extend related work by new algorithms and our experiments are conducted on much larger graphs. Overall we can conclude that sifting performs better than insertion sort. 1 Introduction The feedback arc set problem (FAS) asks for a minimum sized subset of arcs of a directed graph whose removal or reversal makes the graph acyclic. -
FOCS 2005 Program SUNDAY October 23, 2005
FOCS 2005 Program SUNDAY October 23, 2005 Talks in Grand Ballroom, 17th floor Session 1: 8:50am – 10:10am Chair: Eva´ Tardos 8:50 Agnostically Learning Halfspaces Adam Kalai, Adam Klivans, Yishay Mansour and Rocco Servedio 9:10 Noise stability of functions with low influences: invari- ance and optimality The 46th Annual IEEE Symposium on Elchanan Mossel, Ryan O’Donnell and Krzysztof Foundations of Computer Science Oleszkiewicz October 22-25, 2005 Omni William Penn Hotel, 9:30 Every decision tree has an influential variable Pittsburgh, PA Ryan O’Donnell, Michael Saks, Oded Schramm and Rocco Servedio Sponsored by the IEEE Computer Society Technical Committee on Mathematical Foundations of Computing 9:50 Lower Bounds for the Noisy Broadcast Problem In cooperation with ACM SIGACT Navin Goyal, Guy Kindler and Michael Saks Break 10:10am – 10:30am FOCS ’05 gratefully acknowledges financial support from Microsoft Research, Yahoo! Research, and the CMU Aladdin center Session 2: 10:30am – 12:10pm Chair: Satish Rao SATURDAY October 22, 2005 10:30 The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics Tutorials held at CMU University Center into `1 [Best paper award] Reception at Omni William Penn Hotel, Monongahela Room, Subhash Khot and Nisheeth Vishnoi 17th floor 10:50 The Closest Substring problem with small distances Tutorial 1: 1:30pm – 3:30pm Daniel Marx (McConomy Auditorium) Chair: Irit Dinur 11:10 Fitting tree metrics: Hierarchical clustering and Phy- logeny Subhash Khot Nir Ailon and Moses Charikar On the Unique Games Conjecture 11:30 Metric Embeddings with Relaxed Guarantees Break 3:30pm – 4:00pm Ittai Abraham, Yair Bartal, T-H. -
Hierarchical Clustering with Global Objectives: Approximation Algorithms and Hardness Results
HIERARCHICAL CLUSTERING WITH GLOBAL OBJECTIVES: APPROXIMATION ALGORITHMS AND HARDNESS RESULTS ADISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Evangelos Chatziafratis June 2020 © 2020 by Evangelos Chatziafratis. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/bb164pj1759 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Tim Roughgarden, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Moses Charikar, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Li-Yang Tan I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Gregory Valiant Approved for the Stanford University Committee on Graduate Studies. Stacey F. Bent, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. -
An Exact Method for the Minimum Feedback Arc Set Problem
AN EXACT METHOD FOR THE MINIMUM FEEDBACK ARC SET PROBLEM ALI BAHAREV, HERMANN SCHICHL, ARNOLD NEUMAIER, TOBIAS ACHTERBERG Abstract. Given a directed graph G, a feedback arc set of G is a subset of its edges containing at least one edge of every cycle in G. Finding a feedback arc set of minimum cardinality is the minimum feedback arc set problem. The minimum set cover formulation of the minimum feedback arc set problem is appropriate as long as all the simple cycles in G can be enumerated. Unfortunately, even sparse graphs can have W(2n) simple cycles, and such graphs appear in practice. An exact method is proposed for sparse graphs that enumerates simple cycles in a lazy fashion, and extends an incomplete cycle matrix iteratively. In all cases encountered so far, only a tractable number of cycles has to be enumerated until a minimum feedback arc set is found. Numerical results are given on a test set containing computationally challenging sparse graphs, relevant for industrial applications. Key words. minimum feedback arc set, maximum acyclic subgraph, minimum feedback vertex set, linear ordering problem, tearing 1. Introduction. A directed graph G is a pair (V;E) of finite sets, the vertices V and the edges E V V. It is a simple graph if it has no multiple edges or self-loops (edges of ⊆ × the form (v;v)). Let G = (V;E) denote a simple directed graph, and let n = V denote the j j number of vertices (nodes), and m = E denote the number of edges. A subgraph H of the j j graph G is said to be induced if, for every pair of vertices u and v of H, (u;v) is an edge of H if and only if (u;v) is an edge of G. -
Constraint Clustering and Parity Games
Constrained Clustering Problems and Parity Games Clemens Rösner geboren in Ulm Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Bonn 2019 1. Gutachter: Prof. Dr. Heiko Röglin 2. Gutachterin: Prof. Dr. Anne Driemel Tag der mündlichen Prüfung: 05. September 2019 Erscheinungsjahr: 2019 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Abstract Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. We study several clustering objectives. We begin with studying the Euclidean k-center problem. The k-center problem is a classical combinatorial optimization problem which asks to select k centers and assign each input point in a set P to one of the centers, such that the maximum distance of any input point to its assigned center is minimized. The Euclidean k-center problem assumes that the input set P is a subset of a Euclidean space and that each location in the Euclidean space can be chosen as a center. We focus on the special case with k = 1, the smallest enclosing ball problem: given a set of points in m-dimensional Euclidean space, find the smallest sphere enclosing all the points. We combine known results about convex optimization with structural properties of the smallest enclosing ball to create a new algorithm. We show that on instances with rational coefficients our new algorithm computes the exact center of the optimal solutions and has a worst-case run time that is polynomial in the size of the input. -
A Generalization of the Directed Graph Layering Problem
A Generalization of the Directed Graph Layering Problem Ulf R¨uegg1, Thorsten Ehlers1, Miro Sp¨onemann2, and Reinhard von Hanxleden1 1 Dept. of Computer Science, Kiel University, Kiel, Germany furu,the,[email protected] 2 TypeFox GmbH, Kiel, Germany [email protected] Abstract. The Directed Layering Problem (DLP) solves a step of the widely used layer-based approach to automatically draw directed acyclic graphs. To cater for cyclic graphs, usually a preprocessing step is used that solves the Feedback Arc Set Problem (FASP) to make the graph acyclic before a layering is determined. Here we present the Generalized Layering Problem (GLP), which solves the combination of DLP and FASP simultaneously, allowing general graphs as input. We present an integer programming model and a heuris- tic to solve the NP-complete GLP and perform thorough evaluations on different sets of graphs and with different implementations for the steps of the layer-based approach. We observe that GLP reduces the number of dummy nodes significantly, can produce more compact drawings, and improves on graphs where DLP yields poor aspect ratios. Keywords: layer-based layout, layer assignment, linear arrangement, feedback arc set, integer programming 1 Introduction The layer-based approach is a well-established and widely used method to au- tomatically draw directed graphs. It is based on the idea to assign nodes to subsequent layers that show the inherent direction of the graph, see Fig. 1a for an example. The approach was introduced by Sugiyama et al. [19] and remains arXiv:1608.07809v1 [cs.OH] 28 Aug 2016 a subject of ongoing research. -
Implementation of Locality Sensitive Hashing Techniques
Implementation of Locality Sensitive Hashing Techniques Project Report submitted in partial fulfillment of the requirement for the degree of Bachelor of Technology. in Computer Science & Engineering under the Supervision of Dr. Nitin Chanderwal By Srishti Tomar(111210) to Jaypee University of Information and TechnologyWaknaghat, Solan – 173234, Himachal Pradesh i Certificate This is to certify that project report entitled “Implementaion of Locality Sensitive Hashing Techniques”, submitted by Srishti Tomar in partial fulfillment for the award of degree of Bachelor of Technology in Computer Science & Engineering to Jaypee University of Information Technology, Waknaghat, Solan has been carried out under my supervision. This work has not been submitted partially or fully to any other University or Institute for the award of this or any other degree or diploma. Date: Supervisor’s Name: Dr. Nitin Chanderwal Designation : Associate Professor ii Acknowledgement I am highly indebted to Jaypee University of Information Technology for their guidance and constant supervision as well as for providing necessary information regarding the project & also for their support in completing the project. I would like to express my gratitude towards my parents & Project Guide for their kind co-operation and encouragement which help me in completion of this project. I would like to express my special gratitude and thanks to industry persons for giving me such attention and time. My thanks and appreciations also go to my colleague in developing the project and people who have willingly helped me out with their abilities Date: Name of the student: Srishti Tomar iii Table of Content S. No. Topic Page No. 1. Abstract 1 2. -
On the Unique Games Conjecture
On the Unique Games Conjecture Subhash Khot Courant Institute of Mathematical Sciences, NYU ∗ Abstract This article surveys recently discovered connections between the Unique Games Conjec- ture and computational complexity, algorithms, discrete Fourier analysis, and geometry. 1 Introduction Since it was proposed in 2002 by the author [60], the Unique Games Conjecture (UGC) has found many connections between computational complexity, algorithms, analysis, and geometry. This article aims to give a survey of these connections. We first briefly describe how these connections play out, which are then further explored in the subsequent sections. Inapproximability: The main motivation for the conjecture is to prove inapproximability results for NP-complete problems that researchers have been unable to prove otherwise. The UGC states that a specific computational problem called the Unique Game is inapproximable. A gap-preserving reduction from the Unique Game problem then implies inapproximability results for other NP- complete problems. Such a reduction was exhibited in [60] for the Min-2SAT-Deletion problem and it was soon followed by a flurry of reductions for various problems, in some cases using variants of the conjecture. Discrete Fourier Analysis: The inapproximability reductions from the Unique Game problem often use gadgets constructed from a boolean hypercube (also used with a great success in earlier works [18, 49, 51]). The reductions can be alternately viewed as constructions of Probabilistically Checkable Proofs (PCPs) and the gadgets can be viewed as probabilistic checking procedures to check whether a given codeword is a Long Code. Fourier analytic theorems on hypercube play a crucial role in ensuring that the gadgets indeed \work". -
Model Checking Large Design Spaces: Theory, Tools, and Experiments
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2020 Model checking large design spaces: Theory, tools, and experiments Rohit Dureja Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Recommended Citation Dureja, Rohit, "Model checking large design spaces: Theory, tools, and experiments" (2020). Graduate Theses and Dissertations. 18304. https://lib.dr.iastate.edu/etd/18304 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Model checking large design spaces: Theory, tools, and experiments by Rohit Dureja A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Kristin Y. Rozier, Co-major Professor Gianfranco Ciardo, Co-major Professor Samik Basu Robyn Lutz Hridesh Rajan The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2020 Copyright © Rohit Dureja, 2020. All rights reserved. ii DEDICATION To my family. iii TABLE OF CONTENTS LIST OF FIGURES . vi LIST OF TABLES . .x ABSTRACT . xi CHAPTER 1. INTRODUCTION . .1 1.1 Motivation . -
Detecting Near-Duplicates for Web Crawling
WWW 2007 / Track: Data Mining Session: Similarity Search Detecting Near-Duplicates for Web Crawling ∗ Gurmeet Singh Manku Arvind Jain Anish Das Sarma Google Inc. Google Inc. Stanford University [email protected] [email protected] [email protected] ABSTRACT Documents that are exact duplicates of each other (due to Near-duplicate web documents are abundant. Two such mirroring and plagiarism) are easy to identify by standard documents differ from each other in a very small portion checksumming techniques. A more difficult problem is the that displays advertisements, for example. Such differences identification of near-duplicate documents. Two such docu- are irrelevant for web search. So the quality of a web crawler ments are identical in terms of content but differ in a small increases if it can assess whether a newly crawled web page portion of the document such as advertisements, counters is a near-duplicate of a previously crawled web page or not. and timestamps. These differences are irrelevant for web In the course of developing a near-duplicate detection sys- search. So if a newly-crawled page Pduplicate is deemed a tem for a multi-billion page repository, we make two research near-duplicate of an already-crawled page P , the crawl en- contributions. First, we demonstrate that Charikar's finger- gine should ignore Pduplicate and all its out-going links (in- printing technique is appropriate for this goal. Second, we tuition suggests that these are probably near-duplicates of P 1 present an algorithmic technique for identifying existing f- pages reachable from ). Elimination of near-duplicates bit fingerprints that differ from a given fingerprint in at most saves network bandwidth, reduces storage costs and im- k bit-positions, for small k. -
Linear Programming Based Approximation Algorithms for Feedback Set Problems in Bipartite Tournaments
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Theoretical Computer Science 412 (2011) 2556–2561 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Linear programming based approximation algorithms for feedback set problems in bipartite tournamentsI Anke van Zuylen Institute for Theoretical Computer Science, Tsinghua University, Beijing, China article info a b s t r a c t Keywords: We consider the feedback vertex set and feedback arc set problems on bipartite Feedback vertex set tournaments. We improve on recent results by giving a 2-approximation algorithm for Feedback arc set the feedback vertex set problem. We show that this result is the best that we can attain Bipartite tournament Approximation algorithm when using optimal solutions to a certain linear program as a lower bound on the optimal Linear programming value. For the feedback arc set problem on bipartite tournaments, we show that a recent 4-approximation algorithm proposed by Gupta (2008) [8] is incorrect. We give an alternative 4-approximation algorithm based on an algorithm for the feedback arc set on (non-bipartite) tournaments given by van Zuylen and Williamson (2009) [14]. ' 2010 Elsevier B.V. All rights reserved. 1. Introduction We consider the feedback vertex set problem and the feedback arc set problem on bipartite tournaments. The feedback vertex set problem on a directed graph G D .V ; A/ asks for a set of vertices V 0 of minimum size such that the subgraph of G induced by V nV 0 is acyclic.