A Constant Factor Approximation Algorithm for Fault-Tolerant K- Median

Total Page:16

File Type:pdf, Size:1020Kb

A Constant Factor Approximation Algorithm for Fault-Tolerant K- Median 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014) Portland, Oregon, USA 5-7 January 2014 Volume 1 of 2 Editor: C. Chekuri ISBN: 978-1-5108- 1330-4 Printed from e-media with permission by: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 Some format issues inherent in the e-media version may also appear in this print version. Copyright© (2014) by the Association for Computing Machinery, Inc. and the Society for Industrial and Applied Mathematics. All rights reserved. Printed by Curran Associates, Inc. (2015) For permission requests, please contact SIAM: Society for Industrial and Applied Mathematics at the address below. SIAM 3600 Market Street, 6th Floor Philadelphia, PA 19104-2688 USA Phone: (215) 382-9800 Fax: (215) 386-7999 [email protected] Additional copies of this publication are available from: Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA Phone: 845-758-0400 Fax: 845-758-2634 Email: [email protected] Web: www.proceedings.com TABLE OF CONTENTS VOLUME 1 A CONSTANT FACTOR APPROXIMATION ALGORITHM FOR FAULT-TOLERANT K- MEDIAN ................................................................................................................................................................................1 Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, Shi Li, Barna Saha IMPROVED APPROXIMATION ALGORITHM FOR TWO-DIMENSIONAL BIN PACKING ............................. 13 Nikhil Bansal, Arindam Khan A MAZING 2+ ε APPROXIMATION FOR UNSPLITTABLE FLOW ON A PATH .................................................. 26 Aris Anagnostopoulos, Fabrizio Grandoni, Stefano Leonardi, Reas Wiese BETTER APPROXIMATION BOUNDS FOR THE JOINT REPLENISHMENT PROBLEM ................................. 42 Marcin Bienkowski, Jaroslaw Byrka, Marek Chrobak, Lukasz Jez, Dorian Nogneng, Jiri Sgall BETTER ALGORITHMS AND HARDNESS FOR BROADCAST SCHEDULING VIA A DISCREPANCY APPROACH........................................................................................................................................... 55 Nikhil Bansal, Moses Charikar, Ravishankar Krishnaswamy, Shi Li AN EXCLUDED GRID THEOREM FOR DIGRAPHS WITH FORBIDDEN MINORS............................................ 72 Ken-Ichi Kawarabayashi, Stephan Kreutzer FINDING SMALL PATTERNS IN PERMUTATIONS IN LINEAR TIME................................................................. 82 Sylvain Guillemot, Daniel Marx MINIMUM COMMON STRING PARTITION PARAMETERIZED BY PARTITION SIZE IS FIXED-PARAMETER TRACTABLE ............................................................................................................................ 102 Laurent Bulteau, Christian Komusiewicz INTERVAL DELETION IS FIXED-PARAMETER TRACTABLE............................................................................ 122 Yixin Cao, Dániel Marx EFFICIENT COMPUTATION OF REPRESENTATIVE SETS WITH APPLICATIONS IN PARAMETERIZED AND EXACT ALGORITHMS..................................................................................................... 142 Fedor V. Fomin, Daniel Lokshtanov, Saket Saurabh ON THE COMPUTATIONAL COMPLEXITY OF BETTI NUMBERS: REDUCTIONS FROM MATRIX RANK................................................................................................................................................................ 152 H. Edelsbrunne, S. Parsa IMPLICIT MANIFOLD RECONSTRUCTION ............................................................................................................ 161 Siu-Wing Cheng, Man-Kwun Chiu APPROXIMATING LOCAL HOMOLOGY FROM SAMPLES................................................................................. 174 Primoz Skraba, Bei Wang ROBUST SATISFIABILITY OF SYSTEMS OF EQUATIONS................................................................................... 193 Peter Franek, Marek Krcál SOLVING 1-LAPLACIANS IN NEARLY LINEAR TIME: COLLAPSING AND EXPANDING A TOPOLOGICAL BALL ................................................................................................................................................... 204 Michael B. Cohen, Brittany Terese Fasy, Gary L. Miller, Amir Nayyeri, Richard Peng, Noel Walkington AN ALMOST-LINEAR-TIME ALGORITHM FOR APPROXIMATE MAX FLOW IN UNDIRECTED GRAPHS, ITS MULTICOMMODITY GENERALIZATIONS ........................................................ 217 Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, Aaron Sidford COMPUTING CUT-BASED HIERARCHICAL DECOMPOSITIONS IN ALMOST LINEAR TIME................... 227 Harald Räcke, Chintan Shah, Hanjo Täubig NEAR LINEAR TIME APPROXIMATION SCHEMES FOR UNCAPACITATED AND CAPACITATED B–MATCHING PROBLEMS IN NONBIPARTITE GRAPHS ...................................................... 239 Kook Jin Ahn, Sudipto Guha IMPROVED BOUNDS AND ALGORITHMS FOR GRAPH CUTS AND NETWORK RELIABILITY................................................................................................................................................................... 259 David G. Harris, Aravind Srinivasan TOWARDS (1 + ε )-APPROXIMATE FLOW SPARSIFIERS ..................................................................................... 279 Alexandr Andoni, Anupam Gupta, Robert Krauthgamer UNIFORM RANDOM SAMPLING OF SIMPLE BRANCHED COVERINGS OF THE SPHERE BY ITSELF............................................................................................................................................................................... 294 Enrica Duchi, Dominique Poulalhon, Gilles Schaeffer MCMC SAMPLING COLOURINGS AND INDEPENDENT SETS OF G(N, D/N) NEAR UNIQUENESS THRESHOLD ......................................................................................................................................... 305 Charilaos Efthymiou ARBORICITY AND SPANNING-TREE PACKING IN RANDOM GRAPHS WITH AN APPLICATION TO LOAD BALANCING ..................................................................................................................... 317 Pu Gao, Xavier Pérez-Giménez, Cristiane M. Sato CLUSTERING AND MIXING TIMES FOR SEGREGATION MODELS ON Z 2.................................................... 327 Prateek Bhakta, Sarah Miracle, Dana Randall A SIMPLE FPTAS FOR COUNTING EDGE COVERS............................................................................................... 341 Chengyu Lin, Jingcheng Liu, Pinyan Lu SPACE COMPLEXITY OF LIST H-COLOURING: A DICHOTOMY ..................................................................... 349 László Egri, Pavol Hell, Benoit Larose, Arash Rafiey POSITIVITY PROBLEMS FOR LOW-ORDER LINEAR RECURRENCE SEQUENCES..................................... 366 Joël Ouaknine, James Worrell POLYNOMIAL SOLVABILITY OF VARIANTS OF THE TRUST-REGION SUBPROBLEM............................. 380 Daniel Bienstock, Alexander Michalka ON THE LATTICE ISOMORPHISM PROBLEM........................................................................................................ 391 Ishay Haviv, Oded Regev THE COMPLEXITY OF ORDER TYPE ISOMORPHISM......................................................................................... 405 Greg Aloupis, John Iacono, Stefan Langerman, Özgür Ozkan, Stefanie Wuhrer DYNAMIC TASK ALLOCATION IN ASYNCHRONOUS SHARED MEMORY .................................................... 416 Dan Alistarh, James Aspnes, Michael A. Bender, Rati Gelashvili, Seth Gilbert COMPETITIVE ANALYSIS VIA REGULARIZATION ............................................................................................. 436 Niv Buchbinder, Shahar Chen, Joseph Naor FIRST COME FIRST SERVED FOR ONLINE SLOT ALLOCATION AND HUFFMAN CODING ..................... 445 Monik Khare, Claire Mathieu, Neal E. Young ONLINE STEINER TREE WITH DELETIONS........................................................................................................... 455 Anupam Gupta, Amit Kumar MAINTAINING ASSIGNMENTS ONLINE: MATCHING, SCHEDULING, FLOWS............................................. 468 Anupam Gupta, Amit Kumar, Cliff Stein (NEARLY) SAMPLE-OPTIMAL SPARSE FOURIER TRANSFORM ...................................................................... 480 Piotr Indyk, Michael Kapralov, Eric Price LEARNING SPARSE POLYNOMIAL FUNCTIONS................................................................................................... 500 Alexandr Andoni, Rina Panigrahy, Gregory Valiant, Li Zhang LEARNING ENTANGLED SINGLE-SAMPLE GAUSSIANS .................................................................................... 511 Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi EXPLOITING METRIC STRUCTURE FOR EFFICIENT PRIVATE QUERY RELEASE.................................... 523 Zhiyi Huang, Aaron Roth ON THE COMPATIBILITY OF QUARTET TREES................................................................................................... 535 Noga Alon, Sagi Snir, Raphael Yuster A NEW PERSPECTIVE ON VERTEX CONNECTIVITY .......................................................................................... 546 Keren Censor-Hillel, Mohsen Ghaffari, Fabian Kuhn PACKING A-PATHS IN GROUP-LABELLED GRAPHS VIA LINEAR MATROID PARITY.............................. 562 Yutaro Yamaguchi INDEPENDENT SET IN P5-FREE GRAPHS IN POLYNOMIAL TIME .................................................................. 570 Daniel Lokshantov, Martin Vatshelle, Yngve Villanger LARGE INDUCED SUBGRAPHS VIA TRIANGULATIONS AND CMSO .............................................................. 582 Fedor V. Fomin, Ioan Todinca, Yngve Villanger COUNTING THIN SUBGRAPHS VIA PACKINGS FASTER THAN MEET-IN-THE-MIDDLE
Recommended publications
  • FOCS 2005 Program SUNDAY October 23, 2005
    FOCS 2005 Program SUNDAY October 23, 2005 Talks in Grand Ballroom, 17th floor Session 1: 8:50am – 10:10am Chair: Eva´ Tardos 8:50 Agnostically Learning Halfspaces Adam Kalai, Adam Klivans, Yishay Mansour and Rocco Servedio 9:10 Noise stability of functions with low influences: invari- ance and optimality The 46th Annual IEEE Symposium on Elchanan Mossel, Ryan O’Donnell and Krzysztof Foundations of Computer Science Oleszkiewicz October 22-25, 2005 Omni William Penn Hotel, 9:30 Every decision tree has an influential variable Pittsburgh, PA Ryan O’Donnell, Michael Saks, Oded Schramm and Rocco Servedio Sponsored by the IEEE Computer Society Technical Committee on Mathematical Foundations of Computing 9:50 Lower Bounds for the Noisy Broadcast Problem In cooperation with ACM SIGACT Navin Goyal, Guy Kindler and Michael Saks Break 10:10am – 10:30am FOCS ’05 gratefully acknowledges financial support from Microsoft Research, Yahoo! Research, and the CMU Aladdin center Session 2: 10:30am – 12:10pm Chair: Satish Rao SATURDAY October 22, 2005 10:30 The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics Tutorials held at CMU University Center into `1 [Best paper award] Reception at Omni William Penn Hotel, Monongahela Room, Subhash Khot and Nisheeth Vishnoi 17th floor 10:50 The Closest Substring problem with small distances Tutorial 1: 1:30pm – 3:30pm Daniel Marx (McConomy Auditorium) Chair: Irit Dinur 11:10 Fitting tree metrics: Hierarchical clustering and Phy- logeny Subhash Khot Nir Ailon and Moses Charikar On the Unique Games Conjecture 11:30 Metric Embeddings with Relaxed Guarantees Break 3:30pm – 4:00pm Ittai Abraham, Yair Bartal, T-H.
    [Show full text]
  • Distance-Sensitive Hashing∗
    Distance-Sensitive Hashing∗ Martin Aumüller Tobias Christiani IT University of Copenhagen IT University of Copenhagen [email protected] [email protected] Rasmus Pagh Francesco Silvestri IT University of Copenhagen University of Padova [email protected] [email protected] ABSTRACT tight up to lower order terms. In particular, we extend existing Locality-sensitive hashing (LSH) is an important tool for managing LSH lower bounds, showing that they also hold in the asymmetric high-dimensional noisy or uncertain data, for example in connec- setting. tion with data cleaning (similarity join) and noise-robust search (similarity search). However, for a number of problems the LSH CCS CONCEPTS framework is not known to yield good solutions, and instead ad hoc • Theory of computation → Randomness, geometry and dis- solutions have been designed for particular similarity and distance crete structures; Data structures design and analysis; • Informa- measures. For example, this is true for output-sensitive similarity tion systems → Information retrieval; search/join, and for indexes supporting annulus queries that aim to report a point close to a certain given distance from the query KEYWORDS point. locality-sensitive hashing; similarity search; annulus query In this paper we initiate the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions ACM Reference Format: Martin Aumüller, Tobias Christiani, Rasmus Pagh, and Francesco Silvestri. such that the probability of two points having the same hash value is 2018. Distance-Sensitive Hashing. In Proceedings of 2018 International Confer- a given function of the distance between them. More precisely, given ence on Management of Data (PODS’18).
    [Show full text]
  • Arxiv:2102.08942V1 [Cs.DB]
    A Survey on Locality Sensitive Hashing Algorithms and their Applications OMID JAFARI, New Mexico State University, USA PREETI MAURYA, New Mexico State University, USA PARTH NAGARKAR, New Mexico State University, USA KHANDKER MUSHFIQUL ISLAM, New Mexico State University, USA CHIDAMBARAM CRUSHEV, New Mexico State University, USA Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular techniques for finding approximate nearest neighbor searches in high-dimensional spaces. The main benefits of LSH are its sub-linear query performance and theoretical guarantees on the query accuracy. In this survey paper, we provide a review of state-of-the-art LSH and Distributed LSH techniques. Most importantly, unlike any other prior survey, we present how Locality Sensitive Hashing is utilized in different application domains. CCS Concepts: • General and reference → Surveys and overviews. Additional Key Words and Phrases: Locality Sensitive Hashing, Approximate Nearest Neighbor Search, High-Dimensional Similarity Search, Indexing 1 INTRODUCTION Finding nearest neighbors in high-dimensional spaces is an important problem in several diverse applications, such as multimedia retrieval, machine learning, biological and geological sciences, etc. For low-dimensions (< 10), popular tree-based index structures, such as KD-tree [12], SR-tree [56], etc. are effective, but for higher number of dimensions, these index structures suffer from the well-known problem, curse of dimensionality (where the performance of these index structures is often out-performed even by linear scans) [21]. Instead of searching for exact results, one solution to address the curse of dimensionality problem is to look for approximate results.
    [Show full text]
  • Hierarchical Clustering with Global Objectives: Approximation Algorithms and Hardness Results
    HIERARCHICAL CLUSTERING WITH GLOBAL OBJECTIVES: APPROXIMATION ALGORITHMS AND HARDNESS RESULTS ADISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Evangelos Chatziafratis June 2020 © 2020 by Evangelos Chatziafratis. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/bb164pj1759 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Tim Roughgarden, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Moses Charikar, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Li-Yang Tan I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Gregory Valiant Approved for the Stanford University Committee on Graduate Studies. Stacey F. Bent, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format.
    [Show full text]
  • SIGMOD Flyer
    DATES: Research paper SIGMOD 2006 abstracts Nov. 15, 2005 Research papers, 25th ACM SIGMOD International Conference on demonstrations, Management of Data industrial talks, tutorials, panels Nov. 29, 2005 June 26- June 29, 2006 Author Notification Feb. 24, 2006 Chicago, USA The annual ACM SIGMOD conference is a leading international forum for database researchers, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. We invite the submission of original research contributions as well as proposals for demonstrations, tutorials, industrial presentations, and panels. We encourage submissions relating to all aspects of data management defined broadly and particularly ORGANIZERS: encourage work that represent deep technical insights or present new abstractions and novel approaches to problems of significance. We especially welcome submissions that help identify and solve data management systems issues by General Chair leveraging knowledge of applications and related areas, such as information retrieval and search, operating systems & Clement Yu, U. of Illinois storage technologies, and web services. Areas of interest include but are not limited to: at Chicago • Benchmarking and performance evaluation Vice Gen. Chair • Data cleaning and integration Peter Scheuermann, Northwestern Univ. • Database monitoring and tuning PC Chair • Data privacy and security Surajit Chaudhuri, • Data warehousing and decision-support systems Microsoft Research • Embedded, sensor, mobile databases and applications Demo. Chair Anastassia Ailamaki, CMU • Managing uncertain and imprecise information Industrial PC Chair • Peer-to-peer data management Alon Halevy, U. of • Personalized information systems Washington, Seattle • Query processing and optimization Panels Chair Christian S. Jensen, • Replication, caching, and publish-subscribe systems Aalborg University • Text search and database querying Tutorials Chair • Semi-structured data David DeWitt, U.
    [Show full text]
  • Lower Bounds on Lattice Sieving and Information Set Decoding
    Lower bounds on lattice sieving and information set decoding Elena Kirshanova1 and Thijs Laarhoven2 1Immanuel Kant Baltic Federal University, Kaliningrad, Russia [email protected] 2Eindhoven University of Technology, Eindhoven, The Netherlands [email protected] April 22, 2021 Abstract In two of the main areas of post-quantum cryptography, based on lattices and codes, nearest neighbor techniques have been used to speed up state-of-the-art cryptanalytic algorithms, and to obtain the lowest asymptotic cost estimates to date [May{Ozerov, Eurocrypt'15; Becker{Ducas{Gama{Laarhoven, SODA'16]. These upper bounds are useful for assessing the security of cryptosystems against known attacks, but to guarantee long-term security one would like to have closely matching lower bounds, showing that improvements on the algorithmic side will not drastically reduce the security in the future. As existing lower bounds from the nearest neighbor literature do not apply to the nearest neighbor problems appearing in this context, one might wonder whether further speedups to these cryptanalytic algorithms can still be found by only improving the nearest neighbor subroutines. We derive new lower bounds on the costs of solving the nearest neighbor search problems appearing in these cryptanalytic settings. For the Euclidean metric we show that for random data sets on the sphere, the locality-sensitive filtering approach of [Becker{Ducas{Gama{Laarhoven, SODA 2016] using spherical caps is optimal, and hence within a broad class of lattice sieving algorithms covering almost all approaches to date, their asymptotic time complexity of 20:292d+o(d) is optimal. Similar conditional optimality results apply to lattice sieving variants, such as the 20:265d+o(d) complexity for quantum sieving [Laarhoven, PhD thesis 2016] and previously derived complexity estimates for tuple sieving [Herold{Kirshanova{Laarhoven, PKC 2018].
    [Show full text]
  • Constraint Clustering and Parity Games
    Constrained Clustering Problems and Parity Games Clemens Rösner geboren in Ulm Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Bonn 2019 1. Gutachter: Prof. Dr. Heiko Röglin 2. Gutachterin: Prof. Dr. Anne Driemel Tag der mündlichen Prüfung: 05. September 2019 Erscheinungsjahr: 2019 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn Abstract Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. We study several clustering objectives. We begin with studying the Euclidean k-center problem. The k-center problem is a classical combinatorial optimization problem which asks to select k centers and assign each input point in a set P to one of the centers, such that the maximum distance of any input point to its assigned center is minimized. The Euclidean k-center problem assumes that the input set P is a subset of a Euclidean space and that each location in the Euclidean space can be chosen as a center. We focus on the special case with k = 1, the smallest enclosing ball problem: given a set of points in m-dimensional Euclidean space, find the smallest sphere enclosing all the points. We combine known results about convex optimization with structural properties of the smallest enclosing ball to create a new algorithm. We show that on instances with rational coefficients our new algorithm computes the exact center of the optimal solutions and has a worst-case run time that is polynomial in the size of the input.
    [Show full text]
  • Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality
    THEORY OF COMPUTING, Volume 8 (2012), pp. 321–350 www.theoryofcomputing.org SPECIAL ISSUE IN HONOR OF RAJEEV MOTWANI Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality Sariel Har-Peled∗ Piotr Indyk † Rajeev Motwani ‡ Received: August 20, 2010; published: July 16, 2012. Abstract: We present two algorithms for the approximate nearest neighbor problem in d high-dimensional spaces. For data sets of size n living in R , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors’ STOC’98 and FOCS’01 papers. It unifies, generalizes and simplifies the results from those papers. ACM Classification: F.2.2 AMS Classification: 68W25 Key words and phrases: approximate nearest neighbor, high dimensions, locality-sensitive hashing 1 Introduction The nearest neighbor (NN) problem is defined as follows: Given a set P of n points in a metric space defined over a set X with distance function D, preprocess P to efficiently answer queries for finding the point in P closest to a query point q 2 X. A particularly interesting case is that of the d-dimensional d Euclidean space where X = R under some `s norm. This problem is of major importance in several ∗Supported by NSF CAREER award CCR-0132901 and AF award CCF-0915984. †Supported by a Stanford Graduate Fellowship, NSF Award CCR-9357849 and NSF CAREER award CCF-0133849.
    [Show full text]
  • Implementation of Locality Sensitive Hashing Techniques
    Implementation of Locality Sensitive Hashing Techniques Project Report submitted in partial fulfillment of the requirement for the degree of Bachelor of Technology. in Computer Science & Engineering under the Supervision of Dr. Nitin Chanderwal By Srishti Tomar(111210) to Jaypee University of Information and TechnologyWaknaghat, Solan – 173234, Himachal Pradesh i Certificate This is to certify that project report entitled “Implementaion of Locality Sensitive Hashing Techniques”, submitted by Srishti Tomar in partial fulfillment for the award of degree of Bachelor of Technology in Computer Science & Engineering to Jaypee University of Information Technology, Waknaghat, Solan has been carried out under my supervision. This work has not been submitted partially or fully to any other University or Institute for the award of this or any other degree or diploma. Date: Supervisor’s Name: Dr. Nitin Chanderwal Designation : Associate Professor ii Acknowledgement I am highly indebted to Jaypee University of Information Technology for their guidance and constant supervision as well as for providing necessary information regarding the project & also for their support in completing the project. I would like to express my gratitude towards my parents & Project Guide for their kind co-operation and encouragement which help me in completion of this project. I would like to express my special gratitude and thanks to industry persons for giving me such attention and time. My thanks and appreciations also go to my colleague in developing the project and people who have willingly helped me out with their abilities Date: Name of the student: Srishti Tomar iii Table of Content S. No. Topic Page No. 1. Abstract 1 2.
    [Show full text]
  • Model Checking Large Design Spaces: Theory, Tools, and Experiments
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2020 Model checking large design spaces: Theory, tools, and experiments Rohit Dureja Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Recommended Citation Dureja, Rohit, "Model checking large design spaces: Theory, tools, and experiments" (2020). Graduate Theses and Dissertations. 18304. https://lib.dr.iastate.edu/etd/18304 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Model checking large design spaces: Theory, tools, and experiments by Rohit Dureja A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Kristin Y. Rozier, Co-major Professor Gianfranco Ciardo, Co-major Professor Samik Basu Robyn Lutz Hridesh Rajan The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2020 Copyright © Rohit Dureja, 2020. All rights reserved. ii DEDICATION To my family. iii TABLE OF CONTENTS LIST OF FIGURES . vi LIST OF TABLES . .x ABSTRACT . xi CHAPTER 1. INTRODUCTION . .1 1.1 Motivation .
    [Show full text]
  • Detecting Near-Duplicates for Web Crawling
    WWW 2007 / Track: Data Mining Session: Similarity Search Detecting Near-Duplicates for Web Crawling ∗ Gurmeet Singh Manku Arvind Jain Anish Das Sarma Google Inc. Google Inc. Stanford University [email protected] [email protected] [email protected] ABSTRACT Documents that are exact duplicates of each other (due to Near-duplicate web documents are abundant. Two such mirroring and plagiarism) are easy to identify by standard documents differ from each other in a very small portion checksumming techniques. A more difficult problem is the that displays advertisements, for example. Such differences identification of near-duplicate documents. Two such docu- are irrelevant for web search. So the quality of a web crawler ments are identical in terms of content but differ in a small increases if it can assess whether a newly crawled web page portion of the document such as advertisements, counters is a near-duplicate of a previously crawled web page or not. and timestamps. These differences are irrelevant for web In the course of developing a near-duplicate detection sys- search. So if a newly-crawled page Pduplicate is deemed a tem for a multi-billion page repository, we make two research near-duplicate of an already-crawled page P , the crawl en- contributions. First, we demonstrate that Charikar's finger- gine should ignore Pduplicate and all its out-going links (in- printing technique is appropriate for this goal. Second, we tuition suggests that these are probably near-duplicates of P 1 present an algorithmic technique for identifying existing f- pages reachable from ). Elimination of near-duplicates bit fingerprints that differ from a given fingerprint in at most saves network bandwidth, reduces storage costs and im- k bit-positions, for small k.
    [Show full text]
  • Scalable Nearest Neighbor Search for Optimal Transport∗
    Scalable Nearest Neighbor Search for Optimal Transport∗ Arturs Backursy Yihe Dong Piotr Indyk Ilya Razenshteyn Tal Wagner TTIC Microsoft MIT Microsoft Research MIT September 30, 2020 Abstract The Optimal Transport (a.k.a. Wasserstein) distance is an increasingly popular similarity measure for rich data domains, such as images or text documents. This raises the necessity for fast nearest neighbor search algorithms according to this distance, which poses a substantial computational bottleneck on massive datasets. In this work we introduce Flowtree, a fast and accurate approximation algorithm for the Wasserstein- 1 distance. We formally analyze its approximation factor and running time. We perform extensive experimental evaluation of nearest neighbor search algorithms in the W1 distance on real-world dataset. Our results show that compared to previous state of the art, Flowtree achieves up to 7:4 times faster running time. 1 Introduction Given a finite metric space M = (X; dX ) and two distributions µ and ν on X, the Wasserstein-1 distance (a.k.a. Earth Mover's Distance or Optimal Transport) between µ and ν is defined as X W1(µ, ν) = min τ(x1; x2) · dX (x1; x2); (1) τ x1;x22X where the minimum is taken over all distributions τ on X × X whose marginals are equal to µ and ν.1 The Wasserstein-1 distance and its variants are heavily used in applications to measure similarity in structured data domains, such as images [RTG00] and natural language text [KSKW15]. In particular, [KSKW15] proposed the Word Mover Distance (WMD) for text documents. Each document is seen as a uniform distri- bution over the words it contains, and the underlying metric between words is given by high-dimensional word embeddings such as word2vec [MSC+13] or GloVe [PSM14].
    [Show full text]