DA18 Abstracts 31
Total Page:16
File Type:pdf, Size:1020Kb
DA18 Abstracts 31 IP1 and indicate some of the techniques and ideas used to tackle Differential Privacy: A Gateway Concept the uncertainty in these problems and get provable guar- antees on the performance of these algorithms. Differential privacy is a definition of privacy tailored to statistical analysis of very large large datasets. Invented Anupam Gupta just over one decade ago, the notion has become widely (if Carnegie Mellon University not deeply) deployed, and yet much remains to be done. [email protected] The theoretical investigation of privacy/accuracy tradeoffs that shaped the field by delineating the boundary between possible and impossible motivate the continued search for IP4 new algorithmic techniques, as well as still meaningful re- Title To Be Determined - Vassilevska Williams laxations of the basic definition. Differential privacy has also led to new approaches in other fields, most notably in Not available. algorithmic fairness and adaptive data analysis, in which the questions being asked of the data depend on the data Virginia Vassilevska Williams themselves. We will highlight some recent algorithmic and MIT definitional work, and focus on differential privacy as a [email protected] gateway concept to these new areas of study. Cynthia Dwork CP0 Harvard University, USA A Generic Framework for Engineering Graph Can- [email protected] onization Algorithms The state-of-the-art tools for practical graph canonization IP2 are all based on the individualization-refinement paradigm, ThePowerofTheoryinthePracticeofHashing and their difference is primarily in the choice of heuristics with Focus on Similarity Estimation they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how Hash functions have become ubiquitous tools in modern individual algorithmic ideas affect the performance on dif- data analysis, e.g., the construction of small randomized ferent graph classes. We present an algorithmic software sketches of large data streams. We like to think of abstract framework that facilitates implementation of heuristics as hash functions, assigning independent uniformly random independent extensions to a common core algorithm. It hash values to keys, but in practice, we have to choose a therefore becomes easy to perform a detailed comparison hash function that only has an element of randomness, e.g., of the performance and behaviour of different algorithmic 2-independence. While this works for sufficiently random ideas. Implementations are provided of a range of algo- input, the real world has structured data where such simple rithms for tree traversal, target cell selection, and node hash functions fail, calling for the need of more powerful invariant, including choices from the literature and new hash functions. In this talk, we focus hashing for set sim- variations. The framework readily supports extraction and ilarity, which is an integral component in the analysis of visualization of detailed data from separate algorithm ex- Big Data. The basic idea is to use the same hash function ecutions for subsequent analysis and development of new to do coordinated sampling from different sets. Depending heuristics. Using collections of different graph classes we on the context, we want subsets sampled without replace- investigate the effect of varying the selections of heuristics, ment, or fixed-length vectors of samples that may be with often revealing exactly which individual algorithmic choice replacement. The latter is used as input to support vec- is responsible for particularly good or bad performance. On tor machines (SVMs) and locality sensitive hashing (LSH). several benchmark collections, including a newly proposed The most efficient constructions require very powerful hash class of difficult instances, we additionally find that our functions that are also needed for efficient size estimation. implementation performs better than the current state-of- We discuss the interplay between the hash functions and the-art tools. the algorithms using them. Finally, we present experi- ments on both real and synthetic data where standard Jakob L. Andersen 2-independent hashing yield systematically poor similar- Research group Bioinformatics and Computational ity estimates, while the right theoretical choice is sharply Biology concentrated, and faster than standard cryptographic hash Faculty of Computer Science, University of Vienna functions with no proven guarantees. [email protected] Mikkel Thorup Daniel Merkle University of Copenhagen Department of Mathematics and Computer Science [email protected] University of Southern Denmark [email protected] IP3 Approximation Algorithms for Uncertain Environ- CP0 ments Computing Top-k Closeness Centrality in Fully- Dynamic Graphs The past decade has seen considerable work on algorithms in models with uncertainty: where either the inputs to the Closeness is a widely-studied centrality measure. Since it algorithm or the algorithm’s actions have some degree of requires all pairwise distances, computing closeness for all uncertainty. Designing algorithms for these settings give nodes is infeasible for large real-world networks. However, rise to new problems and techniques. I will survey some for many applications, it is only necessary to find the k algorithmic models that try to capture uncertainty in op- most central nodes and not all closeness values. Prior work timization problems, talk about some example problems, has shown that computing the top-k nodes with highest 32 DA18 Abstracts closeness can be done much faster than computing close- [email protected] ness for all nodes in real-world networks. However, for networks that evolve over time, no dynamic top-k closeness Sergey Dovgal algorithm exists that improves on static recomputation. In Institut Galil´ee, Universit´e Paris 13 this paper, we present several techniques that allow us to L’IRIF Universit´e Paris 7; Moscow Inst. of Physics and efficiently compute the k nodes with highest (harmonic) Tech closeness after an edge insertion or an edge deletion. Our [email protected] algorithms use information obtained during earlier compu- tations to omit unnecessary work. However, they do not require asymptotically more memory than the static algo- CP0 rithms (i. e., linear in the number of nodes). We propose Scaling Up Group Closeness Maximization separate algorithms for complex networks (which exhibit the small-world property) and networks with large diame- Closeness is a widely-used centrality measure in social net- ter such as street networks, and we compare them against work analysis. While the identification of the k nodes with static recomputation on a variety of real-world networks. highest closeness received significant attention, many ap- On many instances, our dynamic algorithms are two or- plications are actually interested in finding a group of nodes ders of magnitude faster than recomputation; on some large that is central as a whole. For this problem, only recently graphs, we even reach average speedups between 103 and − 4 a greedy algorithm with approximation ratio (1 1/e)has 10 . been proposed [Chen et al., ADC 2016]. Since this algo- rithm’s running time is still expensive for large networks, Patrick Bisenius a heuristic without approximation guarantee has also been n/a proposed in the same paper. In the present paper we de- n/a velop techniques to speed up the greedy algorithm without losing its guarantee. Compared to a straightforward im- Elisabetta Bergamini, Elisabetta Bergamini plementation, our approach is orders of magnitude faster Institute of Theoretical Informatics and, compared to the heuristic proposed by Chen et al., we Karlsruhe Institute of Technology (KIT) always find a solution with better quality in a comparable [email protected], running time in our experiments. Our method Greedy++ [email protected] allows us to approximate the group with maximum close- ness on networks with up to hundreds of millions of edges Eugenio Angriman in at most a few hours. To have the same theoretical guar- Karlsruhe Institute of Technology antee, the greedy approach by [Chen et al., ADC 2016] [email protected] would take several days already on networks with hun- dreds of thousands of edges. In a comparison with the Henning Meyerhenke optimum, our experiments show that the solution found Karlsruhe Institute of Technology (KIT) by Greedy++ is much better than the theoretical guaran- [email protected] tee. Over all tested networks, the empirical approximation ratio is never lower than 0.97. CP0 Elisabetta Bergamini, Elisabetta Bergamini Institute of Theoretical Informatics Polynomial Tuning of Multiparametric Combinato- Karlsruhe Institute of Technology (KIT) rial Samplers [email protected], Boltzmann samplers and the recursive method are promi- [email protected] nent algorithmic frameworks for the approximate-size and exact-size random generation of large combinatorial struc- Tanya Gonser tures, such as maps, tilings, RNA sequences or various tree- Karlsruhe Institute of Technology like structures. In their multiparametric variants, these [email protected] samplers allow to control the profile of expected values cor- responding to multiple combinatorial parameters. One can Henning Meyerhenke control, for instance, the number of leaves, profile of node University of Cologne degrees in trees or the number of certain subpatterns in [email protected] strings. However, such a flexible control requires an