Quadratic Time Algorithms Appear to Be Optimal for Sorting Evolving Data
Total Page:16
File Type:pdf, Size:1020Kb
Quadratic Time Algorithms Appear to be Optimal for Sorting Evolving Data Juan Jos´eBesa William E. Devanny David Eppstein Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Univ. of California, Irvine Univ. of California, Irvine Univ. of California, Irvine Irvine, CA 92697 USA Irvine, CA 92697 USA Irvine, CA 92697 USA [email protected] [email protected] [email protected] Michael T. Goodrich Timothy Johnson Dept. of Computer Science Dept. of Computer Science Univ. of California, Irvine Univ. of California, Irvine Irvine, CA 92697 USA Irvine, CA 92697 USA [email protected] [email protected] Abstract 1.1 The Evolving Data Model One scenario We empirically study sorting in the evolving data model. where the Knuthian model doesn't apply is for In this model, a sorting algorithm maintains an ap- applications where the input data is changing while an proximation to the sorted order of a list of data items algorithm is processing it, which has given rise to the while simultaneously, with each comparison made by the evolving data model [1]. In this model, input changes algorithm, an adversary randomly swaps the order of are coming so fast that it is hard for an algorithm to adjacent items in the true sorted order. Previous work keep up, much like Lucy in the classic conveyor belt 1 studies only two versions of quicksort, and has a gap scene in the TV show, I Love Lucy. Thus, rather than between the lower bound of Ω(n) and the best upper produce a single output, as in the Knuthian model, bound of O(n log log n). The experiments we perform an algorithm in the evolving data model dynamically in this paper provide empirical evidence that some maintains an output instance over time. The goal of quadratic-time algorithms such as insertion sort and an algorithm in this model is to efficiently maintain its bubble sort are asymptotically optimal for any constant output instance so that it is \close" to the true output rate of random swaps. In fact, these algorithms perform at that time. Therefore, analyzing an algorithm in as well as or better than algorithms such as quicksort the evolving data model involves defining a distance that are more efficient in the traditional algorithm metric for output instances, parameterizing how input analysis model. mutations occur over time, and characterizing how well arXiv:1805.05443v1 [cs.DS] 14 May 2018 an algorithm can maintain its output instance relative 1 Introduction to this distance and these parameters for various types of input mutations. In the traditional Knuthian model [11], an algorithm The goal of a sorting algorithm in the evolving data takes an input, runs for some amount of time, and model, then, is to maintain an output order close to the produces an output. Characterizing an algorithm in this true total order even as it is mutating. For example, the model typically involves providing a function, f(n), such list could be an ordering of political candidates, tennis that the running time of the algorithm can be bounded players, movies, songs, or websites, which is changing as asymptotically as being O(f(n)) on average, with high the ranking of these items is evolving, e.g., see [6]. In probability (w.h.p.), or in the worst case. Although this such contexts, performing a comparison of two elements has proven to be an extremely useful model for general is considered somewhat slow and expensive, in that algorithm design and analysis, there are nevertheless it might involve a debate between two candidates, a interesting scenarios where it doesn't apply. 1E.g., see https://www.youtube.com/watch?v=8NPzLBSBzPI. Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited match between two players, or an online survey or A/B aware of previous experimental work on sorting in the test [5] comparing two movies, songs, or websites. In evoloving data model. We are also not aware of previous this model, a comparison-based sorting algorithm would work, in general, in the evolving data model for other therefore be executing at a rate commensurate with sorting algorithms or for other types of adversaries. the speed in which the total order is changing, i.e., its In addition to this work on sorting, several pa- mutation rate. Formally, to model this phenomenon, pers have considered other problems in the evolving each time an algorithm performs a comparison, we allow data model. Kanade et al. [9] study stable matching for an adversary to perform some changes to the true with evolving preferences. Huang et al. [8] present ordering of the elements. how to select the top-k elements from an evolving There are several different adversaries one might list. Zhang and Li [12] consider how to find short- consider with respect to mutations that would be antic- est paths in an evolving graph. Anagnostopoulos et ipated after a sorting algorithm performs a comparison. al. [2] study (s; t)-connectivity and minimum spanning For example, an adversary (who we call the uniform trees in evolving graphs. Bahmani et al. [3] give adversary) might choose r > 0 consecutive pairs of several PageRank algorithms for evolving graphs and elements in the true total order uniformly at random they analyze these algorithms both theoretically and and swap their relative order. Indeed, previous work [1] experimentally. To our knowledge, this work is the only provided theoretical analyses for the uniform adversary previous experimental work for the evolving data model. for the case when r is a constant. Another type of adversary (who we call the hot spot adversary) might 1.3 Our Results In this paper, we provide an ex- choose an element, x, in the total order and repeatedly perimental investigation of sorting in the evolving data swap x with a predecessor or successor each time a model. The starting point for our work is the previous random \coin flip” comes up \tails," not stopping until theoretical work [1] on sorting in the evolving data the coin comes up \heads." model, which only studies quicksort. Thus, our first A natural distance metric to use in this context is result is to report on experiments that address whether the Kendall tau distance [10], which counts the number these previous theoretical results actually reflect real- of inversions between a total ordering of n distinct world performance. elements and a given list of those elements. That is, In addition, we experimentally investigate a num- a natural goal of a sorting algorithm in the evolving ber of other classic sorting algorithms to empirically data model is to minimize the Kendall tau distance for study whether these algorithms lead to good sorting its output list over time. algorithms for evolving data and to study how sensitive Here, we consider the empirical performance of they are to parameters in the underlying evolving data sorting algorithms in the evolving data model. Whereas model. Interestingly, our experiments provide empir- previous work looked only at quicksort with respect to ical evidence that quadratic-time sorting algorithms, theoretical analyses against the uniform adversary, we including bubble sort, cocktail sort, and insertion sort, are interested in this paper in the \real-world" perfor- can outperform more sophisticated algorithms, such as mance of a variety of sorting algorithms with respect to quicksort and even the batched parallel blocked quick- multiple types of adversaries in the evolving data model. sort algorithm of Anagnostopoulos et al. [1], in practice. Of particular interest are any experimental results that Furthermore, our results also show that even though might be counter-intuitive or would highlight gaps in these quadratic-time algorithms perform compare-and- the theory. swap operations only for adjacent elements at each time step, they are nevertheless robust to increasing the rate, 1.2 Previous Work on Evolving Data The evolv- r, at which random swaps occur in the underlying list for ing data model was introduced by Anagnostopoulos et every comparison done by the algorithm. That is, our al. [1], who study sorting and selection problems with results show that these quadratic-time algorithms are respect to an evolving total order. They prove that robust even in the face of an adversary who can perform quicksort maintains a Kendall tau distance of O(n log n) many swaps for each of an algorithm's comparisons. w.h.p. with respect to the true total order, against Moreover, this robustness holds in spite of the fact that, a uniform adversary that performs a small constant in such highly volatile situations, each element is, on number of random swaps for every comparison made average, moved more often by random swaps than by by the algorithm. Furthermore, they show that a the sorting algorithm's operations. batched collection of quicksort algorithms operating on We also introduce the hot spot adversary and study overlapping blocks can maintain a Kendall tau distance sorting algorithms in the evolving data model with of O(n log log n) against this same adversary. We are not respect to this adversary. Our experiments provide Copyright c 2018 by SIAM Unauthorized reproduction of this article is prohibited evidence that these sorting algorithms have similar is repeated independently for a total of r > 0 swaps. robustness against the hot spot adversary as they do We call each such change to `0 a swap mutation. With against the uniform adversary. Finally, we show that the respect to the hot spot adversary, we refer to the change constant factors in the Kendall tau distances maintained made to `0 as a hot spot mutation, i.e., where an element by classic quadratic-time sorting algorithms appear to is picked uniformly at random and flips an unbiased coin be quite reasonable in practice.