A New Mathematical Model for Unsorted Database Search

Total Page:16

File Type:pdf, Size:1020Kb

A New Mathematical Model for Unsorted Database Search A new mathematical model for unsorted database search Vivek Kumar1 and Sandeep Sharma2 1,2Department of Electronics and Communication Engineering, Dehradun Institute of Technology, Dehradun-248009, Uttarakhand, India [email protected], [email protected] An unsorted database consists of N records, out of which only one is of particular interest. Implementation of classical sequential search and quantum search algorithms on such database gives an upper bound complexity of O(푵) and O(√푵) respectively. We hereby describe a new approach which deploys simple arithmetic and comparison operations to search for a record in an upper bound space and time complexity of O(푵) and O(√푵) respectively. A record in a sorted database can be searched either by binary search algorithms or hash tables which results in faster searching output. [1] However, most of the real world data occur in random format and therefore before searching, sorting of the database elements is implemented. Meanwhile, the lower bound complexity of any known sorting algorithm reaches to O(푁) e.g. timsort[2], cubesort[3], shellsort[4] etc. or log(N) in the case of binary search which means that to perform any searching operations, sorting algorithm complexity has to be included with it. But, in the case of unsorted database only classical sequential search is practiced at an asymptotic complexity of O(푁). Here we present a new approach which covers complete database during searching for a particular record within an upper bound asymptotic space-time complexity of O(푁) and O(√푁) respectively. Consider a database of size N onto which a loop is maintained to traverse, fetching √푁 records at a time and incrementing its value by a factor of √푁 on each new run. Thus, traversing the complete database in √푁 runs. We derive a condition (see Methods for derivation) which investigates presence of the particular record in fetched records in an asymptotic complexity of O(√푁) per √푁 runs. The condition works as a judging tool to identify either any of the fetched record is of our interest or not. If any of the √푁 records satisfies the condition then the loop breaks and another loop starts which traverses those √푁 records, comparing each record to the one to be searched sequentially in O(√푁). Hence, final time complexity of the algorithm becomes O(√푁) × O(√푁) + O(√푁) = O(N) + O(√푁) = O(N) a total of O(N). For simple understanding and demonstration of the algorithm, we replace (√푁) by 푁⁄2. Now, consider a record c to be searched along with content storing functions f (푁⁄2) and g (푁⁄2), where on each fresh run, single value is stored in each respective function giving a space complexity of O(1). By these conventions we deliver a comparison condition as mentioned below. 2 2 2 2 푁 푁 ( (푓 ( )) − 푐2) ( (푔 ( )) − 푐2) 2 2 ( ) ( ) ( ) ( )( ⁄ ) 푓푙표푎푡 (1⁄1 + 푖푛푡 ( 2 2 ) + 푖푛푡 ( 2 2 )) != 푓푙표푎푡 1 3 푁 푁 ( (푓 ( )) − 푐2) −1 ( (푔 ( )) − 푐2) −1 2 2 2 2 2 2 푁 푁 ( (푓 ( )) − 푐2) ( (푔 ( )) − 푐2) 2 2 ( ⁄ ) The factor ( 2 2 ) is represented by F 푁 2 and the factor ( 2 2 ) is 푁 푁 ( (푓 ( )) − 푐2) −1 ( (푔 ( )) − 푐2) −1 2 2 represented by G(푁⁄2). Both the functions are typecast with keyword integer (푖푛푡) such that the results of F(푁⁄2) and G(푁⁄2) aren’t a decimal equivalent. Furthermore, the final outcome on LHS and RHS has to be a decimal equivalent and hence, uses keyword float (푓푙표푎푡) typecasting. This complete condition is called at each fresh run of the loop (here,(푁⁄2) times) and can only be satisfied true if either of the two functions f (푁⁄2) or g (푁⁄2) consists of the required record c. Once satisfied, the program breaks and enter into another loop of sequential checking of each (푁⁄2) records. Hence, retrieving the position of required record within an asymptotic time complexity of O(푁/4) {However, O(푁/4) ≈ O(푁)}. The proposed algorithm works well for both positive and negative numbers. But, concerns arise with decimal record which is kept in future work. Therefore, it’s not recommended currently for decimal record databases. Now, let us bring the final generalized comparison condition expression; 2 2 ( (푓 (√푁)) − 푐2) ( ) ∑ ( ) ( ) ⁄ 푓푙표푎푡 (1⁄1 + √푁 ( 푖푛푡 ( 2 2 ))) != 푓푙표푎푡 (1 1 + √푁) ( (푓 (√푁)) − 푐2) −1 This function checks for √푁 records at each fresh run with an asymptotic space complexity of O(N); giving a final space-time complexity of O(N) and O(√푁) as stated. References [1] Knuth 1998, §6.2 ("Searching by Comparison Of Keys") [2] Peters, Tim. "[Python-Dev] Sorting". https://mail.python.org/pipermail/python-dev/2002- July/026837.html. Retrieved 2 June 2016 [3] Robert Cypher, Jorge L.C Sanz (1992), Cubesort: A parallel algorithm for sorting N data items with S-sorters [4] Pratt, Vaughhan Ronald (1979). Shellsort and Sorting Networks. Garland. ISBN 0-8240- 4406-1. Methods Derivation of algorithm complexity: The proposed algorithm constitutes of two loops out of which first consists of proposed comparison condition and later consists of sequential comparison condition. We start by analyzing the complexity of each loop successively. Given below is the complete proposed algorithm: Initialize i, j, rec Given an unsorted database array A Retrieve record to be searched and save in rec // First loop (Consisting of proposed comparison condition) for i = 0; undergoing √푁 times runs; incrementing i by √푁 times 2 푡ℎ 2 2 ( (퐴[푖+√푁 ]) − 푟푒푐2) ( (퐴[푖])2 − 푟푒푐2) if (푓푙표푎푡) 1⁄1 + (푖푛푡) ( 2 2 2 ) + ⋯ + (푖푛푡) ( 2 ) ( (퐴[푖]) − 푟푒푐 ) −1 푡ℎ 2 ( (퐴[푖+√푁 ]) − 푟푒푐2) −1 ( ) != (푓푙표푎푡)(1⁄1 + √푁) break end if end for // Second loop (Consisting of sequential comparison condition) for j = 0; undergoing √푁 runs; incrementing j by 1 if A[i] == rec Record found at ith position break end if incrementing i by 1 end for Complexity analysis of former loop: The total runs performed by the loop ignoring the inner condition are equal to √푁 giving first complexity factor as √푁. Next, the proposed condition is responsible for major change in both space and time complexity. Since, it has to occupy √푁 space for storing those much records for analysis, therefore, asymptotic space complexity becomes O(√푁). Next, to analyze complexity of factors like (((퐴[푖])2 − 푟푒푐2)2⁄((퐴[푖])2 − 푟푒푐2)2 − 1), we have to remove the constants (here, rec and 1), which reforms the factor in the terms of 퐴[푖]4 equal in both numerator and denominator. Hence, a single factor is responsible for O(1), however, √푁 such factors are summed up which finally gives complexity of O((N+√푁)/2) = O(N). To attain the overall complexity of former loop, we perform multiplication of both complexities; O(√푁) × O(√푁) = O(푁) Complexity analysis of later loop: The later loop consists of sequential comparison of √푁 records. Hence, giving an asymptotic space complexity of O(1) and asymptotic time complexity of O(√푁). Overall, the asymptotic space complexity becomes O(√푁) + O(1) = O(√푁) and asymptotic time complexity becomes O(N) + O(√푁) = O(N). .
Recommended publications
  • Secure Multi-Party Sorting and Applications
    Secure Multi-Party Sorting and Applications Kristján Valur Jónsson1, Gunnar Kreitz2, and Misbah Uddin2 1 Reykjavik University 2 KTH—Royal Institute of Technology Abstract. Sorting is among the most fundamental and well-studied problems within computer science and a core step of many algorithms. In this article, we consider the problem of constructing a secure multi-party computing (MPC) protocol for sorting, building on previous results in the field of sorting networks. Apart from the immediate uses for sorting, our protocol can be used as a building-block in more complex algorithms. We present a weighted set intersection algorithm, where each party inputs a set of weighted ele- ments and the output consists of the input elements with their weights summed. As a practical example, we apply our protocols in a network security setting for aggregation of security incident reports from multi- ple reporters, specifically to detect stealthy port scans in a distributed but privacy preserving manner. Both sorting and weighted set inter- section use O`n log2 n´ comparisons in O`log2 n´ rounds with practical constants. Our protocols can be built upon any secret sharing scheme supporting multiplication and addition. We have implemented and evaluated the performance of sorting on the Sharemind secure multi-party computa- tion platform, demonstrating the real-world performance of our proposed protocols. Keywords. Secure multi-party computation; Sorting; Aggregation; Co- operative anomaly detection 1 Introduction Intrusion Detection Systems (IDS) [16] are commonly used to detect anoma- lous and possibly malicious network traffic. Incidence reports and alerts from such systems are generally kept private, although much could be gained by co- operative sharing [30].
    [Show full text]
  • Improving the Performance of Bubble Sort Using a Modified Diminishing Increment Sorting
    Scientific Research and Essay Vol. 4 (8), pp. 740-744, August, 2009 Available online at http://www.academicjournals.org/SRE ISSN 1992-2248 © 2009 Academic Journals Full Length Research Paper Improving the performance of bubble sort using a modified diminishing increment sorting Oyelami Olufemi Moses Department of Computer and Information Sciences, Covenant University, P. M. B. 1023, Ota, Ogun State, Nigeria. E- mail: [email protected] or [email protected]. Tel.: +234-8055344658. Accepted 17 February, 2009 Sorting involves rearranging information into either ascending or descending order. There are many sorting algorithms, among which is Bubble Sort. Bubble Sort is not known to be a very good sorting algorithm because it is beset with redundant comparisons. However, efforts have been made to improve the performance of the algorithm. With Bidirectional Bubble Sort, the average number of comparisons is slightly reduced and Batcher’s Sort similar to Shellsort also performs significantly better than Bidirectional Bubble Sort by carrying out comparisons in a novel way so that no propagation of exchanges is necessary. Bitonic Sort was also presented by Batcher and the strong point of this sorting procedure is that it is very suitable for a hard-wired implementation using a sorting network. This paper presents a meta algorithm called Oyelami’s Sort that combines the technique of Bidirectional Bubble Sort with a modified diminishing increment sorting. The results from the implementation of the algorithm compared with Batcher’s Odd-Even Sort and Batcher’s Bitonic Sort showed that the algorithm performed better than the two in the worst case scenario. The implication is that the algorithm is faster.
    [Show full text]
  • Selected Sorting Algorithms
    Selected Sorting Algorithms CS 165: Project in Algorithms and Data Structures Michael T. Goodrich Some slides are from J. Miller, CSE 373, U. Washington Why Sorting? • Practical application – People by last name – Countries by population – Search engine results by relevance • Fundamental to other algorithms • Different algorithms have different asymptotic and constant-factor trade-offs – No single ‘best’ sort for all scenarios – Knowing one way to sort just isn’t enough • Many to approaches to sorting which can be used for other problems 2 Problem statement There are n comparable elements in an array and we want to rearrange them to be in increasing order Pre: – An array A of data records – A value in each data record – A comparison function • <, =, >, compareTo Post: – For each distinct position i and j of A, if i < j then A[i] ≤ A[j] – A has all the same data it started with 3 Insertion sort • insertion sort: orders a list of values by repetitively inserting a particular value into a sorted subset of the list • more specifically: – consider the first item to be a sorted sublist of length 1 – insert the second item into the sorted sublist, shifting the first item if needed – insert the third item into the sorted sublist, shifting the other items as needed – repeat until all values have been inserted into their proper positions 4 Insertion sort • Simple sorting algorithm. – n-1 passes over the array – At the end of pass i, the elements that occupied A[0]…A[i] originally are still in those spots and in sorted order.
    [Show full text]
  • Advanced Topics in Sorting
    Advanced Topics in Sorting complexity system sorts duplicate keys comparators 1 complexity system sorts duplicate keys comparators 2 Complexity of sorting Computational complexity. Framework to study efficiency of algorithms for solving a particular problem X. Machine model. Focus on fundamental operations. Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X. Optimal algorithm. Algorithm with best cost guarantee for X. lower bound ~ upper bound Example: sorting. • Machine model = # comparisons access information only through compares • Upper bound = N lg N from mergesort. • Lower bound ? 3 Decision Tree a < b yes no code between comparisons (e.g., sequence of exchanges) b < c a < c yes no yes no a b c b a c a < c b < c yes no yes no a c b c a b b c a c b a 4 Comparison-based lower bound for sorting Theorem. Any comparison based sorting algorithm must use more than N lg N - 1.44 N comparisons in the worst-case. Pf. Assume input consists of N distinct values a through a . • 1 N • Worst case dictated by tree height h. N ! different orderings. • • (At least) one leaf corresponds to each ordering. Binary tree with N ! leaves cannot have height less than lg (N!) • h lg N! lg (N / e) N Stirling's formula = N lg N - N lg e N lg N - 1.44 N 5 Complexity of sorting Upper bound. Cost guarantee provided by some algorithm for X. Lower bound. Proven limit on cost guarantee of any algorithm for X.
    [Show full text]
  • Sorting Algorithm 1 Sorting Algorithm
    Sorting algorithm 1 Sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation, or reordering, of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds. Classification Sorting algorithms used in computer science are often classified by: • Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list . For typical sorting algorithms good behavior is and bad behavior is .
    [Show full text]
  • Faster Shellsort Sequences: a Genetic Algorithm Application
    Faster Shellsort Sequences: A Genetic Algorithm Application Richard Simpson Shashidhar Yachavaram [email protected] [email protected] Department of Computer Science Midwestern State University 3410 Taft, Wichita Falls, TX 76308 Phone: (940) 397-4191 Fax: (940) 397-4442 Abstract complex problems that resist normal research approaches. This paper concerns itself with the application of Genetic The recent popularity of genetic algorithms (GA's) and Algorithms to the problem of finding new, more efficient their application to a wide variety of problems is a result of Shellsort sequences. Using comparison counts as the main their ease of implementation and flexibility. Evolutionary criteria for quality, several new sequences have been techniques are often applied to optimization and related discovered that sort one megabyte files more efficiently than search problems where the fitness of a potential result is those presently known. easily established. Problems of this type are generally very difficult and often NP-Hard, so the ability to find a reasonable This paper provides a brief introduction to genetic solution to these problems within an acceptable time algorithms, a review of research on the Shellsort algorithm, constraint is clearly desirable. and concludes with a description of the GA approach and the associated results. One such problem that has been researched thoroughly is the search for Shellsort sequences. The attributes of this 2. Genetic Algorithms problem make it a prime target for the application of genetic algorithms. Noting this, the authors have designed a GA that John Holland developed the technique of GA's in the efficiently searches for Shellsort sequences that are top 1960's.
    [Show full text]
  • Arxiv:1812.03318V1 [Cs.SE] 8 Dec 2018 Rived from Merge Sort and Insertion Sort, Designed to Work Well on Many Kinds of Real-World Data
    A Verified Timsort C Implementation in Isabelle/HOL Yu Zhang1, Yongwang Zhao1 , and David Sanan2 1 School of Computer Science and Engineering, Beihang University, Beijing, China [email protected] 2 School of Computer Science and Engineering, Nanyang Technological University, Singapore Abstract. Formal verification of traditional algorithms are of great significance due to their wide application in state-of-the-art software. Timsort is a complicated and hybrid stable sorting algorithm, derived from merge sort and insertion sort. Although Timsort implementation in OpenJDK has been formally verified, there is still not a standard and formally verified Timsort implementation in C programming language. This paper studies Timsort implementation and its formal verification using a generic imperative language - Simpl in Isabelle/HOL. Then, we manually generate an C implementation of Timsort from the verified Simpl specification. Due to the C-like concrete syntax of Simpl, the code generation is straightforward. The C implementation has also been tested by a set of random test cases. Keywords: Program Verification · Timsort · Isabelle/HOL 1 Introduction Formal verification has been considered as a promising way to the reliability of programs. With development of verification tools, it is possible to perform fully formal verification of large and complex programs in recent years [2,3]. Formal verification of traditional algorithms are of great significance due to their wide application in state-of-the-art software. The goal of this paper is the functional verification of sorting algorithms as well as generation of C source code. We investigated Timsort algorithm which is a hybrid stable sorting algorithm, de- arXiv:1812.03318v1 [cs.SE] 8 Dec 2018 rived from merge sort and insertion sort, designed to work well on many kinds of real-world data.
    [Show full text]
  • How to Sort out Your Life in O(N) Time
    How to sort out your life in O(n) time arel Číže @kaja47K funkcionaklne.cz I said, "Kiss me, you're beautiful - These are truly the last days" Godspeed You! Black Emperor, The Dead Flag Blues Everyone, deep in their hearts, is waiting for the end of the world to come. Haruki Murakami, 1Q84 ... Free lunch 1965 – 2022 Cramming More Components onto Integrated Circuits http://www.cs.utexas.edu/~fussell/courses/cs352h/papers/moore.pdf He pays his staff in junk. William S. Burroughs, Naked Lunch Sorting? quicksort and chill HS 1964 QS 1959 MS 1945 RS 1887 quicksort, mergesort, heapsort, radix sort, multi- way merge sort, samplesort, insertion sort, selection sort, library sort, counting sort, bucketsort, bitonic merge sort, Batcher odd-even sort, odd–even transposition sort, radix quick sort, radix merge sort*, burst sort binary search tree, B-tree, R-tree, VP tree, trie, log-structured merge tree, skip list, YOLO tree* vs. hashing Robin Hood hashing https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf xs.sorted.take(k) (take (sort xs) k) qsort(lotOfIntegers) It may be the wrong decision, but fuck it, it's mine. (Mark Z. Danielewski, House of Leaves) I tell you, my man, this is the American Dream in action! We’d be fools not to ride this strange torpedo all the way out to the end. (HST, FALILV) Linear time sorting? I owe the discovery of Uqbar to the conjunction of a mirror and an Encyclopedia. (Jorge Luis Borges, Tlön, Uqbar, Orbis Tertius) Sorting out graph processing https://github.com/frankmcsherry/blog/blob/master/posts/2015-08-15.md Radix Sort Revisited http://www.codercorner.com/RadixSortRevisited.htm Sketchy radix sort https://github.com/kaja47/sketches (thinking|drinking|WTF)* I know they accuse me of arrogance, and perhaps misanthropy, and perhaps of madness.
    [Show full text]
  • Gsoc 2018 Project Proposal
    GSoC 2018 Project Proposal Description: Implement the project idea sorting algorithms benchmark and implementation (2018) Applicant Information: Name: Kefan Yang Country of Residence: Canada University: Simon Fraser University Year of Study: Third year Major: Computing Science Self Introduction: I am Kefan Yang, a third-year computing science student from Simon Fraser University, Canada. I have rich experience as a full-stack web developer, so I am familiar with different kinds of database, such as PostgreSQL, MySQL and MongoDB. I have a much better understand of database system other than how to use it. In the database course I took in the university, I implemented a simple SQL database. It supports basic SQL statements like select, insert, delete and update, and uses a B+ tree to index the records by default. The size of each node in the B+ tree is set the disk block size to maximize performance of disk operation. Also, 1 several kinds of merging algorithms are used to perform a cross table query. More details about this database project can be found here. Additionally, I have very solid foundation of basic algorithms and data structure. I’ve participated in division 1 contest of 2017 ACM-ICPC Pacific Northwest Regionals as a representative of Simon Fraser University, which clearly shows my talents in understanding and applying different kinds of algorithms. I believe the contest experience will be a big help for this project. Benefits to the PostgreSQL Community: Sorting routine is an important part of many modules in PostgreSQL. Currently, PostgreSQL is using median-of-three quicksort introduced by Bentley and Mcllroy in 1993 [1], which is somewhat outdated.
    [Show full text]
  • View Publication
    Patience is a Virtue: Revisiting Merge and Sort on Modern Processors Badrish Chandramouli and Jonathan Goldstein Microsoft Research {badrishc, jongold}@microsoft.com ABSTRACT In particular, the vast quantities of almost sorted log-based data The vast quantities of log-based data appearing in data centers has appearing in data centers has generated this interest. In these generated an interest in sorting almost-sorted datasets. We revisit scenarios, data is collected from many servers, and brought the problem of sorting and merging data in main memory, and show together either immediately, or periodically (e.g. every minute), that a long-forgotten technique called Patience Sort can, with some and stored in a log. The log is then typically sorted, sometimes in key modifications, be made competitive with today’s best multiple ways, according to the types of questions being asked. If comparison-based sorting techniques for both random and almost those questions are temporal in nature [7][17][18], it is required that sorted data. Patience sort consists of two phases: the creation of the log be sorted on time. A widely-used technique for sorting sorted runs, and the merging of these runs. Through a combination almost sorted data is Timsort [8], which works by finding of algorithmic and architectural innovations, we dramatically contiguous runs of increasing or decreasing value in the dataset. improve Patience sort for both random and almost-ordered data. Of Our investigation has resulted in some surprising discoveries about particular interest is a new technique called ping-pong merge for a mostly-ignored 50-year-old sorting technique called Patience merging sorted runs in main memory.
    [Show full text]
  • Sorting Algorithm 1 Sorting Algorithm
    Sorting algorithm 1 Sorting algorithm A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions: 1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order); 2. The output is a permutation (reordering) of the input. Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2006). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and upper and lower bounds. Classification Sorting algorithms are often classified by: • Computational complexity (worst, average and best behavior) of element comparisons in terms of the size of the list (n). For typical serial sorting algorithms good behavior is O(n log n), with parallel sort in O(log2 n), and bad behavior is O(n2).
    [Show full text]
  • From Merge Sort to Timsort Nicolas Auger, Cyril Nicaud, Carine Pivoteau
    Merge Strategies: from Merge Sort to TimSort Nicolas Auger, Cyril Nicaud, Carine Pivoteau To cite this version: Nicolas Auger, Cyril Nicaud, Carine Pivoteau. Merge Strategies: from Merge Sort to TimSort. 2015. hal-01212839v2 HAL Id: hal-01212839 https://hal-upec-upem.archives-ouvertes.fr/hal-01212839v2 Preprint submitted on 9 Dec 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Merge Strategies: from Merge Sort to TimSort Nicolas Auger, Cyril Nicaud, and Carine Pivoteau Universit´eParis-Est, LIGM (UMR 8049), F77454 Marne-la-Vall´ee,France fauger,nicaud,[email protected] Abstract. The introduction of TimSort as the standard algorithm for sorting in Java and Python questions the generally accepted idea that merge algorithms are not competitive for sorting in practice. In an at- tempt to better understand TimSort algorithm, we define a framework to study the merging cost of sorting algorithms that relies on merges of monotonic subsequences of the input. We design a simpler yet competi- tive algorithm in the spirit of TimSort based on the same kind of ideas. As a benefit, our framework allows to establish the announced running time of TimSort, that is, O(n log n).
    [Show full text]