On the Usage of Sorting Networks to Big Data

On the usage of Sorting Networks to Big Data Blanca López and Nareli Cruz-Cortés Artificial Intelligence Laboratory, Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), México D.F., México Country Abstract— Sorting data in a computer is maybe the most Sorting Networks (SN) are an example of the non-adaptive popular classical task in Computer Science. For the majority algorithms. of applications the main goal is to minimize the number of Taking advantage of the divide-and-conquer strategy uti- comparisons and execution time that the sorting algorithm lized by the QuickSort, it is designed a strategy where some consumes. Sorting Networks are algorithms that perform SN are coupled to it in order to reduce the comparisons exactly the same number of comparisons to order any input performed by the QuickSort. permutation for a given input data size. That is, each step The remaining of this paper is organized as follows. does not depend on the result of a previous comparisons. In Section 2 some basic concepts about Quicksort and Thus, designing Sorting Networks with a minimal number of Sorting Networks are presented. In Section 3 the proposal comparisons becomes a very important task. However, it is is explained. Section 4 presents the experiments and results. an NP-hard problem. Actually, the optimal Sorting Networks Finally in Section 5 some conclusions are drawn. with a minimal number comparisons (or at least close to the optimal) for small input data sizes from 3 to 16 are published 2. Basic Concepts in the specialized literature. Of course, these input data sizes are very small to be used in real world problems. In this 2.1 Quicksort Algorithm work we propose a new strategy to improve the QuickSort Quicksort (also known as Partition-Exchange Sort) was performance by coupling it with some Sorting Networks to first presented in 1960 by Tony Hoare [4]. It uses a divide- large input data. The results demonstrate it helps reducing and-conquer strategy by dividing a large list into two smaller the sorting execution time. sublists. A sublist with the smallest values and another with the greatest. Then, each sublist is recursively ordered. The Keywords: Sorting Networks, QuickSort algorithm is as follows: 1) Choose an element from the list that will be called 1. Introduction pivot. 2) Order the list in such a way that all the values which Sorting Algorithms are maybe one of the most studied are less than the pivot will be located to its left (before problems in Computer Science, from the theoretical and the pivot). Further, all the values greater than the pivot practical points of view. Applications of them can be found will be located to its right (after the pivot). This way, in Data Processing Systems, Network Communication Sys- the value in the pivot is on its final position. tems, Image Processing, Artificial Intelligence, Cryptogra- 3) For each sublist, repeat the previous steps in a recur- phy, Computer Security, Information Systems, among many sively manner until the sublists size is zero or one. others. A large set of Sorting Algorithms can be found in the This idea is illustrated in Figure 1. QuickSort is a very specialized literature, such as: quicksort, bubble sort, merge efficient algorithm that on the average and best cases makes sort, shell sort, heapsort, insertion, introsort, shear sorting, O(n log n) comparisons for sorting n elements. In the worst 2 etc. Choosing the most efficient algorithm usually depends case it makes O(n ). Some variants to this algorithm have on the type of application at hand. In general, the Sorting been presented in [6][3] where their authors proposed some Algorithms can be classified into two groups: the adap- modifications to reduce the execution time. tive and non-adaptive. An adaptive algorithm executes its compare-interchange operations depending on the input data. 2.2 Sorting Networks On the other hand, the non-adaptive algorithms have fixed SN are algorithms with the main feature of being oblivi- operations which are executed no matter the configuration ous, it means that their current operations (comparisons) do of the input data (e. g. all the possible permutations). They not depend on the input data or the previous comparisons always execute the same compare-interchange operations. [5][7]. Unlike other well known sorting algorithms (bubble Pivot x0=4 y0 =1 c0 c2 Iteration 1 x1=2 y1 =2 c4 QS QS c3 x2=1 c1 y2 =3 Pivot Pivot x3=3 y3 =4 Fig. 2 SORTING NETWORK FOR n = 4 INPUTS. Iteration 2 QS QS QS QS . compare-interchange each time a comparator is found. So, . the comparators c0 and c1 are executed first, then c2 and . c3, and finally c4. c0 evaluates 4 > 2, thus the values of x0 and x1 are swapped. c1 evaluates 1 < 3, so the values of x2 and x3 remain without change. This process continues Fig. 1 until all the comparators are applied, so the final sorted list RECURSIVE PARTITION OPERATION OF THE QUICKSORT ALGORITHM y0; y1; y2; y3 at the right accomplishes y0 ≤ y1 ≤ y2 ≤ y3. As a matter of fact, if an optimal SN for input size n can be designed (i. e. with minimal number of comparators), then n sort, quicksort, etc.), the sequence and number of compari- it means that is the best manner to sort data. Designing SN sons are exactly the same no matter the input configuration with minimal number of comparators and/or high parallelism (permutation). The SN exhibits two main features: is a classical interesting problem in Computer Science. Actually, nowadays it is an open research area. • The comparisons (called comparators) are fixed before It is important to notice that the optimal SN for input the SN execution, size greater than n = 16 are not know. Actually, only • Some comparisons can be executed in a parallel man- lower bounds regarding the number of comparators are ner. theoretically known [5]. The most studied SN is the one A SN is composed by a set of comparators, where each of with input size n = 16, which is a relatively small value, them executes an action compare-interchange between two considering the huge quantity of information that the modern elements (a; b). The element a must be not grater than b, if systems must handle. The best known SN n = 16 has only so, the values must be interchanged to (b; a). So, for a given 60 comparators, for example, the one designed by Green [5] input list with size n, the set of comparators conforming the is illustrated in Figure 3. SN are applied to it, then the output is the list monotonically In [2] K. E. Batcher proposed an interesting algorithm non decreasing ordered. called Merge Odd-Even to merge two SN into one. That is, Typically, the SN are graphically represented by n ho- if we have a SN with input size n, then, it is possible to rizontal lines representing the n input data. Further, some obtain a SN with input size 2n by merging two copies of vertical lines that represent comparisons between the value the original SN size n each. By following this algorithm it at its top extreme and the value at its bottom. If the value at is possible to obtain SN with larger input sizes 1. the top is grater than the value at the bottom, these values An example, to increase the size of input data in 2n must be swapped. from SN for n = 4. A set of operations to order and two The input data are placed at the left, then, after they output lists “g” and “h” are considered. In the Figure 4 are have traveled across the horizontal lines and executed the shown two lists to re-arrange. The list “t” has the numbers comparisons found, the output is obtained at the right. The {t1; t2:::; tg} in ordered. At the same time, second list called data must be ascendant sorted from top to bottom. “w” are composed by fw1; w2; :::; whg. The “g + h” is the See for example a SN for n = 4 inputs illustrated in output of the merging network, the numbers of the merged Figure 2. Each input data is set on the horizontal lines lists in ascending order are fu1; :::; ug+h−1; ug+hg. i.e., at labeled as x0; x1; x2; x3. The vertical lines are the com- first, a list “g + h” can be build by merging network with parators c0; c1; c2; c3; c4, each receiving two values, i. e., the odd-indexed numbers of the two input lists and the even- the comparator c0 receives the values x0 and x1, and so on. All the data values go from left to right executing a 1Usually SN for input sizes greater than n = 16 are considered as large. step1 step2 step3 step4 step5 step6 t 1 t 2 t 3 t g w1 w2 w3 wh Fig. 5 Fig. 3 ODD-EVEN MERGESORT SCHEME.TWO SN FOR n = 8 INPUTS IS SN WITH INPUT SIZE n = 16 DESIGNED BY GREEN.IT IS THE BEST CONSTRUCTED BY TWO SN FOR n = 4. KNOW WITH 60 COMPARATORS. x 1 t 1 u 1 t 1 u 1 O x 2 Co t 2 u2 t 2 u2 C2 d x 3 M t 3 C4 u 3 t 3 u 3 . d C3 . C1 E . t g u 4 w y1 1 E u5 w1 R u5 Co w y2 2 v u6 w2 u6 C2 G w y3 C4 3 e u7 w3 u7 .

Load more