ISSN 2319-8885 Vol.04,Issue.04 February-2015,

Pages:0691-0695

www.ijsetr.com

An Enhanced an Efficient Sorting Architecture for High Throughput Applications KAGITALA NAGARAJU1, K.NARASIMHAREDDY2 1PG Scholar, Dept of ECE (VLSI&ES)¸ QIS College of Engineering and Technology, Ongole, AP, India, E-mail: [email protected]. 2Assistant Professor, Dept of ECE, QIS College of Engineering and Technology, Ongole, AP, India, E-mail: [email protected].

Abstract: Sorting is an important operation in a wide range of applications such as data base, digital signal processing, searching and network processing. These sorting modules are implemented using either ASICs or FPGAs to meet required performance. Generally the inputs of sorting modules are integers, floating point numbers or data values. This paper focuses on the design of partial sorting and max-set-selection units. We also investigate the design and VLSI implementations of the partial sorting and max-set-selection units with low latency, high throughput and modest resource requirements. Modular techniques for designing the sorting modules with small and regular building blocks connected in a modular fashion, because of reducing the verification time and simplifying the design process. In this paper, we propose parallel sorting algorithms for finding/ sorting M largest values from N inputs and then design scalable architectures based on proposed algorithms. For sorting the values the bubble sorting technique also proposed.

Keywords: Sorting, Modular Techniques, Partial Sorting, Max-Set-Selection, Sorting Modules, FPGAs.

I. INTRODUCTION computing [8], searching, scheduling [9], pattern In the past, the major concerns of the VLSI were area, recognition, robotics [10], image and video processing, performance, cost and reliability. The sorting problem has and high-energy physics (HEP) [4]. For applications that been investigated under various parallel architectures, require very high-speed sorting, hardware sorting units since utilizing many functional units to sort concurrently are often implemented using either ASICs or FPGAs to can improve performance. Based on the target meet performance requirements[7]. Based on target applications, hardware sorting units vary greatly not only applications, hardware sorting units vary greatly not only in architecture but also in number of inputs and width of in architecture but also in the number of inputs and the the inputs they process. In order to achieve high width of inputs that they can process. For instance, only 9 throughput rate, today’s computer perform several to 25 inputs need to be processed in certain filters, while operations simultaneously. Not only input operations the number of inputs can vary from 25 to 81 (or even performed concurrently with computing, but also in multi higher) in certain image processing applications [8]. High processors several computing operations are done speed sorters on FPGAs in HEP applications deal with concurrently. In previous research of sorting units, it must 128 to 256 data samples in 100-ns processing cycles [4], produce all of its input in sorted order. But in most of the [9]. Thousands of inputs are sorted in video [3] and applications, all the given inputs need not to be sorted. database applications [3]. For example, in HEP applications, only the M most energetic particles are considered. Similarly in digital In general, inputs can be b-bit integers (8 _< b _< 64), signal processing applications, only the M strongest signal floating-point numbers, or even compressed data values. need to be analyzed. This paper focuses on partial sorting Most previous research on sorting units has focused on and max-set-selection units that discard small inputs as the situation in which the sorting unit must produce all of early as possible to reduce the sorting units latency and its inputs in sorted (increasing or decreasing) order. In hardware complexity. many applications, however, only the M largest (or smallest) output values need to be selected from a total of Batcher introduced the bitonic and the N input values, where M

Copyright @ 2015 IJSETR. All rights reserved. KAGITALA NAGARAJU, K.NARASIMHAREDDY which are in fact hardware implementations of the CE called shuffling. For sorting, either a weak order, operation. A comparator has two inputs and two outputs. "should not come after", can be specified, or a strict Depending on the sense of ordering, comparators can be weak order, "should come before" (specifying one of two kinds as shown in fig.1 (a), (b). defines also the other, the two are the complement of the inverse of each other, see operations on binary relations). For the sorting to be unique, these two are restricted to a total order and a strict total order, respectively.  Sorting n-tuples (depending on context also called e.g. records consisting of fields) can be done based on one or more of its components. More generally objects can be sorted based on a property. Such a component or property is called a sort key.  For example, the items are books, the sort key is the title, subject or author, and the order is alphabetical. Fig.1. comparators.  A new sort key can be created from two or more sort keys by lexicographical order. The first is then Sorting is any process of arranging items according to called the primary sort key, the second a certain sequence or in different sets, and therefore, it has the secondary sort key, etc. two common, yet distinct meanings:  If the sort key values are totally ordered, the sort key  ordering: arranging items of the same kind, class or defines a weak order of the items: items with the nature, in some ordered sequence, same sort key are equivalent with respect to sorting.  categorizing: grouping and labeling items with similar properties together (by sorts) A standard order is often called ascending (corresponding to the fact that the standard order of  Sorting information or data. numbers is ascending, i.e. A to Z, 0 to 9), the reverse  In computer science, sorting is one of the most order descending (Z to A, 9 to 0). extensively researched subjects because of the need to speed up the operation on thousands or millions II. of records during a search operation; see sorting A sorting algorithm is an algorithm that puts elements algorithm. of a list in a certain order. The most-used orders are  The main purpose of sorting information is to numerical order and lexicographical order. Efficient optimize its usefulness for specific tasks. In general, sorting is important for optimizing the use of other there are two ways of grouping information: by algorithms (such as search and merge algorithms) which category e.g. a shopping catalogue where items are require input data to be in sorted lists; it is also often compiled together under headings such as 'home', useful for cannibalizing data and for producing human- 'sport & leisure', 'women's clothes' etc. (nominal readable output. More formally, the output must satisfy scale) and by the intensity of some property, such as two conditions: The output is in non decreasing order price, e.g. from the cheapest to most expensive (each element is no smaller than the previous element (ordinal scale). Richard Saul Wurman, in his according to the desired total order); The output is a book Information Anxiety, proposes that the most permutation (reordering) of the input. Further, the data is common sorting purposes are name, by location and often taken to be in an array, which allows random access, by time (these are actually special cases of category rather than a list, which only allows sequential access, and hierarchy). Together these give the acronym though often algorithms can be applied with suitable LATCH (Location, Alphabetical, Time, Category, modification to either type of data. Since the dawn of Hierarchy) and can be used to describe just about computing, the sorting problem has attracted a great deal every type of ordered information. Often of research, perhaps due to the complexity of solving it information is sorted using different methods at efficiently despite its simple, familiar statement. For different levels of abstraction: e.g. the UK telephone example, was analyzed as early as 1956. directories which are sorted by location, by category (business or residential) and then alphabetically. A fundamental limit of comparison sorting algorithms New media still subscribe to these basic sorting is that they require line arrhythmic time – O(n log n) – in methods: e.g. a Google search returns a list of web the worst case, though better performance is possible on pages in a hierarchical list based on its own scoring real-world data (such as almost-sorted data), and system for how closely they match the search algorithms not based on comparison, such as counting criteria (from closest match downwards). The sort, can have better performance. Although many opposite of sorting, rearranging a sequence of items consider sorting a solved problem – asymptotically in a random or meaningless order, is optimal algorithms have been known since the mid-20th

International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.04, February-2015, Pages: 0691-0695 An Enhanced an Efficient Sorting Architecture for High Throughput Applications century – useful new algorithms are still being invented, processor arrays: On parallel processors, with one value with the now widely used Tim sort dating to 2002, and the per processor and only local left–right neighbor library sort being first published in 2006. Sorting connections, the processors all concurrently do a compare algorithms are prevalent in introductory computer science – exchange operation with their neighbors, alternating classes, where the abundance of algorithms for the between odd–even and even–odd pairings. This algorithm problem provides a gentle introduction to a variety of core was originally presented, and shown to be efficient on algorithm concepts, such as , divide and such processors, by Habermann in 1972. The algorithm conquer algorithms, data structures such as heaps and extends efficiently to the case of multiple items per binary trees, randomized algorithms, best, worst and processor. In the Baudet–Stevenson odd–even merge- average case analysis, time-space tradeoffs, and upper and splitting algorithm, each processor sorts its own sub list at lower bounds. each step, using any efficient sort algorithm, and then performs a merge splitting, or transposition–merge, A. Bitonic Sorting operation with its neighbor, with neighbor pairing Bitonic sort is one of the fastest sorting networks. A alternating between odd–even and even–odd on each step. sorting network is a special kind of sorting algorithm, where the sequence of comparisons is not data-dependent. III. PARTIAL SORTING AND MAX-SET This makes sorting networks suitable for implementation SELECTION UNITS in hardware or in parallel processor arrays. The sorting Partial sorters provide the 2m largest values in sorted network bitonic sort consists of Θ(n•log(n)2) order, and max-set-selection units provide the 2m largest comparators. It has the same asymptotic complexity as values in arbitrary order. Partial sorters and max-set- odd-even and shell sort. Although a sorting selection units are key components in many applications network with only O(n•log(n)) comparators is known as shown in Fig.3. For example, in the LHC [6] low- [AKS 83], due to its large constant it is slower than latency max-set-selection units identify important particle bitonic sort for all practical problem sizes. interactions that correspond to high-energy collisions [9]. In multimedia applications, partial sorters speed up data sorting algorithms. To design 2n-to-4 max-set-selection units, we take advantage of the fact that only the four largest inputs are needed, in no particular order, to decrease the resource requirements and the number of CAE stages as shown in Fig.4.

Fig.2.

B. Odd Even Sorting In computing, an odd–even sort or odd–even transposition sort (also known as brick sort is a relatively simple sorting algorithm, developed originally for use on parallel processors with local interconnections. It is a related to bubble sort, with which it shares many characteristics. It functions by comparing all Fig.3. 8-to-4 bitonic partial sorting unit. (odd, even)-indexed pairs of adjacent elements in the list It replaced with a level of Max units with wirings that and, if a pair is in the wrong order (the first is larger than differ from the first level of parallel CAE blocks in the the second) the elements are switched. The next step OEM-8 unit. These modifications decrease the required repeats this for (even, odd)-indexed pairs (of adjacent number of CAE stages from six in 8-input sorting units to elements). Then it alternates between (odd, even) and four in 8-to-4 max-set selection units. (even, odd) steps until the list is sorted. Sorting on

International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.04, February-2015, Pages: 0691-0695 KAGITALA NAGARAJU, K.NARASIMHAREDDY VI. CONCLUSION Finally, this project presents the design and implementation of flexible, low-latency, high-throughput N-to-M sorting, and max-set-selection units and discussed the structure, performance and resource requirements of these units. In this paper, we propose modular techniques for designing N-to- M sorting and max-set-selection units based on the Batcher’s bitonic and odd-even merge sorting algorithms. We present new regular bitonic merging units that are used to construct efficient sorting and max-set-selection units. Although built from Batcher’s merging units, our proposed parallel designs modify the original units to obtain efficient max-set- selection and partial sorting units, reducing time and area complexities of the original algorithm. For sorting the values the bubble sorting technique also proposed.

Fig.4. 8-to4 bitonic max-set-selection unit. VII. REFERENCES [1] S. Azuma, T. Sakuma, T. Takeo, T. Ando, and K. IV. BUBBLE SORT Shirai, “Diaprism Hardware Sorter - Sort aMillion Bubble sort is a simple sorting algorithm. The algorithm Records within a Second,” http:// sortbenchmark.org/ starts at the beginning of the data set. It compares the first Y2000_Datamation_DiaprismSorter.pdf, 2000. two elements, and if the first is greater than the second, it [2] N. Govindaraju, J. Gray, R. Kumar, and D. Manocha, swaps them. It continues doing this for each pair of “GPUTeraSort: High Performance Graphics Co-Processor adjacent elements to the end of the data set. It then starts Sorting for Large Database Management,” Proc. Conf. again with the first two elements, repeating until no swaps Management of Data, pp. 325- 336, 2006. have occurred on the last pass. This algorithm's average [3] D. Koch and J. Torresen, “FPGASort: A High and worst-case performance is O(n2), so it is rarely used Performance Sorting Architecture Exploiting Run-Time to sort large, unordered data sets. Bubble sort can be used Reconfiguration on FPGAs for Large Problem Sorting,” to sort a small number of items (where its asymptotic Proc. Symp. Field Programmable Gate Arrays, pp. 45-54, inefficiency is not a high penalty). Bubble sort can also be 2011. used efficiently on a list of any length that is nearly sorted [4] D. Pok, C.-I. Chen, J. Schamus, C. Montgomery, and (that is, the elements are not significantly out of place). J. Tsui, “Chip Design for Monobit Receiver,” IEEE For example, if any number of elements is out of place by Trans. Microwave Theory and Techniques, vol. 45, no. only one position (e.g. 0123546789 and 1032547698), 12, pp. 2283-2295, Dec. 1997. bubble sort's exchange will get them in order on the first [5] I. Pitas and A.N. Venetsanopoulos, Nonlinear Digital pass, the second pass will find all elements in order, so Filters: Principles and Applications. Kluwer Academic the sort will take only 2n time. Publishers, 1990. [6] J.P. Agrawal, “Arbitrary Size Bitonic (ASB) Sorters V. RESULT and Their Applications in Broadband ATM Switching,” Result if this paper is as shown in Fig.5. Proc. IEEE Int’l Conf. Computers and Comm., pp. 454- 458, Mar. 1996. [7] K. Yun, K. James, R. Fairlie-Cuninghame, S. Chakraborty, and R. Cruz, “A Self-Timed Real-Time Sorting Network,” IEEE Trans. Very Large Scale Integration Systems, vol. 8, no. 3, pp. 356- 363, June 2000. [8] A. Colavita, E. Mumolo, and G. Capello, “A Novel Sorting Algorithm and Its Application to a Gamma-Ray Telescope Asynchronous Data Acquisition System,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 394, no. 3, pp. 374- 380, 1997. [9] D.C. Stephens, J.C. Bennett, and H. Zhang, “Implementing Scheduling Algorithms in High-Speed Networks,” IEEE J. Selected Areas in Comm, vol. 17, no.

Fig.5. Output wave form of bubble sorting. 6, pp. 1145-1158, June 1999.

International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.04, February-2015, Pages: 0691-0695 An Enhanced an Efficient Sorting Architecture for High Throughput Applications [10] V. Brajovic and T. Kanade, “A VLSI Sorting Image Sensor: GlobalMassively Parallel Intensity-to-Time Processing for Low-Latency, Adaptive Vision,” IEEE Trans. Robotics and Automation, vol. 15, no. 1, pp. 67- 75, Feb. 1999.

International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.04, February-2015, Pages: 0691-0695