Parallel Optimization of the Set Data Structure
Total Page:16
File Type:pdf, Size:1020Kb
Parallel optimization of the set data structure Gustav Karlsson Gustav Karlsson Spring 2015 Bachelor’s thesis, 15 hp Supervisor: Jerry Eriksson Examiner: Pedher Johansson Bachelor’s program in computer science, 180 hp Abstract The Set data structure is a powerful and popular programmer’s tool based on set theory. Bulk operations such as addAll provide a sim- ple way of working with big collections of elements but can severely limit the performance of a multi-processor system when invoked on big sets if performed sequentially. Parallel processing is a technique that can significantly shorten the ex- ecution time of otherwise sequentially executed tasks. In this thesis, a new type of set is proposed that uses multiple threads to employ parallel processing for bulk operations. A new set is designed based on a non-blocking hash table with a high level of concurrency. The parallelization of the set makes use of paral- lel Java 8 streams to allow for quick iteration of the source collection elements, resulting in a speedup in processing of bulk operations. Testing the parallel set shows a significant increase in execution speed on bulk operations when operating on more than 10000 elements. Acknowledgements I would like to thank my supervisor Jerry Eriksson at Umea˚ University for helping me keep this work on track. I would also like to thank Pedher Johansson at Umea˚ University for giving me some valuable assistance and feedback. Finally, thank you Judith for your support, patience, and for putting up with my ramblings and rants. Contents 1 Introduction 1 2 The set data structure 1 2.1 Operations 2 2.1.1 Inspecting operations 2 2.1.2 Mutating operations 2 2.1.3 Set creation operations 2 2.1.4 Bulk operations 3 2.2 Performance and common implementations 3 2.2.1 Array 3 2.2.2 Linked list 4 2.2.3 Binary search tree 4 2.2.4 Hash table 4 3 Concurrency and parallelism 5 3.1 Turning concurrent into parallel 8 3.2 Bulk operations 9 3.3 The producer – consumer bottleneck 9 4 A case for parallel sets 9 5 Method 10 5.1 Java 8 as a test platform 10 5.2 Designing a parallel set 10 5.2.1 Cliff Click’s non-blocking hash table 11 5.2.2 Achieving non-blocking concurrency 11 5.2.3 Turning the hash table into a set 12 5.2.4 Parallelization 12 5.3 Testing 13 5.3.1 Parameters 13 5.3.2 Test setup 14 6 Results 15 7 Conclusion 18 8 Discussion 18 8.1 CPU scaling 18 8.2 Number of elements scaling 18 8.3 Future work 19 8.4 Closing 19 References 21 1(21) 1 Introduction Data structures are foundational in computer science and come in many different forms such as lists, maps, trees, queues, stacks and sets. They are used for many different pur- poses such as assuring the flow of data, controlling relations between elements, or to simply act as containers. An interesting aspect of data structures is how they are represented in computer programs. The elements held in a data structure must be mapped to the comput- ers main memory and as they grow in size, performance becomes an increasingly interesting topic of research. Execution time for lookups and modifications is one of the most impor- tant performance metrics of a data structure and can vary greatly depending on how it is implemented. As time goes and development continues, computer systems get increasingly faster, but in recent years, processor speed is increasing slower and slower due to difficulties in the manufacturing process known as the “power wall” [1]. Instead, focus has shifted towards increasing the number of processors (or processor cores). This has forced programmers to make better use of multiple processors in computations – a difficult task that requires careful design to avoid issues such as race conditions, deadlocks and resource starvation. One way of simplifying the use of multiple processors is to hide the parallelization behind a layer of abstraction such as a class in object oriented design. By encapsulating the com- plex parts, the solution can be reused in other projects to help spread the use of parallel computing. Goal The goal of this work is to design and implement a mutable set1 capable of executing a selection of its operations in parallel by encapsulating the parallelism in the implementation of the set. Sets are targeted both to limit the scope of the work and because they are inter- esting from a performance standpoint. The resulting set is tested to answer the following questions: • How does a parallel set perform in comparison to a sequential set when running on a multi-processor system? • How does the performance of a parallel set scale with the number of used processors? • How does the performance of a parallel set scale with the size of the operation? 2 The set data structure The set data structure is useful tool for programmers and originates from set theory – a branch of mathematical logic – in which it would more accurately be described as a finite set2. 1A mutable set can be modified by inserting or removing elements. 2A finite set has a limited number of elements, as opposed to an infinite set such as the set of all positive integers. 2(21) A set is a collection of elements where any given element may only occur once. The ele- ments of a set are not ordered3 so the only two states an element can have in relation to a set is being a member of the set, or not being a member of the set. 2.1 Operations Sets can be operated on in a number of ways depending on the type of set and what the intended purpose is. Some operations only inspect the state of the set, while some make modifications to it or create new sets from existing sets. 2.1.1 Inspecting operations Inspecting operations are used to inspect the state of the set. The most basic operations are: contains Checks whether the set contains a given element. size Returns the cardinality (number of contained elements) of the set. 2.1.2 Mutating operations Mutable sets must provide at least the following operations for the set to be “fully muta- ble”4: add Adds a given element to the set. The element is rejected if it is already a member of the set. Many implementations return a boolean value indicating if the insertion was accepted or rejected. remove Removes a given element from the set. Many implementations return a boolean value indicating if the element was removed or if it is not a member of the set. 2.1.3 Set creation operations In addition to operations performed on a set, some operations can be used to create new sets from multiple input sets. union Creates a set from elements that are members of any of the input sets. intersection Creates a set from elements that are members of all input sets. difference Creates a set from two existing sets A and B, consisting of elements from A that are not members of B. 3Although some implementations order elements to provide a predictable iteration order or faster lookups. 4Fully mutable in this case means that a set can always go from any possible state to any other possible state through mutable operations 3(21) 2.1.4 Bulk operations For the convenience of the programmer, some implementations provide bulk versions of single element operations. Bulk operations do not provide any extra functionality but in- stead act as a replacement for the typical “for each item in A, do X on B” approach. They take as input an arbitrary number of elements (often packed in a data structure of their own) and perform an operation on the target set with each of the input elements. addAll Adds all elements of the input collection to the set. If the input is also a set, the target set will become the union of both sets. removeAll Removes all elements of the input collection from the set. If the input is also a set, the target set will become the difference between the target set and the input set. containsAll Checks whether the set contains all elements in the input collection. If the input is also a set, this will effectively determine if the input set is a subset of the target set. retainAll Removes all elements from the set that are not in the input collection. This is the inverse of removeAll. 2.2 Performance and common implementations The restriction on sets to only contain a single instance of any element leads to an interesting observation: When adding an element to a set, the implementation must have some mech- anism to rule out every other member as a duplicate of the element being inserted before accepting it. This can have a considerable impact on performance and a poorly designed implementation might need to traverse the full set on every insertion. Another interesting property is the disorderliness of sets. Since elements are not ordered, there is no straightforward way of finding an element and the full set might have to be traversed in order to find the desired element. Ultimately, this means that calls to add, remove and contains are O(n) for any trivial implementation. There are numerous ways to implement sets. Implementations come with different char- acteristics that make them suitable for different situations. Some of the most common ap- proaches are: 2.2.1 Array Elements are kept in a fixed size array where empty slots hold a null reference.