Parallelization of Sequential Applications Using .NET Framework 4.5

Parallelization of Sequential Applications using .NET Framework 4.5 An evaluation, application and discussion about the parallel extensions of the .NET managed concurrency library. JOHAN LITSFELDT KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:160 Parallelization of Sequential Applications using .NET Framework 4.5 An evaluation, application and discussion about the parallel extensions of the .NET managed concurrency library. JOHAN LITSFELDT Degree project in Program System Technology at KTH Information and Communication Technology Supervisor: Mats Brorsson, Marika Engström Examiner: Mats Brorsson TRITA-ICT-EX-2013:160 Abstract Modern processor construction has taken a new turn in that adding on more cores to processors appears to be the norm instead of simply relying on clock speed improve- ments. Much of the responsibility of writing efficient applications has thus been moved from the hardware designers to the software developers. However, issues related to scal- ability, synchronization, data dependencies and debugging makes this troublesome for developers. By using the .NET Framework 4.5, a lot of the men- tioned issues are alleviated through the use of the parallel extensions including TPL, PLINQ and other constructs de- signed specifically for highly concurrent applications. Anal- ysis, profiling and debugging of parallel applications has also been made less problematic in that the Visual Studio 2012 IDE provides such functionality to a great extent. In this thesis, the parallel extensions as well as explicit threading techniques are evaluated along with a parallelization attempt on an application called the LTF/Reduce In- terpreter by the company Norconsult Astando AB. The application turned out to be highly I/O dependent but even so, the parallel extensions proved useful as the running times of the parallelized parts were lowered by a factor of about 3.5–4.1. Sammanfattning Parallellisering av Sekventiella Applikationer med .NET Framework 4.5 Modern processorkonstruktion har tagit en vändning i och med att normen nu ser ut att vara att lägga till fler kär- nor till processorer istället för att förlita sig på ökningar av klockhastigheter. Mycket av ansvaret har således flyttats från hårdvarutillverkarna till mjukvaruutvecklare. Skalbar- het, synkronisering, databeroenden och avlusning kan emel- lertid göra detta besvärligt för utvecklare. Många av de ovan nämnda problemen kan mildras ge- nom användandet av .NET Framework 4.5 och de nya parallella utbyggnaderna vilka inkluderar TPL, PLINQ och andra koncept designade för starkt samverkande applikationer. Analys, profilering och debugging har också gjorts mindre problematiskt i och med att Visual Studio 2012 IDE tillhandahåller sådan funktionalitet i stor utsträckning. De parallella utbyggnaderna samt tekniker för explicit trådning utvärderas i denna avhandling tillsammans med ett parallelliseringsförsök av en applikation vid namn LT- F/Reduce Interpreter av företaget Norconsult Astando AB. Applikationen visade sig vara starkt I/O beroende men även om så var fallet så visade sig de parallella utbyggnaderna vara användbara då körtiden för de parallelliserade delarna kunde minskas med en faktor av c:a 3.5–4.1. Contents List of Tables List of Figures I Background 1 1 Introduction 2 1.1 The Modern Processor .......................... 2 1.2 .NET Framework 4.5 Overview ..................... 3 1.2.1 Common Language Runtime .................. 3 1.2.2 The Parallel Extension ...................... 3 1.3 Problem Definition ............................ 4 1.4 Motivation ................................ 4 II Theory 7 2 Modern Processor Architectures 8 2.1 Instruction Handling ........................... 8 2.2 Memory Principles ............................ 9 2.3 Threads and Processes .......................... 10 2.4 Multi Processor Systems ......................... 11 2.4.1 Parallel Processor Architectures . 11 2.4.2 Memory Architectures ...................... 11 2.4.3 Simultaneous Multi-threading . 13 3 Parallel Programming Techniques 14 3.1 General Concepts ............................. 14 3.1.1 When to go Parallel ....................... 14 3.1.2 Overview of Parallelization Steps . 15 3.2 .NET Threading ............................. 16 3.2.1 Using Threads .......................... 17 3.2.2 The Thread Pool ......................... 17 3.2.3 Blocking and Spinning ...................... 19 3.2.4 Signaling Constructs ....................... 19 3.2.5 Locking Constructs ........................ 20 3.3 Task Parallel Library ........................... 21 3.3.1 Task Parallelism ......................... 22 3.3.2 The Parallel Class ....................... 22 3.4 Parallel LINQ ............................... 23 3.5 Thread-safe Data Collections ...................... 25 3.6 Cancellation ................................ 26 3.7 Exception Handling ........................... 27 4 Patterns of Parallel Programming 28 4.1 Parallel Loops ............................... 28 4.2 Forking and Joining ........................... 29 4.2.1 Recursive Decomposition .................... 29 4.2.2 The Parent/Child Relation ................... 29 4.3 Aggregation and Reduction ....................... 30 4.3.1 Map/Reduce ........................... 30 4.4 Futures and Continuation Tasks .................... 31 4.5 Producer/Consumer ........................... 32 4.5.1 Pipelines ............................. 32 4.6 Asynchronous Programming ....................... 33 4.6.1 The async and await Modifiers . 33 4.7 Passing Data ............................... 34 5 Analysis, Profiling and Debugging 35 5.1 Application Analysis ........................... 35 5.2 Visual Studio 2012 ............................ 36 5.2.1 Debugging Tools ......................... 36 5.2.2 Concurrency Visualizer ..................... 37 5.3 Common Performance Sinks ....................... 38 5.3.1 Load Balancing .......................... 38 5.3.2 Data Dependencies ........................ 38 5.3.3 Processor Oversubscription ................... 39 5.3.4 Profiling .............................. 39 IIIImplementation 41 6 Pre-study 42 6.1 The Local Traffic Prescription (LTF) . 42 6.2 Application Design Overview ...................... 42 6.3 Choice of Method ............................. 43 6.4 Application Profile ............................ 45 6.4.1 Hardware ............................. 45 6.4.2 Performance Overview ...................... 45 6.4.3 Method Calls ........................... 46 7 Parallel Database Concepts 48 7.1 I/O Parallelism .............................. 48 7.2 Interquery Parallelism .......................... 49 7.3 Intraquery Parallelism .......................... 49 7.4 Query Optimization ........................... 49 8 Solution Design 50 8.1 Problem Decomposition ......................... 50 8.2 Applying Patterns ............................ 53 8.2.1 Explicit Threading Approach . 53 8.2.2 Thread Pool Queuing Approach . 55 8.2.3 TPL Approaches ......................... 56 8.2.4 PLINQ Approach ......................... 57 IVExecution 59 9 Results 60 9.1 Performance Survey ........................... 60 9.2 Discussion ................................. 64 9.2.1 Implementation Analysis .................... 64 9.2.2 Parallel Extensions Design Analysis . 65 9.3 Conclusion ................................ 67 9.4 Future Work ............................... 67 V Appendices 69 A Code Snippets 70 A.1 The ReadData Method .......................... 70 A.2 The ThreadObject Class ........................ 71 B Raw Data 72 B.1 Different Technique Meaurements .................... 72 B.2 Varying the Thread Count using Explicit Threading . 73 B.3 Varying the Core Count using TPL ................... 73 Bibliography 74 List of Tables 2.1 Specifications for some common CPU architectures. [1] . 9 2.2 Typical specifications for different memory units (as of 2013). [2] . 9 2.3 Specifications for some general processor classifications. 11 3.1 Typical overheads for threads in .NET. [3] . 17 3.2 Typical overheads for signaling constructs. [4] . 20 3.3 Properties and typical overheads for locking constructs. [4] . 21 6.1 Processor specifications of the hardware used for evaluating the application. ..................................... 45 6.2 Other hardware specifications of the hardware used for evaluating the application. ................................. 45 6.3 Methods with respective exclusive instrumentation percentages. 46 6.4 Methods with respective CPU exclusive samplings. 47 9.2 Perceived difficulty- and applicability levels of implementing the techniques of parallelization. .......................... 64 9.1 Methods with the highest elapsed exclusive time spent executing for the sequential- and TPL based approach. ................... 64 B.1 Running times measured using the different techniques of parallelization. 72 B.2 Running time measurements using explicit threading with different amounts of threads. .................................. 73 B.3 Running time measurements using TPL with different amounts of cores. 73 List of Figures 1.1 Overview of .NET threading concepts. Concepts in white boxes with dotted borders are not covered to great extent in this thesis. [3] . 6 1.2 A high level flow graph of typical parallel execution in the .NET Frame- work. [5] .................................... 6 2.1 Difference between UMA (left) and NUMA (right). Processors are de- noted P1,P2, ..., Pn and memories M1,M2, ..., Mn. [6] . 12 3.1 Thread 1 steals work from thread 2 since both its local- as well

Parallelization of Sequential Applications Using .NET Framework 4.5

Technical Report Aaron Councilman

Intel® Cilk™ Plus

Part V Some Broad Topic

An Introduction to Parallel Programming

Lecture Notes : Intel Xeon Phi Coprocessor - an Overview

The Relevance of Opencl to HPC

Speculative Parallelism in Cilk++

Parallel Programming with Matlabmpi ∗

(12) United States Patent (10) Patent No.: US 8.209,664 B2 Yu Et Al

Parallel Programming in Openmp About the Authors

Dryadlinq for Scientific Analyses

Automatic and Explicit Parallelization Approaches for Mathematical Simulation Models