Parallel Programming in .NET
Total Page:16
File Type:pdf, Size:1020Kb
Parallel Programming in .NET
Kevin Luty Department of Software Engineering University of Wisconsin-Platteville E-mail: [email protected]
Abstract
As hardware trends change from increasing clock speeds of processors to fitting multiple central processing units, or cores, into one processor, developers will now have to take into effect the efficiency of their code. Software engineers will now have to understand the design patterns and tools available to them in order to create fast, responsive applications to continuously appease customers on the widely accepted business platform, Microsoft Windows. The most recent .NET Framework 4 released by Microsoft has made parallel programming easier to design and code than ever before. To accomplish this, the software engineer will have to understand the difference between data parallelism versus task parallelism, and when to apply the appropriate software design to the given problems. This discussion will define parallelism, differentiate parallel loops versus parallel tasks, discuss how and when to use the appropriate design patterns, and briefly describe supporting frameworks and tools for parallel programming.
History
History of Parallel Hardware
In the late 1960’s and 1970’s, computer scientists made it possible with the help of hardware architecture to implement parallel computing into supercomputers. During the 1980’s, continuous development allowed scientists to build a supercomputer using 64 Intel 8086/8087 microprocessors. This proved that when using mass market chipsets, or massively parallel processors (MPPs), extreme performance was able to be obtained, thus research and development efforts would continue. [6]
In the 1980’s, clusters came about to replace applications built using MPPs. Clusters are essentially a parallel computing machine that is connected to a network using many off-the-shelf computers. Modern day clusters are now the dominating architecture of data centers around the world. [2]
As clock speeds have increased due to the decrease in size of transistors, the amount of cores in a processor is now the main focus in developing today’s processor. The reason for this shift of focus is because the efficiency benefits gained in creating multi-core processors outweighs the costs of increasing clock speeds of the processors. So, because of increasing restrictions and standards on energy efficiency with electronic devices, developing multi-core processors will become the primary focus. [1]
History of Parallel Software
In the early ages of parallel computing, many computers were shipped with a single core, single processor. Because of this, sequential programs have been easy to write versus parallel programs, as they still are today. Now that the amount of cores in processors has been increasing, it is the job of the software architects to understand the changes in the environment and adapt to them.
As the amount of cores in a processor started to increase quickly, the lack in number of APIs—a programming interface that allows the developer to use already developed, in this case, parallel programming code—that supported parallel programming made it hard for developers to create parallel applications. The reason for this being that the industry did not create standards for the parallel architectures [6]. In the 1990’s standards began to emerge for concurrent programming; by the year 2000, Message Passing Interface (MPI), POSIX threads (pthreads), and Open Multiprocessing (OpenMP), have all been created to help software developers become successful parallel program developers. [1]
Most recently, the newest libraries to the scene of Microsoft Windows parallel programming has been the release of .NET Framework 4. Included in this release were the Parallel Pattern Library (PPL), Task Parallel Library (TPL) and Parallel Language-Integrated Query (PLINQ), which are mostly commonly used with C++ and C#. These libraries have made it easier to implement design patterns, discussed later, in practice.
Benefits of Understanding Parallel Programming
As multi-core devices become more prominent in computing devices, the understanding of parallel programming allows a software developer to become a much more powerful and needed resource in the work place.
The main benefit of using parallel programming is to make efficient use of the cores in a processor. As many software developers still write sequential programs—because they have no knowledge of parallel programming—they are not making use of the other cores. For instance, if a program is to initialize an array of size 1,000,000, the software will loop through said amount of times. Using the TPL, a developer can use a parallel loop that will automatically create multiple tasks, on separate threads, which divides the initialization process evenly among cores. Dividing the work among the cores means that there will be a significant decrease in the amount of time it takes to complete the task. This however, is a simple application of parallel programming. Later discussion will inform on when to apply parallel programming techniques.
In addition, parallel programming in .NET was written to automatically handle the hardware capabilities. This means that if there is only one core, or many cores, parallel programming in .NET will handle all the situations and will make use of the hardware when it is readily available. In other words, if a parallel program is run on a processor with one core, it will run the same as a sequential program [1]. Since .NET handles this, it takes a great amount of pressure off of the developer, allowing him or her to focus strictly on the software instead of worrying about how the hardware handles the code.
Another benefit to understanding parallel programming in will allow the software developer to successfully debug their software using tools available through organizations that provide parallel programming libraries. In .NET, Visual Studio 2010 allows the user to run the Performance Profiler. The Performance Profiler outputs visual representations of concurrency problems, CPU usage, and other useful information that allows the developer to implement more productive software.
The number of benefits is endless; however, it is important to address the most important. Later in this discussion it will be easy to pick out numerous advantages of why one should understand parallel programming.
Parallel Programming in .NET Defined
Although it is easy to see the benefits of parallel programming from the reading above, there is a whole new aspect to software design patterns and practices and when they should be applied. This discussion will cover how to define data parallelism versus task parallelism design practices; then extend into the programming techniques used for each type, as illustrated in Figure 1.
Figure 1: Parallel programming design patterns for each type of parallelism [1]. Identifying Data Parallelism and Task Parallelism
Data Parallelism
Data parallelism is the process of making changes to many data types in a set simultaneously [3]. Additionally, data parallelism can be understood as performing the same operation to a set of data types at the same time [4]. For instance, if an array of 50 strings needed to be reversed, it is reasonable to use parallel programming techniques to call the “stringName.Reverse()” function because all of the data is independent of each other. On the opposite side of the spectrum it would be useless to try and concatenate pairs of strings in the array because it eliminates the dependencies of the data set.
Task Parallelism
Task parallelism is the process of executing a number of tasks concurrently [4]. Since tasks run on separate threads, we can use task parallelism to complete operations in parallel, then later link the tasks to work together by using “fork/join” to achieve an expected outcome [1]. In general, task parallelism is much more difficult to design for due to the extensive scheduling that needs to take place when executing tasks in parallel; however, it can be done.
Design Patterns
The most important part of choosing the correct design to use when implementing a parallel program is taking into effect the potential parallelism. Potential parallelism is the notion of correctly identifying when it is acceptable to use parallel programming techniques so the software runs faster when hardware is readily available [1]. The design patterns covered in this discussion are parallel loops, parallel tasks, parallel aggregation, futures, and parallelism.
Parallel Loops
The importance of the Parallel Loops pattern is to make absolutely certain that the data set that is being operated on is independent of each other, in other words the steps of a loop must not change shared variables of a data element. Additionally, a developer should identify the problem at hand, or opportunities, before implementing parallel loops. The two types of parallel loops are parallel for-loop and the parallel for-each loop.
Due to the simplicity of converting a for-loop into a parallel for-loop, a common misunderstanding is that they perform the same. The only guarantee parallel loops have is that all data elements will be changed by the end of loop, meaning loops that have loop body dependency will fail. The most prevalent case of this is when a developer tries to create a loop sums up the total value of an array. The best way to identify if there is a loop body dependency is if a variable is declared outside of the scope of the loop. Lastly, it is also safe to assume loops that have a step size other than one are data dependent. [1]
A benefit of using parallel loops is the usefulness of the exception handling that comes with them. Exception handling that is used in sequential code practices can be used the same way in parallel programming with one exception. When an exception is thrown, it will then become a part of a set of exceptions; and this set of exceptions is of type “System.AggregateException.” Within this set of exceptions is it easy to see what exceptions have occurred for loop iteration. Additionally, it will provide the developer with what operation was being executed during the time of the exception. [1] [4] [10]
As an added bonus, the parallel loops also come with a safety harness. Since parallel loops are partition focused—meaning they split up iterations among cores—they still communicate at the system level. For instance, if a parallel loop were to throw an exception in the first iteration of 50, the TPL will halt all other iterations on each core before the software becomes overwhelmed with exception handling. This is particularly good for loops with a large range to iterate through. [1] [4]
The added advantage of using the Parallel Loops design pattern is having the ability to customize the performance of the loops. Microsoft has now made it easy enough to change a few fields to increase or decrease performance of our loops. This would be useful is making products more valuable assuming that the customer has the hardware to support the implemented parallelism. The TPL expose options, namely ParallelOptions, of the parallel loops, like MaxDegreeOfParallelism, MinDegreeOfParallelism, and SetMaxThreads, which all allow the software developer to determine how many cores the software is run on [1] [4]. Making use of these options would allow the business to throttle speeds of the software, ultimately increasing profits so long as the customer is willing to pay.
One last common problem with parallel loops is oversubscription and undersubscription. Oversubscription occurs when there are too many threads for the amount of logical cores and the tasks on the threads take an longer than normal time to run [9]. Simply put, if there are eight threads created with only four cores available, the cores have more threads subscribed to them than can take care of. On the other hand, undersubscription is when cores aren’t being used when they are free to work [9]. So, if there are four threads created with four cores available, and the developer sets the MaxDegreeOfParallelism to two, that would mean two cores are doing all of the work when it could be split up evenly among the four cores. In short, the optimum number of threads for a parallel loop is equal to the number of logical cores divided by the average fraction of core utilization per task. Figure 2 represents the calculation of a processor with four cores where each task uses 10% of a single core’s resource, where is the optimum number of threads each core should run.
Figure 2: Calculation to find optimum number of threads per core. Parallel Aggregation
Parallel Aggregation design patterns are somewhat similar to Parallel Loops. Parallel Aggregation or the Map/Reduce Pattern, is specifically for the “computing sum” example discussed in Parallel Loops. The only difference is that in Parallel Aggregation the computed sum will be data elements that are using unshared, local variables [1]. In short, Parallel Aggregation uses the input from multiple elements and combines them for a single output [1].
An example that uses Parallel Aggregation would be: a developer is given an amount N, number of arrays containing all similar data elements. The developer then must accumulate the subtotal, and add all of the subtotals together, resulting in one final total. Since there are multiple inputs and one final output, it is obvious we will want to use the Parallel Aggregation pattern [10]. This can be done by using the parallel for-loop or parallel for-each loop with the added PLINQ merge command will allow the developer to get the proper result.
Parallel Aggregation is most effective when using PLINQ. Although the details about the syntax of PLINQ are out of the scope of this discussion, it is a helpful library to learn in order to cut costs while parallel programming.
Parallel Tasks
The Parallel Task design pattern has made asynchronous programming much easier due to the intricate design of the System.Threading.Tasks implementation. Now, in .NET is it easy to create new threads to complete tasks using the Task.Factory, and this allows the software developers to create asynchronous code. Using these resources allows Windows to use it’s built in Task Scheduler to automatically handle threads on different cores, thus increasing the speed of programs.
It is helpful to remember what a task is. A task is a single operation that can run asynchronously, on a different thread, without any noticeable changes happening in the software that creates it [5]. With that being said, the Parallel Task pattern can now be applied to an example.
A situation where the Parallel Task pattern should be used would be an application where multiple operations should run concurrently. A prime example would be implementing a chart that trends real-time data collection that is being read asynchronously with an external hardware I/O card. Assuming the chart is tracking multiple variables (frequency, voltage, current, etc.) and has multiple axes for each variable, it should update all the information as fast as it can, at the same time for each variable. To do this, it would be required to use the Task.Factory.StartNew() for each task that needs to be started. In this case, the Tasks would be to ReadFrequency, ReadVoltage, and ReadCurrent for each variable that is going to be trended. Calling the StartNew function for each Task in a while-loop followed by the WaitAll() call would then create a new thread for each Task, execute the communications to retrieve each value asynchronously, then the software would wait for all of the Tasks to complete. When that is done, the values that result in the completed Tasks can be used to update the charts and the while loop can be executed again.
Like the Parallel Loops design pattern, the TPL handles exceptions for the Task class the same way. The added feature is that the developer can now use the InnerException of the AggregateException exception set to determine where specifically in the code the exception is occurring. This is important because the InnerException holds key information like which thread had thrown the exception along with any exceptions that occurred with the function calls inside the Task.
Futures
Futures is a design pattern that can be compared with household activities. For example, “while brushing teeth, put slippers on and let the dog out.” A Future software design is based on how the developer forks the flow of the control in a program [1]. The fork in the previous example would be at the brushing teeth, and it would fork into two other tasks, putting slippers on and letting the dog out. In the end, it will result in one overall output, the dog will be let out, teeth will be brushed, and the slippers will be put on.
To better understand the analogy, the tasks in the Futures pattern are also described as continuation tasks [9]. The name, Futures, means that a task can be started on a separate thread while the software continues to run. Then, another function can use the Future result as a parameter. If the task has not completed nor returned the passed literal future result, the function will wait for the task to finish; otherwise it will immediately begin if the task has already returned the result.
Figure 3 helps to identify when a Future design could be ported from sequential code. When there is code that depends on the previous result of a task, then implementing a Future design would be a great idea. In Figure 3, it is noticeable that variable f cannot be complete until variables b and d is computed.
Figure 3: Sequential code.
Figure 4 is the parallel version of Figure 3. The task, named futureB, calculates in parallel with c and d, and the result can later be used to calculate f. The advantage of using the Result property is to prevent the user from having to poll to check if the task is done running. If futureB isn’t finished, the TPL knows to automatically wait for the result before f tries to be calculated. This demonstrates the powerful potential of TPL because the .NET framework handles these cases automatically, whereas years ago, everything would have to be done by the developer. Figure 4: Parallel code implementation of a Future design code snippet.
Pipelines
The pipeline design is used when there is a specific process of tasks that will be completed in order every time. For instance, consider the process of preparing a bowl of cereal. One possibility of a process would be: get bowl, open cereal container, pour cereal into bowl, open milk, and pour milk into cereal. Following this process every time will create the same expected outcome, and because of this the software can be designed using the pipeline design.
The pipeline design will usually use the collection class called BlockingCollection. With this type, the developer has the ability to limit the capacity of items, or tasks, in the collection and can has a degree of control with how fast tasks are processing. Since the BlockingCollection container is derived from .NET’s Concurrency object, they can automatically release and accept tasks that want to be removed or added.
Applying Figure 5 to a structure similar to what was mentioned above; helps visualize the power of usefulness of BlockingCollection. BlockingCollections can be thought of as queues. When an object is added to the collection, the collection can run the task for an object automatically, and when it is done, it will release it to the next blocking collection. What this does it eliminates the need to poll the thread to see if it is done with the task. This significantly increases efficiency of a core, ultimately creating faster response times.
Figure 5: BlockingCollections acting as buffers for a pipeline design [1]. Tools
With a swift Google search of “Performance Profiler” will result in a list of 3rd party profilers available to multiple operating systems, however the best profiler for the .NET developer is the native tool built into Visual Studio 2010.
Visual Studio 2010
Visual Studio 2010 Ultimate and Premium include the tools to analyze .NET software written both sequentially and in parallel. This tool is called the Performance Profiler. Here a profile can be created to see thread contentions during runtime of software. The output would display a chart as seen in Figure 6.
Figure 6: Output from the thread contention tool in VS2010 [1].
Using a tool like this allows the developer to look at each thread created by the software and by using the x-axis as a time reference, sees if any threads are in contention. Furthermore, the developer can use the zoom/pan options to fit a given window and determine how long threads are in contention. Also the tool will specifically say which and how many threads are in contention.
This tool is important for debugging purposes because it gets down to the nitty-gritty of the software. With this tool, the developer can not only see thread contentions, but also tell which specific pieces of code within the thread are causing the problems. This is a significant factor in determine if the developer is correctly using parallel programming techniques.
Supportive Libraries Although they are not required, Microsoft has introduced libraries that have made parallel programming in .NET more easy to do and faster to write. In much more detail, the Rx and PLINQ libraries will be briefed to gain conceptual understanding.
PLINQ
Parallel Language-Integrated Query, or PLINQ, is an integrated query language that was built with the same intentions as LINQ which was introduced with .NET Framework 3.5. The use of Parallel LINQ makes it easy for the software developer to retrieve a collection of objects with the similar syntax used in database queries. Although it is not required to use PLINQ to implement a powerful parallel program, it does have advantages [8]. The most advantageous aspect of PLINQ is the fact a developer can write a query to retrieve custom objects.
With over 200 different extensions to PLINQ, using the correct syntax, at the right time, can make or break the speed of the software [5]. For example, it would be more efficient to write a for-loop for a small set of known data than to write a query statement, it simply goes faster. There is no set way to calculate if your query is faster than your loop without physically timing it, however PLINQ will also make use of multiple cores when executing the query. In short, it is up to the discretion of the developer to make correct use of PLINQ.
Rx
Reactive Extensions, or Rx, is a highly supportive library used by some parallel programmers. Introduced in .NET Framework 3.5 SP1, it provides additional LINQ to Object queries available to the developer [7]. The main necessity of Rx in the parallel programming world is not only to make use of additional queries, but also implement the Push or Pull model in parallel programs. For example, Rx could be implemented to push the results of one task, Task A, to another task, Task B. An example of the pull method would be: Task B would implement a subscriber that would watch for data to be ready in Task A, and pull it from Task A when it has a valid result. To remind the reader, Reactive Extensions would be highly valuable when implementing the Futures parallel design pattern.
Conclusion
Parallel computing is has been around for decades, and not until recent has it been made easy to implement parallel software. With the most recent release of .NET Framework 4, Microsoft has eased the pressure put on software developers to take into account the way their software affects the hardware. Due to the research put into parallel programming in .NET, software engineers can now implement some design patterns like, Parallel Loop, Parallel Tasks, and Futures to successfully develop a software application. From old parallel libraries to new, it is always important to remember that timely response is a golden rule of software design; and parallel programming is now in the eye of the developer. References
[1] Campbell, Colin, et al. Parallel Programming with Microsoft .NET: Design Patterns for
Decomposition and Coordination on Multicore Architectures. Microsoft. 2010. Print.
[2] Computer cluster (n.d.). FL: Wikimedia Foundation, Inc. Retrieved October 31, 2012, from
http://en.wikipedia.org/wiki/Computer_cluster
[3] Data Parallelism (n.d.). FL: Wikimedia Foundation, Inc. Retrieved October 31, 2012, from
http://en.wikipedia.org/wiki/Data_parallelism
[4] Hillar, Gaston C. Professional Parallel Programming with C#: Master Parallel Extensions
with .NET 4. Indiana: Wiley. 2011. Print.
[5] J. Albahari and B. Albahari. C# 4 in a Nutshell. O’Reilly, fourth edition, 2010. Print.
[6] MSDN. THE MANYCORE SHIFT: Microsoft Parallel Computing Initiative Ushers
Computing into the Next Era. (2007, November). Retrieved October 31, 2012, from
http://www.intel.com/pressroom/kits/upcrc/ParallelComputing_backgrounder.pdf
[7] Rx Extensions (n.d.). In MSDN. Retrieved October 31, 2012, from
http://msdn.microsoft.com/en-us/data/gg577609.aspx
[8] Skeet, Jon. C# In Depth. Connecticut: Manning. 2011. Print. [9] T. G. Mattson, B. A. Sanders, and B. L. Massingill. Patterns for Parallel Programming.
Addison-Wesley, 2004. Print
[10] Toub, Stephan. (2010, July 10). Patterns of Parallel Programming CSharp. Patterns of
Parallel Programming: Understanding and Applying Parallel Patterns with the .NET
Framework 4 and Visual C#. Retrieved October 30, 2012, from
http://www.microsoft.com/en-us/download/details.aspx?id=19222