<<

International Journal of Engineering Technology, Management and Applied Sciences

www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476

A Review of Multicore Processors with Parallel Programming

Anchal Thakur Ravinder Thakur Research Scholar, CSE Department Assistant Professor, CSE L.R Institute of Engineering and Department Technology, Solan , India. L.R Institute of Engineering and Technology, Solan, India

ABSTRACT When the computers first introduced in the market, they came with single processors which limited the performance and efficiency of the computers. The classic way of overcoming the performance issue was to use bigger processors for executing the data with higher speed. Big did improve the performance to certain extent but these processors consumed a lot of power which started over heating the internal circuits. To achieve the efficiency and the speed simultaneously the CPU architectures developed multicore processors units in which two or more processors were used to execute the task. The multicore technology offered better response-time while running big applications, better and faster execution time. Multicore processors also gave developer an opportunity to parallel programming to execute the task in parallel. These days parallel programming is used to execute a task by distributing it in smaller instructions and executing them on different cores. By using parallel programming the complex tasks that are carried out in a multicore environment can be executed with higher efficiency and performance. Keywords: Multicore Processing, Multicore Utilization, Parallel Processing.

INTRODUCTION From the day computers have been invented a great importance has been given to its efficiency for executing the task. Manufactures are continuously researching and developing new technologies to improve the computers performance and efficiency. In general major components like RAM, , CPU, etc are the key factors that decide the computer efficiency [1]. For instance a computer that has a bigger memory and bigger processor will execute the task faster than the computer that has smaller CPU and RAM. Out of all the components mentioned above CPU or Central processing system, is the most important part of the computer design. The CPU is a like computers brain which takes input from the user through user interfaces and processes to execute the task. In past computers were manufactured with single core processor which dominated the computer industry for many years. The size of processor affected the speed of the computers task execution time. The earlier computers had a smaller CPU which limited the computer speed. To overcome the speed issue the manufacturer started introducing bigger processors that had a bigger clock. This did help to overcome the speed issues but the bigger processors consumed a lot of energy which overheated the internal circuits.

MULTICORE PROCESSORS To achieve the efficiency and the speed simultaneously the CPU architectures developed multicore processors units. A multi-core processor is an (IC) to which two or more processors has been attached to enhance performance and reduced power consumption [2]. Figure 1 shows the generic dual-core processor diagram.

Figure 1 Diagram of a generic dual-core processor with CPU-local level-1 caches and a shared, on-die level-2 cache [3] 7 Anchal Thakur, Ravinder Thakur

International Journal of Engineering Technology, Management and Applied Sciences

www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476

In 2001 IBM introduced the first dual core , Power-4 [4]. The biggest difference that can be noticed in a multicore system is the improved response-time while running big applications Additional benefits like better performance power management, faster execution time and multi-threading technique can also be achieved by using multicore processors. Since 2000 multiprocessor have been extensively used to achieve better performance in multilevel environment. Some of the example of the multi-processors are Core i3 (2 cores), Intel Core i5 (4 cores), etc. The design of multicore processors comes in two flavors, Homogenous multi-core processors and Heterogeneous multicore processors. The processors in which all the cores manufactured with identical core and perform the identical functions are called Homogeneous multi-core processors. All the cores have a shared view of memory with dynamic task allocation method for running the tasks. The multiple cores on a single chip but having a different functionality for each core are called Heterogeneous multi-core processors. These multi-core processors uses the static allocation method while assigning the tasks because it was designed in a way, each core is having static type of functionality. The main application of multi-core processors is found in embedded systems, data, web or web commerce , CAD/CAM, image processing, networking and graphics.

PARALLELISM IN MULTICORE Multicore processors are specifically designed to run the tasks in parallel. Parallelism can be at two levels in multicores, one at the hardware level and another one at level. Hardware parallelism is capable of running tasks in parallel at the machine level and control by the . It is achieved in different forms at the hardware level. a) Bit level parallelism: It is solely depends upon the word-size of the processor. Increasing the word-size reduces the number of instructions the processor execute to perform the operation. b) Instruction level parallelism: It is at machine-instruction level. It is measure that how many of the instructions in a program can executed simultaneously. Instructions those are independent of each other are run concurrently in the multicores. Software parallelism is also capable of running the tasks paralley and done at the program level. has to keep pace with the hardware. a) Data level parallelism: It is a programming mechanism where large data is split into smaller chunks that can be operated in parallel. Once the data is processed, it is combined into single dataset. b) Task level parallelism: A single task is split into independent sub-tasks and executed concurrently in software is known as the .

SERIAL AND PARALLE PROGRAMMING As mentioned earlier the computer were initially developed using single processing units. Serial programming was used to execute tasks on these single processors. In serial programming the instructions were executed one at time, which increased the task execution time [5]. Developing a bigger processor was the only solution back then to enhance the performance of the computers. Multicore processors allowed developer to use parallel programming. In general in parallel programming a complex task is distributed in smaller instructions and are executed on different cores; thereby reducing the task execution time. In computer software, a parallel programming model is a model for writing parallel programs which can be compiled and executed. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently they execute. The difference in execution way of serial programming and parallel programming shown in Figure 2, 3.

Figure2: Parallel programming [5]. Figure3: Serial programming [5].

8 Anchal Thakur, Ravinder Thakur

International Journal of Engineering Technology, Management and Applied Sciences

www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476

Classifications of parallel programming models can be divided broadly into two areas: interaction and problem decomposition. a) Process interaction: It relates to the mechanisms by which parallel processes are able to communicate with each other. The most common forms of interaction are and passing, but it can also be implicit [7]. b) Shared memory: Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors. In this model, parallel tasks share a global which they read and write to asynchronously. This requires protection mechanisms such as locks, semaphores and monitors to control concurrent access. Shared memory can be emulated on distributed-memory systems but non- (NUMA) times can come in to play. Sometimes memory is also shared between different section of code of the same program. E.g. A can create threads for each iteration which updates a variable in parallel [7]. ) : Message passing is a concept from that is used extensively in the design and implementation of modern software applications; it is key to some models of concurrency and object-oriented programming. In a message passing model, parallel tasks exchange data through passing to one another. These communications can be asynchronous or synchronous. The Communicating Sequential Processes (CSP) formalisation of message-passing employed communication channels to 'connect' processes, and led to a number of important languages such as Joyce, Occam and Erlang [7]. c) Implicit: In an implicit model, no process interaction is visible to the programmer, instead the and/or runtime is responsible for performing it. This is most common with domain-specific languages where the concurrency within a problem can be more prescribed [7]. d) Problem decomposition: A parallel program is composed of simultaneously executing. Problem decomposition relates to the way in which these processes are formulated. This classification may also be referred to as algorithmic skeletons or parallel programming paradigms [7]. e) Task parallelism: A task-parallel model focuses on processes, or threads of execution. These processes will often be behaviourally distinct, which emphasises the need for communication. Task parallelism is a natural way to express message-passing communication. It is usually classified as MIMD/MPMD or MISD [7]. f) : A data-parallel model focuses on performing operations on a data set which is usually regularly structured in an array. A set of tasks will operate on this data, but independently on separate partitions. In a shared memory system, the data will be accessible to all, but in a distributed-memory system it will divided between memories and worked on locally. Data parallelism is usually classified as SIMD/SPMD [7].

MULTICORE PARALLEL PROGRAMMING Programming for multicore processors poses new challenges. Since 2000 multiprocessor have been extensively used to achieve better performance in multilevel environment. Parallel programming further enhances the performance of the desktop based applications, web applications and softwares to a greater extent. Multicore technology has itself the embedded feature to run the tasks parallel as availability of multiple cores inside a single chip. The main objective of multicore processor architecture is the extraction of higher performance from multicores which depend upon an efficient parallel programming mechanism and its implementation. Most software companies only consider user requirements, when launching their softwares and do not put any consideration to the software efficiency in the multicore platform. These days the software developers are required to give a full consideration to kind of the software will run on. This could include multi-processor, multicore, etc. Application are expected to perform the better by using more cores, hardware threads and higher memory thereby meeting growing demands for performance and efficiency. Here some are points that must be considered while developing the parallel for the multicore platform. 1) Think parallel. Approach all problems looking for the parallelism. Understand where parallelism is, and organize thinking to express it. Decide on the best parallel approach before other design or implementation decisions. Learn to "Think Parallel"[6].

9 Anchal Thakur, Ravinder Thakur

International Journal of Engineering Technology, Management and Applied Sciences

www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476

2) Program using abstraction. Focus on writing code to express parallelism, but avoid writing code to manage threads or processor cores. Libraries, OpenMP, and Intel Threading Building Blocks are all examples of using abstractions. Do not use raw native threads (pthreads, Windows threads, threads, and the like). Threads and MPI are the assembly languages for parallelism. They offer maximum flexibility, but require too much time to write, debug, and maintain. Programming should be at a high-enough level that code is about problem, not about or core management [6]. 3) Program in tasks (chores), not threads (cores). Leave the mapping of tasks to threads or processor cores as a distinctly separate operation in program, preferably an abstraction are using that handles thread/core management. Create an abundance of tasks in program, or a task that can be spread across processor cores automatically (such as an OpenMP loop) [6]. 4) Design with the option to turn concurrency off. To simpler, create programs that can run without concurrency. This way, when debugging, can run programs first with—then without—concurrency, and if both runs fail or not. Debugging common issues is simpler when the program is not running concurrently because it is more familiar and better supported by today's tools. Knowing that something fails only when run concurrently hints at the type of bug are tracking down. If ignore this rule and can't force program to run in only one thread, spend too much time debugging. Since want to have the capability to run in a single thread specifically for debugging, it doesn't need to be efficient. The need to avoid creating parallel programs that require concurrency to work correctly, such as many producer-consumer models. MPI programs often violate this rule, which is part of the reason MPI programs can be problematic to implement and debug [6]. 5) Avoid using locks. Simply say "no" to locks. Locks slow programs, reduce their , and are the source of bugs in parallel programs. Make implicit the solution for program. When need explicit synchronization, use atomic operations. Use locks only as a last resort. Work hard to design the need for locks completely out of your program [6]. 6) Use tools and libraries designed to help with concurrency. Don't "tough it out" with old tools. Be critical of tool support with regards to how it presents and interacts with parallelism. Most tools are not yet ready for parallelism. Look for thread safe libraries—ideally ones that are designed to utilize parallelism themselves [6]. 7) Use scalable memory allocators. Threaded programs need to use scalable memory allocators. There are a number of solutions and I'd guess that all of them are better than malloc(). Using scalable memory allocators speeds up applications by eliminating global bottlenecks, reusing memory within threads to better utilize caches, and partitioning properly to avoid cache line sharing [6]. 8) Design to scale through increased workloads. The amount of work your program needs to handle increases over time. Plan for that. Designed with scaling in mind, your program will handle more work as the number of processor cores increase. Every year, we ask our computers to do more and more. Designs should favor using increases in parallelism to give advantages in handling bigger workloads in the future [6].

CONCLUSION A Higher performance and efficiency can be achieved by using multicore together with parallel programming. Multicore technology offers more than one core that is used to execute multiple tasks at the same time. Whereas parallel programming offer the algorithm, which is used to distribute the complex task in smaller instructions. These instructions are than executed on different cores. Performance of the system depends upon how efficiently the parallel mechanism has been implemented in the multicore of system. Parallel programming in the multicore platform increases the operating efficiency and performance of a system and application to a greater extent.

REFERENCES [1] Alex P. “ Factors.” :www.hitequest.com/Kiss/performance.htm. [2] Margaret Rouse. ”Multi-core processor.” Internet:www.searchdatacenter.techtarget.com/definition/ multi-core-processor. [3] Wikipedia,“MulticoreProcssors.”Internet:www.wikipedia.org/wiki/Multicore_processor#/media/File:Dual_Core_Generic.svg. [4] Prerna Saini, Ankit Bansal and Abhishek Sharma.” Time Critical Multitasking For Multicore Using Xmos® Kit.” International Journal of Embedded systems and Applications (IJESA), Vol.5, No.1, March 2015. [5] Ulrike Meier Yang. “A Tutorial.” Internet:ww.ima.umn.edu/2010-2011/T11.28- 29.10/activities /Yang-Ulrike/IMA- PPtTutorial.pdf. [6] James Reinders.” Rules for Parallel Programming for Multicore.” Internet: www.drdobbs.com/parallel/rules-for-parallel-programming-for- multi/201804248. Wikipedia, “Parallel Programming Model.”Internet:www.wikipedia.org/wiki/Parallel_ programm ing _model. 10 Anchal Thakur, Ravinder Thakur