A Review of Multicore Processors with Parallel Programming

International Journal of Engineering Technology, Management and Applied Sciences www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476 A Review of Multicore Processors with Parallel Programming Anchal Thakur Ravinder Thakur Research Scholar, CSE Department Assistant Professor, CSE L.R Institute of Engineering and Department Technology, Solan , India. L.R Institute of Engineering and Technology, Solan, India ABSTRACT When the computers first introduced in the market, they came with single processors which limited the performance and efficiency of the computers. The classic way of overcoming the performance issue was to use bigger processors for executing the data with higher speed. Big processor did improve the performance to certain extent but these processors consumed a lot of power which started over heating the internal circuits. To achieve the efficiency and the speed simultaneously the CPU architectures developed multicore processors units in which two or more processors were used to execute the task. The multicore technology offered better response-time while running big applications, better power management and faster execution time. Multicore processors also gave developer an opportunity to parallel programming to execute the task in parallel. These days parallel programming is used to execute a task by distributing it in smaller instructions and executing them on different cores. By using parallel programming the complex tasks that are carried out in a multicore environment can be executed with higher efficiency and performance. Keywords: Multicore Processing, Multicore Utilization, Parallel Processing. INTRODUCTION From the day computers have been invented a great importance has been given to its efficiency for executing the task. Manufactures are continuously researching and developing new technologies to improve the computers performance and efficiency. In general major components like RAM, BUS, CPU, etc are the key factors that decide the computer efficiency [1]. For instance a computer that has a bigger memory and bigger processor will execute the task faster than the computer that has smaller CPU and RAM. Out of all the components mentioned above CPU or Central processing system, is the most important part of the computer design. The CPU is a like computers brain which takes input from the user through user interfaces and processes to execute the task. In past computers were manufactured with single core processor which dominated the computer industry for many years. The size of processor affected the speed of the computers task execution time. The earlier computers had a smaller CPU which limited the computer speed. To overcome the speed issue the manufacturer started introducing bigger processors that had a bigger clock. This did help to overcome the speed issues but the bigger processors consumed a lot of energy which overheated the internal circuits. MULTICORE PROCESSORS To achieve the efficiency and the speed simultaneously the CPU architectures developed multicore processors units. A multi-core processor is an integrated circuit (IC) to which two or more processors has been attached to enhance performance and reduced power consumption [2]. Figure 1 shows the generic dual-core processor diagram. Figure 1 Diagram of a generic dual-core processor with CPU-local level-1 caches and a shared, on-die level-2 cache [3] 7 Anchal Thakur, Ravinder Thakur International Journal of Engineering Technology, Management and Applied Sciences www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476 In 2001 IBM introduced the first dual core microprocessor, Power-4 [4]. The biggest difference that can be noticed in a multicore system is the improved response-time while running big applications Additional benefits like better performance power management, faster execution time and multi-threading technique can also be achieved by using multicore processors. Since 2000 multiprocessor have been extensively used to achieve better performance in multilevel environment. Some of the example of the multi-processors are Intel Core i3 (2 cores), Intel Core i5 (4 cores), etc. The design of multicore processors comes in two flavors, Homogenous multi-core processors and Heterogeneous multicore processors. The processors in which all the cores manufactured with identical core and perform the identical functions are called Homogeneous multi-core processors. All the cores have a shared view of memory with dynamic task allocation method for running the tasks. The multiple cores on a single chip but having a different functionality for each core are called Heterogeneous multi-core processors. These multi-core processors uses the static allocation method while assigning the tasks because it was designed in a way, each core is having static type of functionality. The main application of multi-core processors is found in embedded systems, data, web server or web commerce signal processing, CAD/CAM, image processing, networking and graphics. PARALLELISM IN MULTICORE Multicore processors are specifically designed to run the tasks in parallel. Parallelism can be at two levels in multicores, one at the hardware level and another one at software level. Hardware parallelism is capable of running tasks in parallel at the machine level and control by the operating system. It is achieved in different forms at the hardware level. a) Bit level parallelism: It is solely depends upon the word-size of the processor. Increasing the word-size reduces the number of instructions the processor execute to perform the operation. b) Instruction level parallelism: It is at machine-instruction level. It is measure that how many of the instructions in a program can executed simultaneously. Instructions those are independent of each other are run concurrently in the multicores. Software parallelism is also capable of running the tasks paralley and done at the program level. Softwares has to keep pace with the hardware. a) Data level parallelism: It is a programming mechanism where large data is split into smaller chunks that can be operated in parallel. Once the data is processed, it is combined into single dataset. b) Task level parallelism: A single task is split into independent sub-tasks and executed concurrently in software is known as the task parallelism. SERIAL AND PARALLE PROGRAMMING As mentioned earlier the computer were initially developed using single processing units. Serial programming was used to execute tasks on these single processors. In serial programming the instructions were executed one at time, which increased the task execution time [5]. Developing a bigger processor was the only solution back then to enhance the performance of the computers. Multicore processors allowed developer to use parallel programming. In general in parallel programming a complex task is distributed in smaller instructions and are executed on different cores; thereby reducing the task execution time. In computer software, a parallel programming model is a model for writing parallel programs which can be compiled and executed. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently they execute. The difference in execution way of serial programming and parallel programming shown in Figure 2, 3. Figure2: Parallel programming [5]. Figure3: Serial programming [5]. 8 Anchal Thakur, Ravinder Thakur International Journal of Engineering Technology, Management and Applied Sciences www.ijetmas.com September 2015, Volume 3, Issue 9, ISSN 2349-4476 Classifications of parallel programming models can be divided broadly into two areas: process interaction and problem decomposition. a) Process interaction: It relates to the mechanisms by which parallel processes are able to communicate with each other. The most common forms of interaction are shared memory and message passing, but it can also be implicit [7]. b) Shared memory: Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors. In this model, parallel tasks share a global address space which they read and write to asynchronously. This requires protection mechanisms such as locks, semaphores and monitors to control concurrent access. Shared memory can be emulated on distributed-memory systems but non-uniform memory access (NUMA) times can come in to play. Sometimes memory is also shared between different section of code of the same program. E.g. A For loop can create threads for each iteration which updates a variable in parallel [7]. c) Message passing: Message passing is a concept from computer science that is used extensively in the design and implementation of modern software applications; it is key to some models of concurrency and object-oriented programming. In a message passing model, parallel tasks exchange data through passing messages to one another. These communications can be asynchronous or synchronous. The Communicating Sequential Processes (CSP) formalisation of message-passing employed communication channels to 'connect' processes, and led to a number of important languages such as Joyce, Occam and Erlang [7]. c) Implicit: In an implicit model, no process interaction is visible to the programmer, instead the compiler and/or runtime is responsible for performing it. This is most common with domain-specific languages where the concurrency within

A Review of Multicore Processors with Parallel Programming

Parallel Prefix Sum (Scan) with CUDA

2.5 Classification of Parallel Computers

Massively Parallel Computing with CUDA

Parallel Computer Architecture

CSE373: Data Structures & Algorithms Lecture 26

CSE 613: Parallel Programming Lecture 2

Parallel Algorithms and Parallel Program Design

Vector Vs. Scalar Processors: a Performance Comparison Using a Set of Computational Science Benchmarks

Parallel Processing! 1! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 2! Suggested Readings! •! Readings! –! H&P: Chapter 7! •! (Over Next 2 Weeks)!

Challenges for the Message Passing Interface in the Petaflops Era

A Review of Parallel Processing Approaches to Robot Kinematics and Jacobian

A Survey on Parallel Multicore Computing: Performance & Improvement