Multi-Core Microprocessors

GENERAL ARTICLE Multi-core Microprocessors V Rajaraman Multi-core microprocessor is an interconnected set of independent processors called cores integrated on a single silicon chip. These processing cores communicate and cooperate with one another to execute one or more programs faster than a single core processor. In this article we describe how and why these types of processors evolved. We also describe the basics of how multi-core microprocessors are programmed. V Rajaraman is at the Indian Institute of Science, Bengaluru. Several Introduction generations of scientists and engineers in India have learnt Today every computer uses a multi-core microprocessor. This is computer science using his not only true for desktop computers but also for laptops, tablets, lucidly written textbooks on and even microprocessors used in smartphones. Each ‘core’ is programming and computer fundamentals. His current an independent processor and in multi-core systems these cores research interests are parallel work in parallel to speed up the processing. computing and history of computing. Reviewing the evolution of microprocessors, the first processor that came integrated in a chip was the Intel 4004, a 4-bit microprocessor that was introduced in 1971 and was primarily used in pocket calculators. This was followed by the Intel 8008, an 8- bit microprocessor introduced in 1972 that was used to build a full-fledged computer. Based on empirical data, Gordon Moore of Fairchild Semiconductors predicted that the number of transistors which could be packed in an integrated circuit would double almost every two years. This is often called Moore’s law and has surprisingly been accurate till date (See Box 1). The increase in the number of transistors in a microprocessor was facilitated by decreasing the size of each transistor. Decreasing the size of transistors enabled faster clock speeds to be used to drive the circuits using these transistors and resultant improvement in the process- Keywords ing speed. Packing more transistors in a chip also enabled de- Moore’s law, evolution of multi- signers to improve the architecture of microprocessors in many core processors, programming multi-core processors. RESONANCE | December 2017 1175 GENERAL ARTICLE The speed mismatch ways. The first improvement was increase in the chunks of data between the processor that could be processed in each clock cycle. It was increased from and the main memory in 8 bits to 64 bits by doubling the data path width every few years a microcomputer was reduced by increasing (See Table 1). Increasing the data path width also allowed a mi- the number of registers croprocessor to directly address a larger main memory. Through- in the processor and by out the history of computers, the processor was much faster than introducing on-chip the main memory. Thus, reducing the speed mismatch between cache memory. the processor and the main memory in a microcomputer was the next problem to be addressed. This was done by increasing the number of registers in the processor and by introducing on-chip cache memory using the larger number of transistors that could be packed in a chip. Availability of a large number of transistors in a chip also enabled architects to increase the number of arithmetic units in a processor. Multiple arithmetic units allowed a processor to execute several instructions in one clock cycle. This is called instruction level parallel processing. Another architectural method of increasing the speed of processing besides increasing the clock speed was by pipelining. In pipelining, the instruction cycle of a processor is broken up into p steps, each taking ap- proximately equal time to execute. The p steps of a set of sequential instructions are overlapped when a set of instructions are executed sequentially (as in an assembly line) thereby increasing the speed of execution of a large sequence of independent instructions p fold. All these techniques, namely, increasing the clock frequency, increasing the data path width, executing several instructions in one clock cycle, increasing on-chip memory, and pipelining that were used to increase the speed of a single processor in a chip could not be sustained as will be explained in what follows. Why Multi-core? The number of We saw in the last section that the number of transistors packed transistors packed in a single silicon chip has in a single silicon chip has been doubling every two years with been doubling every two the result that around 2006 designers were able to pack about 240 years. million transistors in a chip. 1176 RESONANCE | December 2017 GENERAL ARTICLE Box 1. Moore’s Law In 1965 Gordon Moore* who was working at Fairchild Semiconductors predicted, based on empirical data available since integrated circuits were invented in 1958, that the number of transistors in an integrated circuit chip would double every two years. This became a self-fulfilling prophecy in the sense that the semiconductor industry took it as a challenge and as a benchmark to be achieved. A semiconductor manufacturers’ association was formed comprising chip fabricators and all the suppliers who supplied materials and components to the industry to cooperate and provide better components, purer materials, and better fabrication techniques that enabled doubling the number of transistors in chips every two years. Table 1 indicates how the number of transistors has increased in microprocessor chips made by Intel – a major manufacturer of integrated circuits. In 1971, there were 2300 transistors in the microprocessors made by Intel. By 2016, it had reached 7,200,000,000 – an increase by a factor about 3 × 106 in 23 years, very close to Moore’s law’s prediction. The width of the gate in a transistor in the present technology (called 14 nm technology) is about 50 times the diameter of a silicon atom. Any further increase in the number of transistors fabricated in a planar chip will lead to the transistor gate size approaching 4 to 5 atoms. At this size quantum effects will be evident and a transistor may not work reliably. Thus we may soon reach the end of Moore’s law as we know it now [1]. *Gordon E Moore, Cramming More Components, Electronics, Vol.38, No.8, pp.114–117, April 1965. The question arose on how to effectively use these transistors in designing microprocessors. One major point considered was that most programs are written using a sequential programming language such as C or C++. Many instructions in these languages could be transformed by compilers to a set of instructions that could be carried out in parallel. For example, in the instruction: a = (b + c) − (d ∗ f ) − (g/h), the add operation, the multiply operation, and the divide operation could be carried out simultaneously, i.e., in parallel. This is called instruction-level parallelism. This parallelism could be exploited provided there are three arithmetic units in the processor. Thus designers began putting many arithmetic units in processors. However, the number of arithmetic units could not be In some programs, a set increased beyond four as this simple type of parallelism does not of threads are occur often enough in programs and that resulted in idle arith- independent of one another and could be metic units. Another parallelism is thread-level parallelism [2]. carried out concurrently A thread in a program may be defined as a small set of sequential using multiple arithmetic units. RESONANCE | December 2017 1177 GENERAL ARTICLE Year Intel Processor Data Path Number of Clock Model Width Transistors Speed 1971 4004 4 2300 740 KHz 1972 8008 8 3500 500 KHz 1977 8085 8 6500 3MHz 1978 8086 16 29,000 5MHz 1982 80186 16 55,000 6MHz 1982 80286 16 134,000 6MHz 1985 80386 32 275,000 16–40 MHz 1989 80486 32 1,180,000 25 MHz 1993 Pentium 1 32 3,100,000 60–66 MHz 1995 Pentium Pro 32 5,500,000 150–200 MHz 1999 Pentium 3 32 9,500,000 450–660 MHz 2001 Itanium 1 64 25,000,000 733–800 MHz 2003 Pentium M 64 77,000,000 0.9–1.7 GHz 2006 Core 2 duo 64 291,000,000 1.8–2 GHz 2008 Core i7 Quad 64 730,000,000 2.66–3.2 GHz 2010 8 core Xeon 64 2,300,000,000 1.73-2.66 GHz Nehalem – Ex 2016 22 core Xeon 64 7,200,000,000 2.2–3.6 GHz Boradwell Table 1. Timeline of ∗ progress of Intel proces- (*Intel processors are chosen as they are widely used and represent how a major sors. manufacturer of microprocessors steadily increased the number of transistors in their processor and the clock speed. The data given above is abstracted from the articles on Intel microprocessor chronology and on transistor counts in microprocessors published by Wikipedia. The numbers in the table are indicative as there are numerous models of the same Intel microprocessor such as Pentium 3). instructions that can be scheduled to be processed as a unit shar- ing the same processor resources. A simple example of a thread is a set of sequential instructions in a for loop. In some programs, a set of threads are independent of one another and could be carried out concurrently using multiple arithmetic units. In practice, it is found that most programs have limited thread-level paral- 1178 RESONANCE | December 2017 GENERAL ARTICLE lelism. Thus it is not cost effective to increase arithmetic units in a processor beyond three or four to assist thread-level parallel processing. Novel architectures such as very long instruction word processors in which an instruction length was 128 to 256 bits and packed several instructions that could be carried out in parallel were also designed.

Multi-Core Microprocessors

Floating-Point Package for Intel 8008 and 8080 Microprocessors

2.5 Classification of Parallel Computers

Chapter 5 Multiprocessors and Thread-Level Parallelism

Computer Organization and Architecture Designing for Performance Ninth Edition

Chapter 5 Thread-Level Parallelism

Lecture 1: Course Introduction G Course Organization G Historical Overview G Computer Organization G Why the MC68000? G Why Assembly Language?

The Birth, Evolution and Future of Microprocessor

Trafficdb: HERE's High Performance Shared-Memory Data Store

Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University the Intel® 4004 Microprocessor, Introdu

Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN)

Microprocessors in the 1970'S

Scheduling of Shared Memory with Multi - Core Performance in Dynamic Allocator Using Arm Processor