Multi-Core Microprocessors

Total Page:16

File Type:pdf, Size:1020Kb

Multi-Core Microprocessors GENERAL ARTICLE Multi-core Microprocessors V Rajaraman Multi-core microprocessor is an interconnected set of inde- pendent processors called cores integrated on a single sili- con chip. These processing cores communicate and cooperate with one another to execute one or more programs faster than a single core processor. In this article we describe how and why these types of processors evolved. We also describe the basics of how multi-core microprocessors are programmed. V Rajaraman is at the Indian Institute of Science, Bengaluru. Several Introduction generations of scientists and engineers in India have learnt Today every computer uses a multi-core microprocessor. This is computer science using his not only true for desktop computers but also for laptops, tablets, lucidly written textbooks on and even microprocessors used in smartphones. Each ‘core’ is programming and computer fundamentals. His current an independent processor and in multi-core systems these cores research interests are parallel work in parallel to speed up the processing. computing and history of computing. Reviewing the evolution of microprocessors, the first processor that came integrated in a chip was the Intel 4004, a 4-bit micro- processor that was introduced in 1971 and was primarily used in pocket calculators. This was followed by the Intel 8008, an 8- bit microprocessor introduced in 1972 that was used to build a full-fledged computer. Based on empirical data, Gordon Moore of Fairchild Semiconductors predicted that the number of transis- tors which could be packed in an integrated circuit would double almost every two years. This is often called Moore’s law and has surprisingly been accurate till date (See Box 1). The increase in the number of transistors in a microprocessor was facilitated by decreasing the size of each transistor. Decreasing the size of tran- sistors enabled faster clock speeds to be used to drive the circuits using these transistors and resultant improvement in the process- Keywords ing speed. Packing more transistors in a chip also enabled de- Moore’s law, evolution of multi- signers to improve the architecture of microprocessors in many core processors, programming multi-core processors. RESONANCE | December 2017 1175 GENERAL ARTICLE The speed mismatch ways. The first improvement was increase in the chunks of data between the processor that could be processed in each clock cycle. It was increased from and the main memory in 8 bits to 64 bits by doubling the data path width every few years a microcomputer was reduced by increasing (See Table 1). Increasing the data path width also allowed a mi- the number of registers croprocessor to directly address a larger main memory. Through- in the processor and by out the history of computers, the processor was much faster than introducing on-chip the main memory. Thus, reducing the speed mismatch between cache memory. the processor and the main memory in a microcomputer was the next problem to be addressed. This was done by increasing the number of registers in the processor and by introducing on-chip cache memory using the larger number of transistors that could be packed in a chip. Availability of a large number of transistors in a chip also enabled architects to increase the number of arithmetic units in a processor. Multiple arithmetic units allowed a proces- sor to execute several instructions in one clock cycle. This is called instruction level parallel processing. Another architectural method of increasing the speed of processing besides increasing the clock speed was by pipelining. In pipelining, the instruction cycle of a processor is broken up into p steps, each taking ap- proximately equal time to execute. The p steps of a set of se- quential instructions are overlapped when a set of instructions are executed sequentially (as in an assembly line) thereby increas- ing the speed of execution of a large sequence of independent instructions p fold. All these techniques, namely, increasing the clock frequency, increasing the data path width, executing several instructions in one clock cycle, increasing on-chip memory, and pipelining that were used to increase the speed of a single proces- sor in a chip could not be sustained as will be explained in what follows. Why Multi-core? The number of We saw in the last section that the number of transistors packed transistors packed in a single silicon chip has in a single silicon chip has been doubling every two years with been doubling every two the result that around 2006 designers were able to pack about 240 years. million transistors in a chip. 1176 RESONANCE | December 2017 GENERAL ARTICLE Box 1. Moore’s Law In 1965 Gordon Moore* who was working at Fairchild Semiconductors predicted, based on empirical data available since integrated circuits were invented in 1958, that the number of transistors in an integrated circuit chip would double every two years. This became a self-fulfilling prophecy in the sense that the semiconductor industry took it as a challenge and as a benchmark to be achieved. A semiconductor manufacturers’ association was formed comprising chip fabricators and all the suppliers who supplied materials and components to the industry to cooperate and provide better components, purer materials, and better fabrication techniques that enabled doubling the number of transistors in chips every two years. Table 1 indicates how the number of transistors has increased in microprocessor chips made by Intel – a major manufacturer of integrated circuits. In 1971, there were 2300 transistors in the microprocessors made by Intel. By 2016, it had reached 7,200,000,000 – an increase by a factor about 3 × 106 in 23 years, very close to Moore’s law’s prediction. The width of the gate in a transistor in the present technology (called 14 nm technology) is about 50 times the diameter of a silicon atom. Any further increase in the number of transistors fabricated in a planar chip will lead to the transistor gate size approaching 4 to 5 atoms. At this size quantum effects will be evident and a transistor may not work reliably. Thus we may soon reach the end of Moore’s law as we know it now [1]. *Gordon E Moore, Cramming More Components, Electronics, Vol.38, No.8, pp.114–117, April 1965. The question arose on how to effectively use these transistors in designing microprocessors. One major point considered was that most programs are written using a sequential programming lan- guage such as C or C++. Many instructions in these languages could be transformed by compilers to a set of instructions that could be carried out in parallel. For example, in the instruction: a = (b + c) − (d ∗ f ) − (g/h), the add operation, the multiply operation, and the divide opera- tion could be carried out simultaneously, i.e., in parallel. This is called instruction-level parallelism. This parallelism could be exploited provided there are three arithmetic units in the proces- sor. Thus designers began putting many arithmetic units in pro- cessors. However, the number of arithmetic units could not be In some programs, a set increased beyond four as this simple type of parallelism does not of threads are occur often enough in programs and that resulted in idle arith- independent of one another and could be metic units. Another parallelism is thread-level parallelism [2]. carried out concurrently A thread in a program may be defined as a small set of sequential using multiple arithmetic units. RESONANCE | December 2017 1177 GENERAL ARTICLE Year Intel Processor Data Path Number of Clock Model Width Transistors Speed 1971 4004 4 2300 740 KHz 1972 8008 8 3500 500 KHz 1977 8085 8 6500 3MHz 1978 8086 16 29,000 5MHz 1982 80186 16 55,000 6MHz 1982 80286 16 134,000 6MHz 1985 80386 32 275,000 16–40 MHz 1989 80486 32 1,180,000 25 MHz 1993 Pentium 1 32 3,100,000 60–66 MHz 1995 Pentium Pro 32 5,500,000 150–200 MHz 1999 Pentium 3 32 9,500,000 450–660 MHz 2001 Itanium 1 64 25,000,000 733–800 MHz 2003 Pentium M 64 77,000,000 0.9–1.7 GHz 2006 Core 2 duo 64 291,000,000 1.8–2 GHz 2008 Core i7 Quad 64 730,000,000 2.66–3.2 GHz 2010 8 core Xeon 64 2,300,000,000 1.73-2.66 GHz Nehalem – Ex 2016 22 core Xeon 64 7,200,000,000 2.2–3.6 GHz Boradwell Table 1. Timeline of ∗ progress of Intel proces- (*Intel processors are chosen as they are widely used and represent how a major sors. manufacturer of microprocessors steadily increased the number of transistors in their processor and the clock speed. The data given above is abstracted from the articles on Intel microprocessor chronology and on transistor counts in micro- processors published by Wikipedia. The numbers in the table are indicative as there are numerous models of the same Intel microprocessor such as Pentium 3). instructions that can be scheduled to be processed as a unit shar- ing the same processor resources. A simple example of a thread is a set of sequential instructions in a for loop. In some programs, a set of threads are independent of one another and could be car- ried out concurrently using multiple arithmetic units. In practice, it is found that most programs have limited thread-level paral- 1178 RESONANCE | December 2017 GENERAL ARTICLE lelism. Thus it is not cost effective to increase arithmetic units in a processor beyond three or four to assist thread-level parallel pro- cessing. Novel architectures such as very long instruction word processors in which an instruction length was 128 to 256 bits and packed several instructions that could be carried out in parallel were also designed.
Recommended publications
  • Floating-Point Package for Intel 8008 and 8080 Microprocessors
    UCRL-51940 FLOATING-POINT PACKAGE FOR INTEL 8008 AND 8080 MICROPROCESSORS Michael D. Maples October 24, 1975 Prepared for U.S. Energy Research& Development Administration under contract No. W-7405-Eng-48 I_AV~=IENCE I_IVEFIMORE I.ABOFIATOFIY University ol Calilomia ~ Livermore ~ NOTICE .sponsored by tht: United $~ates G~ven~menl.Neilhe~ the United States nor the United ~tates I’:n,~rgy of their employees,IIOr lilly of their eorltl’ilctclrs~ warranty~ express t~r implied, or asstltlleS ~t]y legld liability or responsihilit y fnr the accuracy, apparatus, product or ])rc)eess disclosed, represents that its rise would IIt~l illl’r liege privlttely-owned rights." Printed in the United States of America Avai.] able from National Technical. information Service U.S. Department of Commerce 5285 Port Royal Road Springfield, Virginia 22151 Price: Printed Copy $ *; Microfiche $2.25 NTIS ""Pages _Sellin_.g Price 1-50 $4.00 51-150 $5.45 151-325 $7.60 326-500 $10.60 501.-1000 $13.60 DISCI.AlMBR This documeut was prepared as an account of work sponsored by an agency of the United States Gnvernment.Neither the United States Governmentnor the University of California nor any of their employees,makes any warranty, express or implied, or assumesany legal liability or responsibility for the accuracy, complete.aess, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use wouldnot infrioge privately ownedrights. Refarenceherein to any specific commercialproduct, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation,or favoring by the United States Govermnentor the University of California.
    [Show full text]
  • 2.5 Classification of Parallel Computers
    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
    [Show full text]
  • Chapter 5 Multiprocessors and Thread-Level Parallelism
    Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. Introduction 2. Centralized SMA – shared memory architecture 3. Performance of SMA 4. DMA – distributed memory architecture 5. Synchronization 6. Models of Consistency Copyright © 2012, Elsevier Inc. All rights reserved. 2 1. Introduction. Why multiprocessors? Need for more computing power Data intensive applications Utility computing requires powerful processors Several ways to increase processor performance Increased clock rate limited ability Architectural ILP, CPI – increasingly more difficult Multi-processor, multi-core systems more feasible based on current technologies Advantages of multiprocessors and multi-core Replication rather than unique design. Copyright © 2012, Elsevier Inc. All rights reserved. 3 Introduction Multiprocessor types Symmetric multiprocessors (SMP) Share single memory with uniform memory access/latency (UMA) Small number of cores Distributed shared memory (DSM) Memory distributed among processors. Non-uniform memory access/latency (NUMA) Processors connected via direct (switched) and non-direct (multi- hop) interconnection networks Copyright © 2012, Elsevier Inc. All rights reserved. 4 Important ideas Technology drives the solutions. Multi-cores have altered the game!! Thread-level parallelism (TLP) vs ILP. Computing and communication deeply intertwined. Write serialization exploits broadcast communication
    [Show full text]
  • Computer Organization and Architecture Designing for Performance Ninth Edition
    COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION William Stallings Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Marcia Horton Designer: Bruce Kenselaar Executive Editor: Tracy Dunkelberger Manager, Visual Research: Karen Sanatar Associate Editor: Carole Snyder Manager, Rights and Permissions: Mike Joyce Director of Marketing: Patrice Jones Text Permission Coordinator: Jen Roach Marketing Manager: Yez Alayan Cover Art: Charles Bowman/Robert Harding Marketing Coordinator: Kathryn Ferranti Lead Media Project Manager: Daniel Sandin Marketing Assistant: Emma Snider Full-Service Project Management: Shiny Rajesh/ Director of Production: Vince O’Brien Integra Software Services Pvt. Ltd. Managing Editor: Jeff Holcomb Composition: Integra Software Services Pvt. Ltd. Production Project Manager: Kayla Smith-Tarbox Printer/Binder: Edward Brothers Production Editor: Pat Brown Cover Printer: Lehigh-Phoenix Color/Hagerstown Manufacturing Buyer: Pat Brown Text Font: Times Ten-Roman Creative Director: Jayne Conte Credits: Figure 2.14: reprinted with permission from The Computer Language Company, Inc. Figure 17.10: Buyya, Rajkumar, High-Performance Cluster Computing: Architectures and Systems, Vol I, 1st edition, ©1999. Reprinted and Electronically reproduced by permission of Pearson Education, Inc. Upper Saddle River, New Jersey, Figure 17.11: Reprinted with permission from Ethernet Alliance. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Copyright © 2013, 2010, 2006 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America.
    [Show full text]
  • Chapter 5 Thread-Level Parallelism
    Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated + Wide-issue processors are very complex + Wide-issue processors consume a lot of power + Steady progress in parallel software : the major obstacle to parallel processing 2 Flynn’s Classification of Parallel Architectures According to the parallelism in I and D stream • Single I stream , single D stream (SISD): uniprocessor • Single I stream , multiple D streams(SIMD) : same I executed by multiple processors using diff D – Each processor has its own data memory – There is a single control processor that sends the same I to all processors – These processors are usually special purpose 3 • Multiple I streams, single D stream (MISD) : no commercial machine • Multiple I streams, multiple D streams (MIMD) – Each processor fetches its own instructions and operates on its own data – Architecture of choice for general purpose mps – Flexible: can be used in single user mode or multiprogrammed – Use of the shelf µprocessors 4 MIMD Machines 1. Centralized shared memory architectures – Small #’s of processors (≈ up to 16-32) – Processors share a centralized memory – Usually connected in a bus – Also called UMA machines ( Uniform Memory Access) 2. Machines w/physically distributed memory – Support many processors – Memory distributed among processors – Scales the mem bandwidth if most of the accesses are to local mem 5 Figure 5.1 6 Figure 5.2 7 2. Machines w/physically distributed memory (cont) – Also reduces the memory latency – Of course interprocessor communication is more costly and complex – Often each node is a cluster (bus based multiprocessor) – 2 types, depending on method used for interprocessor communication: 1.
    [Show full text]
  • Lecture 1: Course Introduction G Course Organization G Historical Overview G Computer Organization G Why the MC68000? G Why Assembly Language?
    Lecture 1: Course introduction g Course organization g Historical overview g Computer organization g Why the MC68000? g Why assembly language? Microprocessor-based System Design 1 Ricardo Gutierrez-Osuna Wright State University Course organization g Grading Instructor n Exams Ricardo Gutierrez-Osuna g 1 midterm and 1 final Office: 401 Russ n Homework Tel:775-5120 g 4 problem sets (not graded) [email protected] n Quizzes http://www.cs.wright.edu/~rgutier g Biweekly Office hours: TBA n Laboratories g 5 Labs Teaching Assistant g Grading scheme Mohammed Tabrez Office: 339 Russ [email protected] Weight (%) Office hours: TBA Quizes 20 Laboratory 40 Midterm 20 Final Exam 20 Microprocessor-based System Design 2 Ricardo Gutierrez-Osuna Wright State University Course outline g Module I: Programming (8 lectures) g MC68000 architecture (2) g Assembly language (5) n Instruction and addressing modes (2) n Program control (1) n Subroutines (2) g C language (1) g Module II: Peripherals (9) g Exception processing (1) g Devices (6) n PI/T timer (2) n PI/T parallel port (2) n DUART serial port (1) g Memory and I/O interface (1) g Address decoding (2) Microprocessor-based System Design 3 Ricardo Gutierrez-Osuna Wright State University Brief history of computers GENERATION FEATURES MILESTONES YEAR NOTES Asia Minor, Abacus 3000BC Only replaced by paper and pencil Mech., Blaise Pascal, Pascaline 1642 Decimal addition (8 decimal figs) Early machines Electro- Charles Babbage Differential Engine 1823 Steam powered (3000BC-1945) mech. Herman Hollerith,
    [Show full text]
  • The Birth, Evolution and Future of Microprocessor
    The Birth, Evolution and Future of Microprocessor Swetha Kogatam Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 [email protected] ABSTRACT timed sequence through the bus system to output devices such as The world's first microprocessor, the 4004, was co-developed by CRT Screens, networks, or printers. In some cases, the terms Busicom, a Japanese manufacturer of calculators, and Intel, a U.S. 'CPU' and 'microprocessor' are used interchangeably to denote the manufacturer of semiconductors. The basic architecture of 4004 same device. was developed in August 1969; a concrete plan for the 4004 The different ways in which microprocessors are categorized are: system was finalized in December 1969; and the first microprocessor was successfully developed in March 1971. a) CISC (Complex Instruction Set Computers) Microprocessors, which became the "technology to open up a new b) RISC (Reduced Instruction Set Computers) era," brought two outstanding impacts, "power of intelligence" and "power of computing". First, microprocessors opened up a new a) VLIW(Very Long Instruction Word Computers) "era of programming" through replacing with software, the b) Super scalar processors hardwired logic based on IC's of the former "era of logic". At the same time, microprocessors allowed young engineers access to "power of computing" for the creative development of personal 2. BIRTH OF THE MICROPROCESSOR computers and computer games, which in turn led to growth in the In 1970, Intel introduced the first dynamic RAM, which increased software industry, and paved the way to the development of high- IC memory by a factor of four.
    [Show full text]
  • Trafficdb: HERE's High Performance Shared-Memory Data Store
    TrafficDB: HERE’s High Performance Shared-Memory Data Store Ricardo Fernandes Piotr Zaczkowski Bernd Gottler¨ HERE Global B.V. HERE Global B.V. HERE Global B.V. [email protected] [email protected] [email protected] Conor Ettinoffe Anis Moussa HERE Global B.V. HERE Global B.V. conor.ettinoff[email protected] [email protected] ABSTRACT HERE's traffic-aware services enable route planning and traf- fic visualisation on web, mobile and connected car appli- cations. These services process thousands of requests per second and require efficient ways to access the information needed to provide a timely response to end-users. The char- acteristics of road traffic information and these traffic-aware services require storage solutions with specific performance features. A route planning application utilising traffic con- gestion information to calculate the optimal route from an origin to a destination might hit a database with millions of queries per second. However, existing storage solutions are not prepared to handle such volumes of concurrent read Figure 1: HERE Location Cloud is accessible world- operations, as well as to provide the desired vertical scalabil- wide from web-based applications, mobile devices ity. This paper presents TrafficDB, a shared-memory data and connected cars. store, designed to provide high rates of read operations, en- abling applications to directly access the data from memory. Our evaluation demonstrates that TrafficDB handles mil- HERE is a leader in mapping and location-based services, lions of read operations and provides near-linear scalability providing fresh and highly accurate traffic information to- on multi-core machines, where additional processes can be gether with advanced route planning and navigation tech- spawned to increase the systems' throughput without a no- nologies that helps drivers reach their destination in the ticeable impact on the latency of querying the data store.
    [Show full text]
  • Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University the Intel® 4004 Microprocessor, Introdu
    Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University The 1st Microprocessor The Intel® 4004 microprocessor, introduced in November 1971 An electronics revolution that changed our world. There were no customer‐ programmable microprocessors on the market before the 4004. It propelled software into the limelight as a key player in the world of digital electronics design. 4004 Microprocessor Display at New Intel Museum A Japanese calculator maker (Busicom) asked to design: A set of 12 custom logic chips for a line of programmable calculators. Marcian E. "Ted" Hoff Recognized the integrated circuit technology (of the day) had advanced enough to build a single chip, general purpose computer. Federico Faggin to turn Hoff's vision into a silicon reality. (In less than one year, Faggin and his team delivered the 4004, which was introduced in November, 1971.) The world's first microprocessor application was this Busicom calculator. (sold about 100,000 calculators.) Measuring 1/8 inch wide by 1/6 inch long, consisting of 2,300 transistors, Intel’s 4004 microprocessor had as much computing power as the first electronic computer, ENIAC. 2 inch 4004 and 12 inch Core™2 Duo wafer ENIAC, built in 1946, filled 3000‐cubic‐ feet of space and contained 18,000 vacuum tubes. The 4004 microprocessor could execute 60,000 operations per second Running frequency: 108 KHz Founders wanted to name their new company Moore Noyce. However the name sounds very much similar to “more noise”. "Only the paranoid survive". Moore received a B.S. degree in Chemistry from the University of California, Berkeley in 1950 and a Ph.D.
    [Show full text]
  • Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN)
    Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN). [Chapters 1, 2] • Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2] Week 2 • MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2] • Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 4] Week 3 • CPU Organization: Datapath & Control Unit Design. [Chapter 5] Week 4 – MIPS Single Cycle Datapath & Control Unit Design. – MIPS Multicycle Datapath and Finite State Machine Control Unit Design. Week 5 • Microprogrammed Control Unit Design. [Chapter 5] – Microprogramming Project Week 6 • Midterm Review and Midterm Exam Week 7 • CPU Pipelining. [Chapter 6] • The Memory Hierarchy: Cache Design & Performance. [Chapter 7] Week 8 • The Memory Hierarchy: Main & Virtual Memory. [Chapter 7] Week 9 • Input/Output Organization & System Performance Evaluation. [Chapter 8] Week 10 • Computer Arithmetic & ALU Design. [Chapter 3] If time permits. Week 11 • Final Exam. EECC550 - Shaaban #1 Lec # 1 Winter 2005 11-29-2005 Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals • Computing Element Choices: – Computing Element Programmability – Spatial vs. Temporal Computing – Main Processor Types/Applications • General Purpose Processor Generations • The Von Neumann Computer Model • CPU Organization (Design) • Recent Trends in Computer Design/performance • Hierarchy
    [Show full text]
  • Microprocessors in the 1970'S
    Part II 1970's -- The Altair/Apple Era. 3/1 3/2 Part II 1970’s -- The Altair/Apple era Figure 3.1: A graphical history of personal computers in the 1970’s, the MITS Altair and Apple Computer era. Microprocessors in the 1970’s 3/3 Figure 3.2: Andrew S. Grove, Robert N. Noyce and Gordon E. Moore. Figure 3.3: Marcian E. “Ted” Hoff. Photographs are courtesy of Intel Corporation. 3/4 Part II 1970’s -- The Altair/Apple era Figure 3.4: The Intel MCS-4 (Micro Computer System 4) basic system. Figure 3.5: A photomicrograph of the Intel 4004 microprocessor. Photographs are courtesy of Intel Corporation. Chapter 3 Microprocessors in the 1970's The creation of the transistor in 1947 and the development of the integrated circuit in 1958/59, is the technology that formed the basis for the microprocessor. Initially the technology only enabled a restricted number of components on a single chip. However this changed significantly in the following years. The technology evolved from Small Scale Integration (SSI) in the early 1960's to Medium Scale Integration (MSI) with a few hundred components in the mid 1960's. By the late 1960's LSI (Large Scale Integration) chips with thousands of components had occurred. This rapid increase in the number of components in an integrated circuit led to what became known as Moore’s Law. The concept of this law was described by Gordon Moore in an article entitled “Cramming More Components Onto Integrated Circuits” in the April 1965 issue of Electronics magazine [338].
    [Show full text]
  • Scheduling of Shared Memory with Multi - Core Performance in Dynamic Allocator Using Arm Processor
    VOL. 11, NO. 9, MAY 2016 ISSN 1819-6608 ARPN Journal of Engineering and Applied Sciences ©2006-2016 Asian Research Publishing Network (ARPN). All rights reserved. www.arpnjournals.com SCHEDULING OF SHARED MEMORY WITH MULTI - CORE PERFORMANCE IN DYNAMIC ALLOCATOR USING ARM PROCESSOR Venkata Siva Prasad Ch. and S. Ravi Department of Electronics and Communication Engineering, Dr. M.G.R. Educational and Research Institute University, Chennai, India E-Mail: [email protected] ABSTRACT In this paper we proposed Shared-memory in Scheduling designed for the Multi-Core Processors, recent trends in Scheduler of Shared Memory environment have gained more importance in multi-core systems with Performance for workloads greatly depends on which processing tasks are scheduled to run on the same or different subset of cores (it’s contention-aware scheduling). The implementation of an optimal multi-core scheduler with improved prediction of consequences between the processor in Large Systems for that we executed applications using allocation generated by our new processor to-core mapping algorithms on used ARM FL-2440 Board hardware architecture with software of Linux in the running state. The results are obtained by dynamic allocation/de-allocation (Reallocated) using lock-free algorithm to reduce the time constraints, and automatically executes in concurrent components in parallel on multi-core systems to allocate free memory. The demonstrated concepts are scalable to multiple kernels and virtual OS environments using the runtime scheduler in a machine and showed an averaged improvement (up to 26%) which is very significant. Keywords: scheduler, shared memory, multi-core processor, linux-2.6-OpenMp, ARM-Fl2440.
    [Show full text]