Reconfigurable Accelerators in the World of General-Purpose Computing

Total Page:16

File Type:pdf, Size:1020Kb

Reconfigurable Accelerators in the World of General-Purpose Computing Reconfigurable Accelerators in the World of General-Purpose Computing Dissertation A thesis submitted to the Faculty of Electrical Engineering, Computer Science and Mathematics of Paderborn University in partial fulfillment of the requirements for the degree of Dr. rer. nat. by Tobias Kenter Paderborn, Germany August 26, 2016 Acknowledgments First and foremost, I would like to thank Prof. Dr. Christian Plessl for the advice and support during my research. As particularly helpful, I perceived his ability to communicate suggestions depending on the situation, either through open questions that give room to explore and learn, or through concrete recommendations that help to achieve results more directly. Special thanks go also to Prof. Dr. Marco Platzner for his advice and support. I profited especially from his experience and ability to systematically identify the essence of challenges and solutions. Furthermore, I would like to thank: • Prof. Dr. João M. P. Cardoso, for serving as external reviewer for my dissertation. • Prof. Dr. Friedhelm Meyer auf der Heide and Dr. Matthias Fischer for serving on my oral examination committee. • All colleagues with whom I had the pleasure to work at the PC2 and the Computer Engineering Group, researchers, technical and administrative staff. In a variation to one of our coffee kitchen puns, I’d like to state that research without colleagues is possible, but pointless. However, I’m not sure about the first part. • My long-time office mates Lars Schäfers and Alexander Boschmann for particularly extensive discussions on our research and far beyond. • Gavin Vaz, Heinrich Riebler and Achim Lösch for intensive and productive collabo- ration on joint research interests. • The Bachelor and Master students and student assistants I have supervised, for their enthusiasm and for helping me to understand potential and limitations of the different approaches in their projects. The technical contributions of Henning Schmitz have particularly contributed to my research. • The Deutsche Forschungsgemeinschaft (DFG) and the Intel Labs Braunschweig for funding different phases of my research in the Collaborative Research Centre (CRC) 901 On-The-Fly Computing and the Project Multimodal Reconfigurable Processing Unit (MM-RPU). iii • The colleagues in the CRC 901 for insightful discussions. • Michael Kauschke and Matthias Gries for contributing an industry perspective on my research. Finally, I would like to thank my family for their encouragement and moral support. Special thanks go to Gaby Nordendorf. If anyone was ever more convinced than me that I would eventually complete this thesis, it was her. iv Abstract Growing and newly emerging computing workloads and markets keep pushing computer architecture forward. Power limitations and diminishing returns from wider and more par- allel processors are driving the need for architectural innovations. By reducing overheads and customizing parallelism, specialized accelerators help to increase the performance of specific workloads efficiently. However, without programmability, they lack the flexibil- ity for general-purpose computing and thus can’t profit from shared costs among differ- ent workloads, users and markets. The architecture of field programmable gate arrays (FPGAs) combines programmability with a high potential for specialization for different workloads. The main obstacle for FPGA adoption in general-purpose computing is the lack of productive methods and tools for designers and maintainers of implementations running entirely or partially on FPGAs. In this work, by analyzing current approaches to tackle this productivity challenge along with their conceptual and practical trade-offs, we identify three pillars that can complement each other to jointly drive general-purpose adoption of FPGAs. These pillars combine, firstly, synthesis from parallel OpenCL designs, secondly, fast and automatic compilation targeting overlay architectures on FPGAs, and thirdly, the encapsulation of hand-optimized FPGA designs into application- or domain-specific libraries. In this thesis, we focus on overlay architectures as one of theses paths towards productive FPGA design processes. A large variety of such architectures have been presented over the last years, but for most overlays it was poorly understood, which overheads they involve compared to custom designs implemented directly on FPGA resources. Our work quantifies such overheads with a diverse set of program loops from a state-of-the-art stereo- matching application for an instruction-programmable overlay. It demonstrates that the architecture can, despite overheads, serve as a practically usable accelerator, and identifies specific differences to fully customized FPGA designs that may help to reduce overheads through overlay customization. Even though one motivating aspect for related research on overlay architectures has been their potential for fast compilation or synthesis and for quick reconfiguration, in order to demonstrate their practical usefulness to raise productivity when aiming at FPGA acceleration, we had to go one step further with regard to the design entry. To this end, we present a fast tool flow that automatically extracts suitable loops from a high-level source or binary code and offloads them to a vector coprocessor realized as an FPGA overlay. v Besides the focus on productivity to foster FPGA adoption, in order to fit into established computing systems and markets, FPGAs need to architecturally coexist and cooperate with general-purpose processors. With a high-level performance estimation model that takes into account the interdependency of architectures and program designs, we explore the design space for such integrated systems. We highlight that integration of the memory hierarchy is not only helpful to reduce application design efforts, but also has considerable influence on the acceleration potential of the platform. Recent, high-profile trends in industry show interesting correspondences to our analysis. When the hardware integration of FPGA accelerators proceeds on this path, it will be foremost the interplay of design productivity and performance potential that governs the success of FPGAs in general- purpose computing. With our analysis and technical contributions in that field, we may help to shape the outcome of this process. vi Zusammenfassung Der Bedarf an immer höherer Rechenleistung für wachsende und neu aufkommende Rechen- lasten und Märkte ist eine Herausforderung für die Rechnerarchitektur. Grenzen bei der Leistungsaufnahme und sinkende Erträge durch größere und zunehmend parallele Prozes- soren machen Architekturinnovationen notwendig. Durch Effizienzsteigerungen und durch individuell angepasste Parallelität können spezialisierte Beschleuniger dazu beitragen, die Rechenleistung für bestimmte Rechenlasten zu erhöhen. Ohne Programmierbarkeit fehlt ihnen jedoch die Flexibilität für allgemeine Rechenaufgaben und sie können somit nicht davon profitieren, Kosten auf verschiedene Nutzungsszenarien und Märkte zu verteilen. Die Architektur von FPGAs, eine bestimmte Variante programmierbarer Logikbausteine, kombiniert Programmierbarkeit mit einem hohen Potenzial zur Spezialisierung für ver- schiedene Rechenlasten. Das größte Hindernis auf dem Weg zu einem verbreiteten Einsatz von FPGAs für allgemeine Rechenaufgaben ist der Mangel an produktiven Methoden und Werkzeugen zur Entwicklung und Wartung von Implementierungen, die ganz oder teil- weise auf FPGAs ausgeführt werden. Wir analysieren in dieser Arbeit aktuelle Ansätze, die Produktivität bei der Entwicklung von FPGA-Anwendungen zu steigern, und arbeiten konzeptuelle und praktische Vor- und Nachteile heraus. Darauf aufbauend identifizieren wir drei Säulen, die einander dabei ergänzen können, die Verwendung von FPGAs für all- gemeine Rechenaufgaben voranzutreiben. Diese Säulen kombinieren erstens eine Konfigu- rationsgenerierung ausgehend von parallelen OpenCL-Implementierungen, zweitens eine schnelle und automatisierte Übersetzung für übergelagerte Architekturen auf FPGAs, und drittens die Zusammenfassung von manuell optimierten FPGA-Konfigurationen zu anwendungs- oder bereichsspezifischen Bibliotheken. In dieser Ausarbeitung konzentrieren wir uns auf übergelagerte Architekturen als Ansatz für produktive Entwicklungsprozesse für FPGAs. Im Laufe der letzten Jahre wurde eine Vielzahl solcher Architekturen vorgestellt, die durch eine Zwischenschicht das Abstrak- tionsniveau von FPGAs erhöhen. Allerdings existierte für die meisten dieser überge- lagerten Architekturen nur ein unzureichendes Verständnis von Flächenmehrverbrauch oder reduzierten Rechenleistungen im Vergleich zu spezialisierten Konfigurationen die den FPGA ohne Zwischenschicht nutzen. Unsere Arbeit quantifiziert für eine instruktions- basierte übergelagerte Architektur solche Nachteile anhand einer Auswahl unterschied- licher Programmschleifen, die zu einer modernen Anwendung für stereoskopischen Bild- abgleich gehören. Wir zeigen, dass diese Architektur trotz dieser Nachteile als praktisch vii nutzbarer Beschleuniger dienen kann und identifizieren verschiede Unterschiede im Ver- gleich zu vollständig spezialisierten Konfigurationen. Dies könnte zukünftig helfen, durch Spezialisierung der übergelagerten Architekturen, deren Nachteile weiter zu reduzieren. Das Potenzial zur schnellen Übersetzung oder Konfigurationsgenerierung für übergelagerte Architekturen, sowie die Möglichkeit schnell Konfigurationen auszutauschen, war bereits ein Anreiz für die bestehende Forschung diesem Bereich. Um allerdings zu demonstrieren,
Recommended publications
  • AMD Athlon™ Processor X86 Code Optimization Guide
    AMD AthlonTM Processor x86 Code Optimization Guide © 2000 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other applica- tion in which the failure of AMD’s product could create a situation where per- sonal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD logo, AMD Athlon, K6, 3DNow!, and combinations thereof, AMD-751, K86, and Super7 are trademarks, and AMD-K6 is a registered trademark of Advanced Micro Devices, Inc. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation.
    [Show full text]
  • The Central Processing Unit(CPU). the Brain of Any Computer System Is the CPU
    Computer Fundamentals 1'stage Lec. (8 ) College of Computer Technology Dept.Information Networks The central processing unit(CPU). The brain of any computer system is the CPU. It controls the functioning of the other units and process the data. The CPU is sometimes called the processor, or in the personal computer field called “microprocessor”. It is a single integrated circuit that contains all the electronics needed to execute a program. The processor calculates (add, multiplies and so on), performs logical operations (compares numbers and make decisions), and controls the transfer of data among devices. The processor acts as the controller of all actions or services provided by the system. Processor actions are synchronized to its clock input. A clock signal consists of clock cycles. The time to complete a clock cycle is called the clock period. Normally, we use the clock frequency, which is the inverse of the clock period, to specify the clock. The clock frequency is measured in Hertz, which represents one cycle/second. Hertz is abbreviated as Hz. Usually, we use mega Hertz (MHz) and giga Hertz (GHz) as in 1.8 GHz Pentium. The processor can be thought of as executing the following cycle forever: 1. Fetch an instruction from the memory, 2. Decode the instruction (i.e., determine the instruction type), 3. Execute the instruction (i.e., perform the action specified by the instruction). Execution of an instruction involves fetching any required operands, performing the specified operation, and writing the results back. This process is often referred to as the fetch- execute cycle, or simply the execution cycle.
    [Show full text]
  • Benchmark Evaluations of Modern Multi Processor Vlsi Ds Pm Ps
    1220 Session 1220 Benchmark Evaluations of Modern Multi-Processor VLSI DSPµPs Aaron L. Robinson and Fred O. Simons, Jr. High-Performance Computing and Simulation (HCS) Research Laboratory Electrical Engineering Department Florida A&M University and Florida State University Tallahassee, FL 32316-2175 Abstract - The authors continue their tradition of presenting technical reviews and performance comparisons of the newest multi-processor VLSI DSPµPs with the intention of providing concise focused analyses designed to help established or aspiring DSP analysts evaluate the applicability of new DSP technology to their specific applications. As in the past, the Analog Devices SHARCTM and Texas Instruments TMS320C80 families of DSPµPs will be the focus of our presentation because these manufacturers continue to push the envelop of new DSPµPs (Digital Signal Processing microProcessors) development. However, in addition to the standard performance analyses and benchmark evaluations, the authors will present a new image-processing bench marking technique designed specifically for evaluating new DSPµP image processing capabilities. 1. Introduction The combination of constantly evolving DSP algorithm development and continually advancing DSPuP hardware have formed the basis for an exponential growth in DSP applications. The increased market demand for these applications and their required hardware has resulted in the production of DSPuPs with very powerful components capable of performing a wide range of complicated operations. Due to the complicated architectures of these new processors, it would be time consuming for even the experienced DSP analysts to review and evaluate these new DSPuPs. Furthermore, the inexperienced DSP analysts would find it even more time consuming, and possibly very difficult to appreciate the significance and opportunities for these new components.
    [Show full text]
  • The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and Computer
    Microprocessors (0630371) Fall 2010/2011 – Lecture Notes # 1 The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and computer Outline of the Lecture Evolution of programming languages. Microcomputer Architecture. Instruction Execution Cycle. Evolution of programming languages: Machine language - the programmer had to remember the machine codes for various operations, and had to remember the locations of the data in the main memory like: 0101 0011 0111… Assembly Language - an instruction is an easy –to- remember form called a mnemonic code . Example: Assembly Language Machine Language Load 100100 ADD 100101 SUB 100011 We need a program called an assembler that translates the assembly language instructions into machine language. High-level languages Fortran, Cobol, Pascal, C++, C# and java. We need a compiler to translate instructions written in high-level languages into machine code. Microprocessor-based system (Micro computer) Architecture Data Bus, I/O bus Memory Storage I/O I/O Registers Unit Device Device Central Processing Unit #1 #2 (CPU ) ALU CU Clock Control Unit Address Bus The figure shows the main components of a microprocessor-based system: CPU- Central Processing Unit , where calculations and logic operations are done. CPU contains registers , a high-frequency clock , a control unit ( CU ) and an arithmetic logic unit ( ALU ). o Clock : synchronizes the internal operations of the CPU with other system components using clock pulsing at a constant rate (the basic unit of time for machine instructions is a machine cycle or clock cycle) One cycle A machine instruction requires at least one clock cycle some instruction require 50 clocks. o Control Unit (CU) - generate the needed control signals to coordinate the sequencing of steps involved in executing machine instructions: (fetches data and instructions and decodes addresses for the ALU).
    [Show full text]
  • Computer Organization & Architecture Eie
    COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing.
    [Show full text]
  • Lecture Notes
    Lecture #4-5: Computer Hardware (Overview and CPUs) CS106E Spring 2018, Young In these lectures, we begin our three-lecture exploration of Computer Hardware. We start by looking at the different types of computer components and how they interact during basic computer operations. Next, we focus specifically on the CPU (Central Processing Unit). We take a look at the Machine Language of the CPU and discover it’s really quite primitive. We explore how Compilers and Interpreters allow us to go from the High-Level Languages we are used to programming to the Low-Level machine language actually used by the CPU. Most modern CPUs are multicore. We take a look at when multicore provides big advantages and when it doesn’t. We also take a short look at Graphics Processing Units (GPUs) and what they might be used for. We end by taking a look at Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). Stanford President John Hennessy won the Turing Award (Computer Science’s equivalent of the Nobel Prize) for his work on RISC computing. Hardware and Software: Hardware refers to the physical components of a computer. Software refers to the programs or instructions that run on the physical computer. - We can entirely change the software on a computer, without changing the hardware and it will transform how the computer works. I can take an Apple MacBook for example, remove the Apple Software and install Microsoft Windows, and I now have a Window’s computer. - In the next two lectures we will focus entirely on Hardware.
    [Show full text]
  • (12) Patent Application Publication (10) Pub. No.: US 2011/0231717 A1 HUR Et Al
    US 20110231717A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0231717 A1 HUR et al. (43) Pub. Date: Sep. 22, 2011 (54) SEMICONDUCTOR MEMORY DEVICE Publication Classification (76) Inventors: Hwang HUR, Kyoungki-do (KR): ( 51) Int. Cl. Chang-Ho Do, Kyoungki-do (KR): C. % CR Jae-Bum Ko, Kyoungki-do (KR); ( .01) Jin-Il Chung, Kyoungki-do (KR) (21) Appl. N 13A149,683 (52) U.S. Cl. ................................. 714/718; 714/E11.145 ppl. No.: 9 (22)22) FileFiled: Mavay 31,51, 2011 (57) ABSTRACT Related U.S. Application Data Semiconductor memory device includes a cell array includ (62) Division of application No. 12/154,870, filed on May ing a plurality of unit cells; and a test circuit configured to 28, 2008, now Pat. No. 7,979,758. perform a built-in self-stress (BISS) test for detecting a defect by performing a plurality of internal operations including a (30) Foreign Application Priority Data write operation through an access to the unit cells using a plurality of patterns during a test procedure carried out at a Feb. 29, 2008 (KR) ............................. 2008-OO18761 wafer-level. O 2 FAD MoDE PE, FRS PWD PADX8 GENERATOR EAELER PEC 3. PD) PWDC CRECKCCES CKCKB RASCASECSCE SISTE CLOCKBUFFER CEE e EST CLK BSI RASCASECSCKE WREFB ESTE s ACTF 32 P 8:W 4. 38t.A 3. se. A(2) FIRST REFRESH AKE13A2) REFIREFA CONTROR ROWANC RSBSAApp.12BS ARS BA255612.ESE E L - ww.i. 33 EE 3. PE PWDA RAE7(3) ECS issi: OLREFB SECOND LAK:2) BSDT DATABUFFERG08 EABLER REFIREFA SECON REFRESH BSE Y ADDRC CONTROLLER BST YEADDK)) PWS Testaposs WKB811-12 BAADDICBAAD BSYBRST (88.11:12) "Of BST CLK 380 3.
    [Show full text]
  • Arithmetic and Logical Unit Design for Area Optimization for Microcontroller Amrut Anilrao Purohit 1,2 , Mohammed Riyaz Ahmed 2 and R
    et International Journal on Emerging Technologies 11 (2): 668-673(2020) ISSN No. (Print): 0975-8364 ISSN No. (Online): 2249-3255 Arithmetic and Logical Unit Design for Area Optimization for Microcontroller Amrut Anilrao Purohit 1,2 , Mohammed Riyaz Ahmed 2 and R. Venkata Siva Reddy 2 1Research Scholar, VTU Belagavi (Karnataka), India. 2School of Electronics and Communication Engineering, REVA University Bengaluru, (Karnataka), India. (Corresponding author: Amrut Anilrao Purohit) (Received 04 January 2020, Revised 02 March 2020, Accepted 03 March 2020) (Published by Research Trend, Website: www.researchtrend.net) ABSTRACT: Arithmetic and Logic Unit (ALU) can be understood with basic knowledge of digital electronics and any engineer will go through the details only once. The advantage of knowing ALU in detail is two- folded: firstly, programming of the processing device can be efficient and secondly, can design a new ALU architecture as per the various constraints of the use cases. The miniaturization of digital circuits can be achieved by either reducing the size of transistor (Moore’s law) or by optimizing the gate count of the circuit. The first has been explored extensively while the latter has been ignored which deals with the application of Boolean rules and requires sound knowledge of logic design. The ultimate outcome is to have an area optimized architecture/approach that optimizes the circuit at gate level. The design of ALU is for various processing devices varies with the device/system requirements. The area optimization places a significant role in the chip design. Here in this work, we have attempted to design an ALU which is area efficient while being loaded with additional functionality necessary for microcontrollers.
    [Show full text]
  • Unit 8 : Microprocessor Architecture
    Unit 8 : Microprocessor Architecture Lesson 1 : Microcomputer Structure 1.1. Learning Objectives On completion of this lesson you will be able to : ♦ draw the block diagram of a simple computer ♦ understand the function of different units of a microcomputer ♦ learn the basic operation of microcomputer bus system. 1.2. Digital Computer A digital computer is a multipurpose, programmable machine that reads A digital computer is a binary instructions from its memory, accepts binary data as input and multipurpose, programmable processes data according to those instructions, and provides results as machine. output. 1.3. Basic Computer System Organization Every computer contains five essential parts or units. They are Basic computer system organization. i. the arithmetic logic unit (ALU) ii. the control unit iii. the memory unit iv. the input unit v. the output unit. 1.3.1. The Arithmetic and Logic Unit (ALU) The arithmetic and logic unit (ALU) is that part of the computer that The arithmetic and logic actually performs arithmetic and logical operations on data. All other unit (ALU) is that part of elements of the computer system - control unit, register, memory, I/O - the computer that actually are there mainly to bring data into the ALU to process and then to take performs arithmetic and the results back out. logical operations on data. An arithmetic and logic unit and, indeed, all electronic components in the computer are based on the use of simple digital logic devices that can store binary digits and perform simple Boolean logic operations. Data are presented to the ALU in registers. These registers are temporary storage locations within the CPU that are connected by signal paths of the ALU.
    [Show full text]
  • Computer Architecture Out-Of-Order Execution
    Computer Architecture Out-of-order Execution By Yoav Etsion With acknowledgement to Dan Tsafrir, Avi Mendelson, Lihu Rappoport, and Adi Yoaz 1 Computer Architecture 2013– Out-of-Order Execution The need for speed: Superscalar • Remember our goal: minimize CPU Time CPU Time = duration of clock cycle × CPI × IC • So far we have learned that in order to Minimize clock cycle ⇒ add more pipe stages Minimize CPI ⇒ utilize pipeline Minimize IC ⇒ change/improve the architecture • Why not make the pipeline deeper and deeper? Beyond some point, adding more pipe stages doesn’t help, because Control/data hazards increase, and become costlier • (Recall that in a pipelined CPU, CPI=1 only w/o hazards) • So what can we do next? Reduce the CPI by utilizing ILP (instruction level parallelism) We will need to duplicate HW for this purpose… 2 Computer Architecture 2013– Out-of-Order Execution A simple superscalar CPU • Duplicates the pipeline to accommodate ILP (IPC > 1) ILP=instruction-level parallelism • Note that duplicating HW in just one pipe stage doesn’t help e.g., when having 2 ALUs, the bottleneck moves to other stages IF ID EXE MEM WB • Conclusion: Getting IPC > 1 requires to fetch/decode/exe/retire >1 instruction per clock: IF ID EXE MEM WB 3 Computer Architecture 2013– Out-of-Order Execution Example: Pentium Processor • Pentium fetches & decodes 2 instructions per cycle • Before register file read, decide on pairing Can the two instructions be executed in parallel? (yes/no) u-pipe IF ID v-pipe • Pairing decision is based… On data
    [Show full text]
  • CPU) the CPU Is the Brains of the Computer, and Is Also Known As the Processor (A Single Chip Also Known As Microprocessor)
    Central processing unit (CPU) The CPU is the brains of the computer, and is also known as the processor (a single chip also known as microprocessor). This electronic component interprets and carries out the basic instructions that operate the computer. Cache as a rule holds data waiting to be processed and instructions waiting to be executed. The main parts of the CPU are: control unit arithmetic logic unit (ALU), and registers – also referred as Cache registers The CPU is connected to a circuit board called the motherboard also known as the system board. Click here to see more information on the CPU Let’s look inside the CPU and see what the different components actually do and how they interact Control unit The control unit directs and co-ordinates most of the operations in the computer. It is a bit similar to a traffic officer controlling traffic! It translates instructions received from a program/application and then begins the appropriate action to carry out the instruction. Specifically the control unit: controls how and when input devices send data stores and retrieves data to and from specific locations in memory decodes and executes instructions sends data to other parts of the CPU during operations sends data to output devices on request Arithmetic Logic Unit (ALU): The ALU is the computer’s calculator. It handles all math operations such as: add subtract multiply divide logical decisions - true or false, and/or, greater then, equal to, or less than Registers Registers are special temporary storage areas on the CPU. They are: used to store items during arithmetic, logic or transfer operations.
    [Show full text]
  • Thread Scheduling in Multi-Core Operating Systems Redha Gouicem
    Thread Scheduling in Multi-core Operating Systems Redha Gouicem To cite this version: Redha Gouicem. Thread Scheduling in Multi-core Operating Systems. Computer Science [cs]. Sor- bonne Université, 2020. English. tel-02977242 HAL Id: tel-02977242 https://hal.archives-ouvertes.fr/tel-02977242 Submitted on 24 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Ph.D thesis in Computer Science Thread Scheduling in Multi-core Operating Systems How to Understand, Improve and Fix your Scheduler Redha GOUICEM Sorbonne Université Laboratoire d’Informatique de Paris 6 Inria Whisper Team PH.D.DEFENSE: 23 October 2020, Paris, France JURYMEMBERS: Mr. Pascal Felber, Full Professor, Université de Neuchâtel Reviewer Mr. Vivien Quéma, Full Professor, Grenoble INP (ENSIMAG) Reviewer Mr. Rachid Guerraoui, Full Professor, École Polytechnique Fédérale de Lausanne Examiner Ms. Karine Heydemann, Associate Professor, Sorbonne Université Examiner Mr. Etienne Rivière, Full Professor, University of Louvain Examiner Mr. Gilles Muller, Senior Research Scientist, Inria Advisor Mr. Julien Sopena, Associate Professor, Sorbonne Université Advisor ABSTRACT In this thesis, we address the problem of schedulers for multi-core architectures from several perspectives: design (simplicity and correct- ness), performance improvement and the development of application- specific schedulers.
    [Show full text]