Operating System Support for Simultaneous Multithreaded Processors

Total Page:16

File Type:pdf, Size:1020Kb

Operating System Support for Simultaneous Multithreaded Processors UCAM-CL-TR-619 Technical Report ISSN 1476-2986 Number 619 Computer Laboratory Operating system support for simultaneous multithreaded processors James R. Bulpin February 2005 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2005 James R. Bulpin This technical report is based on a dissertation submitted September 2004 by the author for the degree of Doctor of Philosophy to the University of Cambridge, King's College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/TechReports/ ISSN 1476-2986 Summary Simultaneous multithreaded (SMT) processors are able to execute multiple application threads in parallel in order to improve the utilisation of the processor's execution resources. The improved utilisation provides a higher processor-wide throughput at the expense of the performance of each individual thread. Simultaneous multithreading has recently been incorporated into the Intel Pentium 4 processor family as “Hyper-Threading”. While there is already basic support for it in popular operating systems, that support does not take advantage of any knowledge about the characteristics of SMT, and therefore does not fully exploit the processor. SMT presents a number of challenges to operating system designers. The threads' dynamic sharing of processor resources means that there are complex performance interactions between threads. These interactions are often unknown, poorly understood, or hard to avoid. As a result such interactions tend to be ignored leading to a lower processor throughput. In this dissertation I start by describing simultaneous multithreading and the hardware implemen- tations of it. I discuss areas of operating system support that are either necessary or desirable. I present a detailed study of a real SMT processor, the Intel Hyper-Threaded Pentium 4, and describe the performance interactions between threads. I analyse the results using information from the processor's performance monitoring hardware. Building on the understanding of the processor's operation gained from the analysis, I present a design for an operating system process scheduler that takes into account the characteristics of the processor and the workloads in order to improve the system-wide throughput. I evaluate designs exploiting various levels of processor-specific knowledge. I finish by discussing alternative ways to exploit SMT processors. These include the partitioning onto separate simultaneous threads of applications and hardware interrupt handling. I present preliminary experiments to evaluate the effectiveness of this technique. 3 4 Acknowledgements I would like to thank my supervisor, Ian Pratt, for his advice and guidance. I am also grateful to my colleagues in the Systems Research Group and local research labs for their friendship, support and many interesting discussions. In particular thanks are due to Jon Crowcroft, Tim Deegan, Keir Fraser, Steve Hand, James Hall, Tim Harris, Richard Mortier, Rolf Neugebauer, Dave Stewart and Andrew Warfield. I would like to thank Keir Fraser, Ian Pratt and Tim Harris for proof reading earlier copies of this dissertation. Any errors that remain are my own. I spent a summer as an intern at Microsoft Research Cambridge. I would like to thank my mentor, Rebecca Isaacs, and the others involved with the “Magpie” project for an interesting and enjoyable break from my own research. Thanks are due to Intel Research Cambridge for their kind donation of a “Prescott” based com- puter which I used for some of the experiments in Chapter 3. My funding was from a CASE award from Marconi Corporation plc. and the Engineering and Physical Sciences Research Council. I would also like to thank my current employers for their patience and support while I finished writing this dissertation. 5 6 Table of contents Glossary 11 1 Introduction 13 1.1 Motivation . 13 1.2 Terminology . 14 1.3 Contribution . 15 1.4 Outline . 15 2 Background 17 2.1 Simultaneous Multithreading . 17 2.1.1 Multithreading . 19 2.1.2 Research Background of SMT . 20 2.2 Commercial SMT Processors . 21 2.2.1 Alpha 21464 . 21 2.2.2 Intel Hyper-Threading . 21 2.2.3 IBM Power5 . 24 2.3 Operating System Support . 26 2.3.1 An Extra Level of Hierarchy . 26 2.3.2 Simultaneous Execution . 27 2.3.3 Cache Considerations . 31 2.3.4 Energy Considerations . 34 2.4 Summary . 35 3 Measuring SMT 37 3.1 Related Work . 37 3.1.1 Simulated Systems . 37 3.1.2 Real Hardware . 38 3.2 Experimental Configurations . 39 3.2.1 Performance Counters . 39 3.2.2 Test Platforms . 40 3.2.3 Workloads . 41 3.2.4 Performance Metrics . 43 3.3 Thread Interactions . 44 3.3.1 Experimental Method . 45 3.3.2 Results . 46 3.3.3 Desktop Applications . 56 7 3.3.4 Summary . 58 3.4 Phases of Execution . 59 3.4.1 Performance Counter Correlation . 66 3.5 Asymmetry . 67 3.6 Summary . 69 4 A Process Scheduler for SMT Processors 71 4.1 The Context . 71 4.1.1 Problems with Traditional Schedulers . 72 4.1.2 Scheduling for Multithreading and Multiprocessing Systems . 72 4.2 Related Work . 73 4.2.1 Hardware Support . 73 4.2.2 SMT-Aware Scheduling . 74 4.3 Practical SMT-Aware Scheduling . 75 4.3.1 Design Space . 76 4.4 Implementation . 76 4.4.1 The Linux Scheduler . 77 4.4.2 Extensible Scheduler Modifications . 77 4.4.3 Performance Estimation . 78 4.4.4 SMT-Aware Schedulers . 79 4.5 Evaluation . 82 4.5.1 Method . 82 4.5.2 Throughput . 83 4.5.3 Fairness . 87 4.6 Applicability to Other Operating Systems . 92 4.7 Applicability to Other Processors . 93 4.8 Summary . 94 5 Alternatives to Multiprogramming 95 5.1 A Multithreaded Processor as a Single Resource . 95 5.1.1 Program Parallelisation . 96 5.2 Threads for Speculation and Prefetching . 96 5.2.1 Data Speculation . 96 5.2.2 Pre-execution . 98 5.2.3 Multiple Path Execution . 100 5.3 Threads for Management and Monitoring . 100 5.3.1 Mini-Threads . 101 5.3.2 Subordinate Threads . 101 5.4 Fault Tolerance . 104 5.5 Operating System Functions . 105 5.5.1 Exception Handling . 105 5.5.2 Privilege Level Partitioning . 106 5.5.3 Interrupt Handling Partitioning on Linux . 106 5.6 Summary . 113 8 6 Conclusions 115 6.1 Summary . 115 6.2 Further Research . 116 References 119 A Monochrome Figures 129 9 10 Glossary ALU Arithmetic-Logic Unit CMP Chip Multiprocessor/Multiprocessing DEC Digital Equipment Corporation D-TLB Data Translation Lookaside Buffer FP Floating Point HMT Hardware Multithreading IA32 Intel Architecture 32 bit IBM International Business Machines ILP Instruction Level Parallelism I/O Input/Output IPC Instructions per Cycle IQR Inter-Quartile Range (of a distribution) IRQ Interrupt Request ISA Instruction Set Architecture ISR Interrupt Service Routine I-TLB Instruction Translation Lookaside Buffer HP Hewlett-Packard (bought Compaq who bought DEC) HT Hyper-Threading L1/L2 Level1/Level2 cache LKM Linux/Loadable Kernel Module MP Multiprocessor/Multiprocessing MSR Model Specific Register (Intel) NUMA Non-uniform Memory Architecture OoO Out of Order (superscalar processor) OS Operating System P4 Intel Pentium 4 PID Process Identifier RISC Reduced Instruction Set Computer ROM Read-only Memory SMP Symmetric Multiprocessing SMT Simultaneous Multithreaded/Multithreading TC Trace-cache TLB Translation Lookaside Buffer TLP Thread Level Parallelism TLS Thread Level Speculation 11 UP Uniprocessor/Uniprocessing VM Virtual Memory 12 Chapter 1 Introduction This dissertation is concerned with the interaction of software, particularly operating systems, and simultaneous multithreaded (SMT) processors. I measure how a selection of workloads perform on SMT processors and show how the scheduling of processes onto SMT threads can cause the performance to change. I demonstrate how improved knowledge of the characteristics of the processor can improve the way the operating system schedules tasks to run. In this chapter I outline the ideas behind SMT processors and the difficulties they create. I state the contributions that are described in this dissertation. The chapter finishes with a brief summary of later chapters. 1.1 Motivation The difference in the rate of growth between processor speed and memory latency has led to an increasing delay (relative to processor cycles) to access memory. This is a well known problem and there are a number of techniques in common use to work around it. While caches are very effective, the cost of a level 1 cache miss causing an access to a lower level cache can cost many processor cycles. To try to minimise the effect of a cache miss on program execution, dynamic issue superscalar (or out-of-order, OoO) processors attempt to execute other instructions not de- pendent on the cache access while waiting for it to complete. Moreover, these processors have multiple execution units enabling multiple independent instructions to be executed in parallel. The efficacy of this approach depends on the amount of instruction level parallelism (ILP) avail- able in the code being executed. If code exhibits low ILP then there are few opportunities for parallel execution and for finding sufficient work to perform while waiting for a cache miss. Simultaneous multithreading (SMT) is an extension of dynamic issue superscalar processing. The aim.
Recommended publications
  • Chapter 1: Introduction What Is an Operating System?
    Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments Operating System Concepts 1.1 Silberschatz, Galvin and Gagne 2002 What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. Operating system goals: ) Execute user programs and make solving user problems easier. ) Make the computer system convenient to use. Use the computer hardware in an efficient manner. Operating System Concepts 1.2 Silberschatz, Galvin and Gagne 2002 1 Computer System Components 1. Hardware – provides basic computing resources (CPU, memory, I/O devices). 2. Operating system – controls and coordinates the use of the hardware among the various application programs for the various users. 3. Applications programs – define the ways in which the system resources are used to solve the computing problems of the users (compilers, database systems, video games, business programs). 4. Users (people, machines, other computers). Operating System Concepts 1.3 Silberschatz, Galvin and Gagne 2002 Abstract View of System Components Operating System Concepts 1.4 Silberschatz, Galvin and Gagne 2002 2 Operating System Definitions Resource allocator – manages and allocates resources. Control program – controls the execution of user programs and operations of I/O devices . Kernel – the one program running at all times (all else being application programs).
    [Show full text]
  • Introduction to Multi-Threading and Vectorization Matti Kortelainen Larsoft Workshop 2019 25 June 2019 Outline
    Introduction to multi-threading and vectorization Matti Kortelainen LArSoft Workshop 2019 25 June 2019 Outline Broad introductory overview: • Why multithread? • What is a thread? • Some threading models – std::thread – OpenMP (fork-join) – Intel Threading Building Blocks (TBB) (tasks) • Race condition, critical region, mutual exclusion, deadlock • Vectorization (SIMD) 2 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization Motivations for multithreading Image courtesy of K. Rupp 3 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization Motivations for multithreading • One process on a node: speedups from parallelizing parts of the programs – Any problem can get speedup if the threads can cooperate on • same core (sharing L1 cache) • L2 cache (may be shared among small number of cores) • Fully loaded node: save memory and other resources – Threads can share objects -> N threads can use significantly less memory than N processes • If smallest chunk of data is so big that only one fits in memory at a time, is there any other option? 4 6/25/19 Matti Kortelainen | Introduction to multi-threading and vectorization What is a (software) thread? (in POSIX/Linux) • “Smallest sequence of programmed instructions that can be managed independently by a scheduler” [Wikipedia] • A thread has its own – Program counter – Registers – Stack – Thread-local memory (better to avoid in general) • Threads of a process share everything else, e.g. – Program code, constants – Heap memory – Network connections – File handles
    [Show full text]
  • Unit: 4 Processes and Threads in Distributed Systems
    Unit: 4 Processes and Threads in Distributed Systems Thread A program has one or more locus of execution. Each execution is called a thread of execution. In traditional operating systems, each process has an address space and a single thread of execution. It is the smallest unit of processing that can be scheduled by an operating system. A thread is a single sequence stream within in a process. Because threads have some of the properties of processes, they are sometimes called lightweight processes. In a process, threads allow multiple executions of streams. Thread Structure Process is used to group resources together and threads are the entities scheduled for execution on the CPU. The thread has a program counter that keeps track of which instruction to execute next. It has registers, which holds its current working variables. It has a stack, which contains the execution history, with one frame for each procedure called but not yet returned from. Although a thread must execute in some process, the thread and its process are different concepts and can be treated separately. What threads add to the process model is to allow multiple executions to take place in the same process environment, to a large degree independent of one another. Having multiple threads running in parallel in one process is similar to having multiple processes running in parallel in one computer. Figure: (a) Three processes each with one thread. (b) One process with three threads. In former case, the threads share an address space, open files, and other resources. In the latter case, process share physical memory, disks, printers and other resources.
    [Show full text]
  • Parallel Computing
    Parallel Computing Announcements ● Midterm has been graded; will be distributed after class along with solutions. ● SCPD students: Midterms have been sent to the SCPD office and should be sent back to you soon. Announcements ● Assignment 6 due right now. ● Assignment 7 (Pathfinder) out, due next Tuesday at 11:30AM. ● Play around with graphs and graph algorithms! ● Learn how to interface with library code. ● No late submissions will be considered. This is as late as we're allowed to have the assignment due. Why Algorithms and Data Structures Matter Making Things Faster ● Choose better algorithms and data structures. ● Dropping from O(n2) to O(n log n) for large data sets will make your programs faster. ● Optimize your code. ● Try to reduce the constant factor in the big-O notation. ● Not recommended unless all else fails. ● Get a better computer. ● Having more memory and processing power can improve performance. ● New option: Use parallelism. How Your Programs Run Threads of Execution ● When running a program, that program gets a thread of execution (or thread). ● Each thread runs through code as normal. ● A program can have multiple threads running at the same time, each of which performs different tasks. ● A program that uses multiple threads is called multithreaded; writing a multithreaded program or algorithm is called multithreading. Threads in C++ ● The newest version of C++ (C++11) has libraries that support threading. ● To create a thread: ● Write the function that you want to execute. ● Construct an object of type thread to run that function. – Need header <thread> for this. ● That function will run in parallel alongside the original program.
    [Show full text]
  • Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2
    Overview SIMD and MIMD in the Multicore Context Single Instruction Multiple Instruction ● (note: Tute 02 this Weds - handouts) ● Flynn’s Taxonomy Single Data SISD MISD ● multicore architecture concepts Multiple Data SIMD MIMD ● for SIMD, the control unit and processor state (registers) can be shared ■ hardware threading ■ SIMD vs MIMD in the multicore context ● however, SIMD is limited to data parallelism (through multiple ALUs) ■ ● T2: design features for multicore algorithms need a regular structure, e.g. dense linear algebra, graphics ■ SSE2, Altivec, Cell SPE (128-bit registers); e.g. 4×32-bit add ■ system on a chip Rx: x x x x ■ 3 2 1 0 execution: (in-order) pipeline, instruction latency + ■ thread scheduling Ry: y3 y2 y1 y0 ■ caches: associativity, coherence, prefetch = ■ memory system: crossbar, memory controller Rz: z3 z2 z1 z0 (zi = xi + yi) ■ intermission ■ design requires massive effort; requires support from a commodity environment ■ speculation; power savings ■ massive parallelism (e.g. nVidia GPGPU) but memory is still a bottleneck ■ OpenSPARC ● multicore (CMT) is MIMD; hardware threading can be regarded as MIMD ● T2 performance (why the T2 is designed as it is) ■ higher hardware costs also includes larger shared resources (caches, TLBs) ● the Rock processor (slides by Andrew Over; ref: Tremblay, IEEE Micro 2009 ) needed ⇒ less parallelism than for SIMD COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 1 COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 3 Hardware (Multi)threading The UltraSPARC T2: System on a Chip ● recall concurrent execution on a single CPU: switch between threads (or ● OpenSparc Slide Cast Ch 5: p79–81,89 processes) requires the saving (in memory) of thread state (register values) ● aggressively multicore: 8 cores, each with 8-way hardware threading (64 virtual ■ motivation: utilize CPU better when thread stalled for I/O (6300 Lect O1, p9–10) CPUs) ■ what are the costs? do the same for smaller stalls? (e.g.
    [Show full text]
  • Personal-Computer Systems • Parallel Systems • Distributed Systems • Real -Time Systems
    Module 1: Introduction • What is an operating system? • Simple Batch Systems • Multiprogramming Batched Systems • Time-Sharing Systems • Personal-Computer Systems • Parallel Systems • Distributed Systems • Real -Time Systems Applied Operating System Concepts 1.1 Silberschatz, Galvin, and Gagne Ď 1999 What is an Operating System? • A program that acts as an intermediary between a user of a computer and the computer hardware. • Operating system goals: – Execute user programs and make solving user problems easier. – Make the computer system convenient to use. • Use the computer hardware in an efficient manner. Applied Operating System Concepts 1.2 Silberschatz, Galvin, and Gagne Ď 1999 Computer System Components 1. Hardware – provides basic computing resources (CPU, memory, I/O devices). 2. Operating system – controls and coordinates the use of the hardware among the various application programs for the various users. 3. Applications programs – define the ways in which the system resources are used to solve the computing problems of the users (compilers, database systems, video games, business programs). 4. Users (people, machines, other computers). Applied Operating System Concepts 1.3 Silberschatz, Galvin, and Gagne Ď 1999 Abstract View of System Components Applied Operating System Concepts 1.4 Silberschatz, Galvin, and Gagne Ď 1999 Operating System Definitions • Resource allocator – manages and allocates resources. • Control program – controls the execution of user programs and operations of I/O devices . • Kernel – the one program running at all times (all else being application programs). Applied Operating System Concepts 1.5 Silberschatz, Galvin, and Gagne Ď 1999 Memory Layout for a Simple Batch System Applied Operating System Concepts 1.7 Silberschatz, Galvin, and Gagne Ď 1999 Multiprogrammed Batch Systems Several jobs are kept in main memory at the same time, and the CPU is multiplexed among them.
    [Show full text]
  • Engineering Definitional Interpreters
    Reprinted from the 15th International Symposium on Principles and Practice of Declarative Programming (PPDP 2013) Engineering Definitional Interpreters Jan Midtgaard Norman Ramsey Bradford Larsen Department of Computer Science Department of Computer Science Veracode Aarhus University Tufts University [email protected] [email protected] [email protected] Abstract In detail, we make the following contributions: A definitional interpreter should be clear and easy to write, but it • We investigate three styles of semantics, each of which leads to may run 4–10 times slower than a well-crafted bytecode interpreter. a family of definitional interpreters. A “family” is characterized In a case study focused on implementation choices, we explore by its representation of control contexts. Our best-performing ways of making definitional interpreters faster without expending interpreter, which arises from a natural semantics, represents much programming effort. We implement, in OCaml, interpreters control contexts using control contexts of the metalanguage. based on three semantics for a simple subset of Lua. We com- • We evaluate other implementation choices: how names are rep- pile the OCaml to 86 native code, and we systematically inves- x resented, how environments are represented, whether the inter- tigate hundreds of combinations of algorithms and data structures. preter has a separate “compilation” step, where intermediate re- In this experimental context, our fastest interpreters are based on sults are stored, and how loops are implemented. natural semantics; good algorithms and data structures make them 2–3 times faster than na¨ıve interpreters. Our best interpreter, cre- • We identify combinations of implementation choices that work ated using only modest effort, runs only 1.5 times slower than a well together.
    [Show full text]
  • Parallel Programming with Openmp
    Parallel Programming with OpenMP OpenMP Parallel Programming Introduction: OpenMP Programming Model Thread-based parallelism utilized on shared-memory platforms Parallelization is either explicit, where programmer has full control over parallelization or through using compiler directives, existing in the source code. Thread is a process of a code is being executed. A thread of execution is the smallest unit of processing. Multiple threads can exist within the same process and share resources such as memory OpenMP Parallel Programming Introduction: OpenMP Programming Model Master thread is a single thread that runs sequentially; parallel execution occurs inside parallel regions and between two parallel regions, only the master thread executes the code. This is called the fork-join model: OpenMP Parallel Programming OpenMP Parallel Computing Hardware Shared memory allows immediate access to all data from all processors without explicit communication. Shared memory: multiple cpus are attached to the BUS all processors share the same primary memory the same memory address on different CPU's refer to the same memory location CPU-to-memory connection becomes a bottleneck: shared memory computers cannot scale very well OpenMP Parallel Programming OpenMP versus MPI OpenMP (Open Multi-Processing): easy to use; loop-level parallelism non-loop-level parallelism is more difficult limited to shared memory computers cannot handle very large problems An alternative is MPI (Message Passing Interface): require low-level programming; more difficult programming
    [Show full text]
  • Operating System
    OPERATING SYSTEM INDEX LESSON 1: INTRODUCTION TO OPERATING SYSTEM LESSON 2: FILE SYSTEM – I LESSON 3: FILE SYSTEM – II LESSON 4: CPU SCHEDULING LESSON 5: MEMORY MANAGEMENT – I LESSON 6: MEMORY MANAGEMENT – II LESSON 7: DISK SCHEDULING LESSON 8: PROCESS MANAGEMENT LESSON 9: DEADLOCKS LESSON 10: CASE STUDY OF UNIX LESSON 11: CASE STUDY OF MS-DOS LESSON 12: CASE STUDY OF MS-WINDOWS NT Lesson No. 1 Intro. to Operating System 1 Lesson Number: 1 Writer: Dr. Rakesh Kumar Introduction to Operating System Vetter: Prof. Dharminder Kr. 1.0 OBJECTIVE The objective of this lesson is to make the students familiar with the basics of operating system. After studying this lesson they will be familiar with: 1. What is an operating system? 2. Important functions performed by an operating system. 3. Different types of operating systems. 1. 1 INTRODUCTION Operating system (OS) is a program or set of programs, which acts as an interface between a user of the computer & the computer hardware. The main purpose of an OS is to provide an environment in which we can execute programs. The main goals of the OS are (i) To make the computer system convenient to use, (ii) To make the use of computer hardware in efficient way. Operating System is system software, which may be viewed as collection of software consisting of procedures for operating the computer & providing an environment for execution of programs. It’s an interface between user & computer. So an OS makes everything in the computer to work together smoothly & efficiently. Figure 1: The relationship between application & system software Lesson No.
    [Show full text]
  • Intel's Hyper-Threading
    Intel’s Hyper-Threading Cody Tinker & Christopher Valerino Introduction ● How to handle a thread, or the smallest portion of code that can be run independently, plays a core component in enhancing a program's parallelism. ● Due to the fact that modern computers process many tasks and programs simultaneously, techniques that allow for threads to be handled more efficiently to lower processor downtime are valuable. ● Motive is to reduce the number of idle resources of a processor. ● Examples of processors that use hyperthreading ○ Intel Xeon D-1529 ○ Intel i7-6785R ○ Intel Pentium D1517 Introduction ● Two main trends have been followed to increase parallelism: ○ Increase number of cores on a chip ○ Increase core throughput What is Multithreading? ● Executing more than one thread at a time ● Normally, an operating system handles a program by scheduling individual threads and then passing them to the processor. ● Two different types of hardware multithreading ○ Temporal ■ One thread per pipeline stage ○ Simultaneous (SMT) ■ Multiple threads per pipeline stage ● Intel’s hyper-threading is a SMT design Hyper-threading ● Hyper-threading is the hardware solution to increasing processor throughput by decreasing resource idle time. ● Allows multiple concurrent threads to be executed ○ Threads are interleaved so that resources not being used by one thread are used by others ○ Most processors are limited to 2 concurrent threads per physical core ○ Some do support 8 concurrent threads per physical core ● Needs the ability to fetch instructions from
    [Show full text]
  • COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011
    COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Edgar Gabriel Spring 2011 Edgar Gabriel Moore’s Law • Long-term trend on the number of transistor per integrated circuit • Number of transistors double every ~18 month Source: http://en.wikipedia.org/wki/Images:Moores_law.svg COSC 6385 – Computer Architecture Edgar Gabriel 1 What do we do with that many transistors? • Optimizing the execution of a single instruction stream through – Pipelining • Overlap the execution of multiple instructions • Example: all RISC architectures; Intel x86 underneath the hood – Out-of-order execution: • Allow instructions to overtake each other in accordance with code dependencies (RAW, WAW, WAR) • Example: all commercial processors (Intel, AMD, IBM, SUN) – Branch prediction and speculative execution: • Reduce the number of stall cycles due to unresolved branches • Example: (nearly) all commercial processors COSC 6385 – Computer Architecture Edgar Gabriel What do we do with that many transistors? (II) – Multi-issue processors: • Allow multiple instructions to start execution per clock cycle • Superscalar (Intel x86, AMD, …) vs. VLIW architectures – VLIW/EPIC architectures: • Allow compilers to indicate independent instructions per issue packet • Example: Intel Itanium series – Vector units: • Allow for the efficient expression and execution of vector operations • Example: SSE, SSE2, SSE3, SSE4 instructions COSC 6385 – Computer Architecture Edgar Gabriel 2 Limitations of optimizing a single instruction
    [Show full text]
  • Open Programming Language Interpreters
    Open Programming Language Interpreters Walter Cazzolaa and Albert Shaqiria a Università degli Studi di Milano, Italy Abstract Context: This paper presents the concept of open programming language interpreters, a model to support them and a prototype implementation in the Neverlang framework for modular development of programming languages. Inquiry: We address the problem of dynamic interpreter adaptation to tailor the interpreter’s behaviour on the task to be solved and to introduce new features to fulfil unforeseen requirements. Many languages provide a meta-object protocol (MOP) that to some degree supports reflection. However, MOPs are typically language-specific, their reflective functionality is often restricted, and the adaptation and application logic are often mixed which hardens the understanding and maintenance of the source code. Our system overcomes these limitations. Approach: We designed a model and implemented a prototype system to support open programming language interpreters. The implementation is integrated in the Neverlang framework which now exposes the structure, behaviour and the runtime state of any Neverlang-based interpreter with the ability to modify it. Knowledge: Our system provides a complete control over interpreter’s structure, behaviour and its runtime state. The approach is applicable to every Neverlang-based interpreter. Adaptation code can potentially be reused across different language implementations. Grounding: Having a prototype implementation we focused on feasibility evaluation. The paper shows that our approach well addresses problems commonly found in the research literature. We have a demon- strative video and examples that illustrate our approach on dynamic software adaptation, aspect-oriented programming, debugging and context-aware interpreters. Importance: Our paper presents the first reflective approach targeting a general framework for language development.
    [Show full text]