Redhawk Linux User's Guide

Total Page:16

File Type:pdf, Size:1020Kb

Redhawk Linux User's Guide LinuxÆ Userís Guide 0898004-710 August 2014 Copyright 2014 by Concurrent Computer Corporation. All rights reserved. This publication or any part thereof is intended for use with Concurrent products by Concurrent personnel, customers, and end–users. It may not be reproduced in any form without the written permission of the publisher. The information contained in this document is believed to be correct at the time of publication. It is subject to change without notice. Concurrent makes no warranties, expressed or implied, concerning the information contained in this document. To report an error or comment on a specific portion of the manual, photocopy the page in question and mark the correction or comment on the copy. Mail the copy (and any additional comments) to Concurrent Computer Corporation, 2881 Gateway Drive, Pompano Beach, Florida, 33069. Mark the envelope “Attention: Publications Department.” This publication may not be reproduced for any other reason in any form without written permission of the publisher. Concurrent Computer Corporation and its logo are registered trademarks of Concurrent Computer Corporation. All other Concurrent product names are trademarks of Concurrent while all other product names are trademarks or registered trademarks of their respective owners. Linux® is used pursuant to a sublicense from the Linux Mark Institute. Printed in U. S. A. Revision History: Date Level Effective With August 2002 000 RedHawk Linux Release 1.1 September 2002 100 RedHawk Linux Release 1.1 December 2002 200 RedHawk Linux Release 1.2 April 2003 300 RedHawk Linux Release 1.3, 1.4 December 2003 400 RedHawk Linux Release 2.0 March 2004 410 RedHawk Linux Release 2.1 July 2004 420 RedHawk Linux Release 2.2 May 2005 430 RedHawk Linux Release 2.3 March 2006 500 RedHawk Linux Release 4.1 May 2006 510 RedHawk Linux Release 4.1 May 2007 520 RedHawk Linux Release 4.2 April 2008 600 RedHawk Linux Release 5.1 June 2008 610 RedHawk Linux Release 5.1 October 2008 620 RedHawk Linux Release 5.2 December 2009 630 RedHawk Linux Release 5.4 May 2011 640 RedHawk Linux Release 6.0 March 2012 650 RedHawk Linux Release 6.0 September 2012 660 RedHawk Linux Release 6.3 January 2013 670 RedHawk Linux Release 6.3 August 2013 680 RedHawk Linux Release 6.3 May 2014 700 RedHawk Linux Release 6.5 August 2014 710 RedHawk Linux Release 6.5 Preface Scope of Manual This manual consists of three parts. The information in Part 1 is directed towards real-time users. Part 2 is directed towards system administrators. Part 3 consists of backmatter: appendixes, glossary and index. An overview of the contents of the manual follows. Structure of Manual This guide consists of the following sections: Part 1 - Real-Time User • Chapter 1, Introduction, provides an introduction to the RedHawk Linux operating system and an overview of the real-time features included. • Chapter 2, Real-Time Performance, discusses issues involved with achieving real-time performance including interrupt response, process dispatch latency and deterministic program execution. The shielded CPU model is described. • Chapter 3, Real-Time Interprocess Communication, discusses procedures for using the POSIX® and System V message-passing and shared memory facilities. • Chapter 4, Process Scheduling, provides an overview of process scheduling and describes POSIX scheduling policies and priorities. • Chapter 5, Interprocess Synchronization, describes the interfaces provided by RedHawk Linux for cooperating processes to synchronize access to shared resources. Included are: POSIX counting semaphores, System V semaphores, rescheduling control tools and condition synchronization tools. • Chapter 6, Programmable Clocks and Timers, provides an overview of some of the RCIM and POSIX timing facilities available under RedHawk Linux. • Chapter 7, System Clocks and Timers, describes system timekeeping and the per-CPU local timer. • Chapter 8, File Systems and Disk I/O, explains the xfs journaling file system and procedures for performing direct disk I/O on the RedHawk Linux operating system. • Chapter 9, Memory Mapping, describes the methods provided by RedHawk Linux for a process to access the contents of another process’ address space. • Chapter 10, Non-Uniform Memory Access (NUMA), describes the NUMA support available on certain systems. Part 2 - Administrator • Chapter 11, Configuring and Building the Kernel, provides information on how to configure and build a RedHawk Linux kernel. iii RedHawk Linux User’s Guide • Chapter 12, Kernel Debugging, provides guidelines for saving, restoring and analyzing the kernel memory image using kdump and crash and basic use of the kdb kernel debugger. • Chapter 13, Pluggable Authentication Modules (PAM), describes the PAM authentication capabilities of RedHawk Linux. • Chapter 14, Device Drivers, describes RedHawk functionality and real-time issues involved with writing device drivers. • Chapter 15, PCI-to-VME Support, describes RedHawk’s support for a PCI- to-VME bridge. Part 3 - Common Material • Appendix A, Example Message Queue Programs, contains example programs illustrating the POSIX and System V message queue facilities. • Appendix B, Kernel Tunables for Real-time Features, contains a listing of the kernel tunables that control unique features in RedHawk Linux and their default values in pre-built kernels. • Appendix C, Capabilities, lists the capabilities included in RedHawk Linux and the permissions provided by each. • Appendix D, Kernel Trace Events, lists pre-defined kernel trace points and methods for defining and logging custom events within kernel modules. • Appendix D, Migrating 32-bit Code to 64-bit Code, provides information needed to migrate 32-bit code to 64-bit processing on an x86_64 processor. • Appendix E, Kernel-level Daemons on Shielded CPUs, describes how kernel-level daemons execute on shielded CPUs and provides methods for improving performance. • Appendix F, Cross Processor Interrupts on Shielded CPUs, describes how cross-processor interrupts execute on shielded CPUs and provides methods for improving performance. • Appendix G, Serial Console Setup, provides instructions for configuring a serial console. • Appendix H, Boot Command Line Parameters, discusses the boot parameters unique to RedHawk. • The Glossary provides definitions for terms used throughout this Guide. • The Index contains an alphabetical reference to key terms and concepts and the pages where they occur in the text. Syntax Notation The following notation is used throughout this manual: italic Books, reference cards, and items that the user must specify appear in italic type. Special terms may also appear in italic. iv Preface list bold User input appears in list bold type and must be entered exactly as shown. Names of directories, files, commands, options and man page references also appear in list bold type. list Operating system and program output such as prompts, messages and listings of files and programs appears in list type. [] Brackets enclose command options and arguments that are optional. You do not type the brackets if you choose to specify these options or arguments. hypertext links When viewing this document online, clicking on chapter, section, fig- ure, table and page number references will display the corresponding text. Clicking on Internet URLs provided in blue type will launch your web browser and display the web site. Clicking on publication names and numbers in red type will display the corresponding manual PDF, if accessible. Related Publications The following table lists RedHawk Linux documentation. Click on the red entry to display the document PDF (optional product documentation is available for viewing only if the optional product has been installed). These documents are also available by clicking on the “Documents” icon on the desktop and from Concurrent’s web site at www.ccur.com. RedHawk Linux Operating System Documentation Pub. Number RedHawk Linux Release Notes 0898003 RedHawk Linux User’s Guide 0898004 Real-Time Clock & Interrupt Module (RCIM) User’s Guide 0898007 RedHawk Linux FAQ N/A Optional RedHawk Product Documentation RedHawk Linux Frequency-Based Scheduler (FBS) User’s Guide 0898005 v RedHawk Linux User’s Guide vi Chapter 0Contents Preface . iii Chapter 1 Introduction Overview. 1-1 RedHawk Linux Kernels. 1-3 System Updates. 1-4 Real-Time Features. 1-4 Processor Shielding . 1-4 Processor Affinity . 1-4 User-level Preemption Control . 1-5 Fast Block/Wake Services . 1-5 RCIM Driver . 1-5 Frequency-Based Scheduler . 1-5 /proc Modifications . 1-6 Kernel Trace Facility . 1-6 ptrace Extensions. 1-6 Kernel Preemption . 1-6 Real-Time Scheduler . 1-6 Low Latency Enhancements . 1-7 Priority Inheritance . 1-7 High Resolution Process Accounting . 1-7 Capabilities Support . 1-7 Kernel Debuggers . 1-8 Kernel Core Dumps/Crash Analysis . 1-8 User-level Spin Locks . 1-8 usermap and /proc mmap. 1-8 Hyper-threading. 1-8 XFS Journaling File System . 1-9 POSIX Real-Time Extensions . 1-9 User Priority Scheduling . 1-9 Memory Resident Processes. 1-9 Memory Mapping and Data Sharing . 1-10 Process Synchronization. 1-10 Asynchronous Input/Output . 1-10 Synchronized Input/Output . 1-10 Real-Time Signal Behavior . 1-11 Clocks and Timers . 1-11 Message Queues . 1-11 Chapter 2 Real-Time Performance Overview of the Shielded CPU Model . 2-1 Overview of Determinism . 2-2 Process Dispatch Latency . 2-2 Effect of Disabling Interrupts . 2-4 Effect of Interrupts. 2-5 Effect of Disabling Preemption . 2-8 vii RedHawk Linux User’s Guide Effect of Open Source Device Drivers . 2-9 How Shielding Improves Real-Time Performance . 2-9 Shielding From Background Processes . 2-9 Shielding From Interrupts . 2-10 Shielding From Local Interrupt . 2-11 Interfaces to CPU Shielding . 2-12 Shield Command . 2-12 Shield Command Examples . ..
Recommended publications
  • NUMA-Aware Thread Migration for High Performance NVMM File Systems
    NUMA-Aware Thread Migration for High Performance NVMM File Systems Ying Wang, Dejun Jiang and Jin Xiong SKL Computer Architecture, ICT, CAS; University of Chinese Academy of Sciences fwangying01, jiangdejun, [email protected] Abstract—Emerging Non-Volatile Main Memories (NVMMs) out considering the NVMM usage on NUMA nodes. Besides, provide persistent storage and can be directly attached to the application threads accessing file system rely on the default memory bus, which allows building file systems on non-volatile operating system thread scheduler, which migrates thread only main memory (NVMM file systems). Since file systems are built on memory, NUMA architecture has a large impact on their considering CPU utilization. These bring remote memory performance due to the presence of remote memory access and access and resource contentions to application threads when imbalanced resource usage. Existing works migrate thread and reading and writing files, and thus reduce the performance thread data on DRAM to solve these problems. Unlike DRAM, of NVMM file systems. We observe that when performing NVMM introduces extra latency and lifetime limitations. This file reads/writes from 4 KB to 256 KB on a NVMM file results in expensive data migration for NVMM file systems on NUMA architecture. In this paper, we argue that NUMA- system (NOVA [47] on NVMM), the average latency of aware thread migration without migrating data is desirable accessing remote node increases by 65.5 % compared to for NVMM file systems. We propose NThread, a NUMA-aware accessing local node. The average bandwidth is reduced by thread migration module for NVMM file system.
    [Show full text]
  • Resource Access Control in Real-Time Systems
    Resource Access Control in Real-time Systems Advanced Operating Systems (M) Lecture 8 Lecture Outline • Definitions of resources • Resource access control for static systems • Basic priority inheritance protocol • Basic priority ceiling protocol • Enhanced priority ceiling protocols • Resource access control for dynamic systems • Effects on scheduling • Implementing resource access control 2 Resources • A system has ρ types of resource R1, R2, …, Rρ • Each resource comprises nk indistinguishable units; plentiful resources have no effect on scheduling and so are ignored • Each unit of resource is used in a non-preemptive and mutually exclusive manner; resources are serially reusable • If a resource can be used by more than one job at a time, we model that resource as having many units, each used mutually exclusively • Access to resources is controlled using locks • Jobs attempt to lock a resource before starting to use it, and unlock the resource afterwards; the time the resource is locked is the critical section • If a lock request fails, the requesting job is blocked; a job holding a lock cannot be preempted by a higher priority job needing that lock • Critical sections may nest if a job needs multiple simultaneous resources 3 Contention for Resources • Jobs contend for a resource if they try to lock it at once J blocks 1 Preempt J3 J1 Preempt J3 J2 blocks J2 J3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Priority inversion EDF schedule of J1, J2 and J3 sharing a resource protected by locks (blue shading indicated critical sections).
    [Show full text]
  • Memory and Cache Contention Denial-Of-Service Attack in Mobile Edge Devices
    applied sciences Article Memory and Cache Contention Denial-of-Service Attack in Mobile Edge Devices Won Cho and Joonho Kong * School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea; [email protected] * Correspondence: [email protected] Abstract: In this paper, we introduce a memory and cache contention denial-of-service attack and its hardware-based countermeasure. Our attack can significantly degrade the performance of the benign programs by hindering the shared resource accesses of the benign programs. It can be achieved by a simple C-based malicious code while degrading the performance of the benign programs by 47.6% on average. As another side-effect, our attack also leads to greater energy consumption of the system by 2.1× on average, which may cause shorter battery life in the mobile edge devices. We also propose detection and mitigation techniques for thwarting our attack. By analyzing L1 data cache miss request patterns, we effectively detect the malicious program for the memory and cache contention denial-of-service attack. For mitigation, we propose using instruction fetch width throttling techniques to restrict the malicious accesses to the shared resources. When employing our malicious program detection with the instruction fetch width throttling technique, we recover the system performance and energy by 92.4% and 94.7%, respectively, which means that the adverse impacts from the malicious programs are almost removed. Keywords: memory and cache contention; denial of service attack; shared resources; performance; en- Citation: Cho, W.; Kong, J. Memory ergy and Cache Contention Denial-of-Service Attack in Mobile Edge Devices.
    [Show full text]
  • A Case for NUMA-Aware Contention Management on Multicore Systems
    A Case for NUMA-aware Contention Management on Multicore Systems Sergey Blagodurov Sergey Zhuravlev Mohammad Dashti Simon Fraser University Simon Fraser University Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract performance of individual applications or threads by as much as 80% and the overall workload performance by On multicore systems, contention for shared resources as much as 12% [23]. occurs when memory-intensive threads are co-scheduled Unfortunately studies of contention-aware algorithms on cores that share parts of the memory hierarchy, such focused primarily on UMA (Uniform Memory Access) as last-level caches and memory controllers. Previous systems, where there are multiple shared LLCs, but only work investigated how contention could be addressed a single memory node equipped with the single memory via scheduling. A contention-aware scheduler separates controller, and memory can be accessed with the same competing threads onto separate memory hierarchy do- latency from any core. However, new multicore sys- mains to eliminate resource sharing and, as a conse- tems increasingly use the Non-Uniform Memory Access quence, to mitigate contention. However, all previous (NUMA) architecture, due to its decentralized and scal- work on contention-aware scheduling assumed that the able nature. In modern NUMA systems, there are mul- underlying system is UMA (uniform memory access la- tiple memory nodes, one per memory domain (see Fig- tencies, single memory controller). Modern multicore ure 1). Local nodes can be accessed in less time than re- systems, however, are NUMA, which means that they mote ones, and each node has its own memory controller. feature non-uniform memory access latencies and multi- When we ran the best known contention-aware sched- ple memory controllers.
    [Show full text]
  • Thread Evolution Kit for Optimizing Thread Operations on CE/Iot Devices
    Thread Evolution Kit for Optimizing Thread Operations on CE/IoT Devices Geunsik Lim , Student Member, IEEE, Donghyun Kang , and Young Ik Eom Abstract—Most modern operating systems have adopted the the threads running on CE/IoT devices often unintentionally one-to-one thread model to support fast execution of threads spend a significant amount of time in taking the CPU resource in both multi-core and single-core systems. This thread model, and the frequency of context switch rapidly increases due to which maps the kernel-space and user-space threads in a one- to-one manner, supports quick thread creation and termination the limited system resources, degrading the performance of in high-performance server environments. However, the perfor- the system significantly. In addition, since CE/IoT devices mance of time-critical threads is degraded when multiple threads usually have limited memory space, they may suffer from the are being run in low-end CE devices with limited system re- segmentation fault [16] problem incurred by memory shortages sources. When a CE device runs many threads to support diverse as the number of threads increases and they remain running application functionalities, low-level hardware specifications often lead to significant resource contention among the threads trying for a long time. to obtain system resources. As a result, the operating system Some engineers have attempted to address the challenges encounters challenges, such as excessive thread context switching of IoT environments such as smart homes by using better overhead, execution delay of time-critical threads, and a lack of hardware specifications for CE/IoT devices [3], [17]–[21].
    [Show full text]
  • Computer Architecture Lecture 12: Memory Interference and Quality of Service
    Computer Architecture Lecture 12: Memory Interference and Quality of Service Prof. Onur Mutlu ETH Zürich Fall 2017 1 November 2017 Summary of Last Week’s Lectures n Control Dependence Handling q Problem q Six solutions n Branch Prediction n Trace Caches n Other Methods of Control Dependence Handling q Fine-Grained Multithreading q Predicated Execution q Multi-path Execution 2 Agenda for Today n Shared vs. private resources in multi-core systems n Memory interference and the QoS problem n Memory scheduling n Other approaches to mitigate and control memory interference 3 Quick Summary Papers n "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems” n "The Blacklisting Memory Scheduler: Achieving High Performance and Fairness at Low Cost" n "Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems” n "Parallel Application Memory Scheduling” n "Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning" 4 Shared Resource Design for Multi-Core Systems 5 Memory System: A Shared Resource View Storage 6 Resource Sharing Concept n Idea: Instead of dedicating a hardware resource to a hardware context, allow multiple contexts to use it q Example resources: functional units, pipeline, caches, buses, memory n Why? + Resource sharing improves utilization/efficiency à throughput q When a resource is left idle by one thread, another thread can use it; no need to replicate shared data + Reduces communication latency q For example,
    [Show full text]
  • Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler
    Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler MSc Research Project Cloud Computing Akshatha Mulubagilu Nagaraj Student ID: 18113575 School of Computing National College of Ireland Supervisor: Mr. Vikas Sahni www.ncirl.ie National College of Ireland Project Submission Sheet School of Computing Student Name: Akshatha Mulubagilu Nagaraj Student ID: 18113575 Programme: Cloud Computing Year: 2020 Module: Research Project Supervisor: Mr. Vikas Sahni Submission Due Date: 17/08/2020 Project Title: Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler Word Count: 5060 Page Count: 19 I hereby certify that the information contained in this (my submission) is information pertaining to research I conducted for this project. All information other than my own contribution will be fully referenced and listed in the relevant bibliography section at the rear of the project. ALL internet material must be referenced in the bibliography section. Students are required to use the Referencing Standard specified in the report template. To use other author's written or electronic work is illegal (plagiarism) and may result in disciplinary action. I agree to an electronic copy of my thesis being made publicly available on TRAP the National College of Ireland's Institutional Repository for consultation. Signature: Date: 17th August 2020 PLEASE READ THE FOLLOWING INSTRUCTIONS AND CHECKLIST: Attach a completed copy of this sheet to each project (including multiple copies). Attach a Moodle submission receipt of the online project submission, to each project (including multiple copies). You must ensure that you retain a HARD COPY of the project, both for your own reference and in case a project is lost or mislaid.
    [Show full text]
  • Addressing Shared Resource Contention in Multicore Processors Via Scheduling
    Addressing Shared Resource Contention in Multicore Processors via Scheduling Sergey Zhuravlev Sergey Blagodurov Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada fsergey zhuravlev, sergey blagodurov, alexandra [email protected] Abstract 70 Contention for shared resources on multicore processors remains 60 BEST an unsolved problem in existing systems despite significant re- 50 search efforts dedicated to this problem in the past. Previous solu- WORST 40 tions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how 30 and to what extent contention for shared resource can be mitigated 20 via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate Down to solo realtive 10 into the system. Our study is the first to provide a comprehensive 0 analysis of contention-mitigating techniques that use only schedul- SOPLEX SPHINX GAMESS NAMD AVERAGE % Slow-Down realtive Slow-Down to solo % realtive -10 ing. The most difficult part of the problem is to find a classification Benchmark scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a com- Figure 1. The performance degradation relative to running solo for prehensive analysis of such classification schemes using a newly two different schedules of SPEC CPU2006 applications on an Intel proposed methodology that enables to evaluate these schemes sep- Xeon X3565 quad-core processor (two cores share an LLC). arately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classifi- cation scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory con- power budgets have greatly staggered the development of large sin- troller, memory bus and prefetching hardware.
    [Show full text]
  • Research Collection
    Research Collection Master Thesis Dynamic Thread Allocation for Distributed Jobs using Resource Tokens Author(s): Smesseim, Ali Publication Date: 2019 Permanent Link: https://doi.org/10.3929/ethz-b-000362308 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Dynamic Thread Allocation for Distributed Jobs using Resource Tokens Master Thesis Ali Smesseim August 25, 2019 Advisors Prof. Dr. G. Alonso Dr. I. Psaroudakis Dr. V. Trigonakis Department of Oracle Labs Oracle Labs Computer Science Zurich Zurich ETH Zurich Contents Contentsi 1 Introduction1 2 Background5 2.1 Related work ............................ 5 2.1.1 Parallel programming models............... 5 2.1.2 Simple admission control ................. 6 2.1.3 Thread scheduling..................... 7 2.1.4 Cluster scheduling..................... 8 2.2 System overview .......................... 10 2.2.1 PGX.D overview...................... 10 2.2.2 Callisto runtime system.................. 12 2.2.3 Relation to literature.................... 16 3 Solution Design 19 3.1 Dynamic thread allocation..................... 19 3.2 Scheduler API ........................... 22 3.3 Policies for distributed jobs .................... 24 3.3.1 Outgoing message policy.................. 24 3.3.2 CPU time policy...................... 26 3.3.3 Policy combination..................... 28 3.3.4 Sliding window....................... 30 3.4 Operator assignment within job.................. 30 3.5 Admission control ......................... 31 4 Evaluation 33 4.1 Methodology ............................ 33 4.1.1 Workloads.......................... 33 4.2 Parameter configuration...................... 35 i Contents 4.2.1 Message handling prioritization.............. 35 4.2.2 Network policy configuration ............... 37 4.2.3 Combination function configuration ..........
    [Show full text]
  • Analysis of Application Sensitivity to System Performance Variability in a Dynamic Task Based Runtime
    Analysis of Application Sensitivity to System Performance Variability in a Dynamic Task Based Runtime Galen Shipmany Kevin Pedretti Ramanan Sankaran Patrick McCormick Stephen Olivier Oak Ridge National Laboratory Los Alamos National Laboratory Kurt B. Ferreira Oak Ridge, TN, USA Los Alamos, NM, USA Sandia National Laboratories Albuquerque, NM, USA Sean Treichler Michael Bauer Alex Aiken NVIDIA Stanford University Santa Clara, CA Stanford, CA, USA ABSTRACT tem thereby providing a level of isolation of system services Application scalability can be significantly impacted by node from the application. Management of resource contention level performance variability in HPC. While previous studies at the application level is handled by the application de- have demonstrated the impact of one source of variability, veloper. Many application developers opt for a static par- OS noise, in message passing runtimes, none have explored titioning of resources mirroring domain decomposition due the impact on dynamically scheduled runtimes. In this pa- to its simplicity. However, this approach leaves many HPC per we examine the impact that OS noise has on the Le- applications vulnerable to the effects of OS noise, especially gion runtime. Our work shows that 2:5% net performance at scale. variability at the node level can result in 25% application The growing effects of OS noise has been one of the con- slowdown for MPI+OpenACC based runtimes compared to tributing factors in the development of dynamic runtime 2% slowdown for Legion. We then identify the mechanisms systems such as Charm++ [1], the Open Community Run- that contribute to better noise absorption in Legion, quan- time [2], Uintah [3], StarSs [4] and Legion [5].
    [Show full text]
  • KIT PPT Master
    Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH) Memory Contention – a Problem on Multicores CPU Memory 2 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU Memory 3 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU Memory 4 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU CPU CPU Memory 5 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU CPU CPU Memory CPU CPU CPU CPU 6 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU CPU CPU CPU CPU CPU CPU Memory CPU CPU CPU CPU CPU CPU CPU CPU 7 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention Intel Core2 Quad core0 core1 Bottleneck: memory bus stream idle core2 core3 Stall cycles, increased runtime idle idle core0 core1 stream stream 4.5 core2 core3 idle idle 4 e core0 core1 m i 3.5 stream stream t e n c core2 core3 u n 3 stream stream r a 1 instance t d s 2.5 e 2 instances n on 4 cores z i i l } r 2 4 instances a e p m r 1.5 o n 1 0.5 0 stream memory benchmark 8 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Impact of Resource Contention on Energy Efficiency Longer time to halt More static power Increasing importance of leakage 9 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Achieving Energy Efficiency by Scheduling Scheduler decides When Where In which combination At which frequency setting to execute tasks.
    [Show full text]
  • A Framework for Lock Contention Aware Thread Scheduling for Multicore Multiprocessor Systems
    Shuffling: A Framework for Lock Contention Aware Thread Scheduling for Multicore Multiprocessor Systems Kishore Kumar Pusukuri Rajiv Gupta Laxmi N. Bhuyan Department of Computer Science and Engineering University of California, Riverside Riverside, USA 92521 {kishore, gupta, bhuyan}@cs.ucr.edu ABSTRACT 1 Introduction On a cache-coherent multicore multiprocessor system, the performance of a multithreaded application with high lock The cache-coherent multicore multiprocessing architecture contention is very sensitive to the distribution of application was designed to overcome the scalability limits of the sym- threads across multiple processors (or Sockets). This is metric multiprocessing architecture. Today, multicore mul- because the distribution of threads impacts the frequency of tiprocessor (or multi-socket) systems with a large number lock transfers between Sockets, which in turn impacts the of cores are ubiquitous [4, 5, 6]. For applications with high frequency of last-level cache (LLC) misses that lie on the degree of parallelism it is often necessary to create large critical path of execution. Since the latency of a LLC miss is number of threads and distribute them across the multi- high, an increase of LLC misses on the critical path increases ple multicore Sockets to utilize all the available cores [32]. both lock acquisition latency and critical section processing However, shared-memory multithreaded applications often time. However, thread schedulers for operating systems, such exhibit high lock times (i.e., > 5%) due to frequent synchro- as Solaris and Linux, are oblivious of the lock contention nization of threads [20, 28]. The lock time is defined as the among multiple threads belonging to an application and percentage of elapsed time a process has spent waiting for therefore fail to deliver high performance for multithreaded lock operations in user space [26].
    [Show full text]