Thread Evolution Kit for Optimizing Thread Operations on CE/Iot Devices
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
NUMA-Aware Thread Migration for High Performance NVMM File Systems
NUMA-Aware Thread Migration for High Performance NVMM File Systems Ying Wang, Dejun Jiang and Jin Xiong SKL Computer Architecture, ICT, CAS; University of Chinese Academy of Sciences fwangying01, jiangdejun, [email protected] Abstract—Emerging Non-Volatile Main Memories (NVMMs) out considering the NVMM usage on NUMA nodes. Besides, provide persistent storage and can be directly attached to the application threads accessing file system rely on the default memory bus, which allows building file systems on non-volatile operating system thread scheduler, which migrates thread only main memory (NVMM file systems). Since file systems are built on memory, NUMA architecture has a large impact on their considering CPU utilization. These bring remote memory performance due to the presence of remote memory access and access and resource contentions to application threads when imbalanced resource usage. Existing works migrate thread and reading and writing files, and thus reduce the performance thread data on DRAM to solve these problems. Unlike DRAM, of NVMM file systems. We observe that when performing NVMM introduces extra latency and lifetime limitations. This file reads/writes from 4 KB to 256 KB on a NVMM file results in expensive data migration for NVMM file systems on NUMA architecture. In this paper, we argue that NUMA- system (NOVA [47] on NVMM), the average latency of aware thread migration without migrating data is desirable accessing remote node increases by 65.5 % compared to for NVMM file systems. We propose NThread, a NUMA-aware accessing local node. The average bandwidth is reduced by thread migration module for NVMM file system. -
Demarinis Kent Williams-King Di Jin Rodrigo Fonseca Vasileios P
sysfilter: Automated System Call Filtering for Commodity Software Nicholas DeMarinis Kent Williams-King Di Jin Rodrigo Fonseca Vasileios P. Kemerlis Department of Computer Science Brown University Abstract This constant stream of additional functionality integrated Modern OSes provide a rich set of services to applications, into modern applications, i.e., feature creep, not only has primarily accessible via the system call API, to support the dire effects in terms of security and protection [1, 71], but ever growing functionality of contemporary software. How- also necessitates a rich set of OS services: applications need ever, despite the fact that applications require access to part of to interact with the OS kernel—and, primarily, they do so the system call API (to function properly), OS kernels allow via the system call (syscall) API [52]—in order to perform full and unrestricted use of the entire system call set. This not useful tasks, such as acquiring or releasing memory, spawning only violates the principle of least privilege, but also enables and terminating additional processes and execution threads, attackers to utilize extra OS services, after seizing control communicating with other programs on the same or remote of vulnerable applications, or escalate privileges further via hosts, interacting with the filesystem, and performing I/O and exploiting vulnerabilities in less-stressed kernel interfaces. process introspection. To tackle this problem, we present sysfilter: a binary Indicatively, at the time of writing, the Linux -
Resource Access Control in Real-Time Systems
Resource Access Control in Real-time Systems Advanced Operating Systems (M) Lecture 8 Lecture Outline • Definitions of resources • Resource access control for static systems • Basic priority inheritance protocol • Basic priority ceiling protocol • Enhanced priority ceiling protocols • Resource access control for dynamic systems • Effects on scheduling • Implementing resource access control 2 Resources • A system has ρ types of resource R1, R2, …, Rρ • Each resource comprises nk indistinguishable units; plentiful resources have no effect on scheduling and so are ignored • Each unit of resource is used in a non-preemptive and mutually exclusive manner; resources are serially reusable • If a resource can be used by more than one job at a time, we model that resource as having many units, each used mutually exclusively • Access to resources is controlled using locks • Jobs attempt to lock a resource before starting to use it, and unlock the resource afterwards; the time the resource is locked is the critical section • If a lock request fails, the requesting job is blocked; a job holding a lock cannot be preempted by a higher priority job needing that lock • Critical sections may nest if a job needs multiple simultaneous resources 3 Contention for Resources • Jobs contend for a resource if they try to lock it at once J blocks 1 Preempt J3 J1 Preempt J3 J2 blocks J2 J3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Priority inversion EDF schedule of J1, J2 and J3 sharing a resource protected by locks (blue shading indicated critical sections). -
Memory and Cache Contention Denial-Of-Service Attack in Mobile Edge Devices
applied sciences Article Memory and Cache Contention Denial-of-Service Attack in Mobile Edge Devices Won Cho and Joonho Kong * School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea; [email protected] * Correspondence: [email protected] Abstract: In this paper, we introduce a memory and cache contention denial-of-service attack and its hardware-based countermeasure. Our attack can significantly degrade the performance of the benign programs by hindering the shared resource accesses of the benign programs. It can be achieved by a simple C-based malicious code while degrading the performance of the benign programs by 47.6% on average. As another side-effect, our attack also leads to greater energy consumption of the system by 2.1× on average, which may cause shorter battery life in the mobile edge devices. We also propose detection and mitigation techniques for thwarting our attack. By analyzing L1 data cache miss request patterns, we effectively detect the malicious program for the memory and cache contention denial-of-service attack. For mitigation, we propose using instruction fetch width throttling techniques to restrict the malicious accesses to the shared resources. When employing our malicious program detection with the instruction fetch width throttling technique, we recover the system performance and energy by 92.4% and 94.7%, respectively, which means that the adverse impacts from the malicious programs are almost removed. Keywords: memory and cache contention; denial of service attack; shared resources; performance; en- Citation: Cho, W.; Kong, J. Memory ergy and Cache Contention Denial-of-Service Attack in Mobile Edge Devices. -
A Case for NUMA-Aware Contention Management on Multicore Systems
A Case for NUMA-aware Contention Management on Multicore Systems Sergey Blagodurov Sergey Zhuravlev Mohammad Dashti Simon Fraser University Simon Fraser University Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract performance of individual applications or threads by as much as 80% and the overall workload performance by On multicore systems, contention for shared resources as much as 12% [23]. occurs when memory-intensive threads are co-scheduled Unfortunately studies of contention-aware algorithms on cores that share parts of the memory hierarchy, such focused primarily on UMA (Uniform Memory Access) as last-level caches and memory controllers. Previous systems, where there are multiple shared LLCs, but only work investigated how contention could be addressed a single memory node equipped with the single memory via scheduling. A contention-aware scheduler separates controller, and memory can be accessed with the same competing threads onto separate memory hierarchy do- latency from any core. However, new multicore sys- mains to eliminate resource sharing and, as a conse- tems increasingly use the Non-Uniform Memory Access quence, to mitigate contention. However, all previous (NUMA) architecture, due to its decentralized and scal- work on contention-aware scheduling assumed that the able nature. In modern NUMA systems, there are mul- underlying system is UMA (uniform memory access la- tiple memory nodes, one per memory domain (see Fig- tencies, single memory controller). Modern multicore ure 1). Local nodes can be accessed in less time than re- systems, however, are NUMA, which means that they mote ones, and each node has its own memory controller. feature non-uniform memory access latencies and multi- When we ran the best known contention-aware sched- ple memory controllers. -
Red Hat Enterprise Linux for Real Time 7 Tuning Guide
Red Hat Enterprise Linux for Real Time 7 Tuning Guide Advanced tuning procedures for Red Hat Enterprise Linux for Real Time Radek Bíba David Ryan Cheryn Tan Lana Brindley Alison Young Red Hat Enterprise Linux for Real Time 7 Tuning Guide Advanced tuning procedures for Red Hat Enterprise Linux for Real Time Radek Bíba Red Hat Customer Content Services [email protected] David Ryan Red Hat Customer Content Services [email protected] Cheryn Tan Red Hat Customer Content Services Lana Brindley Red Hat Customer Content Services Alison Young Red Hat Customer Content Services Legal Notice Copyright © 2015 Red Hat, Inc. This document is licensed by Red Hat under the Creative Commons Attribution-ShareAlike 3.0 Unported License. If you distribute this document, or a modified version of it, you must provide attribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all Red Hat trademarks must be removed. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. -
Computer Architecture Lecture 12: Memory Interference and Quality of Service
Computer Architecture Lecture 12: Memory Interference and Quality of Service Prof. Onur Mutlu ETH Zürich Fall 2017 1 November 2017 Summary of Last Week’s Lectures n Control Dependence Handling q Problem q Six solutions n Branch Prediction n Trace Caches n Other Methods of Control Dependence Handling q Fine-Grained Multithreading q Predicated Execution q Multi-path Execution 2 Agenda for Today n Shared vs. private resources in multi-core systems n Memory interference and the QoS problem n Memory scheduling n Other approaches to mitigate and control memory interference 3 Quick Summary Papers n "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems” n "The Blacklisting Memory Scheduler: Achieving High Performance and Fairness at Low Cost" n "Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems” n "Parallel Application Memory Scheduling” n "Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning" 4 Shared Resource Design for Multi-Core Systems 5 Memory System: A Shared Resource View Storage 6 Resource Sharing Concept n Idea: Instead of dedicating a hardware resource to a hardware context, allow multiple contexts to use it q Example resources: functional units, pipeline, caches, buses, memory n Why? + Resource sharing improves utilization/efficiency à throughput q When a resource is left idle by one thread, another thread can use it; no need to replicate shared data + Reduces communication latency q For example, -
SUSE Linux Enterprise Server 12 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 12 SP4
SUSE Linux Enterprise Server 12 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 12 SP4 An administrator's guide for problem detection, resolution and optimization. Find how to inspect and optimize your system by means of monitoring tools and how to eciently manage resources. Also contains an overview of common problems and solutions and of additional help and documentation resources. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents About This Guide xii 1 Available Documentation xiii -
Greg Kroah-Hartman [email protected] Github.Com/Gregkh/Presentation-Kdbus
kdbus IPC for the modern world Greg Kroah-Hartman [email protected] github.com/gregkh/presentation-kdbus Interprocess Communication ● signal ● synchronization ● communication standard signals realtime The Linux Programming Interface, Michael Kerrisk, page 878 POSIX semaphore futex synchronization named eventfd unnamed semaphore System V semaphore “record” lock file lock file lock mutex threads condition variables barrier read/write lock The Linux Programming Interface, Michael Kerrisk, page 878 data transfer pipe communication FIFO stream socket pseudoterminal POSIX message queue message System V message queue memory mapping System V shared memory POSIX shared memory shared memory memory mapping Anonymous mapping mapped file The Linux Programming Interface, Michael Kerrisk, page 878 Android ● ashmem ● pmem ● binder ashmem ● POSIX shared memory for the lazy ● Uses virtual memory ● Can discard segments under pressure ● Unknown future pmem ● shares memory between kernel and user ● uses physically contigous memory ● GPUs ● Unknown future binder ● IPC bus for Android system ● Like D-Bus, but “different” ● Came from system without SysV types ● Works on object / message level ● Needs large userspace library ● NEVER use outside an Android system binder ● File descriptor passing ● Used for Intents and application separation ● Good for small messages ● Not for streams of data ● NEVER use outside an Android system QNX message passing ● Tight coupling to microkernel ● Send message and control, to another process ● Used to build complex messages -
Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler
Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler MSc Research Project Cloud Computing Akshatha Mulubagilu Nagaraj Student ID: 18113575 School of Computing National College of Ireland Supervisor: Mr. Vikas Sahni www.ncirl.ie National College of Ireland Project Submission Sheet School of Computing Student Name: Akshatha Mulubagilu Nagaraj Student ID: 18113575 Programme: Cloud Computing Year: 2020 Module: Research Project Supervisor: Mr. Vikas Sahni Submission Due Date: 17/08/2020 Project Title: Optimizing Kubernetes Performance by Handling Resource Contention with Custom Scheduler Word Count: 5060 Page Count: 19 I hereby certify that the information contained in this (my submission) is information pertaining to research I conducted for this project. All information other than my own contribution will be fully referenced and listed in the relevant bibliography section at the rear of the project. ALL internet material must be referenced in the bibliography section. Students are required to use the Referencing Standard specified in the report template. To use other author's written or electronic work is illegal (plagiarism) and may result in disciplinary action. I agree to an electronic copy of my thesis being made publicly available on TRAP the National College of Ireland's Institutional Repository for consultation. Signature: Date: 17th August 2020 PLEASE READ THE FOLLOWING INSTRUCTIONS AND CHECKLIST: Attach a completed copy of this sheet to each project (including multiple copies). Attach a Moodle submission receipt of the online project submission, to each project (including multiple copies). You must ensure that you retain a HARD COPY of the project, both for your own reference and in case a project is lost or mislaid. -
Futexes Are Tricky
Futexes Are Tricky Ulrich Drepper Red Hat, Inc. [email protected] January 31, 2008 Abstract Starting with early version of the 2.5 series, the Linux kernel contains a light-weight method for process synchronization. It is used in the modern thread library implementation but is also useful when used directly. This article introduces the concept and user level code to use them. 1 Preface addr1. It has a size of 4 bytes on all platforms, 32-bit and 64-bit. The value of the variable is fully under the control 1 The base reference for futexes has been “Fuss, Futexes of the application. No value has a specific meaning. and Furwocks: Fast User Level Locking in Linux” writ- ten by Franke, Russell, and Kirkwood, released in the Any memory address in regular memory (excluding some- proceedings of the 2002 OLS [1]. This document is still thing like DMA areas etc) can be used for the futex. The mostly valid. But the kernel functionality got extended only requirement is that the variable is aligned at a mul- and generally improved. The biggest weakness, though, tiple of sizeof(int). is the lack of instruction on how to use futexes correctly. Rusty Russell distributes a package containing user level It is not obvious from the prototype, but the kernel han- code (ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/) dles the actual physical addresses of the futexes. I.e., if but unfortunately this code is not very well documented two processes reference a futex in a memory region they and worse, as of this writing the code is actually incor- share, they will reference the same futex object. -
Addressing Shared Resource Contention in Multicore Processors Via Scheduling
Addressing Shared Resource Contention in Multicore Processors via Scheduling Sergey Zhuravlev Sergey Blagodurov Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada fsergey zhuravlev, sergey blagodurov, alexandra [email protected] Abstract 70 Contention for shared resources on multicore processors remains 60 BEST an unsolved problem in existing systems despite significant re- 50 search efforts dedicated to this problem in the past. Previous solu- WORST 40 tions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how 30 and to what extent contention for shared resource can be mitigated 20 via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate Down to solo realtive 10 into the system. Our study is the first to provide a comprehensive 0 analysis of contention-mitigating techniques that use only schedul- SOPLEX SPHINX GAMESS NAMD AVERAGE % Slow-Down realtive Slow-Down to solo % realtive -10 ing. The most difficult part of the problem is to find a classification Benchmark scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a com- Figure 1. The performance degradation relative to running solo for prehensive analysis of such classification schemes using a newly two different schedules of SPEC CPU2006 applications on an Intel proposed methodology that enables to evaluate these schemes sep- Xeon X3565 quad-core processor (two cores share an LLC). arately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classifi- cation scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory con- power budgets have greatly staggered the development of large sin- troller, memory bus and prefetching hardware.