Rethinking the I/O Memory Management Unit (IOMMU)

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Rethinking the I/O Memory Management Unit (IOMMU) Moshe Malka Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 Rethinking the I/O Memory Management Unit (IOMMU) Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Moshe Malka Submitted to the Senate of the Technion | Israel Institute of Technology Adar 5775 Haifa March 2015 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 This research was carried out under the supervision of Prof. Dan Tsafrir, in the Faculty of Computer Science. Some results in this thesis have been published as articles by the author and research collaborators in conferences and journals during the course of the author's research period, the most up-to-date versions of which being: 1. Moshe Malka, Nadav Amit, Muly Ben-Yehuda and Dan Tsafrir. rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring. In proceeding of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2015). 2. Moshe Malka, Nadav Amit, and Dan Tsafrir. Efficient IOMMU Intra-Operating System Protection. In proceeding of the 13th USENIX Conference on File and Storage Technologies (FAST 2015) . Acknowledgements I would like to thank my advisor Dan Tsafrir for his devoted guidance and help, my research team Nadav Amit and Muli Ben-Yehuda, my parents and my friends. The generous financial help of the Technion is gratefully acknowledged. Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 Contents List of Figures Abstract 1 Abbreviations and Notations 3 1 Introduction 5 2 Background 7 2.1 Virtual Memory . .8 2.1.1 Physical and Virtual Addressing . .8 2.1.2 Address Spaces . 10 2.1.3 Page Table . 10 2.1.4 Virtual Memory as a Tool for Memory Protection . 11 2.1.5 Address Translation . 13 2.2 Direct Memory Access . 22 2.2.1 Transferring Data from the Memory to the Device . 23 2.2.2 Transferring Data from the Device to the Memory . 23 2.3 Adding Virtual Memory to I/O Transactions . 24 3 rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers 27 3.1 Introduction . 27 3.2 Background . 29 3.2.1 Operating System DMA Protection . 29 3.2.2 IOMMU Design and Implementation . 30 3.2.3 I/O Devices Employing Ring Buffers . 31 3.3 Cost of Safety . 32 3.3.1 Overhead Components . 33 3.3.2 Protection Modes and Measured Overhead . 33 3.3.3 Performance Model . 37 3.4 Design . 38 3.5 Evaluation . 44 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 3.5.1 Methodology . 44 3.5.2 Results . 47 3.5.3 When IOTLB Miss Penalty Matters . 50 3.5.4 Comparing to TLB Prefetchers . 50 3.6 Related Work . 51 4 Efficient IOMMU Intra-Operating System Protection 53 4.1 Introduction . 53 4.2 Intra-OS Protection . 55 4.3 IOVA Allocation and Mapping . 57 4.4 Long-Lasting Ring Interference . 59 4.5 The EiovaR Optimization . 61 4.5.1 EiovaR with Strict Protection . 63 4.5.2 EiovaR with Deferred Protection . 65 4.6 Evaluation . 66 4.6.1 Methodology . 66 4.6.2 Results . 69 4.7 Related Work . 75 5 Reducing the IOTLB Miss Overhead 77 5.1 Introduction . 77 5.2 General description of all the prefetchers we explore . 78 5.3 Markov Prefetcher (MP) . 79 5.3.1 Markov Chain Theorem . 80 5.3.2 Prefetching Using the Markov Chain . 81 5.3.3 Extension to IOMMU . 81 5.4 Recency Based Prefetching (RP) . 81 5.4.1 TLB hit . 82 5.4.2 TLB miss . 82 5.4.3 Extension to IOMMU . 84 5.5 Distance Prefetching (DP) . 84 5.6 Evaluation . 86 5.6.1 Methodology . 86 5.6.2 Results . 86 5.7 Measuring the cost of an Intel IOTLB miss . 90 6 Conclusions 93 6.1 rIOMMU . 93 6.2 eIOVAR . 93 6.3 Reducing the IOTLB Miss Overhead . 93 Hebrew Abstract i Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 List of Figures 2.1 A system that uses physical addressing. .8 2.2 A system that uses virtual addressing. .9 2.3 Flat page table. 11 2.4 Allocating a new virtual page. 12 2.5 Using virtual memory to provide page-level memory protection. 13 2.6 Addressing translation with page table. 14 2.7 Page hit. 14 2.8 Components of a virtual address that are used to access the TLB. 15 2.9 TLB hit. 16 2.10 TLB miss. 17 2.11 A two-level page table hierarchy. Notice that addresses increase from top to bottom. 18 2.12 Address translation with a k-level page table. 19 2.13 Addressing for small memory system. Assume 14-bit virtual addresses (n = 14), 12-bit physical addresses (m = 12), and 64-byte pages (P = 64). 20 2.14 TLB, page table, and cache for small memory system. All values in the TLB, page table, and cache are in hexadecimal notation. 21 2.15 DMA transaction flow with IOMMU sequence diagram. 24 3.1 IOMMU is for devices what the MMU is for processes. 28 3.2 Intel IOMMU data structures for IOVA translation. 30 3.3 A driver drives its device through a ring. With an IOMMU, pointers are IOVAs (both registers and target buffers). 32 3.4 The I/O device driver maps an IOVA v to a physical target buffer p. It then assigns v to the DMA descriptor. 34 3.5 The I/O device writes the packet it receives to the target buffer through v, which the IOMMU translates to p..................... 34 3.6 After the DMA completes, the I/O device driver unmaps v and passes p to a higher-level software layer. 34 3.7 CPU cycles used for processing one packet. The top bar labels are relative to Cnone=1,816 (bottommost grid line). 36 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 3.8 Throughput of Netperf TCP stream as a function of the average number of cycles spent on processing one packet. 36 3.9 The rIOMMU data structures. e) is used only by hardware. The last two fields of rRING are used only by software. 38 3.10 rIOMMU data structures for IOVA translation. 39 3.11 Outline of the rIOMMU logic. All DMAs are carried out with IOVAs that are translated by the rtranslate routine. 40 3.12 Outline of the rIOMMU OS driver, implementing map and unmap, which respectively correspond to Figures 3.4 and 3.6. 41 3.13 Absolute performance numbers of the IOMMU modes when using the Mellanox (top) and Broadcom (bottom) NICs. 47 4.1 IOVA translation using the Intel IOMMU. ................... 56 4.2 Pseudo code of the baseline IOVA allocation scheme. The functions rb next and rb prev return the successor and predecessor of the node they receive, respectively. 59 4.3 The length of each alloc iova search loop in a 40K (sub)sequence of alloc iova calls performed by one Netperf run. One Rx-Tx interference leads to regular linearity. .................................... 61 4.4 Netperf TCP stream iteratively executed under strict protection. The x axis shows the iteration number. .......................... 63 4.5 Average cycles breakdown of map with Netperf/strict. ............ 64 4.6 Average cycles breakdown of unmap with Netperf/strict. ........... 64 4.7 Netperf TCP stream iteratively executed under deferred protection. The x axis shows the iteration number. .......................... 65 4.8 Under deferred protection, EiovaRk eliminates costly linear searches when k exceeds the high-water mark W . ........................ 67 4.9 Length of the alloc iova search loop under the EiovaRk deferred protection regime for three k values when running Netperf TCP Stream. Bigger capacity implies that the searches become shorter on average. Big enough capacity (k ≥ W = 250) eliminates the searches altogether. ............... 67 4.10 The performance of baseline vs. EiovaR allocation, under strict and deferred protection regimes for the Mellanox (top) and Broadcom (bottom) setups. Except for in the case of Netperf RR, higher values indicated better performance. 68 4.11 Netperf Stream throughput (top) and used CPU (bottom) for different message sizes in the Broadcom setup. .......................... 74 4.12 Impact of increased concurrency on Memcached in the Mellanox setup. EiovaR allows the performance to scale. ........................ 75 5.1 General scheme. 78 5.2 Markov state transition diagram, which is represented as a directed graph (right) or a matrix (left). 80 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 5.3 Schematic implementation of the Markov Prefetcher. 82 5.4 Schematic depiction of the recency prefetcher on a TLB hit. 83 5.5 Schematic depiction of the recency prefetcher on a TLB miss. 84 5.6 Schematic depiction of the distance prefetcher on a TLB miss. 85 5.7 Hit rate simulation of Apache benchmarks with message sizes of 1k (top) and 1M (bottom). 87 5.8 Hit rate simulation of Netperf stream with message sizes of 1k (top) and 4k (bottom). 88 5.9 Hit rate simulation of Netperf RR (top) and Memcached (bottom). 89 5.10 Subtraction between the RTT when the IOMMU is enabled and the RTT when the IOMMU is disabled. 91 Technion - Computer Science Department - M.Sc. Thesis MSC-2015-10 - 2015 Technion - Computer Science Department - M.Sc.
Recommended publications
  • Memory Protection at Option

    Memory Protection at Option

    Memory Protection at Option Application-Tailored Memory Safety in Safety-Critical Embedded Systems – Speicherschutz nach Wahl Auf die Anwendung zugeschnittene Speichersicherheit in sicherheitskritischen eingebetteten Systemen Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades Doktor-Ingenieur vorgelegt von Michael Stilkerich Erlangen — 2012 Als Dissertation genehmigt von der Technischen Fakultät Universität Erlangen-Nürnberg Tag der Einreichung: 09.07.2012 Tag der Promotion: 30.11.2012 Dekan: Prof. Dr.-Ing. Marion Merklein Berichterstatter: Prof. Dr.-Ing. Wolfgang Schröder-Preikschat Prof. Dr. Michael Philippsen Abstract With the increasing capabilities and resources available on microcontrollers, there is a trend in the embedded industry to integrate multiple software functions on a single system to save cost, size, weight, and power. The integration raises new requirements, thereunder the need for spatial isolation, which is commonly established by using a memory protection unit (MPU) that can constrain access to the physical address space to a fixed set of address regions. MPU-based protection is limited in terms of available hardware, flexibility, granularity and ease of use. Software-based memory protection can provide an alternative or complement MPU-based protection, but has found little attention in the embedded domain. In this thesis, I evaluate qualitative and quantitative advantages and limitations of MPU-based memory protection and software-based protection based on a multi-JVM. I developed a framework composed of the AUTOSAR OS-like operating system CiAO and KESO, a Java implementation for deeply embedded systems. The framework allows choosing from no memory protection, MPU-based protection, software-based protection, and a combination of the two.
  • A Minimal Powerpc™ Boot Sequence for Executing Compiled C Programs

    A Minimal Powerpc™ Boot Sequence for Executing Compiled C Programs

    Order Number: AN1809/D Rev. 0, 3/2000 Semiconductor Products Sector Application Note A Minimal PowerPCª Boot Sequence for Executing Compiled C Programs PowerPC Systems Architecture & Performance [email protected] This document describes the procedures necessary to successfully initialize a PowerPC processor and begin executing programs compiled using the PowerPC embedded application interface (EABI). The items discussed in this document have been tested for MPC603eª, MPC750, and MPC7400 microprocessors. The methods and source code presented in this document may work unmodiÞed on similar PowerPC platforms as well. This document contains the following topics: ¥ Part I, ÒOverview,Ó provides an overview of the conditions and exceptions for the procedures described in this document. ¥ Part II, ÒPowerPC Processor Initialization,Ó provides information on the general setup of the processor registers, caches, and MMU. ¥ Part III, ÒPowerPC EABI Compliance,Ó discusses aspects of the EABI that apply directly to preparing to jump into a compiled C program. ¥ Part IV, ÒSample Boot Sequence,Ó describes the basic operation of the boot sequence and the many options of conÞguration, explains in detail a sample conÞgurable boot and how the code may be modiÞed for use in different environments, and discusses the compilation procedure using the supporting GNU build environment. ¥ Part V, ÒSource Files,Ó contains the complete source code for the Þles ppcinit.S, ppcinit.h, reg_defs.h, ld.script, and MakeÞle. This document contains information on a new product under development by Motorola. Motorola reserves the right to change or discontinue this product without notice. © Motorola, Inc., 2000. All rights reserved. Overview Part I Overview The procedures discussed in this document perform only the minimum amount of work necessary to execute a user program.
  • Memory Management

    Memory Management

    Memory Management These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang’s courses at GMU can make a single machine readable copy and print a single copy of each slide for their own reference as long as the slide contains the copyright statement, and the GMU facilities are not used to produce the paper copies. Permission for any other use, either in machine-readable or printed form, must be obtained form the author in writing. CS471 1 Memory 0000 A a set of data entries 0001 indexed by addresses 0002 0003 Typically the basic data 0004 0005 unit is byte 0006 0007 In 32 bit machines, 4 bytes 0008 0009 grouped to words 000A 000B Have you seen those 000C 000D DRAM chips in your PC ? 000E 000F CS471 2 1 Logical vs. Physical Address Space The addresses used by the RAM chips are called physical addresses. In primitive computing devices, the address a programmer/processor use is the actual address. – When the process fetches byte 000A, the content of 000A is provided. CS471 3 In advanced computers, the processor operates in a separate address space, called logical address, or virtual address. A Memory Management Unit (MMU) is used to map logical addresses to physical addresses. – Various mapping technologies to be discussed – MMU is a hardware component – Modern processors have their MMU on the chip (Pentium, Athlon, …) CS471 4 2 Continuous Mapping: Dynamic Relocation Virtual Physical Memory Memory Space Processor 4000 The processor want byte 0010, the 4010th byte is fetched CS471 5 MMU for Dynamic Relocation CS471 6 3 Segmented Mapping Virtual Physical Memory Memory Space Processor Obviously, more sophisticated MMU needed to implement this CS471 7 Swapping A process can be swapped temporarily out of memory to a backing store (a hard drive), and then brought back into memory for continued execution.
  • Arm System Memory Management Unit Architecture Specification

    Arm System Memory Management Unit Architecture Specification

    Arm® System Memory Management Unit Architecture Specification SMMU architecture version 3 Document number ARM IHI 0070 Document version D.a Document confidentiality Non-confidential Copyright © 2016-2020 Arm Limited or its affiliates. All rights reserved. Arm System Memory Management Unit Architecture Specifica- tion Release information Date Version Changes 2020/Aug/31 D.a• Update with SMMUv3.3 architecture • Amendments and clarifications 2019/Jul/18 C.a• Amendments and clarifications 2018/Mar/16 C• Update with SMMUv3.2 architecture • Further amendments and clarifications 2017/Jun/15 B• Amendments and clarifications 2016/Oct/15 A• First release ii Non-Confidential Proprietary Notice This document is protected by copyright and other related rights and the practice or implementation of the information contained in this document may be protected by one or more patents or pending patent applications. No part of this document may be reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated. Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations infringe any third party patents. THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to, and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.
  • Quantifying the Performance of Garbage Collection Vs. Explicit Memory Management

    Quantifying the Performance of Garbage Collection Vs. Explicit Memory Management

    Quantifying the Performance of Garbage Collection vs. Explicit Memory Management ¤ Matthew Hertz Emery D. Berger Computer Science Department Dept. of Computer Science Canisius College University of Massachusetts Amherst Buffalo, NY 14208 Amherst, MA 01003 [email protected] [email protected] ABSTRACT physical memory is scarce, paging causes garbage collection to run Garbage collection yields numerous software engineering benefits, an order of magnitude slower than explicit memory management. but its quantitative impact on performance remains elusive. One Categories and Subject Descriptors can compare the cost of conservative garbage collection to explicit memory management in C/C++ programs by linking in an appro- D.3.3 [Programming Languages]: Dynamic storage management; priate collector. This kind of direct comparison is not possible for D.3.4 [Processors]: Memory management (garbage collection) languages designed for garbage collection (e.g., Java), because pro- General Terms grams in these languages naturally do not contain calls to free. Experimentation, Measurement, Performance Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collec- Keywords tion remains unknown. oracular memory management, garbage collection, explicit mem- We introduce a novel experimental methodology that lets us quan- ory management, performance analysis, time-space tradeoff, through- tify the performance of precise garbage collection versus explicit put, paging memory management. Our system allows us to treat unaltered Java programs as if they used explicit memory management by relying 1. Introduction on oracles to insert calls to free. These oracles are generated Garbage collection, or automatic memory management, provides from profile information gathered in earlier application runs.
  • Introduction to Uclinux

    Introduction to Uclinux

    Introduction to uClinux V M Introduction to uClinux Michael Opdenacker Free Electrons http://free-electrons.com Created with OpenOffice.org 2.x Thanks to Nicolas Rougier (Copyright 2003, http://webloria.loria.fr/~rougier/) for the Tux image Introduction to uClinux © Copyright 2004-2007, Free Electrons Creative Commons Attribution-ShareAlike 2.5 license http://free-electrons.com Nov 20, 2007 1 Rights to copy Attribution ± ShareAlike 2.5 © Copyright 2004-2007 You are free Free Electrons to copy, distribute, display, and perform the work [email protected] to make derivative works to make commercial use of the work Document sources, updates and translations: Under the following conditions http://free-electrons.com/articles/uclinux Attribution. You must give the original author credit. Corrections, suggestions, contributions and Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license translations are welcome! identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. License text: http://creativecommons.org/licenses/by-sa/2.5/legalcode Introduction to uClinux © Copyright 2004-2007, Free Electrons Creative Commons Attribution-ShareAlike 2.5 license http://free-electrons.com Nov 20, 2007 2 Best viewed with... This document is best viewed with a recent PDF reader or with OpenOffice.org itself! Take advantage of internal or external hyperlinks. So, don't hesitate to click on them! Find pages quickly thanks to automatic search.
  • I.T.S.O. Powerpc an Inside View

    I.T.S.O. Powerpc an Inside View

    SG24-4299-00 PowerPC An Inside View IBM SG24-4299-00 PowerPC An Inside View Take Note! Before using this information and the product it supports, be sure to read the general information under “Special Notices” on page xiii. First Edition (September 1995) This edition applies to the IBM PC PowerPC hardware and software products currently announced at the date of publication. Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address given below. An ITSO Technical Bulletin Evaluation Form for reader′s feedback appears facing Chapter 1. If the form has been removed, comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. JLPC Building 014 Internal Zip 5220 1000 NW 51st Street Boca Raton, Florida 33431-1328 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. Copyright International Business Machines Corporation 1995. All rights reserved. Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Abstract This document provides technical details on the PowerPC technology. It focuses on the features and advantages of the PowerPC Architecture and includes an historical overview of the development of the reduced instruction set computer (RISC) technology. It also describes in detail the IBM Power Series product family based on PowerPC technology, including IBM Personal Computer Power Series 830 and 850 and IBM ThinkPad Power Series 820 and 850.
  • Understanding the Linux Kernel, 3Rd Edition by Daniel P

    Understanding the Linux Kernel, 3Rd Edition by Daniel P

    1 Understanding the Linux Kernel, 3rd Edition By Daniel P. Bovet, Marco Cesati ............................................... Publisher: O'Reilly Pub Date: November 2005 ISBN: 0-596-00565-2 Pages: 942 Table of Contents | Index In order to thoroughly understand what makes Linux tick and why it works so well on a wide variety of systems, you need to delve deep into the heart of the kernel. The kernel handles all interactions between the CPU and the external world, and determines which programs will share processor time, in what order. It manages limited memory so well that hundreds of processes can share the system efficiently, and expertly organizes data transfers so that the CPU isn't kept waiting any longer than necessary for the relatively slow disks. The third edition of Understanding the Linux Kernel takes you on a guided tour of the most significant data structures, algorithms, and programming tricks used in the kernel. Probing beyond superficial features, the authors offer valuable insights to people who want to know how things really work inside their machine. Important Intel-specific features are discussed. Relevant segments of code are dissected line by line. But the book covers more than just the functioning of the code; it explains the theoretical underpinnings of why Linux does things the way it does. This edition of the book covers Version 2.6, which has seen significant changes to nearly every kernel subsystem, particularly in the areas of memory management and block devices. The book focuses on the following topics: • Memory management, including file buffering, process swapping, and Direct memory Access (DMA) • The Virtual Filesystem layer and the Second and Third Extended Filesystems • Process creation and scheduling • Signals, interrupts, and the essential interfaces to device drivers • Timing • Synchronization within the kernel • Interprocess Communication (IPC) • Program execution Understanding the Linux Kernel will acquaint you with all the inner workings of Linux, but it's more than just an academic exercise.
  • 18-447: Computer Architecture Lecture 18: Virtual Memory III

    18-447: Computer Architecture Lecture 18: Virtual Memory III

    18-447: Computer Architecture Lecture 18: Virtual Memory III Yoongu Kim Carnegie Mellon University Spring 2013, 3/1 Upcoming Schedule • Today: Lab 3 Due • Today: Lecture/Recitation • Monday (3/4): Lecture – Q&A Session • Wednesday (3/6): Midterm 1 – 12:30 – 2:20 – Closed book – One letter-sized cheat sheet • Can be double-sided • Can be either typed or written Readings • Required – P&H, Chapter 5.4 – Hamacher et al., Chapter 8.8 • Recommended – Denning, P. J. Virtual Memory. ACM Computing Surveys. 1970 – Jacob, B., & Mudge, T. Virtual Memory in Contemporary Microprocessors. IEEE Micro. 1998. • References – Intel Manuals for 8086/80286/80386/IA32/Intel64 – MIPS Manual Review of Last Lecture • Two approaches to virtual memory 1. Segmentation • Not as popular today 2. Paging • What is usually meant today by “virtual memory” • Virtual memory requires HW+SW support – HW component is called the MMU • Memory management unit – How to translate: virtual ↔ physical addresses? Review of Last Lecture (cont’d) 1. Segmentation – Divide the address space into segments • Physical Address = BASE + Virtual Address – Case studies: Intel 8086, 80286, x86, x86-64 – Advantages • Modularity/Isolation/Protection • Translation is simple – Disadvantages • Complicated management • Fragmentation • Only a few segments are addressable at the same time Review of Last Lecture (cont’d) 2. Paging – Virtual address space: Large, contiguous, imaginary – Page: A fixed-sized chunk of the address space – Mapping: Virtual pages → physical pages – Page table: The data structure that stores the mappings • Problem #1: Too large – Solution: Hierarchical page tables • Problem #2: Large latency – Solution: Translation Lookaside Buffer (TLB) – Case study: Intel 80386 – Today, we’ll talk more about paging ..
  • Virtual Memory and Linux

    Virtual Memory and Linux

    Virtual Memory and Linux Matt Porter Embedded Linux Conference Europe October 13, 2016 About the original author, Alan Ott ● Unfortunately, he is unable to be here at ELCE 2016. ● Veteran embedded systems and Linux developer ● Linux Architect at SoftIron – 64-bit ARM servers and data center appliances – Hardware company, strong on software – Overdrive 3000, more products in process Physical Memory Single Address Space ● Simple systems have a single address space ● Memory and peripherals share – Memory is mapped to one part – Peripherals are mapped to another ● All processes and OS share the same memory space – No memory protection! – Processes can stomp one another – User space can stomp kernel mem! Single Address Space ● CPUs with single address space ● 8086-80206 ● ARM Cortex-M ● 8- and 16-bit PIC ● AVR ● SH-1, SH-2 ● Most 8- and 16-bit systems x86 Physical Memory Map ● Lots of Legacy ● RAM is split (DOS Area and Extended) ● Hardware mapped between RAM areas. ● High and Extended accessed differently Limitations ● Portable C programs expect flat memory ● Multiple memory access methods limit portability ● Management is tricky ● Need to know or detect total RAM ● Need to keep processes separated ● No protection ● Rogue programs can corrupt the entire system Virtual Memory What is Virtual Memory? ● Virtual Memory is a system that uses an address mapping ● Maps virtual address space to physical address space – Maps virtual addresses to physical RAM – Maps virtual addresses to hardware devices ● PCI devices ● GPU RAM ● On-SoC IP blocks What is Virtual Memory? ● Advantages ● Each processes can have a different memory mapping – One process's RAM is inaccessible (and invisible) to other processes.
  • ACDC: Towards a Universal Mutator for Benchmarking Heap Management Systems

    ACDC: Towards a Universal Mutator for Benchmarking Heap Management Systems

    ACDC: Towards a Universal Mutator for Benchmarking Heap Management Systems Martin Aigner Christoph M. Kirsch University of Salzburg fi[email protected] Abstract liveness deallocation delay We present ACDC, an open-source benchmark that may be config- ured to emulate explicit single- and multi-threaded memory alloca- tion, sharing, access, and deallocation behavior to expose virtually time any relevant allocator performance differences. ACDC mimics pe- ... ... allocation accesses last access deallocation riodic memory allocation and deallocation (AC) as well as persis- tent memory (DC). Memory may be allocated thread-locally and shared among multiple threads to study multicore scalability and even false sharing. Memory may be deallocated by threads other lifetime than the allocating threads to study blowup memory fragmentation. Memory may be accessed and deallocated sequentially in alloca- Figure 1: The lifecycle of an object tion order or in tree-like traversals to expose allocator deficiencies in exploiting spatial locality. We demonstrate ACDC’s capabili- ties with seven state-of-the-art allocators for C/C++ in an empir- ical study which also reveals interesting performance differences accesses to the allocated memory, and ends with the deallocation between the allocators. of the allocated memory. The time from allocation to deallocation is called the lifetime of an object. The time from allocation to last Categories and Subject Descriptors D.3.4 [Programming Lan- access is called the liveness of an object which ACDC, unlike other guages]: Memory management benchmarking tools, also emulates explicitly by controlling object General Terms Performance, Measurement access. The difference between lifetime and liveness of an object, here called deallocation delay, emulates mutator inefficiencies in Keywords benchmark; explicit heap management; multicore identifying dead objects for deallocation which may in turn expose allocator inefficiencies in handling dead memory.
  • Memory Management In

    Memory Management In

    Memory Management Prof. James L. Frankel Harvard University Version of 7:34 PM 2-Oct-2018 Copyright © 2018, 2017, 2015 James L. Frankel. All rights reserved. Memory Management • Ideal memory • Large • Fast • Non-volatile (keeps state without power) • Memory hierarchy • Extremely limited number of registers in CPU • Small amount of fast, expensive memory – caches • Lots of medium speed, medium price main memory • Terabytes of slow, cheap disk storage • Memory manager handles the memory hierarchy Basic Memory Management Three simple ways of organizing memory for monoprogramming without swapping or paging (this is, an operating system with one user process) Multiprogramming with Fixed Partitions • Fixed memory partitions • separate input queues for each partition • single input queue Probabilistic Model of Multiprocessing • Each process is in CPU wait for fraction f of the time • There are n processes with one processor • If the processes are independent of each other, then the probability that all processes are in CPU wait is fn • So, the probability that the CPU is busy is 1 – fn • However, the processes are not independent • They are all competing for one processor • More than one process may be using any one I/O device • Better model would be constructed using queuing theory Modeling Multiprogramming Degree of multiprogramming CPU utilization as a function of number of processes in memory Analysis of Multiprogramming System Performance • Arrival and work requirements of 4 jobs • CPU utilization for 1 – 4 jobs with 80% I/O wait • Sequence of events as jobs arrive and finish • note numbers show amout of CPU time jobs get in each interval Relocation and Protection • At time program is written, uncertain where program will be loaded in memory • Therefore, address locations of variables and code cannot be absolute – enforce relocation • Must ensure that a program does not access other processes’ memory – enforce protection • Static vs.