Systemy Operacyjne Komputerów Przemysłowych OS Linux, Część 3

Total Page:16

File Type:pdf, Size:1020Kb

Systemy Operacyjne Komputerów Przemysłowych OS Linux, Część 3 Systemy operacyjne komputerów przemysłowych OS Linux, część 3 Iwona Kochanska´ Katedra Systemów Sonarowych WETI PG Listopad 2019 GUT – Intel 2015/16 1/50 Pami˛ec´ fizyczna - liniowa (“flat”) I wszystkie procesy współdziel ˛aten sam obszar pami˛eci I przykładowe CPU: 8086-80206, ARM Cortex-M, 8- i 16-bit PIC, AVR, wi˛ekszos´c´ systemow 8- i 16-bitowych GUT – Intel 2015/16 2/50 Pami˛ec´ fizyczna - model x86 GUT – Intel 2015/16 3/50 Pami˛ec´ fizyczna - model x86 I Pami˛ec´ fizyczna podzielona jest na strony I Rozmiar strony jest rózny˙ dla róznych˙ architektur. W wi˛ekszosci´ przypadków to 4096 B I Ograniczona przenosno´ s´c´ programów C I Trzeba znac´ (lub sprawdzac)´ całkowit ˛ailos´c´ pami˛eciRAM I Trzeba pilnowac,´ by procesy miały osobne obszary pami˛eci I Zle´ napisane programy mog ˛apowodowac´ awari˛ecałego systemu GUT – Intel 2015/16 4/50 Linux to system obsługuj ˛acypami˛ec´ wirtualn ˛a I Mapowanie wirtualnej przestrzeni adreswej na: I przestrzen´ pami˛ecifizycznej RAM I urz ˛adzenia, np. PCI, GPU RAM I Zalety: I kazdy˙ proces moze˙ miec´ własn ˛aprzestrzen´ adresow ˛a; przestrzenie adresowe s ˛aod siebie odseparowane, I przestrzen´ adresowa j ˛adra jest niewidoczna dla przestrzeni adresowej uzytkownika˙ I pami˛ec´ fizyczna RAM moze˙ byc´ mapowana przez wiele procesów - pami˛ec´ współdzielona (shared memory) I obszary pami˛ecimog ˛amiec´ rózne˙ ustawienia dost˛epu(read, write, execute) GUT – Intel 2015/16 5/50 Linux to system obsługuj ˛acypami˛ec´ wirtualn ˛a I Dwie przestrzenie adresowe: I adresy fizyczne - uzywane˙ przez urz ˛adzenia (DMA, peryferia) I adresy wirtualne - uzywane˙ przez programy (instrukcje load/store dla procesorów RISC, wszelkie instrukcje dost˛epudo pami˛eciRAM dla procesorów CISC) I Kazdy˙ adres w pami˛eci(fizycznej lub wirtualnej) mozna˙ przedstawic´ jako wielokrotnos´c´ rozmiaru strony oraz przesuni˛ecia I Przestrzen´ adresowa pami˛eciwirtualnej I dla procesorów 32-bitowych: od 0 do 0xffffffff, podzielona na strony o rozmiarze 4 kB I dla procesorów 64-bitowych: 48 bitów adresu (maks. 256 TB), po połowie dla j ˛adra i przestrzeni uzytkownika˙ GUT – Intel 2015/16 6/50 Linux to system obsługuj ˛acypami˛ec´ wirtualn ˛a I Przestrzen´ adresowa podzielona na: I przestrzen´ u˙zytkownika (user space) - high memory I i przestrzen´ j ˛adra (kernel space) - low memory GUT – Intel 2015/16 7/50 Linux to system obsługuj ˛acypami˛ec´ wirtualn ˛a I Segment danych (Data) - globalne i statyczne zmienne o zdefiniowanych, niezmiennych wartosciach;´ wszelkie zmienne nie zdefiniowane wewn ˛atrzfunkcji I Segment BSS - globalne i statyczne zmienne o wartosci´ 0 lub bez wartosci´ I Sterta (Heap) - zmienne dynamiczne - przestrzen´ zarz ˛adzana poleceniami malloc, calloc, realloc, free; obszar współdzielony przez wszystkie w ˛atki,biblioteki współdzielone, moduły I Stos (Stack) - zmienne lokalne; jesli´ wska´znikstosu zrówna si˛e ze wska´znikiemsterty, to oznacza koniec wolnej pami˛eci. GUT – Intel 2015/16 8/50 Linux to system obsługuj ˛acypami˛ec´ wirtualn ˛a I Podział mi˛edzyprzestrzen´ uzytkownika˙ a przestrzen´ jadra: parametr PAGE_OFFSET konfiguracji j ˛adra I typowy system 32-bitowy: PAGE_OFFSET = 0xc0000000, nizsze˙ 3GB - przestrzen´ uzytkownika˙ I przestrzen´ uzytkownika˙ alokowana dla procesu I przestrzen´ j ˛adra - wspólna dla wszystkich procesów. GUT – Intel 2015/16 9/50 Jednostka zarz ˛adzaniapami˛eci˛a I Memory Management Unit (MMU) - zestaw układów realizuj ˛acychdost˛epdo pami˛ecifizycznej z˙ ˛adanejprzez CPU. Do ich zadan´ nalezy:˙ I translacja pami˛eciwirtualnej do pami˛ecifizycznej, I ochrona pami˛eci, I obsługa pami˛ecipodr˛ecznej, I zarz ˛adzanieszynami danych I przeł ˛aczaniebanków pami˛eci(w systemach 8-bitowych) I Układy MMU dziel ˛aprzestrzen´ logiczn ˛a(wirtualn ˛a)pami˛ecina strony, których rozmiar wynosi 2N (kilka kilobajtów). I Page frame number (PFN) - indeks strony pami˛eci(N starszych bitów adresu pami˛eci).Młodsze M bitów adresu (przesuni˛ecie) pozostaje niezmienione GUT – Intel 2015/16 10/50 MMU - Translation lookaside buffer I Translation lookaside buffer (TLB) - asocjacyjna pami˛ec´ podr˛ecznasłuz˙ ˛acado mapowania adresów wirtualnych/fizycznych I Jezeli˙ TLB nie posiada odpowiedniego przypisania (rozmiar pami˛eciTLB jest ograniczony) uruchamiane s ˛awolniejsze, sprz˛etowe mechanizmy procesora przeszukuj ˛acestruktury danych znajduj ˛acesi˛ew pami˛eci,co wymaga niekiedy pomocy ze strony oprogramowania (systemu operacyjnego). I Pozycje w tych strukturach nazywaj ˛asi˛ewpisami (elementami) tablicy stron pami˛eci(PTEs - ang. page table entries), a cała struktura nazywana jest tablic ˛astron pami˛eci(ang. page table). I Kompletny adres w pami˛ecifizycznej jest ustalany poprzez dodanie bitów przesuni˛eciado przetłumaczonego numeru strony. GUT – Intel 2015/16 11/50 MMU - Translation lookaside buffer I Jezeli˙ w strukturach TLB lub PTE nie znajduje si˛eodpowiedni opis aktualnie wykorzystywanej logicznej strony pami˛ecilub znajduj ˛acysi˛ewpis zabrania dost˛epuw danym trybie, MMU sygnalizuje CPU wyj ˛atekzwi ˛azany z bł˛ednym dost˛epemdo strony pami˛eci,tzw. page fault. GUT – Intel 2015/16 12/50 Mapowanie pami˛eciwirtualnej Ka˙zdastrona pami˛eciwirtualnej mo˙zebyc:´ I niemapowana (próba dost˛epu-> SIGSEGV -> komunikat segmentation fault) I mapowana na stron˛efizyczn ˛apami˛eci prywatn ˛adla procesu I mapowana na stron˛efizyczn ˛apami˛eci współdzielon ˛az innym procesem I mapowana na stron˛efizyczn ˛apami˛eci u˙zywan˛aprzez j ˛adro I j ˛adromoze˙ dodatkowo mapowac´ strony p. wirtualnej na obszary pami˛ecizarezerwowane dla np. buforów pami˛ecisterowników GUT – Intel 2015/16 13/50 Mapowanie pami˛eciwirtualnej - COW I mapowanie i współdzielenie z flag ˛a “copy on write” I gdy istnieje potrzeba współdzielenia duzej˙ ilosci´ danych I zamiast rzeczywistego, kosztownego kopiowania pami˛ecizwracany jest wska´znikdo oryginalnych danych; kopiowanie jest wykonywane dopiero, gdy zachodzi potrzeba ich modyfikacji. I Przykład: funkcja fork() I tworzy proces potomny, który posiada dokładn ˛akopi˛ekontekstu procesu nadrz˛ednego, jak równiez˙ kopi˛ejego pami˛eci(mapowan ˛a na t ˛asam ˛aprzestrzen´ fizyczn ˛a). I strony pami˛eci,które mog ˛abyc´ zmodyfikowane zarówno przez proces jak i jego potomka, otrzymuj ˛aznacznik "kopiowane przy zapisie". I gdy jeden z procesów modyfikuje pami˛ec,´ kernel przechwytuje to wywołanie i kopiuje modyfikowane strony tak, aby zmiany dokonane przez jeden proces były niewidoczne dla drugiego. Od tej chwili proces nadrz˛edny i potomny zaczynaj ˛aodwoływac´ si˛edo GUT – Intelfizycznie 2015/16 róznych˙ stron. 14/50 Mapowanie pami˛eciwirtualnej - COW GUT – Intel 2015/16 15/50 Zalety pami˛eciwirtualnej I niepoprawne odwołania do pami˛eciprzechwytywane i sygnalizowane przez SIGSEGV I proces działa we własnej przestrzeni pamieci, odizolowany od innych procesów I efektywne wykorzystanie pami˛eci- współdzielenie cz˛esci´ kodu i danych ( np. bibliotek) I mozliwo˙ s´c´ zwi˛ekszenia ilosci´ pami˛ecipoprzez pliki wymiany (swap files) GUT – Intel 2015/16 16/50 Wady pami˛eciwirtualnej I trudno ustalic,´ jakie jest aktualne zuzycie˙ pami˛eci(przez aplikacj˛e) I “overcommit” - praktyka przedzielania pami˛eciwirtualnej bez gwarancji, ze˙ dost˛epnajest odpowiednia ilos´c´ pami˛ecifizycznej I opó´znieniawprowadzane przez zarz ˛adzaniepami˛eci˛ai obsług˛e wyj ˛atków(page faults) GUT – Intel 2015/16 17/50 Przestrzen´ pami˛ecij ˛adra Przestrzen´ pamieci j ˛adra - kazdej˙ alokacji w przestrzeni wirtualnej odpowiada przestrzen´ fizyczna! Obszary: I j ˛adro (kod i dane załadowane z obrazu j ˛adra podczas rozruchu systemu) I segmenty: .text, .init, .data, .bss I pami˛ec´ dla sterowników urz ˛adzen´ I moduły j ˛adra GUT – Intel 2015/16 18/50 Przestrzen´ pami˛ecij ˛adra I pami˛ec´ alokowana przez void * kmalloc(size_t size, int flags) I slab allocator - efektywny mechanizm ci ˛agłej alokacji pami˛eci; I zachowanie kmalloc zalezy˙ od parametru flags: I GFP_KERNEL: zwykła alokacja, moze˙ zostac´ chwilowo wstrzymana (zwolniona) i ponownie zaalokowana. I GFP_ATOMIC: alokacja dokonana w całosci´ lub wcale. I zwolnienie pami˛eci: void kfree(const void *objp); I pami˛ec´ alokowana przez void *vmalloc(unsigned long size); I wi˛eksze obszary pami˛eciniz˙ kmalloc() (przestrzen´ adresowa wirtualna jest ci ˛agła,ale fizyczna - niekoniecznie) I odpowiednik malloc() w przestrzeni uzytkownika,˙ I zwolnienie pami˛eci: void vfree(const void *addr); I Przykład programu: http://www.roman10.net/2011/07/29/linux- kernel-programmingmemory-allocation/ GUT – Intel 2015/16 19/50 Ile pami˛eciuzywa˙ j ˛adro? I rozmiar obrazu j ˛adra I zwykle j ˛adrojest małe w porównaniu z całkowit ˛ailosci´ ˛apami˛eci I budowa małego j ˛adra ( Linux-tiny, Linux Kernel Tinification), projekt https://tiny.wiki.kernel.org/. GUT – Intel 2015/16 20/50 Ile pami˛eciuzywa˙ j ˛adro? Odczyt /proc/meminfo. Kernel memory usage to suma: I Slab: pami˛ec´ alokowana przez slab allocator I wi˛ecejinformacji - odczyt /proc/slabinfo I KernelStack: przestrzen´ adresowa stosu I PageTables: pami˛ec´ do przechowywania tablic stronicowania I VmallocUsed: pami˛ec´ alokowana przez vmalloc() I wi˛ecejinformacji - odczyt /proc/vmallocinfo GUT – Intel 2015/16 21/50 Ile pami˛eciuzywa˙ j ˛adro? GUT – Intel 2015/16 22/50 Ile pami˛eciuzywa˙ j ˛adro? lsmod - ile pami˛ecizuzywa˙ kod i dane modułów j ˛adra? GUT – Intel 2015/16 23/50 Obszar pami˛eciuzytkownika˙ I Stronicowanie na ˙z˛adanie(Demand-paging) - Linux mapuje strony pami˛ecifizycznej tylko wtedy, gdy program odwołuje si˛e do odpowiedniego miejsca w pami˛eciwirtualnej malloc(3) zwraca wska´znikdo pami˛eciwirtualnej operacja odczytu/zapisu przechwytywana przez
Recommended publications
  • Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES from the WINDOWS 10 VIRTUAL STORE 2
    white paper Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES FROM THE WINDOWS 10 VIRTUAL STORE 2 Abstract Windows 8.1 introduced memory compression in August 2013. By the end of 2013 Linux 3.11 and OS X Mavericks leveraged compressed memory as well. Disk I/O continues to be orders of magnitude slower than RAM, whereas reading and decompressing data in RAM is fast and highly parallelizable across the system’s CPU cores, yielding a significant performance increase. However, this came at the cost of increased complexity of process memory reconstruction and thus reduced the power of popular tools such as Volatility, Rekall, and Redline. In this document we introduce a method to retrieve compressed pages from the Windows 10 Memory Manager Virtual Store, thus providing forensics and auditing tools with a way to retrieve, examine, and reconstruct memory artifacts regardless of their storage location. Introduction Windows 10 moves pages between physical memory and the hard disk or the Store Manager’s virtual store when memory is constrained. Universal Windows Platform (UWP) applications leverage the Virtual Store any time they are suspended (as is the case when minimized). When a given page is no longer in the process’s working set, the corresponding Page Table Entry (PTE) is used by the OS to specify the storage location as well as additional data that allows it to start the retrieval process. In the case of a page file, the retrieval is straightforward because both the page file index and the location of the page within the page file can be directly retrieved.
    [Show full text]
  • Demand Paging from the OS Perspective
    HW 3 due 11/10 LLeecctuturree 1122:: DDeemmaanndd PPaaggiinngg CSE 120: Principles of Operating Systems Alex C. Snoeren MMeemmoorryy MMaannaaggeemmeenntt Last lecture on memory management: Goals of memory management ◆ To provide a convenient abstraction for programming ◆ To allocate scarce memory resources among competing processes to maximize performance with minimal overhead Mechanisms ◆ Physical and virtual addressing (1) ◆ Techniques: Partitioning, paging, segmentation (1) ◆ Page table management, TLBs, VM tricks (2) Policies ◆ Page replacement algorithms (3) 2 CSE 120 – Lecture 12 LLeecctuturree OOvveerrvviieeww Review paging and page replacement Survey page replacement algorithms Discuss local vs. global replacement Discuss thrashing 3 CSE 120 – Lecture 12 LLooccaalliityty All paging schemes depend on locality ◆ Processes reference pages in localized patterns Temporal locality ◆ Locations referenced recently likely to be referenced again Spatial locality ◆ Locations near recently referenced locations are likely to be referenced soon Although the cost of paging is high, if it is infrequent enough it is acceptable ◆ Processes usually exhibit both kinds of locality during their execution, making paging practical 4 CSE 120 – Lecture 12 DDeemmaanndd PPaaggiinngg ((OOSS)) Recall demand paging from the OS perspective: ◆ Pages are evicted to disk when memory is full ◆ Pages loaded from disk when referenced again ◆ References to evicted pages cause a TLB miss » PTE was invalid, causes fault ◆ OS allocates a page frame, reads
    [Show full text]
  • Virtual Memorymemory
    ChapterChapter 9:9: VirtualVirtual MemoryMemory ChapterChapter 9:9: VirtualVirtual MemoryMemory ■ Background ■ Demand Paging ■ Process Creation ■ Page Replacement ■ Allocation of Frames ■ Thrashing ■ Demand Segmentation ■ Operating System Examples Operating System Concepts 9.2 Silberschatz, Galvin and Gagne ©2009 BackgroundBackground ■ Virtual memory – separation of user logical memory from physical memory. ● Only part of a program needs to be in memory for execution (can really execute only one instruction at a time) Only have to load code that is needed Less I/O, so potential performance gain More programs in memory, so better resource allocation and throughput ● Logical address space can therefore be much larger than physical address space. Programs can be larger than memory Less management required by programmers ● Need to allow pages to be swapped in and out ■ Virtual memory can be implemented via: ● Demand paging (for both paging and swapping) ● Demand segmentation Operating System Concepts 9.3 Silberschatz, Galvin and Gagne ©2009 VirtualVirtual MemoryMemory ThatThat isis LargerLarger ThanThan PhysicalPhysical MemoryMemory Operating System Concepts 9.4 Silberschatz, Galvin and Gagne ©2009 AA Process’sProcess’s Virtual-addressVirtual-address SpaceSpace Operating System Concepts 9.5 Silberschatz, Galvin and Gagne ©2009 VirtualVirtual MemoryMemory hashas ManyMany UsesUses ■ In addition to separating logical from physical memory, virtual memory can enable processes to share memory ■ Provides following benefits: ● Can share system
    [Show full text]
  • Virtual Memory HW
    Administrivia • Lab 1 due Friday 12pm (noon) • We give will give short extensions to groups that run into trouble. But email us: - How much is done and left? - How much longer do you need? • Attend section Friday at 12:30pm to learn about lab 2. 1 / 37 Virtual memory • Came out of work in late 1960s by Peter Denning (lower right) - Established working set model - Led directly to virtual memory 2 / 37 Want processes to co-exist 0x9000 OS 0x7000 gcc 0x4000 bochs/pintos 0x3000 emacs 0x0000 • Consider multiprogramming on physical memory - What happens if pintos needs to expand? - If emacs needs more memory than is on the machine? - If pintos has an error and writes to address 0x7100? - When does gcc have to know it will run at 0x4000? - What if emacs isn’t using its memory? 3 / 37 Issues in sharing physical memory • Protection - A bug in one process can corrupt memory in another - Must somehow prevent process A from trashing B’s memory - Also prevent A from even observing B’s memory (ssh-agent) • Transparency - A process shouldn’t require particular physical memory bits - Yes processes often require large amounts of contiguous memory (for stack, large data structures, etc.) • Resource exhaustion - Programmers typically assume machine has “enough” memory - Sum of sizes of all processes often greater than physical memory 4 / 37 Virtual memory goals Is address No: to fault handler kernel legal? . virtual address Yes: phys. 0x30408 addr 0x92408 load MMU memory . • Give each program its own virtual address space - At runtime, Memory-Management Unit relocates each load/store - Application doesn’t see physical memory addresses • Also enforce protection - Prevent one app from messing with another’s memory • And allow programs to see more memory than exists - Somehow relocate some memory accesses to disk 5 / 37 Virtual memory goals Is address No: to fault handler kernel legal? .
    [Show full text]
  • Caching and Demand-‐Paged Virtual Memory
    Caching and Demand-Paged Virtual Memory Definions • Cache – Copy of data that is faster to access than the original – Hit: if cache has copy – Miss: if cache does not have copy • Cache block – Unit of cache storage (mul8ple memory locaons) • Temporal locality – Programs tend to reference the same memory locaons mulple mes – Example: instruc8ons in a loop • Spaal locality – Programs tend to reference nearby locaons – Example: data in a loop Cache Concept (Read) Cache Fetch Fetch Address Address In Address No Cache? Yes Store Value in Cache Cache Concept (Write) Cache Store Value Store Value Fetch at Address at Address Address In Address No Cache? WriteBuffer Write through: changes sent Yes immediately to next level of Store Value storage in Cache Store Value at Address Write back: changes stored If Write Through in cache un8l cache block is replaced Memory Hierarchy i7 has 8MB as shared 3rd level cache; 2nd level cache is per-core Main Points • Can we provide the illusion of near infinite memory in limited physical memory? – Demand-paged virtual memory – Memory-mapped files • How do we choose which page to replace? – FIFO, MIN, LRU, LFU, Clock • What types of workloads does caching work for, and how well? – Spaal/temporal locality vs. Zipf workloads Hardware address translaon is a power tool • Kernel trap on read/write to selected addresses – Copy on write – Fill on reference – Zero on use – Demand paged virtual memory – Memory mapped files – Modified bit emulaon – Use bit emulaon Demand Paging (Before) Page Table Physical Memory Disk Page Frames Frame Access Page A Page B Virtual Page B Frame for B Invalid Page A Virtual Page A Frame for A R/W Demand Paging (A]er) Page Table Physical Memory Disk Page Frames Frame Access Page A Page B Virtual Page B Frame for B R/W Page B Virtual Page A Frame for A Invalid Demand Paging on MIPS 1.
    [Show full text]
  • Demand-‐Paged Virtual Memory
    Demand-Paged Virtual Memory Main Points • Can we provide the illusion of near infinite memory in limited physical memory? – Demand-paged virtual memory – Memory-mapped files • How do we choose which page to replace? – FIFO, MIN, LRU, LFU, Clock • What types of workloads does caching work for, and how well? – Spaal/temporal locality vs. Zipf workloads Hardware address translaon is a power tool • Kernel trap on read/write to selected addresses – Copy on write – Fill on reference – Zero on use – Demand paged virtual memory – Memory mapped files – Modified bit emulaon – Use bit emulaon Demand Paging • Illusion of (nearly) infinite memory, available to every process • MulJplex virtual pages onto a limited amount of physical page frames • Pages can be either – resident (in physical memory, valid page table entry) – non-resident (on disk, invalid page table entry) • On reference to non-resident page, copy into memory, replacing some resident page – From the same process, or a different process Demand Paging (Before) Page Table Physical Memory Disk Page Frames Frame Access Page A Page B Virtual Page B Frame for B Invalid Page A Virtual Page A Frame for A R/W Demand Paging (AVer) Page Table Physical Memory Disk Page Frames Frame Access Page A Page B Virtual Page B Frame for B R/W Page B Virtual Page A Frame for A Invalid Demand Paging QuesJons • How does the kernel provide the illusion that all pages are resident? • Where are non-resident pages stored on disk? • How do we find a free page frame? • Which pages have been modified (must be wriXen back to disk) or acJvely used (shouldn’t be evicted)? • Are modified/use bits virtual or physical? • What policy should we use for choosing which page to evict? Demand Paging 1.
    [Show full text]
  • Last Class: Paging & Segmentation
    Last Class: Paging & Segmentation • Paging: divide memory into fixed-sized pages, map to frames (OS view of memory) • Segmentation: divide process into logical ‘segments’ (compiler view of memory • Combine paging and segmentation by paging individual segments Lecture 13, page 1 Computer Science CS377: Operating Systems Today: Demand Paged Virtual Memory • Up to now, the virtual address space of a process fit in memory, and we assumed it was all in memory. • OS illusions 1. treat disk (or other backing store) as a much larger, but much slower main memory 2. analogous to the way in which main memory is a much larger, but much slower, cache or set of registers • The illusion of an infinite virtual memory enables 1. a process to be larger than physical memory, and 2. a process to execute even if all of the process is not in memory 3. Allow more processes than fit in memory to run concurrently. Lecture 13, page 2 Computer Science CS377: Operating Systems Demand Paged Virtual Memory • Demand Paging uses a memory as a cache for the disk • The page table (memory map) indicates if the page is on disk or memory using a valid bit • Once a page is brought from disk into memory, the OS updates the page table and the valid bit • For efficiency reasons, memory accesses must reference pages that are in memory the vast majority of the time – Else the effective memory access time will approach that of the disk • Key Idea: Locality---the working set size of a process must fit in memory, and must stay there.
    [Show full text]
  • Optimizing the TLB Shootdown Algorithm with Page Access Tracking
    Optimizing the TLB Shootdown Algorithm with Page Access Tracking Nadav Amit, VMware Research https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit This paper is included in the Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC ’17). July 12–14, 2017 • Santa Clara, CA, USA ISBN 978-1-931971-38-6 Open access to the Proceedings of the 2017 USENIX Annual Technical Conference is sponsored by USENIX. Optimizing the TLB Shootdown Algorithm with Page Access Tracking Nadav Amit VMware Research Abstract their TLBs according to the information supplied by The operating system is tasked with maintaining the the initiator core, and they report back when they coherency of per-core TLBs, necessitating costly syn- are done. TLB shootdown can take microseconds, chronization operations, notably to invalidate stale causing a notable slowdown [48]. Performing TLB mappings. As core-counts increase, the overhead of shootdown in hardware, as certain CPUs do, is faster TLB synchronization likewise increases and hinders but still incurs considerable overheads [22]. scalability, whereas existing software optimizations In addition to reducing performance, shootdown that attempt to alleviate the problem (like batching) overheads can negatively affect the way applications are lacking. are constructed. Notably, to avoid shootdown la- We address this problem by revising the TLB tency, programmers are advised against using mem- synchronization subsystem. We introduce several ory mappings, against unmapping them, and even techniques that detect cases whereby soon-to-be against building multithreaded applications [28, 42]. invalidated mappings are cached by only one TLB But memory mappings are the efficient way to use or not cached at all, allowing us to entirely avoid persistent memory [18, 47], and avoiding unmap- the cost of synchronization.
    [Show full text]
  • Efficient High Frequency Checkpointing for Recovery and Debugging Vogt, D
    VU Research Portal Efficient High Frequency Checkpointing for Recovery and Debugging Vogt, D. 2019 document version Publisher's PDF, also known as Version of record Link to publication in VU Research Portal citation for published version (APA) Vogt, D. (2019). Efficient High Frequency Checkpointing for Recovery and Debugging. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. E-mail address: [email protected] Download date: 30. Sep. 2021 Efficient High Frequency Checkpointing for Recovery and Debugging Ph.D. Thesis Dirk Vogt Vrije Universiteit Amsterdam, 2019 This research was funded by the European Research Council under the ERC Advanced Grant 227874. Copyright © 2019 by Dirk Vogt ISBN 978-94-028-1388-3 Printed by Ipskamp Printing BV VRIJE UNIVERSITEIT Efficient High Frequency Checkpointing for Recovery and Debugging ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr.
    [Show full text]
  • Virtual Memory and Demand Paging
    UNIVERSITY OF CINCINNATI _____________ , 20 _____ I,______________________________________________, hereby submit this as part of the requirements for the degree of: ________________________________________________ in: ________________________________________________ It is entitled: ________________________________________________ ________________________________________________ ________________________________________________ ________________________________________________ Approved by: ________________________ ________________________ ________________________ ________________________ ________________________ A STUDY OF SWAP CACHE BASED PREFETCHING TO IMPROVE VIRTUAL MEMORY PERFORMANCE A thesis submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in the Department of Electrical and Computer Engineering and Computer Science of the College of Engineering 2002 by Udaykumar Kunapuli Bachelor of Engineering, Osmania University, Hyderabad, India, 1998 Committee Chair: Dr. Yiming Hu Abstract With dramatic increase in processor speeds over the last decade, disk latency has become a critical issue in computer systems performance. Disks, being mechanical devices, are orders of magnitude slower than the processor or physical memory. Most Virtual Memory(VM) systems use disk as secondary storage for idle data pages of an application. The working set of pages is kept in memory. When a page requested by the processor is not present in memory, it results in a page fault. On a page fault, the Operating System brings the requested page from the disk into memory. Thus the performance of Virtual Memory systems depends on disk performance. In this project, we aim to reduce the effect of disks on Virtual Memory performance compared to the traditional demand paging system. We study novel techniques of page grouping and prefetching to improve Virtual Memory system performance. In our system, we group pages according to their access times.
    [Show full text]
  • The Effect of Compression on Performance in a Demand Paging
    The Journal of Systems and Software 50 (2000) 151±170 www.elsevier.com/locate/jss The eect of compression on performance in a demand paging operating system Allen Wynn, Jie Wu * Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA Received 21 October 1997; accepted 13 April 1998 Abstract As engineers increase microprocessor speed, many traditionally computer-bound tasks are being transformed to input/output (I/O) bound tasks. Where processor performance had once been the primary bottleneck, I/O performance is now the primary in- hibitor to faster execution. As the performance gap widens between processor and I/O, it is becoming more important to improve the eciency of I/O in order to improve overall system performance. In a demand paging operating system, secondary memory is accessed during program execution when a page fault occurs. To improve program execution, it is increasingly important to decrease the amount of time required to process a page fault. This paper describes a compression format which is suitable for both pages of code and pages of data. We ®nd that when the OS/2 operating system is modi®ed to use this compression format, the time saved due to reduced I/O osets the additional time required to decompress the page. Ó 2000 Elsevier Science Inc. All rights reserved. Keywords: Demand paging; Compression; Operating system; Page-replacement policy 1. Introduction 1.1. Motivation As microprocessors increase in speed many traditional compute-constrained tasks are transformed into Input/ Output (I/O) bound task. Access to secondary memory or direct access storage devices (DASD) is quickly becoming the primary bottleneck for many computer Systems.
    [Show full text]
  • 15213 Lecture 17: Virtual Memory Concepts
    15213 Lecture 17: Virtual Memory Concepts Learning Objectives • Understand the distinction between virtual and physical addresses, and which are visible to processes. • Describe page faults and give at least two situations where one might occur. • Compare and contrast pages with cache lines. • Identify what the OS does to programs that perform illegal memory accesses. Getting Started The directions for today’s activity are on this sheet, but refer to accompanying programs that you’ll need to download. To get set up, run these commands on a shark machine: 1. $ wget http://www.cs.cmu.edu/~213/activities/lec17.tar 2. $ tar xf lec17.tar 3. $ cd lec17 1 Memory Addresses: A Lie Examine the addrs.c file using your editor of choice or the cat command. Notice that the program prints example addresses from a number of its sections, then forks a child process that does the same. Once you’re comfortable with the program, build ($ make addrs) and run ($ ./addrs) it. 1. What do you notice about the addresses printed by the two processes? The addresses printed by the two processes are the same. 2. Do you think the processes share the same memory? Explain why this either must be or cannot be the case. The matching addresses suggest they would. But wait: that can’t be true, because we know that they have different heaps (otherwise, this program would double-free() its allocation) and stacks (otherwise, they’d run into problems when they tried to call different functions at different times)! 3. Now consider the large.c program, which performs a number of 1-GB memory allocations.
    [Show full text]