CS 351: Systems Programming
Total Page:16
File Type:pdf, Size:1020Kb
Virtual Memory CS 351: Systems Programming Michael Saelee <[email protected]> Computer Science Science registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) previously: SRAM ⇔ DRAM Computer Science Science registers cache (SRAM) main memory (DRAM) local hard disk drive (HDD/SSD) remote storage (networked drive / cloud) next: DRAM ⇔ HDD, SSD, etc. i.e., memory as a “cache” for disk Computer Science Science main goals: 1. maximize memory throughput 2. maximize memory utilization 3. provide address space consistency & memory protection to processes Computer Science Science throughput = # bytes per second - depends on access latencies (DRAM, HDD) and “hit rate” Computer Science Science utilization = fraction of allocated memory that contains “user” data (aka payload) - vs. metadata and other overhead required for memory management Computer Science Science address space consistency → provide a uniform “view” of memory to each process 0xffffffff Computer Science Science Kernel virtual memory Memory (code, data, heap, stack) invisible to 0xc0000000 user code User stack (created at runtime) %esp (stack pointer) Memory mapped region for shared libraries 0x40000000 brk Run-time heap (created by malloc) Read/write segment ( , ) .data .bss Loaded from the Read-only segment executable file (.init, .text, .rodata) 0x08048000 0 Unused address space consistency → provide a uniform “view” of memory to each process Computer Science Science memory protection → prevent processes from directly accessing each other’s address space Computer Science Science 0xffffffff 0xffffffff 0xffffffff Kernel virtual memory Memory Kernel virtual memory Memory Kernel virtual memory Memory (code, data, heap, stack) invisible to (code, data, heap, stack) invisible to (code, data, heap, stack) invisible to 0xc0000000 user code 0xc0000000 user code 0xc0000000 user code User stack User stack User stack (created at runtime) (created at runtime) (created at runtime) %esp (stack pointer) %esp (stack pointer) %esp (stack pointer) Memory mapped region for Memory mapped region for Memory mapped region for shared libraries shared libraries shared libraries 0x40000000 0x40000000 0x40000000 brk brk brk Run-time heap Run-time heap Run-time heap (created by malloc) (created by malloc) (created by malloc) Read/write segment Read/write segment Read/write segment ( , ) ( , ) ( , ) .data .bss Loaded from the .data .bss Loaded from the .data .bss Loaded from the Read-only segment executable file Read-only segment executable file Read-only segment executable file (.init, .text, .rodata) (.init, .text, .rodata) (.init, .text, .rodata) 0x08048000 0x08048000 0x08048000 0 Unused 0 Unused 0 Unused P0 P1 P2 memory protection → prevent processes from directly accessing each other’s address space Computer Science Science i.e., every process should be provided with a managed, virtualized address space Computer Science Science “memory addresses”: what are they, really? Computer Science Science Main Memory N CPU (note: cache not shown) address: N data “physical” address: (byte) index into DRAM Computer Science Science (gdb) set detach-on-fork off (gdb) break main Breakpoint 1 at 0x400508: file memtest.c, line 7. (gdb) run Breakpoint 1, main () at memtest.c:7 7 fork(); (gdb) next [New process 7450] 8 glob += 1; (gdb) print &glob int glob = 0xDEADBEEE; $1 = (int *) 0x6008d4 (gdb) next main() { 9 } parent fork(); (gdb) print /x glob glob += 1; $2 = 0xdeadbeef } (gdb) inferior 2 [Switching to inferior 2 [process 7450] #0 0x000000310acac49d in __libc_fork () 131 pid = ARCH_FORK (); (gdb) finish Run till exit from #0 in __libc_fork () 8 glob += 1; (gdb) print /x glob $4 = 0xdeadbeee (gdb) print &glob child $5 = (int *) 0x6008d4 Computer Science Science Main Memory N CPU address: N data instructions executed by the CPU do not refer directly to physical addresses! Computer Science Science processes reference virtual addresses, the CPU relays virtual address requests to the memory management unit (MMU), which are translated to physical addresses Computer Science Science Main Memory MMU physical address virtual address address CPU translation unit disk address (note: cache not shown) “swap” space Computer Science Science essential problem: translate request for a virtual address → physical address … this must be FAST, as every memory access from the CPU must be translated Computer Science Science both hardware/software are involved: - MMU (hw) handles simple and fast operations (e.g., table lookups) - Kernel (sw) handles complex tasks (e.g., eviction policy) Computer Science Science §Virtual Memory Implementations Computer Science Science keep in mind goals: 1. maximize memory throughput 2. maximize memory utilization 3. provide address space consistency & memory protection to processes Computer Science Science 1. simple relocation Main Memory N+B B P0 N 0 0 Computer Science Science 1. simple relocation Main Memory CPU MMU N B relocation reg. VA: N B PA: N+B data - per-process relocation address is loaded by kernel on every context switch Computer Science Science 1. simple relocation Main Memory CPU MMU N B relocation reg. VA: N B PA: N+B data - problem: processes may easily overextend their bounds and trample on each other Computer Science Science 1. simple relocation Main Memory MMU B+L limit reg. process CPU L sandbox N B relocation reg. B VA: N PA: N+B data assert (0 ≤ N ≤ L) - incorporate a limit register to provide memory protection Computer Science Science 1. simple relocation Main Memory MMU B+L limit reg. process CPU L sandbox N B relocation reg. B VA: N PA: N+B data assert (0 ≤ N ≤ L) - assertion failure triggers a fault, which summons kernel (which signals process) Computer Science Science pros: - simple & fast! - provides protection Virtual Memory stack Computer Science Science heap Virtual Memory Main Memory Main Memory stack stack data data heap heap code code vs. B data data code code B but: available memory for mapping depends on value of base address i.e., address spaces are not consistent! Computer Science Science virtual Main Memory address space stack L stack possibly unused! code B code 0 also: all of a process below the address limit must be loaded in memory i.e., memory may be vastly under-utilized Computer Science Science 2. segmentation - partition virtual address space into multiple logical segments - individually map them onto physical memory with relocation registers Computer Science Science Segmented Virtual Main Memory Address Space B2+L2 0 MMU Seg #3: Stack Segment Table B2 Base Limit 0 B0 L0 1 B1 L1 B3+L3 0 2 B2 L2 B3 Seg #2: Heap 3 B3 L3 B0+L0 B0 0 Seg #1: Data 0 B1+L1 Seg #0: Code B1 virtual address has form seg#:offset Computer Science Science MMU Segment Table Main Memory Base Limit B2+L2 0 B0 L0 1 B1 L1 2 B2 L2 B2 3 B3 L3 B3+L3 B3 CPU VA: seg#:offset PA: offset + B2 B0+L0 ⊕ B0 B1+L1 assert (offset ≤ L2) B1 data Computer Science Science Segment Table Base Limit 0 B0 L0 1 B1 L1 2 B2 L2 3 B3 L3 - implemented as MMU registers - part of kernel-maintained, per-process metadata (aka “process control block”) - re-populated on each context switch Computer Science Science pros: - still very fast - translation = register access & addition - memory protection via limits - segmented addresses improve consistency Computer Science Science virtual Main Memory address space simple stack L stack relocation: possibly unused! code B code 0 virtual Main Memory address space stack stack segmentation: 0 better! 0 code code Computer Science Science virtual Main Memory virtual address address space x space 2 stack stack 0 x stack 0 0 code code x 0 code - variable segment sizes → memory fragmentation - fragmentation potentially lowers utilization - can fix through compaction, but expensive! Computer Science Science 3. paging - partition virtual and physical address spaces into uniformly sized pages - virtual pages map onto physical pages Computer Science Science stack physical memory heap data code Computer Science Science stack physical memory heap data code - minimum mapping granularity = page - not all of a given segment need be mapped Computer Science Science modified mapping problem: - a virtual address is broken down into virtual page number & page offset - determine which physical page (if any) a given virtual page is loaded into - if physical page is found, use page offset to access data Computer Science Science Given page size = 2p bytes p VA: virtual page number virtual page offset p PA: physical page number physical page offset Computer Science Science VA: virtual page number virtual page offset address translation PA: physical page number physical page offset Computer Science Science n VA: virtual page number virtual page offset translation structure: page table valid PPN index 2n entries if invalid, page is not mapped PA: physical page number physical page offset Computer Science Science page table entries (PTEs) typically contain additional metadata, e.g.: - dirty (modified) bit - access bits (shared or kernel-owned pages may be read-only or inaccessible) Computer Science Science e.g., 32-bit virtual address, 4KB (212) page size, 4-byte PTE size; - size of page table? Computer Science Science e.g., 32-bit virtual address, 4KB (212) pages, 4-byte PTEs; - # pages = 232 ÷ 212 = 220 =1M - page table size = 1M × 4 bytes = 4MB Computer Science Science 4MB is much too large to fit in the MMU — insufficient registers and SRAM! Page table resides in main memory Computer Science Science The translation process (aka page table walk) is performed