X86 Memory Protection and Translation

2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Logical Diagram Binary Memory x86 Memory Protection and Threads Formats Allocators Translation User System Calls Kernel Don Porter RCU File System Networking Sync Memory Device CPU Today’s Management Drivers Scheduler Lecture Hardware Interrupts Disk Net Consistency 1 Today’s Lecture: Focus on Hardware ABI 2 1 2 COMP 790: OS Implementation COMP 790: OS Implementation Lecture Goal Undergrad Review • Understand the hardware tools available on a • What is: modern x86 processor for manipulating and – Virtual memory? protecting memory – Segmentation? • Lab 2: You will program this hardware – Paging? • Apologies: Material can be a bit dry, but important – Plus, slides will be good reference • But, cool tech tricks: – How does thread-local storage (TLS) work? – An actual (and tough) Microsoft interview question 3 4 3 4 COMP 790: OS Implementation COMP 790: OS Implementation Memory Mapping Two System Goals 1) Provide an abstraction of contiguous, isolated virtual Process 1 Process 2 memory to a program Virtual Memory Virtual Memory 2) Prevent illegal operations // Program expects (*x) – Prevent access to other application or OS memory 0x1000 Only one physical 0x1000 address 0x1000!! // to always be at – Detect failures early (e.g., segfault on address 0) // address 0x1000 – More recently, prevent exploits that try to execute int *x = 0x1000; program data 0x1000 Physical Memory 5 6 5 6 1 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Outline x86 Processor Modes • x86 processor modes • Real mode – walks and talks like a really old x86 chip • x86 segmentation – State at boot • x86 page tables – 20-bit address space, direct physical memory access • 1 MB of usable memory • Advanced Features – Segmentation available (no paging) • Interesting applications/problems • Protected mode – Standard 32-bit x86 mode – Segmentation and paging – Privilege levels (separate user and kernel) 7 8 7 8 COMP 790: OS Implementation COMP 790: OS Implementation x86 Processor Modes Translation Overview • Long mode – 64-bit mode (aka amd64, x86_64, etc.) – Very similar to 32-bit mode (protected mode), but bigger 0xdeadbeef Segmentation 0x0eadbeef Paging 0x6eadbeef – Restrict segmentation use – Garbage collect deprecated instructions Virtual Address Linear Address Physical Address • Chips can still run in protected mode with old instructions • Even more obscure modes we won’t discuss today Protected/Long mode only • Segmentation cannot be disabled! – But can be a no-op (aka flat mode) 9 10 9 10 COMP 790: OS Implementation COMP 790: OS Implementation x86 Segmentation Programming model • A segment has: • Segments for: code, data, stack, “extra” – Base address (linear address) – A program can have up to 6 total segments – Length – Segments identified by registers: cs, ds, ss, es, fs, gs – Permissions • Prefix all memory accesses with desired segment: – mov eax, ds:0x80 (load offset 0x80 from data into eax) – jmp cs:0xab8 (jump execution to code offset 0xab8) – mov ss:0x40, ecx (move ecx to stack offset 0x40) 11 12 11 12 2 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Segmented Programming Pseudo-example Programming, cont. • This is cumbersome, so infer code, data and stack // global int x = 1 ds:x = 1; // data segments by instruction type: int y; // stack ss:y; // stack – Control-flow instructions use code segment (jump, call) if (x) { if (ds:x) { – Stack management (push/pop) uses stack y = 1; ss:y = 1; – Most loads/stores use data segment printf (“Boo”); cs:printf (ds:“Boo”); } else } else • Extra segments (es, fs, gs) must be used explicitly y = 0; ss:y = 0; Segments would be used in assembly, not C 13 14 13 14 COMP 790: OS Implementation COMP 790: OS Implementation Segment management Segment registers • For safety (without paging), only the OS should define segments. Why? Global or Local Table Index (13 bits) Ring (2 bits) • Two segment tables the OS creates in memory: Table? (1 bit) – Global – any process can use these segments – Local – segment definitions for a specific process • Set by the OS on fork, context switch, etc. • How does the hardware know where they are? – Dedicated registers: gdtr and ldtr – Privileged instructions: lgdt, lldt 15 16 15 16 COMP 790: OS Implementation COMP 790: OS Implementation Segments Illustrated Sample Problem: Low 3 bits 0 th Index 1 (4 bit) cs: 0x8 ds: 0xf (Old) jOS Bootloader • Suppose my kernel is compiled to be in upper 256 MB of a 32-bit address space (i.e., 0xf0100000) 0, 0x123000, 0x423000, gdtr … 0B 1MB 1MB – Common to put OS kernel at top of address space • Bootloader starts in real mode (only 1MB of addressable physical memory) • Bootloader loads kernel at 0x00100000 call cs:0xf150 0x123000 + 0xf150 – Can’t address 0xf0100000 = 0x123150 17 18 17 18 3 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Booting problem Segmentation to the Rescue! • Kernel needs to set up and manage its own page • kern/entry.S: tables – What is this code doing? – Paging can translate 0xf0100000 to 0x00100000 mygdt: • But what to do between the bootloader and kernel SEG_NULL # null seg code that sets up paging? SEG(STA_X|STA_R, -KERNBASE, 0xffffffff) # code seg SEG(STA_W, -KERNBASE, 0xffffffff) # data seg 19 20 19 20 COMP 790: OS Implementation COMP 790: OS Implementation JOS ex 1, cont. Flat segmentation SEG(STA_X|STA_R, -KERNBASE, 0xffffffff) # code seg • The above trick is used for booting. We eventually Execute and want to use paging. Offset Segment Length Read -0xf0000000 (4 GB) • How can we make segmentation a no-op? permission • From kern/pmap.c: // 0x8 - kernel code segment jmp 0xf01000db8 # virtual addr. (implicit cs seg) [GD_KT >> 3] = SEG(STA_X | STA_R, 0x0, 0xffffffff, 0), Execute and Offset Segment Length Read Ring 0 jmp (0xf01000db8 + -0xf0000000) 0x00000000 (4 GB) permission jmp 0x001000db8 # linear addr. 21 22 21 22 COMP 790: OS Implementation COMP 790: OS Implementation Outline Paging Model • x86 processor modes • 32 (or 64) bit address space. • x86 segmentation • Arbitrary mapping of linear to physical pages • x86 page tables • Pages are most commonly 4 KB • Advanced Features – Newer processors also support page sizes of 2 MB and 1 GB • Interesting applications/problems 23 24 23 24 4 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation How it works Translation Overview • OS creates a page table – Any old page with entries formatted properly – Hardware interprets entries • cr3 register points to the current page table – Only ring0 can change cr3 25 From Intel 80386 Reference Programmer’s Manual 26 25 26 COMP 790: OS Implementation COMP 790: OS Implementation Example Page Table Entries 0xf1084150 0x3b4 0x84 0x150 cr3 Physical Address Upper (20 bits) Flags (12 bits) Page Dir Offset Page Table Offset Physical Page Offset (Top 10 addr bits: 0x00384 PTE_W|PTE_P|PTE_U (Next 10 addr bits) (Low 12 addr bits) 0xf10 >> 2) 0 0 cr3 0x28370 PTE_W|PTE_P 0 0 0 0 0 0 0 0 Entry at cr3+0x3b4 * 0 0 sizeof(PTE) Entry at 0x84 * Data we want at sizeof(PTE) offset 0x150 27 28 27 28 COMP 790: OS Implementation COMP 790: OS Implementation Page Table Entries Page flags • Top 20 bits are the physical address of the mapped • 3 for OS to use however it likes page • 4 reserved by Intel, just in case – Why 20 bits? • 3 for OS to CPU metadata – 4k page size == 12 bits of offset – User/vs kernel page, • Lower 12 bits for flags – Write permission, – Present bit (so we can swap out pages) • 2 for CPU to OS metadata – Dirty (page was written), Accessed (page was read) 29 30 29 30 5 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Page Table Entries Back of the envelope • If a page is 4K and an entry is 4 bytes, how many User, writable, present entries per page? cr3 Physical Address – 1k Upper (20 bits) Flags (12 bits) 0x00384 No mapping PTE_W|PTE_P|PTE_U • How large of an address space can 1 page represent? 0 0 – 1k entries * 1page/entry * 4K/page = 4MB 0x28370 PTE_W|PTE_P|PTE_DIRTY • How large can we get with a second level of … … translation? Writeable, kernel-only, present, and dirty – 1k tables/dir * 1k entries/table * 4k/page = 4 GB (Dirty set by CPU on write) – Nice that it works out that way! 31 32 31 32 COMP 790: OS Implementation COMP 790: OS Implementation Challenge questions TLB Entries • What is the space overhead of paging? • The CPU caches address translations in the TLB – I.e., how much memory goes to page tables for a 4 GB – Translation Lookaside Buffer address space? • What is the optimal number of levels for a 64 bit page table? cr3 Virt Phys • When would you use a 2 MB or 1 GB page size? 0xf0231000 0x1000 0x00b31000 0x1f000 0xb0002000 0xc1000 - - Page Traversal is Slow Table Lookup is Fast 33 34 33 34 COMP 790: OS Implementation COMP 790: OS Implementation TLB Entries TLB Entries • The CPU caches address translations in the TLB • The TLB is not coherent with memory, meaning: • Translation Lookaside BufferThe TLB is not coherent – If you cHange a PTE, you need to manually invalidate with memory, meaning: cacHed values – If you cHange a PTE, you need to manually invalidate – See the tlb_invalidate() function in jOS cacHed values – See the tlb_invalidate() function in jOS cr3 Virt Phys 0xf0231000 0x1000 0x00b31000 0x1f000 0xb0002000 0xc1000 - - Same No 35 Virt Addr. Change!!! 36 35 36 6 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Outline Physical Address Extension (PAE) • x86 processor modes • Period with 32-bit machines + >4GB RAM (2000’s) • x86 segmentation • Essentially, an early deployment

X86 Memory Protection and Translation

Virtual Memory

The Case for Compressed Caching in Virtual Memory Systems

Memory Protection at Option

Virtual Memory

Extracting Compressed Pages from the Windows 10 Virtual Store WHITE PAPER | EXTRACTING COMPRESSED PAGES from the WINDOWS 10 VIRTUAL STORE 2

Memory Protection File B [RWX] File D [RW] File F [R] OS GUEST LECTURE

LECTURE NOTE 5: the MEMORY MANAGEMENT REAL MODE Allows the Microprocessor to Address Only the First 1Mbyte of Memory Even If Its Pentium II Microprocessor

Paging: Smaller Tables

SYSCALL, IRET ● Opportunistic SYSRET Failed, Do IRET

The Pentium Processor

Intermediate X86 Part 2

Multiprocessing Contents