Process Address Spaces and Binary Formats
Total Page:16
File Type:pdf, Size:1020Kb
2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Logical Diagram Binary Memory Process Address Spaces Threads Formats AllocatorsToday’s and Binary Formats Lecture User System Calls Kernel Don Porter RCU File System Networking Sync Memory Device CPU Management Drivers Scheduler Hardware Interrupts Disk Net Consistency 1 2 1 2 COMP 790: OS Implementation COMP 790: OS Implementation Review Definitions (can vary) • We’ve seen how paging and segmentation work on • Process is a virtual address space x86 – 1+ threads of execution work within this address space – Maps logical addresses to physical pages • A process is composed of: – These are the low-level hardware tools – Memory-mapped files • This lecture: build up to higher-level abstractions • Includes program binary • Namely, the process address space – Anonymous pages: no file backing • When the process exits, their contents go away 3 4 3 4 COMP 790: OS Implementation COMP 790: OS Implementation Address Space Layout Simple Example • Determined (mostly) by the application • Determined at compile time Virtual Address Space – Link directives can influence this hello heap stk libc.so • See kern/kernel.ld in JOS; specifies kernel starting address • OS usually reserves part of the address space to map 0 0xffffffff itself • “Hello world” binary specified load address – Upper GB on x86 Linux • Also specifies where it wants libc • Application can dynamically request new mappings from the OS, or delete mappings • Dynamically asks kernel for “anonymous” pages for its heap and stack 5 6 5 6 1 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation In practice Problem 1: How to represent in the kernel? • You can see (part of) the requested memory layout • What is the best way to represent the components of of a program using ldd: a process? $ ldd /usr/bin/git – Common question: is mapped at address x? linux-vdso.so.1 => (0x00007fff197be000) • Page faults, new memory mappings, etc. libz.so.1 => /lib/libz.so.1 (0x00007f31b9d4e000) • Hint: a 64-bit address space is seriously huge libpthread.so.0 => /lib/libpthread.so.0 (0x00007f31b9b31000) • Hint: some programs (like databases) map tons of libc.so.6 => /lib/libc.so.6 (0x00007f31b97ac000) data /lib64/ld-linux-x86-64.so.2 (0x00007f31b9f86000) – Others map very little • No one size fits all 7 8 7 8 COMP 790: OS Implementation COMP 790: OS Implementation Sparse representation Linux: vm_area_struct • Naïve approach might make a big array of pages • Linux represents portions of a process with a – Mark empty space as unused vm_area_struct, or vma – But this wastes OS memory • Includes: • Better idea: only allocate nodes in a data structure – Start address (virtual) for memory that is mapped to something – End address (first address after vma) – why? – Kernel data structure memory use proportional to • Memory regions are page aligned complexity of address space! – Protection (read, write, execute, etc) – implication? • Different page protections means new vma – Pointer to file (if one) – Other bookkeeping 9 10 9 10 COMP 790: OS Implementation COMP 790: OS Implementation Simple list representation Simple list • Linear traversal – O(n) 0 Process Address Space 0xffffffff – Shouldn’t we use a data structure with the smallest O? • Practical system building question: start end – What is the common case? next – Is it past the asymptotic crossover point? vma vma vma • If tree traversal is O(log n), but adds bookkeeping /bin/ls anon libc.so (data) overhead, which makes sense for: – 10 vmas: log 10 =~ 3; 10/2 = 5; Comparable either way – 100 vmas: log 100 starts making sense mm_struct (process) 11 12 11 12 2 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Common cases Red-black trees • Many programs are simple • (Roughly) balanced tree – Only load a few libraries • Read the wikipedia article if you aren’t familiar with – Small amount of data them • Some programs are large and complicated • Popular in real systems – Databases – Asymptotic average == worst case behavior • Linux splits the difference and uses both a list and a • Insertion, deletion, search: log n red-black tree • Traversal: n 13 14 13 14 COMP 790: OS Implementation COMP 790: OS Implementation Optimizations Memory mapping recap • Using an RB-tree gets us logarithmic search time • VM Area structure tracks regions that are mapped • Other suggestions? – Efficiently represent a sparse address space • Locality: If I just accessed region x, there is a – On both a list and an RB-tree • reasonably good chance I’ll access it again Fast linear traversal • Efficient lookup in a large address space – Linux caches a pointer in each process to the last vma – Cache last lookup to exploit temporal locality looked up – Source code (mm/mmap.c) claims 35% hit rate 15 16 15 16 COMP 790: OS Implementation COMP 790: OS Implementation Linux APIs Example 1: • mmap(void *addr, size_t length, int prot, int flags, • Let’s map a 1 page (4k) anonymous region for data, int fd, off_t offset); read-write at address 0x40000 • munmap(void *addr, size_t length); • mmap(0x40000, 4096, PROT_READ|PROT_WRITE, MAP_ANONYMOUS, -1, 0); • How to create an anonymous mapping? – Why wouldn’t we want exec permission? • What if you don’t care where a memory region goes (as long as it doesn’t clobber something else)? 17 18 17 18 3 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Insert at 0x40000 Scenario 2 • What if there is something already mapped there 0x1000-0x4000 0x20000-0x21000 0x100000-0x10f000 with read-only permission? – Case 1: Last page overlaps – Case 2: First page overlaps – Case 3: Our target is in the middle 1) Is anything already mapped at 0x40000-0x41000? 2) If not, create a new vma and insert it 3) Recall: pages will be allocated on demand mm_struct (process) 19 20 19 20 COMP 790: OS Implementation COMP 790: OS Implementation Case 1: Insert at 0x40000 Case 3: Insert at 0x40000 0x1000-0x4000 0x20000-0x41000 0x100000-0x10f000 0x1000-0x4000 0x20000-0x50000 0x100000-0x10f000 1) Is anything already mapped at 0x40000-0x41000? 1) Is anything already mapped at 0x40000-0x41000? 2) If at the end and different permissions: 2) If in the middle and different permissions: 1) Truncate previous vma 1) Split previous vma mm_struct mm_struct 2) Insert new vma 2) Insert new vma (process) (process) 3) If permissions are the same, one can replace pages and/or extend previous vma 21 22 21 22 COMP 790: OS Implementation COMP 790: OS Implementation Demand paging Unix fork() • Creating a memory mapping (vma) doesn’t • Recall: this function creates and starts a copy of the necessarily allocate physical memory or setup page process; identical except for the return value table entries • Example: – What mechanism do you use to tell when a page is needed? int pid = fork(); • It pays to be lazy! if (pid == 0) { – A program may never touch the memory it maps. // child code • Examples? – Program may not use all code in a library } else if (pid > 0) { – Save work compared to traversing up front // parent code – Hidden costs? Optimizations? • Page faults are expensive; heuristics could help performance } else // error 23 24 23 24 4 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Copy-On-Write (COW) How does COW work? • Naïve approach would march through address space • Memory regions: and copy each page – New copies of each vma are allocated for child during fork – Most processes immediately exec() a new binary – As are page tables without using any of these pages • Pages in memory: – Again, lazy is better! – In page table (and in-memory representation), clear write bit, set COW bit • Is the COW bit hardware specified? • No, OS uses one of the available bits in the PTE – Make a new, writeable copy on a write fault 25 26 25 26 COMP 790: OS Implementation COMP 790: OS Implementation New Topic: Stacks Idiosyncrasy 1: Stacks Grow Down • In Linux/Unix, as you add frames to a stack, they actually decrease in virtual address order • Example: Stack “bottom” – 0x13000 main() 0x12600 foo() 0x12300 bar() 0x11900 Exceeds stack OS allocates a page new page 27 28 27 28 COMP 790: OS Implementation COMP 790: OS Implementation Problem 1: Expansion Feed 2 Birds with 1 Scone • Recall: OS is free to allocate any free page in the • Unix has been around longer than paging virtual address space if user doesn’t specify an – Remember data segment abstraction? address – Unix solution: • What if the OS allocates the page below the “top” of Grows Grows the stack? Heap Stack – You can’t grow the stack any further Data Segment – Out of memory fault with plenty of memory spare • OS must reserve stack portion of address space • Stack and heap meet in the middle – Fortunate that memory areas are demand paged – Out of memory when they meet 29 30 29 30 5 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation But now we have paging Windows Comparison • Unix and Linux still have a data segment abstraction • LPVOID VirtualAllocEx(__in HANDLE hProcess, – Even though they use flat data segmentation! __in_opt LPVOID lpAddress, • sys_brk() adjusts the endpoint of the heap __in SIZE_T dwSize, – Still used by many memory allocators today __in DWORD flAllocationType, __in DWORD flProtect); • Library function applications program to – Provided by ntdll.dll – the rough equivalent of Unix libc – Implemented with an undocumented system call 31 32 31 32 COMP 790: OS Implementation COMP 790: OS Implementation Windows Comparison Windows Comparison • LPVOID VirtualAllocEx(__in HANDLE hProcess, • LPVOID VirtualAllocEx(__in HANDLE hProcess,