Administrivia Confining Code with Legacy Oses Using Chroot Escaping Chroot System Call Interposition Limitations of Syscall Inte
Total Page:16
File Type:pdf, Size:1020Kb
Administrivia Confining code with legacy OSes Guest lecture Thursday • Often want to confine code on legacy OSes - Mark Lentczner (Google) on the Belay project • - Please attend lecture if at all possible Analogy: Firewalls • Last project due Thursday attacker • - No extensions unless all non-SCPD group members at lecture Hopelessly - If staff grants you extension, means only if you attend lecture Insecure attacker Server - We will have a more stringent enforcement mechanism Final Exam • - Your machine runs hopelessly insecure software - Wednesday March 16, 12:15-3:15pm - Can’t fix it—no source or too complicated - Open book, covers all 19 lectures - Can reason about network traffic (possibly including topics already on the midterm) Similarly block unrusted code within a machine Televised final review session Friday • • - By limiting what it can interact with - Bring questions on lecture material 1/37 2/37 Using chroot Escaping chroot chroot (char *dir) “changes root directory” Re-chroot to a lower directory, then chroot .. • • - Kernel stores root directory of each process - Each process has one root directory, so chrooting to a new - File name “/” now refers to dir directory can put you above your new root - Accessing “..” in dir now returns dir Create devices that let you access raw disk • Need root privs to call chroot • Send signals to or ptrace non-chrooted processes - But subsequently can drop privileges • Create setuid program for non-chrooted proc. to run Ideally “Chrooted process” wouldn’t affect parts of • • the system outside of dir Bind privileged ports, mess with clock, reboot, etc. • - Even process still running as root shouldn’t escape chroot Problem: chroot was not originally intended for • security In reality, many ways to cause damage outside dir • - FreeBSD jail, Linux vserver have tried to address problems 3/37 4/37 System call interposition Limitations of syscall interposition Why not use ptrace or other debugging facilities to • Hard to know exact implications of a system call control untrusted programs? • - Too much context not available outside of kernel Almost any “damage” must result from system call • (e.g., what does this file descriptor number mean?) - delete files unlink → - Context-dependent (e.g., /proc/self/cwd) - overwrite files open/write → Indirect paths to resources - attack over network socket/bind/connect/send/recv • → - File descriptor passing, core dumps, “unhelpful processes” - leak private data open/read/socket/connect/write . → Race conditions • So enforce policy by allowing/disallowing each - Remember difficulty of eliminating TOCCTOU bugs? • syscall - Now imagine malicious application deliberately doing this - Theoretically much more fine-grained than chroot - Symlinks, directory renames (so “..” changes), . - Plus don’t need to be root to do it See [Garfinkel] for a more detailed discussion Q: Why is this not a panacea? • • 5/37 6/37 Review: What is an OS What if. The process abstraction looked just like hardware? • OS is software between applications and reality • - Abstracts hardware and makes portable - Makes finite into (near) infinite - Provides protection 7/37 8/37 How is a process different from HW? Virtual Machine Monitor Thin layer of software that virtualizes the hardware Process Hardware • - Exports a virtual machine abstraction that looks like the CPU – Non-Privileged CPU – All registers and • • hardware registers and instructions. instructions. Memory – Virtual memory. Memory – Both virtual and • • Exceptions – signals, errors. physical memory, memory • management, TLB/page I/O – File System, Directory, • tables, etc. Files, raw devices. Exceptions – Trap architecture, • interrupts, etc. I/O – I/O devices accessed us- • ing programmed I/O, DMA, interrupts. 9/37 10/37 Old idea from the 1960s VMM benefits See [Goldberg] from 1974 Software compatibility • • IBM VM/370 – A VMM for IBM mainframe - Runs pretty much all software • - Multiplex multiple OS environments on expensive hardware - Trick: Make virtual hardware match real hardware - Desirable when few machines around Can get Low overheads/High performance • Interest died out in the 1980s and 1990s - Near “raw” machine performance for many workloads • - Hardware got cheap - With tricks can have direct execution on CPU/MMU - Compare Windows NT vs. N DOS machines Isolation • Interesting again today - Seemingly total data isolation between virtual machines • - Different problems today – software management - Use hardware protection - VMM attributes still relevant Encapsulation • - Virtual machines are not tied to physical machines - Checkpoint/Migration 11/37 12/37 OS backwards compatibility Logical partitioning of servers Run multiple servers on same box (e.g., Amazon EC2) Backward compatibility is bane of new OSes • • - Ability to give away less than one machine - Huge effort require to innovate but not break Modern CPUs more powerful than most services need Security considerations may make it impossible • - 0.10U rack space machine – less power, cooling, space, etc. - Choice: Close security hole and break apps or be insecure - Server consolidation trend: N machines 1 real machine → Example: Not all WinNT applications run on WinXP Isolation of environments • or XP on Vista • - Printer server doesn’t take down Exchange server - In spite of a huge compatibility effort - Compromise of one VM can’t get at data of othersa - Given the number of applications that ran on WinNT, Resource management practically any change would break something • if (OS == WinNT) . - Provide service-level agreements Solution: Use a VMM to run both WinNT and WinXP Heterogeneous environments • • - Obvious for OS migration as well: Windows Linux - Linux, FreeBSD, Windows, etc. → a though in practice not so simple because of side-channel attacks [Ristenpart] 13/37 14/37 Complete Machine Simulation Virtualizing the CPU Observations: Most instructions are the same Simplest VMM approach, used by bochs • • regardless of processor privileged level Build a simulation of all the hardware. • - Example: incl %eax - CPU – A loop that fetches each instruction, decodes it, Why not just give instructions to CPU to execute? simulates its effect on the machine state • - One issue: Safety – How to get the CPU back? Or stop it from - Memory – Physical memory is just an array, simulate the MMU stepping on us? How about cli/halt? on all memory accesses - Solution: Use protection mechanisms already in CPU - I/O – Simulate I/O devices, programmed I/O, DMA, interrupts Run virtual machine’s OS directly on CPU in • unprivileged user mode Problem: Too slow! • - “Trap and emulate” approach - 100x slowdown makes it not too useful - Most instructions just work - CPU/Memory – 100x CPU/MMU simulation - Privileged instructions trap into monitor and run simulator on - I/O Device – < 2 slowdown. × instruction Need faster ways of emulating CPU/MMU • - Makes some assumptions about architecture 15/37 16/37 Virtualizing traps Virtualizing memory What happens when an interrupt or trap occurs Basic MMU functionality: • • - Like normal kernels: we trap into the monitor - OS manages physical memory (0. MAX MEM) What if the interrupt or trap should go to guest OS? - OS sets up page tables mapping VA PA • → - Example: Page fault, illegal instruction, system call, interrupt - CPU accesses to VA should go to PA (Paging off: PA=VA) - Re-start the guest OS simulating the trap - Used for every instruction fetch, load, or store x86 example: Need to implement a virtual “physical memory” • • - Give CPU an IDT that vectors back to VMM - Logically need additional level of indirection - VM’sVA VM’s PA machine address - Look up trap vector in VM’s “virtual” IDT → → - Push virtualized %cs, %eip, %eflags, on stack - Note “physical memory” no longer mans hardware bits – - Switch to virtualized privileged mode machine memory is hardware bits Trick: Use hardware MMU to simulate virtual MMU • - Can be folded into page tables: VA machine address → 17/37 18/37 Shadow page tables Shadow PT issues Monitor keeps shadow of VM’s page table Hardware only ever sees shadow page table • • - Shadow PT is map from VA machine address - Guest OS only sees it’s own VM page table, never shadow PT → Treat shadow page tables as a cache Consider the following • • - Have true page faults when a page not in VM’s page table - OS has a page table T mapping V P U → U - Have hidden page faults when just misses in shadow page table - T itself resides at physical address PT On a page fault, VMM must: - Another page table maps VT PT • → - Lookup VPN PPN in VM’s (guest OS’s) page table - VMM stores PU in machine address MU and PT in MT → - Determine where PPN is in machine memory (MPN) What can VMM put in shadow page table? • - Insert VPN MPN mapping in shadow page table - Safe to map V M or V M → U → U T → T - Note: Monitor can demand-page the virtual machine Not safe to map both simultaneously! • Uses hardware protection - If OS writes to P , may make V M in shadow PT incorrect • T U → U - Monitor never maps itself into VM’s page table - If OS reads/writes VU, may require accessed/dirty bits to be - Never maps other VMs’s memory in VM’s page table changed in PT (hardware can only change shadow PT) 19/37 20/37 Tracing Tracing vs. hidden faults VMM needs to get control on some memory accesses • Suppose VMM never allowed access to VM PTs? Guest OS changes VM page table • • - Every PTE access would incur the cost of a tracing fault invlpg - OS should use instruction, which would trap to VMM – - Very expensive when OS changes lots of PTEs but in practice many/most OSes are sloppy about this Suppose OS allowed access to most page tables - Must invalidate stale mapping in shadow page table • (except very recently accessed regions) Guest OS accesses page when VM PT accessible • - Now lots of hidden faults when accessing new region - Accessed/dirty bits in VM PT will no longer be correct - Plus overhead to pre-compute accessed/dirty bits from shadow - Must make VM PT inaccessible in shadow PT page tables when switching between VMs Solution: Tracing Makes for complex trade-offs • • - To track page access, make VPN(s) invalid in shadow PT - But adaptive binary translation (later) can make this better - If guest OS accesses page, will trap to VMM w.