Verifying Practical Garbage Collectors Virtual
Total Page:16
File Type:pdf, Size:1020Kb
Verifying Practical Garbage Collectors Chris Hawblitzel Joint work with Erez Petrank (Technion) 1/27 Virtual machines for languages Java bytecode / .NET bytecode / JavaScript / ActionScript Virtual Machine compiler libraries run‐time system exceptio garbage concurrenc x86 n collector y handling 2/27 Virtual machine vulnerabilities Opera JavaScript Pointer Dereference Lets Microsoft Security Bulletin MS07‐040 ‐ Critical Remote Users Execute Arbitrary Code Vulnerabilities in .NET Framework Could Allow Original Entry Date: Aug 15 2007 Remote Code Execution A remote user can create specially crafted Published: July 10, 2007 JavaScript that, when loaded by the target ... user, will make a virtual function call on an What causes the vulnerability? invalid pointer to trigger a memory An unchecked buffer in the .NET Framework 2.0 dereference error and execute arbitrary JIT Compiler service within the .NET Framework. code on the target system. From: Apple Product Security Mozilla Firefox Bug in JavaScript Garbage Date: Fri, 11 Jul 2008 Collector Lets Remote Users Deny Service ... Advisory: Mozilla Foundation Security Advisory Available for: iPhone v1.0 through Date: Apr 16 2008 v1.1.4,iPod touch v1.1 through v1.1.4 A remote user can create specially crafted HTML Description: A memory corruption issue that, when loaded by the target user, will trigger exists in JavaScriptCore's handling of runtime a flaw in the JavaScript garbage collector code garbage collection. Visiting a maliciously and cause the target user's browser to crash. ... crafted website may lead to an unexpected The vendor indicates that similar crashes have application termination or arbitrary code been exploitable in the past, so arbitrary code execution. This update addresses the issue execution may be possible... through improved garbage collection. 3/27 Outline • Virtual machines, vulnerabilities • Proof‐carrying code – Goals: • verify properties of compiler output (e.g. type safety) • verify properties of run‐time system (e.g. correctness) • Simple verified GC • Practical verified GCs 4/27 Proof‐carrying code (Necula, Lee 1996) Java bytecode / .NET bytecode / JavaScript / ActionScript Virtual Machine compiler libraries run‐time system exceptio garbage concurrenc x86 n collector y proof handlingproof proof proof Verifier 5/27 Type‐preserving compilation in Bartok CIL (.NET bytecode) Bartok Compiler (~200,000 LOC) Optimizations: copy propagation, constant propagation,propagation, constant folding, HIR types CSE,CSE, dead‐code elimination, loop‐ invariant removal, loop rotation, inlining,inlining, tree shaking, array bounds check elimination (ABCD, induction MIR types var),var), reverse CSE for LEA instruction, peephole optimizations,optimizations, elimination of redundant CC ops, optimize boolean test and branch, floating point stack LIR types optimizations, convert “ADD” to “LEA”, graph‐coloring register allocation,allocation, code layout, ... x86 types “Typed assembly language” (Morrisett et al 1998) 6/27 Typed output from Bartok B1: {eax:int, ebx:int} cmp eax, 5 jne B3 B2: B3: mov ecx, eax mov ecx, ebx jmp B1 add ecx, 1 B4: {ecx:int} add ecx, 1 … 7/27 Array bounds safety in Bartok CIL (.NET bytecode) Bartok Compiler (~200,000 LOC) Optimizations: copy propagation, constant propagation,propagation, constant folding, HIR types CSE,CSE, dead‐code elimination, loop‐ invariant removal, loop rotation, inlining,inlining, tree shaking, array bounds unsafe check elimination (ABCD, induction MIR types proofs var),var), reverse CSE for LEA instruction, peephole optimizations,optimizations, elimination of redundant CC ops, optimize boolean test and branch, floating point stack LIR types proofs optimizations, convert “ADD” to “LEA”, graph‐coloring register allocation,allocation, code layout, ... x86 types proofs 8/27 A bug in Bartok: ABCD void F(int[] arr, int i) { if (i <= arr.Length) { counterexample: // i ≤ arr.Length i = ‐231 int j = i ‐ 1; j = 231 ‐ 1 // j < i ≤ arr.Length j > i if (j >= 0) { // 0 ≤ j < i ≤ arr.Length // no bounds check required! vulnerability: arr[j]++; address = &(arr[0]) + 4*j } = &(arr[0]) + 4*(231 ‐ 1) } = &(arr[0]) ‐ 4 } 9/27 Outline • Virtual machines, vulnerabilities • Proof‐carrying code • Simple verified GC – Program verification with Boogie/Coq, Boogie/Z3 – Miniature mark‐sweep collector – Integers, bit vectors, arrays, quantifiers • Practical verified GCs 10/27 Program verification (demo) 11/27 Mark‐sweep and copying collectors abstract A BC graph (root) mark‐sweep copying from copying to A A A C C B B B 12/27 Garbage collector properties • safety: gc does no harm – graph isomorphism • concrete graph represents abstract graph verified (McCreight, Shao et al 2007) • effectiveness – after gc, unreachable objects reclaimed • termination not • efficiency verified 13/27 Proving safety abstract $AbsMem A BC graph (root) $toAbs $toAbs concrete Mem graph A B procedure GarbageCollect() requires MutatorInv(root, Color, $toAbs, $AbsMem, Mem); functionmodifies MutatorInv(...) Mem, Color, $toAbs; returns (bool) { ensuresWellFormed($toAbs) MutatorInv(root, && Color, memAddr(root) $toAbs, $AbsMem, && $toAbs[root] Mem); != NO_ABS &&... (forall i:int::{T(i)} T(i) && memAddr(i) ==> ObjInv(i, $toAbs, $AbsMem, Mem)) {&& (forall i:int::{T(i)} T(i) && memAddr(i) ==> White(Color[i])) &&... call (forall Mark(root); i:int::{T(i)} ...T(i) && memAddr(i) ==> ($toAbs[i]==NO_ABS <==> Unalloc(Color[i])))} functioncall Sweep(); ObjInv(...) returns (bool) { $toAbs[i] != NO_ABS ==> } ... $toAbs[Mem[i, field]] != NO_ABS ... ... $toAbs[Mem[i, field]] == $AbsMem[$toAbs[i], field] ... } 14/27 Verifying Mark procedure Mark(ptr:int) requires GcInv(Color, $toAbs, $AbsMem, Mem); requires memAddr(ptr) && T(ptr); requires $toAbs[ptr] != NO_ABS; Mark creates no new gray, white objects modifies Color; ensures GcInv(Color, $toAbs, $AbsMem, Mem); ensures (forall i:int::{T(i)} T(i) && !Black(Color[i]) ==> Color[i] == old(Color)[i]); ensures !White(Color[ptr]); { if (White(Color[ptr])) { Color[ptr] := 2; // make gray call Mark(Mem[ptr,0]); call Mark(Mem[ptr,1]); Color[ptr] := 3; // make black } } 15/27 Verifying Mark procedure Mark(ptr:int) requires GcInv(Color, $toAbs, $AbsMem, Mem); requires memAddr(ptr) && T(ptr); requires $toAbs[ptr] != NO_ABS; modifies Color; ensures GcInv(Color, $toAbs, $AbsMem, Mem); ensures (forall i:int::{T(i)} T(i) && !Black(Color[i]) ==> Color[i] == old(Color)[i]); ensures !White(Color[ptr]); { if (White(Color[ptr])) { A Color[ptr] := 2; // make gray B call Mark(Mem[ptr,0]); call Mark(Mem[ptr,1]); Color[ptr] := 3; // make black } A B } forallforall i:int i:int::{T(i)}::{T(i)} T(i) T(i) && && memAddr(i) memAddr(i) ==> ==> Black(Color[i]) ObjInv(i, $toAbs, ==> !White(Color[Mem[i,field]]$AbsMem, Mem) 16/27 Integers, bit vectors, arrays (decidable) procedure Mark(ptr:int) memAddr(i) <==> memLo <= i && i < memHi requires GcInv(Color, $toAbs, $AbsMem, Mem); requires memAddr(ptr) && T(ptr); requires $toAbs[ptr] != NO_ABS; modifies Color; ensures GcInv(Color, $toAbs, $AbsMem, Mem); ensures (forall i:int::{T(i)} T(i) && !Black(Color[i]) ==> Color[i] == old(Color)[i]); ensures !White(Color[ptr]); { if (White(Color[ptr])) { Color[ptr] := 2; // make gray (equivalent to “Color := Color[ptr := 2];” call Mark(Mem[ptr,0]); call Mark(Mem[ptr,1]); booleanColor[ptr] expressions := 3; // makee black ::= true | false | x | !e | e && e | e ==> e | ... linear} integer arithmetic e ::= ... | ‐2 | ‐1 | 0 | 1 | 2 | e + e | e –e | e <= e | ... } bit vector arithmetic e ::= ... | e & e | e << e | ... arrays e ::= ... | e[e] | e[e := e] 17/27 Quantifiers (undecidable) procedure Mark(ptr:int) requires GcInv(Color, $toAbs, $AbsMem, Mem); requires memAddr(ptr) && T(ptr); requires $toAbs[ptr] != NO_ABS; explicit introductions of T(...) modifies Color; to provoke quantifier instantiations ensures GcInv(Color, $toAbs, $AbsMem, Mem); ensures (forall i:int::{T(i)} T(i) && !Black(Color[i]) ==> Color[i] == old(Color)[i]); ensures !White(Color[ptr]); “trigger” quantifier instantiation { i =e when T(e) seen if (White(Color[ptr])) { Color[ptr] := 2; // make gray (equivalent to “Color := Color[ptr := 2];” call Mark(Mem[ptr,0]); T is a dummy function: call Mark(Mem[ptr,1]); Color[ptr] := 3; // make black T(e) <==> true } } 18/27 Outline • Virtual machines, vulnerabilities • Proof‐carrying code • Simple verified GC • Practical verified GCs – Mark‐sweep collector • Mark stack • Table of color bits (2 bits per color) • Allocate small objects from cache • Allocate large objects, caches from free list – Copying collector • Simple 2‐space Cheney queue algorithm • Not generational 19/27 Mutator‐GC interface The mutator (the program) calls these procedures: ¾ Initialize() ¾ read($ptr:int, $field:int) prove that it’s safe to read/write ¾ write($ptr:int, $field:int, $val:int) (mutator then performs read/write) ¾ AllocateObject(VTable vtable) ¾ AllocateString(int stringLength) ¾ AllocateVector(VTable vtable, int numElems) ¾ AllocateArray(VTable vtable, int numDims, int numElems) (Note: we define a field to be an aligned, 4‐byte word) 20/27 Object layouts (multi‐dimensional) Class object String Vector Array preheader preheader preheader preheader vtable vtable vtable vtable data field numChars numElems numDims (private) elements numElems data field char char base/length elements char char base/length ... elements tag(vtable) == ?STRING_TAG ==> numFields(abs) >= 4 && 4 * numFields(abs) == pad(16 + 2 * numChars) ... elements 21/27 VTables and GC information Class object VTable preheader vtable data field gc info data