Attacking Edge Through the Javascript Compiler Bluehatil 2019 and Offensivecon 2019 Bruno Keith Whoami

Attacking Edge through the JavaScript compiler BlueHatIL 2019 And OffensiveCon 2019 Bruno Keith whoami ● 24, French ● Started playing CTF in 2016 (ESPR) ● Started vulnerability research full time in 2018 ● Mainly looking at JavaScript engines ● @bkth_ (DMs are open) Agenda 1. ChakraCore 2. JavaScript engine primer 3. ChakraCore internals basics 4. Just-In-Time (JIT) compilation of JavaScript and its problematic 5. Chakra’s JIT compiler 6. Case study of a bug What is ChakraCore ● Chakra is the JavaScript engine powering Microsoft Edge (not for long anymore :( ) ● ChakraCore is the open-sourced version of Chakra minus a few things (COM API, Edge bindings, etc…) ● Available on GitHub ● Written mainly in C++ JavaScript engine primer What makes up a JavaScript engine? ● Parser ● Interpreter ● Runtime ● Garbage Collector ● JIT compiler(s) What makes up a JavaScript engine? ● Parser Entrypoint, parses the source code and produces custom bytecode ● Interpreter ● Runtime ● Garbage Collector ● JIT compiler(s) What makes up a JavaScript engine? ● Parser ● Interpreter Virtual machine that processes and “executes” the bytecode ● Runtime ● Garbage Collector ● JIT compiler(s) What makes up a JavaScript engine? ● Parser ● Interpreter ● Runtime Basic data structures, standard library, builtins, etc. ● Garbage Collector ● JIT compiler(s) What makes up a JavaScript engine? ● Parser ● Interpreter ● Runtime ● Garbage Collector Freeing of dead objects ● JIT compiler(s) What makes up a JavaScript engine? ● Parser ● Interpreter ● Runtime ● Garbage Collector ● JIT compiler(s) Consumes the bytecode to produce optimized machine code ChakraCore internals basics Representing JSValues ● Every “value” is of type Var (just an alias for void*) ● NaN-boxing: trick to encode both value and some type information in 8 bytes ● Use the upper 17 bits of a 64 bits value to encode some type information var a = 0x41414141 represented as 0x0001000041414141 var b = 5.40900888e-315 represented as 0xfffc000041414141 ● Upper bits cleared => pointer to an object which represents the actual value Representing JSObjects ● JavaScript objects are basically a collection of key-value pairs called properties ● The object does not maintain its own map of property names to property values. ● The object only has the property values and a Type which describes that object’s layout. ○ Saves space by reusing that type across objects ○ Allows for optimisations such as inline caching (more on that later) ● Bunch of different layouts for performance. Objects internal representation var a = {}; a.x = 0x414141; 0x00010000414141 0x00010000424242 a.y = 0x424242; __vfptr type auxSlots objectArray Objects internal representation var a = {x: 0x414141, y:0x424242}; stored with a layout called ObjectHeaderInlined __vfptr type 0x0001000000414141 0x0001000000424242 Property access var a = {}; 0x00010000414141 0x00010000424242 a.x = 0x414141; print(a.x); __vfptr type auxSlots objectArray Type is used to know where the property is stored Property access var a = {}; 0x00010000414141 0x00010000424242 a.x = 0x414141; print(a.x); __vfptr type auxSlots objectArray Type is used to know where the property is stored Can map a property name (PropertyId) to an index Property access var a = {}; 0x00010000414141 0x00010000424242 a.x = 0x414141; print(a.x); __vfptr type auxSlots objectArray Type is used to know where the property is stored Can map a property name (PropertyId) to an index Interpreter will call a->GetDynamicType()->GetTypeHandler()->GetProperty(PropertyId(“x”)); Representing JSArrays ● Standard-defined as an exotic object having a “length” property defined ● Most engines implement basic and efficient optimisations for Arrays internally ● Chakra uses a segment-based implementation ● Three main classes to allow storage optimization: ○ JavascriptNativeIntArray ○ JavascriptNativeFloatArray ○ JavascriptArray Representing JSArrays var arr = [1,2,3]; Representing JSArrays var arr = [1,2,3]; left: 0 length: 3 size: 6 next: 0 1 2 3 Representing JSArrays var arr = [1,2,3]; left: 0 length: 3 size: 6 next 1 2 3 arr[100] = 4; left: 100 length: 1 size: 18 next: 0 4 ... ... JavaScript JIT compilation and its problematic JIT compilation Goal is to generate highly optimized machine code Pros: much better code speed Problems: higher startup time, no type information In practice, execution starts in the interpreter. If a function gets called repeatedly, it will be compiled to machine code Problematic function addition(x, y) { return x + y; } Problematic 1. Let lref be the result of evaluating AdditiveExpression. function addition(x, y) { 2. Let lval be ? GetValue(lref). return x + y; 3. Let rref be the result of evaluating MultiplicativeExpression. } 4. Let rval be ? GetValue(rref). 5. Let lprim be ? ToPrimitive(lval). 6. Let rprim be ? ToPrimitive(rval). 7. If Type(lprim) is String or Type(rprim) is String, then a. Let lstr be ? ToString(lprim). b. Let rstr be ? ToString(rprim). c. Return the String that is the result of concatenating lstr and rstr. 8. Let lnum be ? ToNumber(lprim). 9. Let rnum be ? ToNumber(rprim). 10. Return the result of applying the addition operation to lnum and rnum. See the Note below 12.8.5. Problematic function addition(x, y) { If we only call this function with numbers we probably want it return x + y; compiled to something that looks like: } lea rax, [rdi+rsi] ret However, JavaScript has no type information. Problematic function addition(x, y) { If we only call this function with numbers we probably want it return x + y; compiled to something that looks like: } lea rax, [rdi+rsi] ret However, JavaScript has no type information. SOLUTION: collect profile information and generate optimized code based on that Problematic function addition(x, y) { return x + y; } for (var i = 0; i < 1000; ++i) { addition(i, 1337); } Collect type information on the parameters Assumption: will be called with same arguments type Idea: Check type at the beginning for the compiled function and optimize based on that Problematic: another example function getX(o) { We want to optimize the object access. return o.x; } But we don’t want to compile down the whole object lookup for (var i = 0; i < 1000; ++i) { getX({x:1337}); } Problematic: another example function getX(o) { We want to optimize the object access. return o.x; } But we don’t want to compile down the whole object lookup for (var i = 0; i < 1000; ++i) { SOLUTION: Assume the object type will stay the same and getX({x:1337}); use direct index access (inline caching), if not fall back to } the interpreter. Key concept: Slow path If assumptions do not hold, we might have to call back into the runtime/interpreter via a so-called “slow path” to execute a certain operation Bad news: we get a performance hit Good news: Execution returns in the JIT compiled function Key concept: Bailout Sometimes, if an assumption does not hold, the JIT code is completely unusable. The whole function has to continue in the interpreter Cons: bigger performance hit (bailing out is a non-trivial process) Pros: it actually works? In a nutshell JIT compilation of JavaScript relies on profile information collected during execution in the interpreter Highly optimized code is generated based on that information JIT code has to be responsible for generating checks in the code to make sure the assumption is true and deal with cases when they are not Problems arise when the engine assumes something which is not true Chakra’s JIT compiler Chakra JIT pipeline Interpreter keeps track of how many times a function has been called Past a certain threshold, change the entrypoint of the function to a thunk to start JIT compilation. Compilation happens out of process in Edge where the JIT runs in its own process so the content process can benefit from Arbitrary Code Guard. When code generation is done, change entrypoint to the native code address. Chakra JIT pipeline Chakra has a two-tiered JIT compiler: SimpleJit and FullJit Operates on a Control-Flow Graph (CFG) and a custom Intermediate Representation (IR) generated from the function’s bytecode. Main steps of compilation are roughly: ● IRBuilderPhase: builds the IR from the bytecode ● InlinePhase: check if some things can be inlined ● FGBuildPhase: builds the CFG from the IR ● GlobOptPhase: global optimizer, where most of the magic happens ○ SimpleJit: one backward pass (deadstore pass) on the CFG ○ FullJit: one backward pass, one forward pass, one backward pass (deadstore pass) ● LowererPhase: lowers the IR to machine dependent operations ● RegAllocPhase ● …. Chakra JIT pipeline Chakra has a two-tiered JIT compiler: SimpleJit and FullJit Operates on a Control-Flow Graph (CFG) and a custom Intermediate Representation (IR) generated from the function’s bytecode. Main steps of compilation are roughly: ● IRBuilderPhase: builds the IR from the bytecode ● InlinePhase: check if some things can be inlined This is what interests us the most ! ● FGBuildPhase: builds the CFG from the IR ● GlobOptPhase: global optimizer, where most of the magic happens ○ SimpleJit: one backward pass (deadstore pass) on the CFG ○ FullJit: one backward pass, one forward pass, one backward pass (deadstore pass) ● LowererPhase: lowers the IR to machine dependent operations ● RegAllocPhase ● …. How it looks in Chakra: simple integer addition Line 2: return x + y; Col 5: ^ StatementBoundary #0 #0000 function addition(x, y) { return x + y; GLOBOPT INSTR: s0[LikelyCanBeTaggedValue_Int].var = Add_A } s2[LikelyCanBeTaggedValue_Int].var!, s3[LikelyCanBeTaggedValue_Int].var! #0000 for

Load more