Montage: a Neural Network Language Model-Guided Javascript Engine Fuzzer

Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer Suyoung Lee, HyungSeok Han, Sang Kil Cha, Sooel Son Graduate School of Information Security (GSIS) , KAIST 1 Popularity of Web Browsers 4 billion users Source: https://www.statista.com/statistics/543218/worldwide-internet-users-by-browser/ 2 Vulnerable Web Browsers ▪ Browser-based cyber threats ▪ Most exploited applications in 2018 Office 47% 24% Browser Android 21% Java 5% https://www.kaspersky.com/about/press-releases/2018_microsoft-office-exploits https://www.theguardian.com/technology/2017/jan/10/browser-autofill-used-to-steal-personal-details-in-new-phising-attack-chrome-safari https://searchsecurity.techtarget.com/news/252458522/MarioNet-attack-exploits-HTML5-to-create-botnets https://cointelegraph.com/news/recent-firefoxs-zero-day-flaw-was-used-in-attacks-against-coinbases-employees https://www.tomsguide.com/us/ios-malvertising-barrage,news-29880.html 3 JS Engine Vulnerabilities https://leeswimming.com# id uid=0(root) gid=0(root) groups=0(root) JS Engine V8 Chrome JS engine vulnerabilities pose critical security threat! 4 JS Engine Fuzzing ▪ JS Engines V8 Spider Chakra JavaScript Monkey Core Core ▪ Fuzzing (Fuzz Testing) - An automated software testing that involve providing invalid or unexpected input to a program under testing (PUT). JS TESTCODE JS Engine Fuzzer Chakra Core (PUT) 5 How Can We (Fuzzer) Generate Test Input? Proof of Concept (PoC) for CVE-2017-8586 CRASH! Chakra Core (PUT) 6 Previous Work Abstract Syntax Tree (AST) 7 Previous Work Abstract Syntax Tree (AST) 1. Mutation-based fuzzers - LangFuzz, IFuzzer, and GramFuzz - Combining AST subtrees extracted from JS seeds 2. Generation-based fuzzers - jsfunfuzz - Applying JS grammar rules from scratch 8 Previous Work Abstract Syntax Tree (AST) 1. Mutation-based fuzzers - LangFuzz, IFuzzer, and GramFuzz - Combining AST subtrees extracted from JS seeds 2. Generation-based fuzzers - jsfunfuzz - Applying JS grammar rules from scratch 9 Previous Work: Mutation-based JS fuzzers ▪ LangFuzz, IFuzzer, and GramFuzz - They combine AST subtrees of seed JS tests Subtree 1 Subtree 2 Generated code 10 Relationship between Building Blocks Current AST A set of applicable AST subtrees None of the existing fuzzers consider their relationships! Which combination is more likely to trigger JS engine bugs? 11 Motivational Question Current AST A set of applicable AST subtrees Are there any similar patterns between bug-triggering JS code? 12 Study on JS Engine Vulnerabilities • Analyzed patches of 50 CVEs assigned to ChakraCore CVE-2017-0071 18% of patches revised CVE-2017-0141 GlobOpt.cpp CVE-2017-0196 . 14% of patches revised . JavascriptArray.cpp CVE-2018-0953 Patches of 50 CVEs 13 Study on JS Engine Vulnerabilities • Analyzed patches of 50 CVEs assigned to ChakraCore CVE-2017-0071 18% are related to CVE-2017-0141 global optimization CVE-2017-0196 . 14% are related to . JavaScript array CVE-2018-0953 Patches of 50 CVEs 14 Study on JS Engine Vulnerabilities • Compared AST subtrees from two sets At August, 2016 After August, 2016 JS Test 1 CVE-2016-3247 Over 95% subtrees from JS Test 2 CVE-2016-7203 PoCs exist in regression tests . JS Test 2038 CVE-2018-0980 2038 JS tests from 67 PoCs triggering ChakraCore repo ChakraCore CVEs 15 Our Goal 1. To leverage the functionality of JS regression tests - Mutation based approach 2. To learn the relationship of AST subtrees - Modeling the relationship between AST subtrees 16 Our Building Block – Fragments const v0 = 0; Program Program VarDeclaration body body kind decl VarDeclaration VarDeclaration const VarDeclarator kind decl const VarDeclarator id init VarDeclarator Identifier Literal Identifier Literal id init name value name value Identifier Literal v0 0 v0 0 A fragment is a subtree of depth 1 17 Montage Overview JS A Sequence of Trained Fragments NNLM Preprocessing Training AST Mutation JS 18 Preprocessing – Normalization • Normalizing IDs - To decrease the # of unique fragments, rename IDs VarDeclarator VarDeclarator id init id init Identifier Literal Identifier Literal name value name value cntv0 0 idxv0 0 19 Preprocessing – Normalization • Normalizing IDs - To decrease the # of unique fragments, rename IDs VarDeclarator VarDeclarator id init id init Identifier Literal Identifier Literal name value name value v0 0 v0 0 20 Preprocessing – Fragmentation • Fragmentation Program Program body body VarDeclaration VarDeclaration kind decl const VarDeclarator id init Identifier Literal name value v0 0 Normalized AST A sequence of fragments 21 Preprocessing – Fragmentation • Fragmentation Program Program VarDeclaration body body kind decl VarDeclaration VarDeclaration const VarDeclarator kind decl const VarDeclarator id init Identifier Literal name value v0 0 Normalized AST A sequence of fragments 22 Preprocessing – Fragmentation • Fragmentation Program Program VarDeclaration body body kind decl VarDeclaration VarDeclaration const VarDeclarator kind decl const VarDeclarator id init VarDeclarator Identifier Literal id init name value Identifier Literal v0 0 Normalized AST A sequence of fragments 23 Preprocessing – Fragmentation • Fragmentation Program Program VarDeclaration body body kind decl VarDeclaration VarDeclaration const VarDeclarator kind decl const VarDeclarator id init VarDeclarator Identifier Identifier Literal id init name name value Identifier Literal v0 v0 0 Normalized AST A sequence of fragments 24 Preprocessing – Fragmentation • Fragmentation Program Program VarDeclaration body body kind decl VarDeclaration VarDeclaration const VarDeclarator kind decl const VarDeclarator id init VarDeclarator Identifier Literal Identifier Literal id init name value name value Identifier Literal v0 0 v0 0 Normalized AST A sequence of fragments 25 Preprocessing – Fragmentation • Fragmentation Program body VarDeclaration kindMontagedecl captures the global compositional const VarDeclaratorrelationships between fragments id init Identifier Literal A sequence of fragments name value v0 0 Normalized AST 26 Training Objectives 1. Given a sequence, predict the distribution of next fragments A sequence of Probability distribution of LSTM model preceding fragments next fragment 27 Training Objectives 2. Prioritize the fragments that have a correct type Program FunctionDecl id body body params FunctionDecl Identifier Identifier BlockStmt A given sequence of preceding fragments 28 Training Objectives 2. Prioritize the fragments that have a correct type Program FunctionDecl id body body params FunctionDecl Identifier Identifier BlockStmt A given sequence of preceding fragments To be syntactically correct, the root of the next fragment should be Identifier! 29 AST Mutation const v1 = v0 + 1; Program body VarDeclaration kind decl const VarDeclarator id init Identifier BinaryExpr left right name operator v1 Identifier + Literal name value v0 1 Seed AST 30 AST Mutation Program body VarDeclaration kind decl const VarDeclarator id init Identifier BinaryExpr left right name operator v1 Identifier + Literal name value v0 1 Remove a subtree 31 AST Mutation Program body VarDeclaration kind decl const VarDeclarator id init Identifier BinaryExpr name v1 32 AST Mutation Program body Input the sequence of 4 fragments VarDeclaration representing the current AST kind decl const VarDeclarator id init Trained LSTM model Identifier BinaryExpr name v1 33 AST Mutation Program body VarDeclaration kind decl const VarDeclarator id init Trained LSTM model Identifier BinaryExpr name v1 … The probability distribution of the next fragment 0.76 0.12 0.001 34 AST Mutation Program body VarDeclaration kind decl const VarDeclarator id init Trained LSTM model Identifier BinaryExpr name v1 … Randomly select one from Top K fragments 0.76 0.12 0.001 35 AST Mutation Program body VarDeclaration kind decl const VarDeclarator id init Trained LSTM model Identifier BinaryExpr left right name operator v1 Identifier / Identifier 36 AST Mutation const v1 = v1 / v0; Program body VarDeclaration kind decl const VarDeclarator id init Trained LSTM model Identifier BinaryExpr left right name operator v1 Identifier / Identifier name name v1 v0 Repeat until all leaf nodes become terminal symbols 37 Resolving Reference Errors var str = ‘Hello World’; foo(); function foo () { var obj = Object(); num = 10; b.toUpperCase(); // reference error } Code generated from the previous step 38 Resolving Reference Errors var str = ‘Hello World’; global scope: foo(); str => string foo => function function foo () { num => number var obj = Object(); foo: num = 10; obj => object b.toUpperCase(); } Identifier map 39 Resolving Reference Errors var str = ‘Hello World’; global scope: foo(); str => string foo => function function foo () { num => number var obj = Object(); foo: num = 10; obj => object b.toUpperCase(); } Identifier map If possible, statically infer the type of undeclared identifiers! 40 Resolving Reference Errors var str = ‘Hello World’; global scope: foo(); str => string foo => function function foo () { num => number var obj = Object(); foo: num = 10; obj => object b.toUpperCase(); } b is a string Identifier map If possible, statically infer the type of undeclared identifiers! 41 Resolving Reference Errors var str = ‘Hello World’; global scope: foo(); str => string foo => function function foo () { num => number var obj = Object(); foo: num = 10; obj => object str.toUpperCase(); } Identifier map Replace b with a declared identifier str 42 Experiment Setup Dataset Unpatched Bugs January, 2017 February, 2017 Dataset ChakraCore 1.4.1 • Collected

Load more