JIT Spraying Threats on ARM and Defense by Diversification

Total Page:16

File Type:pdf, Size:1020Kb

JIT Spraying Threats on ARM and Defense by Diversification UC San Diego UC San Diego Electronic Theses and Dissertations Title JIT Spraying Threats on ARM and Defense by Diversification Permalink https://escholarship.org/uc/item/0dw3f9k1 Author Lian, Wing-Soon Wilson Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO JIT Spraying Threats on ARM and Defense by Diversification A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Wing-Soon Wilson Lian Committee in charge: Professor Stefan Savage, Co-Chair Professor Hovav Shacham, Co-Chair Professor Ranjit Jhala Professor Gert Lanckriet Professor Geoffrey M. Voelker 2016 Copyright Wing-Soon Wilson Lian, 2016 All rights reserved. The Dissertation of Wing-Soon Wilson Lian is approved and is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California, San Diego 2016 iii TABLE OF CONTENTS Signature Page . iii Table of Contents . iv List of Figures . vii List of Tables . ix Acknowledgements . x Vita................................................................. xii Abstract of the Dissertation . xiii Introduction . 1 Chapter 1 Background . 6 Chapter 2 Assumptions and Threat model . 11 Chapter 3 ARM Architecture . 13 3.1 Instruction sets . 13 3.2 Core registers . 15 3.3 Endianness . 17 3.4 Conditional execution . 17 Chapter 4 JIT Spraying Payloads on ARM. 19 4.1 Introduction . 19 4.2 Controlling JIT compiler output. 21 4.2.1 Attacker-controlled bits . 21 4.2.2 Immediate bits . 22 4.2.3 Register fields . 24 4.2.4 Arithmetic woes . 26 4.3 Same-instruction set self-sustaining payloads on ARM . 27 4.4 Cross-instruction set self-sustaining payloads . 30 4.4.1 Thumb-to-ARM self-sustaining payloads . 30 4.4.2 ARM-to-Thumb self-sustaining payloads . 38 4.5 Gadget chaining payloads . 44 Chapter 5 Thumb gadget chaining against JavaScriptCore . 51 5.1 The JavaScriptCore JavaScript Engine . 51 5.1.1 Low Level Interpreter . 52 iv 5.1.2 Baseline JIT . 52 5.1.3 Data Flow Graph (DFG) JIT . 53 5.1.4 Fourth Tier LLVM (FTL) JIT . 53 5.1.5 JavaScript value representation . 53 5.1.6 JavaScript call stack and calling convention . 54 5.2 Proof of concept gadget chaining attack . 55 5.2.1 Gadget generation . 57 5.2.2 Pinpointing gadgets in memory . 58 5.2.3 Preparing registers and branching to gadgets from JavaScript . 62 5.2.4 Returning from gadgets without crashing . 62 5.2.5 Analysis of the proof of concept attack . 64 Chapter 6 ARM Gadget Chaining against V8 . 67 6.1 The V8 JavaScript Engine . 67 6.2 Proof of concept gadget chaining attack . 68 6.2.1 Gadget layout and creation . 70 6.2.2 Artificial control flow vulnerability . 71 6.2.3 Failure-tolerant invocation. 72 6.2.4 Analysis . 76 Chapter 7 ARM-to-Thumb Self-sustaining JIT Spraying against SpiderMonkey 77 7.1 SpiderMonkey JavaScript Engine . 77 7.1.1 Bytecode Interpreter. 77 7.1.2 Baseline JIT . 78 7.1.3 IonMonkey JIT . 79 7.2 Proof of concept Turing-complete self-sustaining payload . 79 7.2.1 Implementing an SBNZ One Instruction Computer . 79 7.2.2 Encoding challenges . 80 7.2.3 Encoding a NOP sled . 86 7.2.4 System calls . 87 7.2.5 Design shortcomings . 88 Chapter 8 Defensive Just-In-Time Code Emission on ARM . 90 8.1 Introduction . 90 8.2 Survey of Proposed JIT Spraying Mitigations . 91 8.2.1 Capability confinement . 91 8.2.2 Memory protection . 98 8.2.3 Diversification mechanisms . 106 8.2.4 Concrete diversification proposals . 111 8.3 State of Mitigation Deployment . 121 8.3.1 JavaScriptCore . 121 8.3.2 V8 . 125 v 8.3.3 SpiderMonkey . 127 8.3.4 Chakra . 128 8.4 Understanding the costs and benefits of diversification mitigations . 130 8.4.1 Implementations . 131 8.4.2 Evaluation . 146 Chapter 9 Conclusion . 157 Bibliography . 160 vi LIST OF FIGURES Figure 1.1. Illustration of a NOP sled encoded in the bytes implementing the statement x = 0x3c909090 ^0x3c909090 ^0x3c909090; . 9 Figure 4.1. Example of two possible Thumb-mode decodings of a sequence of halfwords. 29 Figure 4.2. Example decoding of four consecutive bytes of little endian instruc- tion memory into two Thumb halfwords and an ARM instruction with both a condition flag and an ALU destination register. 32 Figure 4.3. Illustrations of the three classes of halfwords from which un- intended ARM instructions can draw their most significant half. 33 Figure 4.4. Diagram of the second halfword encoding found in many Thumb ALU instructions with an immediate operand. 34 Figure 4.5. Diagram illustrating the use of 16-bit Thumb branch instructions as the most significant half of unintended ARM instructions.. 37 Figure 4.6. Illustration of ARM-to-Thumb payloads. 40 Figure 4.7. Illustration of how the immediate-operand bitwise AND instruction from the ARM instruction set (top row) can be decoded as two 16-bit Thumb-2 instructions (bottom row). 41 Figure 4.8. Illustration of using a virtual PC (in this case R6) to more efficiently utilize the space skipped over by branches.. 43 Figure 4.9. Diagram of the invocation of the read gadget with arrows showing control flow. ..
Recommended publications
  • Technical Report TR-2020-10
    Simulation-Based Engineering Lab University of Wisconsin-Madison Technical Report TR-2020-10 CryptoEmu: An Instruction Set Emulator for Computation Over Ciphers Xiaoyang Gong and Dan Negrut December 28, 2020 Abstract Fully homomorphic encryption (FHE) allows computations over encrypted data. This technique makes privacy-preserving cloud computing a reality. Users can send their encrypted sensitive data to a cloud server, get encrypted results returned and decrypt them, without worrying about data breaches. This project report presents a homomorphic instruction set emulator, CryptoEmu, that enables fully homomorphic computation over encrypted data. The software-based instruction set emulator is built upon an open-source, state-of-the-art homomorphic encryption library that supports gate-level homomorphic evaluation. The instruction set architecture supports multiple instructions that belong to the subset of ARMv8 instruction set architecture. The instruction set emulator utilizes parallel computing techniques to emulate every functional unit for minimum latency. This project re- port includes details on design considerations, instruction set emulator architecture, and datapath and control unit implementation. We evaluated and demonstrated the instruction set emulator's performance and scalability on a 48-core workstation. Cryp- toEmu shown a significant speed up in homomorphic computation performance when compared with HELib, a state-of-the-art homomorphic encryption library. Keywords: Fully Homomorphic Encryption, Parallel Computing, Homomorphic Instruction Set, Homomorphic Processor, Computer Architecture 1 Contents 1 Introduction 3 2 Background 4 3 TFHE Library 5 4 CryptoEmu Architecture Overview 7 4.1 Data Processing . .8 4.2 Branch and Control Flow . .9 5 Data Processing Units 9 5.1 Load/Store Unit . .9 5.2 Adder .
    [Show full text]
  • A Methodology for Assessing Javascript Software Protections
    A methodology for Assessing JavaScript Software Protections Pedro Fortuna A methodology for Assessing JavaScript Software Protections Pedro Fortuna About me Pedro Fortuna Co-Founder & CTO @ JSCRAMBLER OWASP Member SECURITY, JAVASCRIPT @pedrofortuna 2 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Agenda 1 4 7 What is Code Protection? Testing Resilience 2 5 Code Obfuscation Metrics Conclusions 3 6 JS Software Protections Q & A Checklist 3 What is Code Protection Part 1 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Intellectual Property Protection Legal or Technical Protection? Alice Bob Software Developer Reverse Engineer Sells her software over the Internet Wants algorithms and data structures Does not need to revert back to original source code 5 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Intellectual Property IP Protection Protection Legal Technical Encryption ? Trusted Computing Server-Side Execution Obfuscation 6 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Code Obfuscation Obfuscation “transforms a program into a form that is more difficult for an adversary to understand or change than the original code” [1] More Difficult “requires more human time, more money, or more computing power to analyze than the original program.” [1] in Collberg, C., and Nagra, J., “Surreptitious software: obfuscation, watermarking, and tamperproofing for software protection.”, Addison- Wesley Professional, 2010. 7 A methodology for Assessing
    [Show full text]
  • La Promotion Du Web Ouvert a Bien Changé Mais Mozilla Est Toujours Là
    La promotion du Web Ouvert a bien changé mais Mozilla est toujours là Promouvoir le Web ouvert est l’une des missions de Mozilla. Mission parfaitement assumée et réussie il y a quelques années avec l’avènement de Firefox qui obligea Internet Explorer à quitter son arrogance pour rentrer dans le rang et se montrer plus respectueux des standards et donc des internautes. Sauf qu’aujourd’hui la donne a sensiblement changé. Avec la mobilité, les stores, les apps, les navigateurs intégrés, etc. c’est en effet un Web bien plus complexe qui se présente devant nous. Un Web enthousiasmant[1] mais plein d’embûches pour ceux qui sont attachés à son ouverture et à sa neutralité. C’est tout l’objet de ce très intéressant récent billet du développeur Mozilla Robert O’Callahan. Des changements dans la façon de promouvoir le Web Ouvert Shifts In Promoting The Open Web Robert O’Callahan – 30 septembre 201 – Blog personnel (Traduction Framalang : Antistress et Goofy) Historiquement Mozilla a dépensé pas mal d’énergie pour promouvoir l’usage du « Web ouvert » plutôt que de plateformes propriétaires et de code spécifique à des navigateurs non standards (IE6). Cette évangélisation reste nécessaire mais le paysage s’est modifié et je pense que notre discours doit s’adapter. Les plateformes dont nous devons nous préoccuper ont beaucoup changé. Au lieu de WPF, Slivertlight and Flash, les outils propriétaires pour développeurs avec lesquelles il faut rivaliser dorénavant sont iOS et Android. En conséquence, les fonctionnalités que le Web doit intégrer sont à présent orientées vers la mobilité.
    [Show full text]
  • Attacking Client-Side JIT Compilers.Key
    Attacking Client-Side JIT Compilers Samuel Groß (@5aelo) !1 A JavaScript Engine Parser JIT Compiler Interpreter Runtime Garbage Collector !2 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !3 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !4 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !5 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !6 Agenda 1. Background: Runtime Parser • Object representation and Builtins JIT Compiler 2. JIT Compiler Internals • Problem: missing type information • Solution: "speculative" JIT Interpreter 3.
    [Show full text]
  • CS153: Compilers Lecture 19: Optimization
    CS153: Compilers Lecture 19: Optimization Stephen Chong https://www.seas.harvard.edu/courses/cs153 Contains content from lecture notes by Steve Zdancewic and Greg Morrisett Announcements •HW5: Oat v.2 out •Due in 2 weeks •HW6 will be released next week •Implementing optimizations! (and more) Stephen Chong, Harvard University 2 Today •Optimizations •Safety •Constant folding •Algebraic simplification • Strength reduction •Constant propagation •Copy propagation •Dead code elimination •Inlining and specialization • Recursive function inlining •Tail call elimination •Common subexpression elimination Stephen Chong, Harvard University 3 Optimizations •The code generated by our OAT compiler so far is pretty inefficient. •Lots of redundant moves. •Lots of unnecessary arithmetic instructions. •Consider this OAT program: int foo(int w) { var x = 3 + 5; var y = x * w; var z = y - 0; return z * 4; } Stephen Chong, Harvard University 4 Unoptimized vs. Optimized Output .globl _foo _foo: •Hand optimized code: pushl %ebp movl %esp, %ebp _foo: subl $64, %esp shlq $5, %rdi __fresh2: movq %rdi, %rax leal -64(%ebp), %eax ret movl %eax, -48(%ebp) movl 8(%ebp), %eax •Function foo may be movl %eax, %ecx movl -48(%ebp), %eax inlined by the compiler, movl %ecx, (%eax) movl $3, %eax so it can be implemented movl %eax, -44(%ebp) movl $5, %eax by just one instruction! movl %eax, %ecx addl %ecx, -44(%ebp) leal -60(%ebp), %eax movl %eax, -40(%ebp) movl -44(%ebp), %eax Stephen Chong,movl Harvard %eax,University %ecx 5 Why do we need optimizations? •To help programmers… •They write modular, clean, high-level programs •Compiler generates efficient, high-performance assembly •Programmers don’t write optimal code •High-level languages make avoiding redundant computation inconvenient or impossible •e.g.
    [Show full text]
  • Quaxe, Infinity and Beyond
    Quaxe, infinity and beyond Daniel Glazman — WWX 2015 /usr/bin/whoami Primary architect and developer of the leading Web and Ebook editors Nvu and BlueGriffon Former member of the Netscape CSS and Editor engineering teams Involved in Internet and Web Standards since 1990 Currently co-chair of CSS Working Group at W3C New-comer in the Haxe ecosystem Desktop Frameworks Visual Studio (Windows only) Xcode (OS X only) Qt wxWidgets XUL Adobe Air Mobile Frameworks Adobe PhoneGap/Air Xcode (iOS only) Qt Mobile AppCelerator Visual Studio Two solutions but many issues Fragmentation desktop/mobile Heavy runtimes Can’t easily reuse existing c++ libraries Complex to have native-like UI Qt/QtMobile still require c++ Qt’s QML is a weak and convoluted UI language Haxe 9 years success of Multiplatform OSS language Strong affinity to gaming Wide and vibrant community Some press recognition Dead code elimination Compiles to native on all But no native GUI… platforms through c++ and java Best of all worlds Haxe + Qt/QtMobile Multiplatform Native apps, native performance through c++/Java C++/Java lib reusability Introducing Quaxe Native apps w/o c++ complexity Highly dynamic applications on desktop and mobile Native-like UI through Qt HTML5-based UI, CSS-based styling Benefits from Haxe and Qt communities Going from HTML5 to native GUI completeness DOM dynamism in native UI var b: Element = document.getElementById("thirdButton"); var t: Element = document.createElement("input"); t.setAttribute("type", "text"); t.setAttribute("value", "a text field"); b.parentNode.insertBefore(t,
    [Show full text]
  • X86 Assembly Language Syllabus for Subject: Assembly (Machine) Language
    VŠB - Technical University of Ostrava Department of Computer Science, FEECS x86 Assembly Language Syllabus for Subject: Assembly (Machine) Language Ing. Petr Olivka, Ph.D. 2021 e-mail: [email protected] http://poli.cs.vsb.cz Contents 1 Processor Intel i486 and Higher – 32-bit Mode3 1.1 Registers of i486.........................3 1.2 Addressing............................6 1.3 Assembly Language, Machine Code...............6 1.4 Data Types............................6 2 Linking Assembly and C Language Programs7 2.1 Linking C and C Module....................7 2.2 Linking C and ASM Module................... 10 2.3 Variables in Assembly Language................ 11 3 Instruction Set 14 3.1 Moving Instruction........................ 14 3.2 Logical and Bitwise Instruction................. 16 3.3 Arithmetical Instruction..................... 18 3.4 Jump Instructions........................ 20 3.5 String Instructions........................ 21 3.6 Control and Auxiliary Instructions............... 23 3.7 Multiplication and Division Instructions............ 24 4 32-bit Interfacing to C Language 25 4.1 Return Values from Functions.................. 25 4.2 Rules of Registers Usage..................... 25 4.3 Calling Function with Arguments................ 26 4.3.1 Order of Passed Arguments............... 26 4.3.2 Calling the Function and Set Register EBP...... 27 4.3.3 Access to Arguments and Local Variables....... 28 4.3.4 Return from Function, the Stack Cleanup....... 28 4.3.5 Function Example.................... 29 4.4 Typical Examples of Arguments Passed to Functions..... 30 4.5 The Example of Using String Instructions........... 34 5 AMD and Intel x86 Processors – 64-bit Mode 36 5.1 Registers.............................. 36 5.2 Addressing in 64-bit Mode.................... 37 6 64-bit Interfacing to C Language 37 6.1 Return Values..........................
    [Show full text]
  • Bitwise Operators in C Examples
    Bitwise Operators In C Examples Alphonse misworships discommodiously. Epigastric Thorvald perishes changefully while Oliver always intercut accusatively.his anthology plumps bravely, he shown so cheerlessly. Baillie argufying her lashkar coweringly, she mass it Find the program below table which is c operators associate from star from the positive Left shift shifts each bit in its left operand to the left. Tying up some Arduino loose ends before moving on. The next example shows you how to do this. We can understand this better using the following Example. These operators move each bit either left or right a specified number of times. Amazon and the Amazon logo are trademarks of Amazon. Binary form of these values are given below. Shifting the bits of the string of bitwise operators in c examples and does not show the result of this means the truth table for making. Mommy, security and web. When integers are divided, it shifts the bits to the right. Operators are the basic concept of any programming language, subtraction, we are going to see how to use bitwise operators for counting number of trailing zeros in the binary representation of an integer? Adding a negative and positive number together never means the overflow indicates an exception! Not valid for string or complex operands. Les cookies essentiels sont absolument necessaires au bon fonctionnement du site. Like the assembly in C language, unlike other magazines, you may treat this as true. Bitwise OR operator is used to turn bits ON. To me this looks clearer as is. We will talk more about operator overloading in a future section, and the second specifies the number of bit positions by which the first operand is to be shifted.
    [Show full text]
  • 1 More Register Allocation Interference Graph Allocators
    More Register Allocation Last time – Register allocation – Global allocation via graph coloring Today – More register allocation – Clarifications from last time – Finish improvements on basic graph coloring concept – Procedure calls – Interprocedural CS553 Lecture Register Allocation II 2 Interference Graph Allocators Chaitin Briggs CS553 Lecture Register Allocation II 3 1 Coalescing Move instructions – Code generation can produce unnecessary move instructions mov t1, t2 – If we can assign t1 and t2 to the same register, we can eliminate the move Idea – If t1 and t2 are not connected in the interference graph, coalesce them into a single variable Problem – Coalescing can increase the number of edges and make a graph uncolorable – Limit coalescing coalesce to avoid uncolorable t1 t2 t12 graphs CS553 Lecture Register Allocation II 4 Coalescing Logistics Rule – If the virtual registers s1 and s2 do not interfere and there is a copy statement s1 = s2 then s1 and s2 can be coalesced – Steps – SSA – Find webs – Virtual registers – Interference graph – Coalesce CS553 Lecture Register Allocation II 5 2 Example (Apply Chaitin algorithm) Attempt to 3-color this graph ( , , ) Stack: Weighted order: a1 b d e c a1 b a e c 2 a2 b a1 c e d a2 d The example indicates that nodes are visited in increasing weight order. Chaitin and Briggs visit nodes in an arbitrary order. CS553 Lecture Register Allocation II 6 Example (Apply Briggs Algorithm) Attempt to 2-color this graph ( , ) Stack: Weighted order: a1 b d e c a1 b* a e c 2 a2* b a1* c e* d a2 d * blocked node CS553 Lecture Register Allocation II 7 3 Spilling (Original CFG and Interference Graph) a1 := ..
    [Show full text]
  • Feasibility of Optimizations Requiring Bounded Treewidth in a Data Flow Centric Intermediate Representation
    Feasibility of Optimizations Requiring Bounded Treewidth in a Data Flow Centric Intermediate Representation Sigve Sjømæling Nordgaard and Jan Christian Meyer Department of Computer Science, NTNU Trondheim, Norway Abstract Data flow analyses are instrumental to effective compiler optimizations, and are typically implemented by extracting implicit data flow information from traversals of a control flow graph intermediate representation. The Regionalized Value State Dependence Graph is an alternative intermediate representation, which represents a program in terms of its data flow dependencies, leaving control flow implicit. Several analyses that enable compiler optimizations reduce to NP-Complete graph problems in general, but admit linear time solutions if the graph’s treewidth is limited. In this paper, we investigate the treewidth of application benchmarks and synthetic programs, in order to identify program features which cause the treewidth of its data flow graph to increase, and assess how they may appear in practical software. We find that increasing numbers of live variables cause unbounded growth in data flow graph treewidth, but this can ordinarily be remedied by modular program design, and monolithic programs that exceed a given bound can be efficiently detected using an approximate treewidth heuristic. 1 Introduction Effective program optimizations depend on an intermediate representation that permits a compiler to analyze the data and control flow semantics of the source program, so as to enable transformations without altering the result. The information from analyses such as live variables, available expressions, and reaching definitions pertains to data flow. The classical method to obtain it is to iteratively traverse a control flow graph, where program execution paths are explicitly encoded and data movement is implicit.
    [Show full text]
  • Communications Cacm.Acm.Org of Theacm 06/2009 Vol.52 No.06
    COMMUNICATIONS CACM.ACM.ORG OF THEACM 06/2009 VOL.52 NO.06 One Laptop Per Child: Vision vs. Reality Hard-Disk Drives: The Good, The Bad, and the Ugly How CS Serves The Developing World Network Front-End Processors The Claremont Report On Database Research Autonomous Helicopters Association for Computing Machinery Think Parallel..... It’s not just what we make. It’s what we make possible. Advancing Technology Curriculum Driving Software Evolution Fostering Tomorrow’s Innovators Learn more at: www.intel.com/thinkparallel ACM Ad.indd 1 4/17/2009 11:20:03 AM ABCD springer.com Noteworthy Computer Science Journals Autonomous Biological Personal and Robots Cybernetics Ubiquitous G. Sukhatme, University W. Senn, Universität Bern, Computing of Southern California, Physiologisches Institut; ACM Viterbi School of Engi- J. Rinzel, National neering, Dept. Computer Institutes of Health (NIH), P. Thomas, Univ. Coll. Science Dept. Health Education & London Interaction Centre Autonomous Robots Welfare; J. L. van Hemmen, reports on the theory and TU München, Abt. Physik Personal and Ubiquitous applications of robotic systems capable of Biological Cybernetics is an interdisciplinary Computing publishes peer-reviewed some degree of self-sufficiency. It features medium for experimental, theoretical and international research on handheld, wearable papers that include performance data on actual application-oriented aspects of information and mobile information devices and the robots in the real world. The focus is on the processing in organisms, including sensory, pervasive communications infrastructure that ability to move and be self-sufficient, not on motor, cognitive, and ecological phenomena. supports them to enable the seamless whether the system is an imitation of biology.
    [Show full text]
  • Iterative-Free Program Analysis
    Iterative-Free Program Analysis Mizuhito Ogawa†∗ Zhenjiang Hu‡∗ Isao Sasano† [email protected] [email protected] [email protected] †Japan Advanced Institute of Science and Technology, ‡The University of Tokyo, and ∗Japan Science and Technology Corporation, PRESTO Abstract 1 Introduction Program analysis is the heart of modern compilers. Most control Program analysis is the heart of modern compilers. Most control flow analyses are reduced to the problem of finding a fixed point flow analyses are reduced to the problem of finding a fixed point in a in a certain transition system, and such fixed point is commonly certain transition system. Ordinary method to compute a fixed point computed through an iterative procedure that repeats tracing until is an iterative procedure that repeats tracing until convergence. convergence. Our starting observation is that most programs (without spaghetti This paper proposes a new method to analyze programs through re- GOTO) have quite well-structured control flow graphs. This fact cursive graph traversals instead of iterative procedures, based on is formally characterized in terms of tree width of a graph [25]. the fact that most programs (without spaghetti GOTO) have well- Thorup showed that control flow graphs of GOTO-free C programs structured control flow graphs, graphs with bounded tree width. have tree width at most 6 [33], and recent empirical study shows Our main techniques are; an algebraic construction of a control that control flow graphs of most Java programs have tree width at flow graph, called SP Term, which enables control flow analysis most 3 (though in general it can be arbitrary large) [16].
    [Show full text]